{"id":4600,"date":"2025-05-06T08:02:57","date_gmt":"2025-05-06T15:02:57","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/azure-sql\/?p=4600"},"modified":"2025-05-07T09:47:48","modified_gmt":"2025-05-07T16:47:48","slug":"efficiently-and-elegantly-modeling-embeddings-in-azure-sql-and-sql-server","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/azure-sql\/efficiently-and-elegantly-modeling-embeddings-in-azure-sql-and-sql-server\/","title":{"rendered":"Efficiently and Elegantly Modeling Embeddings in Azure SQL and SQL Server"},"content":{"rendered":"<p>Storing and querying text embeddings in a database it might seem challenging, but with the right schema design, it&#8217;s not only possible, it&#8217;s powerful. Whether you\u2019re building AI-powered search, semantic filtering, or recommendation features, embeddings, and thus vectors, are now a first-class data type. So how do you model them well inside a database like SQL Server and Azure SQL?<\/p>\n<p>In this post, I\u2019ll walk you through a practical, scalable strategy to store embeddings effectively\u2014while keeping your schema clean, your queries readable, and your operations efficient.<\/p>\n<p>This blog post is part of a series of blogs I&#8217;m writing as I&#8217;m learning how to bring AI to the database, which is the most efficient way to use AI with your own data. <a href=\"https:\/\/devblogs.microsoft.com\/azure-sql\/storing-querying-and-keeping-embeddings-updated-options-and-best-practices\/\">Make sure to check out the entire series<\/a> as you&#8217;ll find answers to very common questions.<\/p>\n<h2>The Challenge of Managing Embeddings in a Database<\/h2>\n<p>Embeddings are high-dimensional vectors that represent the meaning of text. When you want to store and search them in a database environment, you quickly hit a few common challenges.<\/p>\n<p>First, embeddings can be large\u2014often several kilobytes per vector\u2014which means they unlikely belong in your core table structure. Second, if your text content is long (like an article, a contract, or a code snippet with detailed notes), you usually need to chunk the text and generate multiple embeddings per record. Finally, many real-world records include multiple text fields\u2014such as description, notes, and details\u2014each of which might need to be embedded separately. This rules out the idea of having a single \u201cembedding column\u201d directly embedded in your main data table.<\/p>\n<p>What you need instead is a flexible and maintainable model that accounts for all of this complexity without becoming a mess of JOINs and tangled logic.<\/p>\n<h2>The Right Way: One Embedding Table per Text Field<\/h2>\n<p>The most effective approach is to create one separate embedding table for each long-text field that needs to be vectorized. Each table holds the chunked embeddings for a specific field\u2014so for example, all description embeddings go into a <code>code_samples_description_embeddings<\/code> table, all note embeddings into <code>code_samples_notes_embeddings<\/code>, and so on.<\/p>\n<p>Here\u2019s what the base samples table might look like:<\/p>\n<pre class=\"prettyprint language-sql\"><code class=\"language-sql\">create table dbo.code_samples\r\n(\r\n    id int identity primary key,\r\n    name nvarchar(100) not null,\r\n    description nvarchar(max) not null,\r\n    notes nvarchar(max) null,\r\n    details json null,\r\n    url nvarchar(1000) not null,\r\n    created_on datetime2 not null,\r\n    updated_on datetime2 not null\r\n)<\/code><\/pre>\n<p>Then, each long-form field gets its own embedding table. For example for the <code>description<\/code> field I created the following table to store the related embeddings:<\/p>\n<pre class=\"prettyprint language-sql\"><code class=\"language-sql\">create table dbo.code_samples_description_embeddings (\r\n    id int not null,\r\n    chunk_index int not null,\r\n    embedding vector(1536) not null,\r\n    primary key (id, chunk_index)\r\n)<\/code><\/pre>\n<p>Similar tables will be created for the <code>notes<\/code> and the <code>details<\/code> columns<\/p>\n<p>This gives you the flexibility to handle large, multi-paragraph text inputs while keeping embeddings organized by field and easily manageable.<\/p>\n<h2>Elegant Querying with LEAST()<\/h2>\n<p>Let\u2019s say you want to find the top-k most relevant code samples for a given query vector. Relevance could come from the description, notes, or details field\u2014you don\u2019t necessarily know which.<\/p>\n<p>Instead of unioning results from all embedding tables, you can use a single query and leverage the <code>LEAST()<\/code> function to pick the closest match from all available embedding sources:<\/p>\n<pre class=\"prettyprint language-sql\"><code class=\"language-sql\">select top(@k)\r\n    s.id, name, description, notes, details,\r\n    least(\r\n        vector_distance('cosine', de.embedding, @qv),\r\n        vector_distance('cosine', ne.embedding, @qv),\r\n        vector_distance('cosine', dte.embedding, @qv)\r\n    ) as distance_score\r\nfrom\r\n    samples s\r\ninner join\r\n    samples_description_embeddings de on de.id = s.id\r\nleft join\r\n    samples_notes_embeddings ne on ne.id = s.id\r\nleft join\r\n    samples_details_embeddings dte on dte.id = s.id\r\norder by\r\n    distance_score asc<\/code><\/pre>\n<p>This is where <code>LEAST()<\/code> shines. It cleanly expresses the idea: <em>\u201cGet the closest semantic match across all available text fields.\u201d<\/em><\/p>\n<p>The resulting query is readable, concise, and easy to maintain. Adding another field? Just add a new <code>LEFT JOIN<\/code> and another <code>vector_distance()<\/code> expression inside <code>LEAST()<\/code>.<\/p>\n<p>It\u2019s fantastic case that shows how SQL queries are expressive and elegant.<\/p>\n<h2>Embedding Metadata with JSON for Richer Context<\/h2>\n<p>While some fields (like descriptions or notes) benefit from chunked embeddings, others\u2014such as tags, author, category, or language\u2014are too short to justify embedding individually. Embedding \u201cPython\u201d or \u201cdatabase tutorial\u201d on their own doesn\u2019t carry much semantic weight.<\/p>\n<p>A better approach is to consolidate these lightweight fields into a single JSON column, and embed the result as one coherent string. This reduces complexity and adds semantic density.<\/p>\n<p>The sample table shown above as a <code>details<\/code> JSON column that looks like this:<\/p>\n<pre class=\"prettyprint language-json\"><code class=\"language-json\">{\r\n    \"tags\": [\"SQL\", \"vector search\"],\r\n    \"author\": \"Alice Smith\",\r\n    \"language\": \"T-SQL\",\r\n    \"category\": \"database design\"\r\n}<\/code><\/pre>\n<p>Now you only need one embedding operation to capture the full semantic fingerprint of that record\u2019s metadata. This approach is especially useful when metadata varies in shape across records or grows over time. It allows you to semantically represent almost the entire record\u2014long-form fields via chunked embedding tables, and metadata via a single embedded JSON blob\u2014without cluttering your schema or blowing up your compute budget.<\/p>\n<p>Of course, you don&#8217;t need to have a JSON column to follow this approach. You can easily generate the JSON data from your table using a <code>FOR JSON<\/code> or <code>JSON_OBJECT<\/code> function and generate the JSON on the fly, without needed to store it at all. You&#8217;ll be only storing the embedding calculated on the generated JSON.<\/p>\n<h2>Wrapping Up<\/h2>\n<p>If you\u2019re integrating embeddings into your database, the key is to think modularly and semantically.<\/p>\n<p>Start by breaking out each long-form text field into its own embedding table, allowing chunked storage and flexible vector querying. Use <code>LEAST()<\/code> function to combine vector distances elegantly at runtime. And for short, structured metadata, consolidate it into a JSON column and generate a single embedding from the flattened content.<\/p>\n<p>This strategy turns your existing structured data platform into a vector-aware, AI-powered backend\u2014without introducing unnecessary complexity or requiring a new database system.<\/p>\n<p>It\u2019s fast. It\u2019s clean.<em> And it works beautifully.<\/em><\/p>\n<h2>Real-World Example<\/h2>\n<p>Everything explained in this post has been used to build a real-world, production-grade sample, freely available on GitHub. The only difference is that, for my use case, I didn&#8217;t need to chunk the text, so in the embeddings table you&#8217;ll not find the <code>chunk_id<\/code>, but other than that, the sample is very well aligned with that explained her, (and with many more cool best practices that I&#8217;ll explain in future posts&#8230;.so stay tuned!<\/p>\n<p><a href=\"https:\/\/github.com\/yorek\/azure-sql-db-ai-samples-search\">https:\/\/github.com\/yorek\/azure-sql-db-ai-samples-search<\/a><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Storing and querying text embeddings in a database it might seem challenging, but with the right schema design, it&#8217;s not only possible, it&#8217;s powerful. Whether you\u2019re building AI-powered search, semantic filtering, or recommendation features, embeddings, and thus vectors, are now a first-class data type. So how do you model them well inside a database like [&hellip;]<\/p>\n","protected":false},"author":24720,"featured_media":4623,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[601,1,615],"tags":[666,665,640,667,647],"class_list":["post-4600","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","category-azure-sql","category-vectors","tag-best-pratices","tag-data-modeling","tag-embeddings","tag-practices","tag-vectors"],"acf":[],"blog_post_summary":"<p>Storing and querying text embeddings in a database it might seem challenging, but with the right schema design, it&#8217;s not only possible, it&#8217;s powerful. Whether you\u2019re building AI-powered search, semantic filtering, or recommendation features, embeddings, and thus vectors, are now a first-class data type. So how do you model them well inside a database like [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/posts\/4600","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/users\/24720"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/comments?post=4600"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/posts\/4600\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/media\/4623"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/media?parent=4600"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/categories?post=4600"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/tags?post=4600"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}