{"id":3982,"date":"2024-12-17T08:42:29","date_gmt":"2024-12-17T16:42:29","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/azure-sql\/?p=3982"},"modified":"2024-12-17T09:06:26","modified_gmt":"2024-12-17T17:06:26","slug":"embedding-models-and-dimensions-optimizing-the-performance-resource-usage-ratio","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/azure-sql\/embedding-models-and-dimensions-optimizing-the-performance-resource-usage-ratio\/","title":{"rendered":"Embedding models and dimensions: optimizing the performance to resource-usage ratio"},"content":{"rendered":"<p>Since the release of vector preview, we&#8217;ve been working with many customers that are building AI solution on Azure SQL and SQL Server and one of the most common questions is how to support high-dimensional data, for example more than 2000 dimensions per vector. In fact, at the moment, the vector type supports &#8220;only&#8221; up to 1998 dimensions for an embedding. One of the impressions that such limitation may give, is that you cannot use the latest and greatest embedding model offered by OpenAI, and also available in Azure, which is the <em>text-3-embedding-large<\/em> model, as it returns 3072 dimensions.<\/p>\n<p>Well, that&#8217;s not the case. And I would even add that using such high number of dimensions is not really giving you that much benefit compared to the costs that comes with that usage. Let me show you why.<\/p>\n<h2>Embedding dimensions and MTEB Benchmark<\/h2>\n<p>&#8220;<em>MTEB is a massive benchmark for measuring the performance of text embedding models on diverse embedding tasks.<\/em>&#8221; as stated here: <a href=\"https:\/\/huggingface.co\/blog\/mteb\">https:\/\/huggingface.co\/blog\/mteb<\/a>. Taking a look at the published leaderboard, and filtering it for the models available in OpenAI &#8211; just to compare something that can be easily used by everyone &#8211; you can see very interesting results:<\/p>\n<p><img decoding=\"async\" class=\"aligncenter wp-image-3984 size-full\" src=\"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2024\/12\/Screenshot-2024-12-16-093822.png\" alt=\"Image Screenshot 2024 12 16 093822\" width=\"1795\" height=\"345\" srcset=\"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2024\/12\/Screenshot-2024-12-16-093822.png 1795w, https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2024\/12\/Screenshot-2024-12-16-093822-300x58.png 300w, https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2024\/12\/Screenshot-2024-12-16-093822-1024x197.png 1024w, https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2024\/12\/Screenshot-2024-12-16-093822-768x148.png 768w, https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2024\/12\/Screenshot-2024-12-16-093822-1536x295.png 1536w\" sizes=\"(max-width: 1795px) 100vw, 1795px\" \/>as you can notice the average performance is very similar and it is interesting to notice that the\u00a0<em>text-embedding-3-large<\/em> can be used to return only 256 dimensions instead of the default 3072, while still performing very well. Great performance with 1\/12 of the resource usage. That&#8217;s quite a great deal!<\/p>\n<h2>Setting Dimensions Count<\/h2>\n<p>As described in the OpenAI text-3-embedding model release article, with those models &#8220;<em>developers can shorten embeddings (i.e. remove some numbers from the end of the sequence) without the embedding losing its concept-representing properties by passing in the\u00a0<code>dimensions<\/code>\u00a0API parameter. For example, on the MTEB benchmark, a\u00a0<code>text-embedding-3-large<\/code>\u00a0embedding can be shortened to a size of 256 while still outperforming an unshortened\u00a0<code>text-embedding-ada-002<\/code> embedding with a size of 1536.<\/em>&#8221;<\/p>\n<p>Details can be found in the &#8220;Native support for shortening embeddings&#8221; section of the article &#8220;<a href=\"https:\/\/openai.com\/index\/new-embedding-models-and-api-updates\/\">New embedding models and API updates<\/a>&#8221; published by OpenAI in January 2024.<\/p>\n<p>To set the desired dimension count you just have to pass the dimension parameter in your payload. Here&#8217;s the sample code in T-SQL:<\/p>\n<pre class=\"prettyprint language-sql\"><code class=\"language-sql\">declare @inputText nvarchar(max) = 'It''s fun to do the impossible.';\r\n\r\ndeclare @payload nvarchar(max) = json_object(\r\n    'input': @inputText,\r\n    'dimensions': 1024\r\n);\r\n\r\ndeclare @retval int, @response nvarchar(max)\r\nexec @retval = sp_invoke_external_rest_endpoint\r\n    @url = 'https:\/\/dm-open-ai-3.openai.azure.com\/openai\/deployments\/text-embedding-3-large\/embeddings?api-version=2023-03-15-preview',\r\n    @method = 'POST',\r\n    @credential = [https:\/\/dm-open-ai-3.openai.azure.com],\r\n    @payload = @payload,\r\n    @response = @response output;\r\n\r\ndeclare @re nvarchar(max) = json_query(@response, '$.result.data[0].embedding')\r\nselect cast(@re as vector(1024));<\/code><\/pre>\n<p>Please note that I&#8217;m using a <code>DATABASE SCOPED CREDENTIAL<\/code>, as explained in the <code>sp_invoke_external_rest_endpoint<\/code> <a href=\"https:\/\/learn.microsoft.com\/en-us\/sql\/relational-databases\/system-stored-procedures\/sp-invoke-external-rest-endpoint-transact-sql?view=fabric&amp;tabs=request-headers#credentials\">documentation<\/a>, to securely store and use Azure OpenAI API keys. Another, even better, option would be to use Managed Identities, to forget about having to store and protect keys altogether.<\/p>\n<h2>Consideration on choosing the right model and dimension size<\/h2>\n<p>Setting 1024 dimensions seems to be a sweet spot for the <em>text-embedding-3-large<\/em> model as with 4Kb of space (1024 dimensions each one using 4 bytes to store single-precision float value) as it will give pretty much the same performance of using 3072 dimensions that will instead use 12Kb of space. More importantly calculating dot product or, even more, the cosine distance, will require way less computation, which in turns means less CPU usage and less power drain for still practically the same results, as shown by this chart that is also referenced in the OpenAI article mentioned before:<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2024\/12\/Screenshot-2024-12-16-172106.png\"><img decoding=\"async\" class=\"aligncenter wp-image-4018 size-full\" src=\"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2024\/12\/Screenshot-2024-12-16-172106.png\" alt=\"Image Screenshot 2024 12 16 172106\" width=\"917\" height=\"123\" srcset=\"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2024\/12\/Screenshot-2024-12-16-172106.png 917w, https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2024\/12\/Screenshot-2024-12-16-172106-300x40.png 300w, https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2024\/12\/Screenshot-2024-12-16-172106-768x103.png 768w\" sizes=\"(max-width: 917px) 100vw, 917px\" \/><\/a><\/p>\n<p>It is important to understand that &#8220;latest and greatest&#8221; with embedding models might not be the best for your case, and deciding the right model to use goes well beyond the simple dimension count. Taking a deeper look at the <a href=\"https:\/\/huggingface.co\/spaces\/mteb\/leaderboard\">MTEB Leaderboard<\/a> is something I strongly suggest to do, so that you can pick the best model taking into account all factors. From use case to dimension count through localization, resource usage, speed and quality: in that way you can make sure avoid overspending for getting pretty much the same results.<\/p>\n<h2>Yes but&#8230;<\/h2>\n<p>But what if you really need more than 2K dimensions? We are aware of some fascinating use cases, particularly in machine learning workloads, where more than 10K dimensions are necessary. We&#8217;ve also received feedback about scenarios where each dimension value is just a bit (binary quantization). Additionally, some embedding models are even reaching up to 4K dimensions. We&#8217;re exploring all these options (and more) to ensure we prioritize correctly. Stay tuned for updates in 2025! If you have any feedback or use cases you&#8217;d like us to consider, please leave a comment below. Your input will help us shape vector support to provide the best balance between performance, ease of use, and practical application.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Since the release of vector preview, we&#8217;ve been working with many customers that are building AI solution on Azure SQL and SQL Server and one of the most common questions is how to support high-dimensional data, for example more than 2000 dimensions per vector. In fact, at the moment, the vector type supports &#8220;only&#8221; up [&hellip;]<\/p>\n","protected":false},"author":24720,"featured_media":4022,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[601,1,615],"tags":[640,641],"class_list":["post-3982","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","category-azure-sql","category-vectors","tag-embeddings","tag-high-dimensional-data"],"acf":[],"blog_post_summary":"<p>Since the release of vector preview, we&#8217;ve been working with many customers that are building AI solution on Azure SQL and SQL Server and one of the most common questions is how to support high-dimensional data, for example more than 2000 dimensions per vector. In fact, at the moment, the vector type supports &#8220;only&#8221; up [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/posts\/3982","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/users\/24720"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/comments?post=3982"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/posts\/3982\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/media\/4022"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/media?parent=3982"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/categories?post=3982"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/tags?post=3982"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}