{"id":2598,"date":"2023-11-15T08:00:50","date_gmt":"2023-11-15T16:00:50","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/azure-sql\/?p=2598"},"modified":"2023-11-14T20:55:36","modified_gmt":"2023-11-15T04:55:36","slug":"vector-search-with-azure-sql-database","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/azure-sql\/vector-search-with-azure-sql-database\/","title":{"rendered":"Vector Search with Azure SQL Database"},"content":{"rendered":"<p>With <a href=\"https:\/\/aka.ms\/AISearchIntegratedVectorizationAnnouncement\">public preview of integrated vectorization<\/a>, a ground-breaking capability of vector search in Azure AI Search (previously Azure Cognitive Search), you can do vector search with data stored in Azure SQL Database easily. This feature is designed to streamline the process of chunking, generating, storing, and querying vectors for vector search in Azure AI Search. This feature significantly speeds up the development of the vectorization pipeline and minimizes maintenance tasks during data ingestion and query time. It paves the way for seamless integration of vectors into your applications with traditional search demands and Retrieval-Augmented Generation (RAG) applications.<\/p>\n<p>Learn more about <a href=\"https:\/\/aka.ms\/AISearchIntegratedVectorizationAnnouncement\">Azure AI Search Integrated vectorization announcement<\/a>.<\/p>\n<h2>Key Concepts in Integrated Vectorization<\/h2>\n<ul>\n<li><strong>Vector search:<\/strong> In Azure AI Search, this is a capability for indexing, storing, and retrieving vector embeddings from a search index. By representing text as vectors, vector search can identify the most similar documents based on their proximity in a vector space. In vector search, vectorization refers to the conversion of text data into vector embeddings.<\/li>\n<li><strong>Chunking:<\/strong> Process of dividing data into smaller manageable parts (chunks) that can be processed independently. Chunking is required if source documents are too large for the maximum input size of embedding and\/or large language models.<\/li>\n<li><strong>Retrieval Augmented Generation (RAG):<\/strong> Architecture that augments the capabilities of a Large Language Model (LLM) like ChatGPT by adding an information retrieval system (i.e., Azure AI Search) that provides the data.<\/li>\n<\/ul>\n<h2>Example with data stored in Azure SQL Database<\/h2>\n<p>Let\u2019s create an example of an AI application that responds to users&#8217; queries based on the Azure SQL DB table of Amazon product reviews. This example is using a Python notebook. The same operations can be done via the Azure portal or scripting in your favorite deployment option. You can find this example at: <a href=\"https:\/\/aka.ms\/sql-ai\/sql-acs\/sample\">https:\/\/aka.ms\/sql-ai\/sql-acs\/sample<\/a>\u00a0 (Thanks to our SQL Data Science Team especially Jordan DuBeau).<\/p>\n<p>The end behavior will be something like:<\/p>\n<p><strong><em>[User search]: Canned dog food<\/em><\/strong><\/p>\n<p><strong><em>[AI Response]: After searching through our product database, I recommend &lt;product ID&gt; because&#8230;<\/em><\/strong><\/p>\n<p>Behind the scenes, we take the following steps:<\/p>\n<ul>\n<li>Set up a sample table in SQL DB and upload data to it.<\/li>\n<li>Set up an index in Azure AI Search to store the data we need, including vectorized versions of the text reviews.<\/li>\n<li>Set up an indexer in Azure AI Search to pull data into the index.\n<ul>\n<li>Automatically chunks and vectorizes the data using an Azure OpenAI Embedding service.<\/li>\n<\/ul>\n<\/li>\n<li>Use Azure AI Search to process the user&#8217;s query and search for the most relevant data.<\/li>\n<li>Use an Azure OpenAI Completion service to respond to the user&#8217;s query.<\/li>\n<\/ul>\n<h2>Architecture<\/h2>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2023\/11\/Integerated-Vectorization-1.png\"><img decoding=\"async\" class=\"aligncenter wp-image-2646 size-large\" src=\"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2023\/11\/Integerated-Vectorization-1-1024x558.png\" alt=\"Image Integerated Vectorization 1\" width=\"640\" height=\"349\" srcset=\"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2023\/11\/Integerated-Vectorization-1-1024x558.png 1024w, https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2023\/11\/Integerated-Vectorization-1-300x164.png 300w, https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2023\/11\/Integerated-Vectorization-1-768x419.png 768w, https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2023\/11\/Integerated-Vectorization-1-1536x838.png 1536w, https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2023\/11\/Integerated-Vectorization-1-2048x1117.png 2048w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/a><\/p>\n<h2>Setting up environment<\/h2>\n<p>You will need to set up the environment and add the environment information in example.env file.<\/p>\n<p>Pre-requisites in this example:<\/p>\n<ul>\n<li>An existing SQL Database with server name, DB name, username, and password copied into\u00a0example.env\n<ul>\n<li>The user must have permission to create a new table and enable and view change tracking on the database<\/li>\n<li>You must whitelist your IP to access your SQL server by opening the SQL server resource in the Azure portal, navigating to Security \/ Networking, and adding your IP.<\/li>\n<\/ul>\n<\/li>\n<li>An OpenAI resource with the endpoint and key copied into\u00a0example.env<\/li>\n<li>An Azure AI Search resource with the endpoint and key copied into example.env<\/li>\n<li>The Python packages listed in\u00a0requirements.txt<\/li>\n<li>The Microsoft ODBC 18<\/li>\n<\/ul>\n<p>This sample shows how to load data as well, you can skip that step and start with your own product data if needed.<\/p>\n<p>Next, create a data source connection. This step creates a connection that will be used to pull data from our SQL table.\u00a0Documentation can be found\u00a0<a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/search\/search-howto-connecting-azure-sql-database-to-azure-search-using-indexers\">here.<\/a><\/p>\n<h2><span class=\"ui-provider\">Chunk, vectorize, configure, and store data<\/span><\/h2>\n<h4><span class=\"ui-provider\" style=\"font-size: 18pt;\">Create an index and configure integrated vectorization<\/span><\/h4>\n<p>Here are the steps:<\/p>\n<ol>\n<li>Take the combined text (summary + review text) from each product review.<\/li>\n<li>Split the combined text into chunks.<\/li>\n<li>Embed each chunk as a vector.<\/li>\n<li>(Later) search for the most relevant chunk based on the incoming query.<\/li>\n<\/ol>\n<p>To enable this, the search index will store all of the following data, for each chunk of text:<\/p>\n<ul>\n<li>Id of chunk<\/li>\n<li>Chunk text<\/li>\n<li>Vector version of chunk text<\/li>\n<li>Id of parent row<\/li>\n<li>Product Id from parent row<\/li>\n<li>Review text from parent row<\/li>\n<li>Summary text from parent row<\/li>\n<li>Score from parent row<\/li>\n<\/ul>\n<p>All of these values will be stored in SearchFields specified in the code.<\/p>\n<p>In this step we also configure the search algorithm(s), and the vectorizer that will automatically vectorize the incoming query.<\/p>\n<p>Documentation about creating indexes can be found\u00a0<a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/search\/search-how-to-create-search-index?tabs=index-other-sdks\">here.<\/a><\/p>\n<h2><span class=\"cm-line\">Create skillset and indexer<\/span><\/h2>\n<p><span class=\"cm-line\">We use two built-in skills provided by Azure AI Search:\n<\/span><span class=\"cm-line\">1.\u00a0The\u00a0Split\u00a0Skill\u00a0takes\u00a0the\u00a0review\u00a0text\u00a0and\u00a0divides\u00a0it\u00a0into\u00a0chunks\u00a0(to\u00a0stay\u00a0within\u00a0the\u00a0token\u00a0limits\u00a0for\u00a0the\u00a0OpenAI\u00a0embedding\u00a0service).\n<\/span><span class=\"cm-line\">2.\u00a0The\u00a0Azure\u00a0Open\u00a0AI\u00a0Embedding\u00a0Skill\u00a0takes\u00a0the\u00a0outputs\u00a0of\u00a0the\u00a0Split\u00a0Skill\u00a0and\u00a0vectorizes\u00a0them\u00a0individually.\n<\/span><span class=\"cm-line\">\n<\/span><span class=\"cm-line\">Then we create an indexer that uses our skillset to pull data from the SQL Database, separate the text into chunks, vectorize each chunk, and store all the required data in the index we created above.\n<\/span><span class=\"cm-line\">\n<\/span><span class=\"cm-line\">Documentation is <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/search\/search-indexer-overview\">here<\/a> to start learning more about indexers and skillsets.<\/span><\/p>\n<h2>Use vector search for sample application<\/h2>\n<h2 style=\"font-weight: var(--jp-content-heading-font-weight);\">Perform queries<code class=\"language-py\"><\/code><\/h2>\n<pre class=\"prettyprint language-py\"><code class=\"language-py\">user_query = \"Canned dog food\"<\/code><\/pre>\n<p style=\"font-weight: var(--jp-content-heading-font-weight);\">In the following output, we find the top 3 chunks that are most relevant to the user&#8217;s query.<\/p>\n<p><span style=\"font-size: 10pt;\">Search score: 0.88524085<\/span><\/p>\n<p><span style=\"font-size: 10pt;\">Parent Id: 1 | Chunk id: f59640a3248d_1_pages_0<\/span><\/p>\n<p><span style=\"font-size: 10pt;\">Product Id: B001E4KFG0<\/span><\/p>\n<p><span style=\"font-size: 10pt;\">Text chunk: Summary: Good Quality Dog Food | Review: I have bought several of the Vitality canned dog food products and have found them all to be of good quality. The product looks more like a stew than a processed meat and it smells better.<\/span><\/p>\n<p><span style=\"font-size: 10pt;\">Review summary: Good Quality Dog Food<\/span><\/p>\n<p><span style=\"font-size: 10pt;\">Review text: I have bought several of the Vitality canned dog food products and have found them all to be of good quality. The product looks more like a stew than a processed meat and it smells better. My Labrador is finicky and she appreciates this product better than most.<\/span><\/p>\n<p><span style=\"font-size: 10pt;\">Review score: 5<\/span><\/p>\n<p><span style=\"font-size: 10pt;\">&#8212;&#8211;<\/span><\/p>\n<p><span style=\"font-size: 10pt;\">Search score: 0.87025785<\/span><\/p>\n<p><span style=\"font-size: 10pt;\">Parent Id: 94 | Chunk id: f327d3004d0c_94_pages_3<\/span><\/p>\n<p><span style=\"font-size: 10pt;\">Product Id: B0019CW0HE<\/span><\/p>\n<p><span style=\"font-size: 10pt;\">Text chunk: a couple of cans. I came home and to my surprise realized that I could save $20 each time I bought dog food if I just buy it off Amazon.&lt;br \/&gt;&lt;br \/&gt;All in all, I definitely recommend and give my stamp of approval to natural balance dog food. While I have never eaten it, my dog seems to love it.<\/span><\/p>\n<p><span style=\"font-size: 10pt;\">Review summary: Great Dog Food!<\/span><\/p>\n<p><span style=\"font-size: 10pt;\">Review text: My golden retriever is one of the most picky dogs I&#8217;ve ever met. After experimenting with various types of food, I have found she loves natural balance. What I really like about natural balance is the fact that it has multiple flavors in dry and wet varieties. I mix her dry food with a little wet food and my golden loves it. Furthermore, I do like mixing up the flavors each time as I think the same meal day over day might get a little boring, so I figured why not. I tend to stay away from the fish type though as it smells&#8230;&lt;br \/&gt;&lt;br \/&gt;Additionally, I started purchasing off Amazon because Petco didn&#8217;t have the wet food box and only had a couple of cans. I came home and to my surprise realized that I could save $20 each time I bought dog food if I just buy it off Amazon.&lt;br \/&gt;&lt;br \/&gt;All in all, I definitely recommend and give my stamp of approval to natural balance dog food. While I have never eaten it, my dog seems to love it.<\/span><\/p>\n<p><span style=\"font-size: 10pt;\">Review score: 5<\/span><\/p>\n<p><span style=\"font-size: 10pt;\">&#8212;&#8211;<\/span><\/p>\n<p><span style=\"font-size: 10pt;\">Search score: 0.8640232<\/span><\/p>\n<p><span style=\"font-size: 10pt;\">Parent Id: 98 | Chunk id: 0bc498329489_98_pages_0<\/span><\/p>\n<p><span style=\"font-size: 10pt;\">Product Id: B0019CW0HE<\/span><\/p>\n<p><span style=\"font-size: 10pt;\">Text chunk: Summary: Great allergy sensitive dog food, dogs love it | Review: Our pup has experienced allergies in forms of hotspots and itching from other dog foods. The cheap &#8216;you can buy it anywhere&#8217; food not only have crazy preservatives in them but can cause health problems for your pets.<\/span><\/p>\n<p><span style=\"font-size: 10pt;\">Review summary: Great allergy sensitive dog food, dogs love it<\/span><\/p>\n<p><span style=\"font-size: 10pt;\">Review text: Our pup has experienced allergies in forms of hotspots and itching from other dog foods. The cheap &#8216;you can buy it anywhere&#8217; food not only have crazy preservatives in them but can cause health problems for your pets. This food works wonders on reducing allergies and our dog loves the food.&lt;br \/&gt;This message is RAMSEY FrAnkenSteiN approved.<\/span><\/p>\n<p><span style=\"font-size: 10pt;\">Review score: 5<\/span><\/p>\n<p><span style=\"font-size: 10pt;\">&#8212;&#8211;<\/span><\/p>\n<h2>Call to OpenAI<\/h2>\n<pre class=\"prettyprint language-py\"><code class=\"language-py\">response = openai.Completion.create(\r\n    engine= config[\"openai_deployment_completion\"],\r\n    prompt=prompt,\r\n    max_tokens=1024,\r\n    n=1,\r\n    stop=None,\r\n    temperature=1,\r\n)\r\n\r\nprint(response['choices'][0]['text'])<\/code><\/pre>\n<p><span style=\"font-size: 10pt;\"><strong>Result:<\/strong> After searching through our product database, we recommend the Vitality canned dog food (B001E4KFG0). This product looks more like a stew than a processed meat, smells better, and was given a 5-star review by a finicky Labrador.<\/span><\/p>\n<h2>Clean up resources<\/h2>\n<p>After finishing the sample, remember to delete unneeded resources:<\/p>\n<ul>\n<li>Table created within existing SQL DB<\/li>\n<li>Within the Search Service resource:\n<ul>\n<li>Data source connection<\/li>\n<li>Index<\/li>\n<li>Skillset<\/li>\n<li>Indexer<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>These resources can always be recreated by rerunning the notebook.<\/p>\n<p>&nbsp;<\/p>\n<h2>Next Steps:<\/h2>\n<p>Learn more about Azure SQL and AI application development at <a href=\"https:\/\/aka.ms\/sql-ai\">https:\/\/aka.ms\/sql-ai<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>With public preview of integrated vectorization, a ground-breaking capability of vector search in Azure AI Search (previously Azure Cognitive Search), you can do vector search with data stored in Azure SQL Database easily. This feature is designed to streamline the process of chunking, generating, storing, and querying vectors for vector search in Azure AI Search. [&hellip;]<\/p>\n","protected":false},"author":99201,"featured_media":81,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[590,561,591],"class_list":["post-2598","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-azure-sql","tag-ai","tag-openai","tag-vector-search"],"acf":[],"blog_post_summary":"<p>With public preview of integrated vectorization, a ground-breaking capability of vector search in Azure AI Search (previously Azure Cognitive Search), you can do vector search with data stored in Azure SQL Database easily. This feature is designed to streamline the process of chunking, generating, storing, and querying vectors for vector search in Azure AI Search. [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/posts\/2598","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/users\/99201"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/comments?post=2598"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/posts\/2598\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/media\/81"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/media?parent=2598"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/categories?post=2598"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/tags?post=2598"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}