Vector Search with Azure SQL Database

Muazma Zahid

With public preview of integrated vectorization, a ground-breaking capability of vector search in Azure AI Search (previously Azure Cognitive Search), you can do vector search with data stored in Azure SQL Database easily. This feature is designed to streamline the process of chunking, generating, storing, and querying vectors for vector search in Azure AI Search. This feature significantly speeds up the development of the vectorization pipeline and minimizes maintenance tasks during data ingestion and query time. It paves the way for seamless integration of vectors into your applications with traditional search demands and Retrieval-Augmented Generation (RAG) applications.

Learn more about Azure AI Search Integrated vectorization announcement.

Key Concepts in Integrated Vectorization

  • Vector search: In Azure AI Search, this is a capability for indexing, storing, and retrieving vector embeddings from a search index. By representing text as vectors, vector search can identify the most similar documents based on their proximity in a vector space. In vector search, vectorization refers to the conversion of text data into vector embeddings.
  • Chunking: Process of dividing data into smaller manageable parts (chunks) that can be processed independently. Chunking is required if source documents are too large for the maximum input size of embedding and/or large language models.
  • Retrieval Augmented Generation (RAG): Architecture that augments the capabilities of a Large Language Model (LLM) like ChatGPT by adding an information retrieval system (i.e., Azure AI Search) that provides the data.

Example with data stored in Azure SQL Database

Let’s create an example of an AI application that responds to users’ queries based on the Azure SQL DB table of Amazon product reviews. This example is using a Python notebook. The same operations can be done via the Azure portal or scripting in your favorite deployment option. You can find this example at: https://aka.ms/sql-ai/sql-acs/sample  (Thanks to our SQL Data Science Team especially Jordan DuBeau).

The end behavior will be something like:

[User search]: Canned dog food

[AI Response]: After searching through our product database, I recommend <product ID> because…

Behind the scenes, we take the following steps:

  • Set up a sample table in SQL DB and upload data to it.
  • Set up an index in Azure AI Search to store the data we need, including vectorized versions of the text reviews.
  • Set up an indexer in Azure AI Search to pull data into the index.
    • Automatically chunks and vectorizes the data using an Azure OpenAI Embedding service.
  • Use Azure AI Search to process the user’s query and search for the most relevant data.
  • Use an Azure OpenAI Completion service to respond to the user’s query.

Architecture

Image Integerated Vectorization 1

Setting up environment

You will need to set up the environment and add the environment information in example.env file.

Pre-requisites in this example:

  • An existing SQL Database with server name, DB name, username, and password copied into example.env
    • The user must have permission to create a new table and enable and view change tracking on the database
    • You must whitelist your IP to access your SQL server by opening the SQL server resource in the Azure portal, navigating to Security / Networking, and adding your IP.
  • An OpenAI resource with the endpoint and key copied into example.env
  • An Azure AI Search resource with the endpoint and key copied into example.env
  • The Python packages listed in requirements.txt
  • The Microsoft ODBC 18

This sample shows how to load data as well, you can skip that step and start with your own product data if needed.

Next, create a data source connection. This step creates a connection that will be used to pull data from our SQL table. Documentation can be found here.

Chunk, vectorize, configure, and store data

Create an index and configure integrated vectorization

Here are the steps:

  1. Take the combined text (summary + review text) from each product review.
  2. Split the combined text into chunks.
  3. Embed each chunk as a vector.
  4. (Later) search for the most relevant chunk based on the incoming query.

To enable this, the search index will store all of the following data, for each chunk of text:

  • Id of chunk
  • Chunk text
  • Vector version of chunk text
  • Id of parent row
  • Product Id from parent row
  • Review text from parent row
  • Summary text from parent row
  • Score from parent row

All of these values will be stored in SearchFields specified in the code.

In this step we also configure the search algorithm(s), and the vectorizer that will automatically vectorize the incoming query.

Documentation about creating indexes can be found here.

Create skillset and indexer

We use two built-in skills provided by Azure AI Search: 1. The Split Skill takes the review text and divides it into chunks (to stay within the token limits for the OpenAI embedding service). 2. The Azure Open AI Embedding Skill takes the outputs of the Split Skill and vectorizes them individually. Then we create an indexer that uses our skillset to pull data from the SQL Database, separate the text into chunks, vectorize each chunk, and store all the required data in the index we created above. Documentation is here to start learning more about indexers and skillsets.

Use vector search for sample application

Perform queries

user_query = "Canned dog food"

In the following output, we find the top 3 chunks that are most relevant to the user’s query.

Search score: 0.88524085

Parent Id: 1 | Chunk id: f59640a3248d_1_pages_0

Product Id: B001E4KFG0

Text chunk: Summary: Good Quality Dog Food | Review: I have bought several of the Vitality canned dog food products and have found them all to be of good quality. The product looks more like a stew than a processed meat and it smells better.

Review summary: Good Quality Dog Food

Review text: I have bought several of the Vitality canned dog food products and have found them all to be of good quality. The product looks more like a stew than a processed meat and it smells better. My Labrador is finicky and she appreciates this product better than most.

Review score: 5

—–

Search score: 0.87025785

Parent Id: 94 | Chunk id: f327d3004d0c_94_pages_3

Product Id: B0019CW0HE

Text chunk: a couple of cans. I came home and to my surprise realized that I could save $20 each time I bought dog food if I just buy it off Amazon.<br /><br />All in all, I definitely recommend and give my stamp of approval to natural balance dog food. While I have never eaten it, my dog seems to love it.

Review summary: Great Dog Food!

Review text: My golden retriever is one of the most picky dogs I’ve ever met. After experimenting with various types of food, I have found she loves natural balance. What I really like about natural balance is the fact that it has multiple flavors in dry and wet varieties. I mix her dry food with a little wet food and my golden loves it. Furthermore, I do like mixing up the flavors each time as I think the same meal day over day might get a little boring, so I figured why not. I tend to stay away from the fish type though as it smells…<br /><br />Additionally, I started purchasing off Amazon because Petco didn’t have the wet food box and only had a couple of cans. I came home and to my surprise realized that I could save $20 each time I bought dog food if I just buy it off Amazon.<br /><br />All in all, I definitely recommend and give my stamp of approval to natural balance dog food. While I have never eaten it, my dog seems to love it.

Review score: 5

—–

Search score: 0.8640232

Parent Id: 98 | Chunk id: 0bc498329489_98_pages_0

Product Id: B0019CW0HE

Text chunk: Summary: Great allergy sensitive dog food, dogs love it | Review: Our pup has experienced allergies in forms of hotspots and itching from other dog foods. The cheap ‘you can buy it anywhere’ food not only have crazy preservatives in them but can cause health problems for your pets.

Review summary: Great allergy sensitive dog food, dogs love it

Review text: Our pup has experienced allergies in forms of hotspots and itching from other dog foods. The cheap ‘you can buy it anywhere’ food not only have crazy preservatives in them but can cause health problems for your pets. This food works wonders on reducing allergies and our dog loves the food.<br />This message is RAMSEY FrAnkenSteiN approved.

Review score: 5

—–

Call to OpenAI

response = openai.Completion.create(
    engine= config["openai_deployment_completion"],
    prompt=prompt,
    max_tokens=1024,
    n=1,
    stop=None,
    temperature=1,
)

print(response['choices'][0]['text'])

Result: After searching through our product database, we recommend the Vitality canned dog food (B001E4KFG0). This product looks more like a stew than a processed meat, smells better, and was given a 5-star review by a finicky Labrador.

Clean up resources

After finishing the sample, remember to delete unneeded resources:

  • Table created within existing SQL DB
  • Within the Search Service resource:
    • Data source connection
    • Index
    • Skillset
    • Indexer

These resources can always be recreated by rerunning the notebook.

 

Next Steps:

Learn more about Azure SQL and AI application development at https://aka.ms/sql-ai

1 comment

Discussion is closed. Login to edit/delete existing comments.

  • Chris CoulthrustMicrosoft employee 0

    Amazing ! Nice work

Feedback usabilla icon