February 4th, 2025

Storing, querying and keeping embeddings updated: options and best practices

Davide Mauri
Principal Product Manager

Embeddings and vectors are becoming common terms not only for engineers involved in AI-related activities but also for those using databases. Some common points of discussion that frequently arise among users familiar with vectors and embeddings include:

  • How can I quickly “vectorize” the content already present in my database?
  • How can I keep the vectors updated whenever there is a change to the content from which they have been generated?
  • What if I have more than one column that I would like to use for generating embeddings?
  • Are there any best practices for storing and querying vectors in modern relational databases?

Let’s tackle each one of these questions one by one starting from the very first. In the next blog posts I’ll discuss about the other questions.

Quickly vectorize existing data

This is a straightforward question. While the answer is often “it depends,” in this case, there is a clear and direct solution: to efficiently vectorize data, batching requests is essential. The OpenAI API—and by extension, the Azure OpenAI service—supports receiving an array of strings for vectorization, thereby minimizing REST call overhead:

https://platform.openai.com/docs/api-reference/embeddings

The OpenAI API is rapidly becoming the standard, so it is highly likely that even if you plan to use your own model with an on-premises inference server, batch support for embedding processing is also available. For instance, Ollama already supports this feature:

https://github.com/ollama/ollama/blob/main/docs/api.md#generate-embeddings

Having established the optimal strategy, how can it be implemented? Several options exist, ranging from Azure Data Factory to Apache Spark. Essentially, any tool capable of reading from Azure SQL, batching requests, and making REST calls is suitable.

Parallelize and scale

However, turning text into embeddings at scale presents complexities. For instance, if the text to be vectorized is too lengthy, it must be chunked to ensure the generated embeddings accurately represent the underlying concepts. Additionally, embedding models may have inherent limitations, such as restrictions on the amount of text (in terms of tokens) that can be vectorized simultaneously.

To address these limitations and enhance scalability, one option is to scale out by deploying multiple embedding models in parallel. By distributing arrays of text to different models simultaneously, the solution can easily be scaled by adding more instances of the desired embedding model.

Image vectorized high level architecture

A working example

At a certain point, I was required to vectorize approximately 10 million rows of text and needed to complete this task as efficiently as possible. Therefore, I developed a sample to facilitate all the necessary processes. To ensure maximum customization and adaptability to various scenarios, I opted to use .NET and implemented everything from scratch utilizing the ConcurrentQueue class, which supports scaling out and parallelism to the greatest extent possible.

Image sample screenshot

The sample – that can be used as a standalone tool if you don’t need any custom behavior, otherwise feel free to contribute to it 🙂 – is available on GitHub:

https://github.com/Azure-Samples/azure-sql-db-vectorizer

The reason why vectorized data is stored in a dedicated table will be explained in the next blog post. Stay tuned!

Author

Davide Mauri
Principal Product Manager

Principal Product Manager in Azure SQL, with a career in IT spanning since 1997, earning the prestigious Data Platform MVP status for 12 consecutive years. Currently, he serves as the Principal Product Manager for Azure SQL Database, focusing on developers and AI.

2 comments

  • Mohammad Alomari 11 hours ago

    Hi Davide,
    Thank you very much for your amazing working examples. I have been following your articles and am eager to get a proof of concept (POC) up and running to search and interact with our data stored in a SQL database. I plan to try your latest example, but I am struggling to get your previous example (azure-sql-db-session-recommender-v2) up and running. Specifically, I am encountering an issue where the 'json_array_to_vector' function is not recognized as a built-in function.

    Could you please update the repository to reflect how it is currently working live at https://ai.microsofthq.vslive.com/? I have also reported this issue...

    Read more