In the previous article of this series, it was discussed how embeddings can be quickly created from data already in Azure SQL. This is a useful starting point, but since data in a database changes frequently, a common question arises: “How can the vectors be kept updated whenever there is a change to the content from which they have been generated?” This article aims to address that question. Solutions to keep embeddings updated Several methods can be employed to ensure that embeddings are updated to reflect any modifications made to the content from which they are generated:
- Using a Database Trigger
- Using Change Tracking
- Using an Azure Function Sql Trigger binding
- Using Azure Logic Apps
- Using Change Data Capture
- Using the new Change Event Stream
This is by far the simplest solution as it only requires T-SQL code and it take advantage of database trigger to call an embedding model for vectorizing the text.
A after insert, update trigger is created on the table that contains the data that needs to be vectorized and internally the trigger uses sp_invoke_external_rest_endpoint
to call to the embedding model and get the embedding vector back. To make the code cleaner and favor reuse, it is recommended to create a stored procedure to retrieve the embeddings, for example:
create database scoped credential [https://<your-azure-openai-endpoint>.openai.azure.com]
with identity = 'HTTPEndpointHeaders', secret = '{"api-key": ""}'; -- Add your Azure OpenAI Key
go
create or alter procedure [dbo].[get_embedding]
@inputText nvarchar(max),
@embedding vector(1536) output
as
declare @retval int;
declare @payload nvarchar(max) = json_object('input': @inputText);
declare @response nvarchar(max)
exec @retval = sp_invoke_external_rest_endpoint
@url = 'https://<your-azure-openai-endpoint>.openai.azure.com/openai/deployments/<deployment-name>/embeddings?api-version=2023-03-15-preview',
@method = 'POST',
@credential = [https://<your-azure-openai-endpoint>.openai.azure.com],
@payload = @payload,
@response = @response output;
set @embedding = json_query(@response, '$.result.data[0].embedding');
return @retval
And then use it in the trigger:
declare @content_vector vector(1536);
exec @retval = [dbo].[get_embedding] @content, @content_vector output with result sets none
It is important to understand that the trigger is part of the transaction that modifies the data in the table, which means that the performance of the transaction will be proportional to the performance of the embedding generation. If the embedding generation takes considerable time (for instance, if requests are being throttled by the called model), all the locks and resources used by the transaction will be held for a longer period, potentially resulting in reduced concurrency.
Therefore, this approach is recommended primarily for proof-of-concepts or scenarios where concurrent access to the table is very limited. In SQL Server, it is recommended to use read committed snapshot to minimize read-write contention. On Azure SQL, this setting is enabled by default, so no additional action is required.
You can find a sample showing trigger usage here: Vector Search Samples with T-SQL
Using Change Tracking
Change tracking is a powerful yet underutilized technology. It allows querying a table to retrieve all changes since the last query or from a specific version, making it ideal for detecting inserted or updated rows and updating embeddings accordingly. Lightweight and efficient, change tracking adds minimal overhead, perfect for such scenarios. Learn more about change tracking here:
With change tracking, you can create a micro-batching solution. This could be a scheduled application or a simple T-SQL scheduled batch that will update the desired tables of changes every second. It will connect to the chosen embedding model only when necessary, generate the embedding, and then update the table with the generated values.
Using an Azure Function Sql Trigger binding
Change tracking is a viable option to keep embeddings updated, but it requires some coding to establish a functional end-to-end solution. However, coding to manage change tracking is not necessary if Azure Functions are used. Azure Functions offer the option, through the SQL Trigger binding, to respond to any changes in monitored tables, allowing the execution of code – whether in C#, Python, JavaScript, or Java – whenever rows are inserted, updated, or deleted.
Azure Function Bindings – Azure SQL Trigger
This Azure Function binding utilizes Change Tracking to provide benefits without needing additional infrastructure or plumbing code. You only need to write the logic to connect to the desired embedding model to generate the embedding and update the inserted or changed rows.
A working example of this approach is available here: https://github.com/Azure-Samples/azure-sql-db-session-recommender-v2
Using Azure Logic Apps
Azure Logic Apps simplify the process by eliminating the need for coding. The tasks that can be performed using Change Tracking and Azure Function SQL Trigger binding can also be done in Azure Logic Apps through the workflow editor, where you can design the process of monitoring a table for changes, generating the embedding, and storing them back into the database. More information is available here:
Azure Logic Apps Connectors – Add a SQL Server Trigger
Using Change Data Capture
Change Data Capture is a technology available in the MSSQL Engine that allows developers to monitor a table for changes. It is more comprehensive and powerful than Change Tracking as it automatically stores all the changes made to the monitored table into a dedicated table, preserving the entire history.
When used with tools like Debezium, it enables the creation of Change Stream solutions where data can be sent to Apache Kafka or Azure Event Hubs for further processing: Azure SQL and SQL Server Change Stream with Debezium.
Once the changed data is available in Apache Kafka or Azure Event Hubs, it is straightforward to use a custom application or an Azure Function to consume the change feed, generate the related embeddings, and then save them back to the database.
Using the new Change Event Stream
The newly announced Change Event Stream automates the process based on the concept of Change Data Capture. Once the data is available in Azure Event Hubs, executing an Azure Function to generate embeddings for the changed data and then saving it back to the database requires only a few lines of code.
Conclusion
To ensure embeddings in Azure SQL remain up-to-date, various methods such as database triggers, change tracking, and Azure Functions can be employed. Each approach has its own benefits, from the simplicity of database triggers for proof-of-concept scenarios to the more robust capabilities of Change Data Capture and Azure Logic Apps for handling extensive data changes. By selecting the most suitable method based on specific requirements, organizations can maintain the accuracy and relevance of their embeddings, enhancing their AI-driven applications.
0 comments
Be the first to start the discussion.