Announcing EAP for Vector Support in Azure SQL Database

Pooja Kamath

Davide Mauri

Image welcometotheworldofvector

We are super excited to announce the Early Adopter Preview of Native Vector Support in Azure SQL Database and are currently accepting requests from customers who wish to participate.

SQL beyond RDBMS

Vector databases have gained a lot of prominence lately. These specialized repositories handle vector data, crucial for applications like semantic search, chatbots, and recommendation systems.

However, a paradigm shift is underway: Why maintain a separate Vector database when Azure SQL Database can seamlessly accommodate vector embeddings?

By integrating vector search into Azure SQL, you simplify application development, coexisting with operational data for efficient similarity searches, joins, and aggregations—all while leveraging Azure SQL’s sophisticated query optimizer and robust enterprise features. The integration of vector search within RDBMS allows customers to perform vector similarity searches alongside traditional SQL queries, enhancing data analysis and decision-making.

Embeddings

Embedding is the mathematical representation of objects (such as words, paragraphs, documents, images, or audio) as dense vectors. These vectors are typically generated by deep learning models and are instrumental in various machine learning and AI applications. By translating data into a vector space, embeddings can effectively capture and quantify the semantic similarities between related concepts, facilitating tasks such as semantic search and natural language processing.

Embeddings capture semantic similarity between similar concepts. For example, consider word embeddings: Clusters of related words emerge (e.g., “computer,” “software,” “machine” in one cluster; “lion,” “cow,” “cat,” “dog” in another). The gap between clusters highlights dissimilarity in meaning or context

The real complex part is calculating the embeddings, but thanks to Azure OpenAI, everyone has an easily accessible REST service that can be used to get the embeddings using pre-trained ML models. Once the embeddings are generated, they can be stored into a SQL Server database. This allows you to store the embeddings alongside the data they represent, and to perform vector search queries to find similar data points.

A great article to understand how embeddings work, is the following: Introducing text and code embeddings.

Vector Search Use Cases

Vector search refers to the process of finding all vectors in a dataset that are similar to a specific query vector. Therefore, a query vector for the word human searches the entire dataset for similar vectors, and thus similar words: in this example it should find the word person as a close match. This closeness, or distance, is measured using a distance metric such as cosine distance

Image Screenshot 2024 05 16 183942

Similarity enables applications such as:

  • Search (where items are ranked by relevance to a query string)​
  • Clustering (where items are grouped by similarity)​
  • Recommendations (where related items are recommended)​
  • Anomaly detection (where outliers with little relatedness are identified)​
  • Diversity measurement (where similarity distributions are analyzed)​
  • Classification (where items are classified by their most similar label)

Feature Exploration

Azure SQL natively supports vectors allowing you to create, store and search vectors. Vectors are stored in a compact binary format so that calculating distance between two vectors is done in the fastest and most efficient way possible.

There are 4 functions introduced to perform operations on vectors in binary format and generated vectors can be stored in a VARBINARY(8000) column or variable:

Function Description
JSON_ARRAY_TO_VECTOR Creates a vector from a JSON array
ISVECTOR Tests whether a binary contains a valid vector
VECTOR_TO_JSON_ARRAY Returns a vector as a JSON array
VECTOR_DISTANCE Calculates the distance between two vectors using a specified distance metric

Example

Let us walk through an example of using Azure SQL DB to Create, Store Vector embeddings and perform similarity searches using our new functions.

The table has been created using the customer reviews dataset from FineFoods and enriching it with embeddings generated via the text-embedding-small Azure OpenAI model.

The embeddings have been generated using the concatenation of Summary + Text field.

Imagine a user asks, “What’s the best coffee?” We’ll transform their query into a vector and search our database of reviews to extract all products that are similar to provided question.

Let’s look at the data loaded in the table:

SELECT TOP(10) * FROM [dbo].[FineFoodReviews] ORDER BY Id

Results:

contents of finefoodreviews table

Storing Vectors

To have the broadest compatibility with any language and platform in the first wave vectors will take advantage of existing VARBINARY data type to store vector binary format.

Add the New Column: vector with the type VARBINARY(8000)

ALTER TABLE [dbo].[FineFoodReviews] ADD [VectorBinary] VARBINARY(8000);

Update the new column with the transformed data from the embedding column using the below function.

JSON_ARRAY_TO_VECTOR

We will pass the embeddings to the new built in function JSON_ARRAY_TO_VECTOR that will converts a JSON array to a compact binary representation of a vector

UPDATE [dbo].[FineFoodReviews]
SET [VectorBinary] = JSON_ARRAY_TO_VECTOR([vector]);
GO

Vectors are now stored in an efficient binary format that also enables usage of dedicated CPU vector processing extensions like SIMD and AVX.

SELECT TOP(5) ID, ProductId, Summary, Text, VectorBinary FROM [dbo].[FineFoodReviews] ORDER BY Id

Result:

Image vectorbinary

Specialized functions will allow developers to transform stored vector data back into JSON arrays and to check and mandate vector dimensionality

ISVECTOR

The function checks if the provided object is a valid vector: Returns 1 if valid, otherwise returns 0. Returns NULL if the expression is NULL

SELECT TOP(5) ISVECTOR(VectorBinary) AS isvector, VectorBinary, ProductId FROM [dbo].[FineFoodReviews] ORDER BY Id

Result:

Image isvector

You can also use the ISVECTOR function as a constraint. The following example alters the FineFoodReviews table to store vector embeddings coming from OpenAI using an embedding model that returns vectors with 1,536 dimensions.

ALTER TABLE [dbo].[FineFoodReviews] ADD [VectorBinary] VARBINARY(8000) CHECK (ISVECTOR([VectorBinary], 1536) = 1)

This ensures that the data in the [VectorBinary]column adheres to the specified vector requirements

VECTOR_TO_JSON_ARRAY

This function converts a vector in a compact binary format to a human-readable string format. The string format is the same as the one used by JSON to represent arrays.

SELECT TOP(5) ProductId, Summary, ISVECTOR(VectorBinary) AS isvector, VectorBinary, VECTOR_TO_JSON_ARRAY(VectorBinary) AS jsonvector FROM [dbo].[FineFoodReviews] ORDER BY ID

Result:

Image vector to jsonarray

Using a REST service to get embeddings

OpenAI models are available as REST endpoints and thus can be easily consumed from Azure SQL Database using the sp_invoke_external_rest_endpoint system stored procedure:

Using a call to a REST service to get embeddings is just one of the integration options you have when working with SQL Database and OpenAI.

You can let any of the available models access data stored in Azure SQL Database to create solutions where your users can interact with the data as shown here

For this specific sample you have to deploy an embeddings model text-embedding-small model, the same used for the source we are using in this sample. Once that is done, you need to get the API KEY and the URL of the deployed model and use in the below example

For example, you can use the stored procedure to get embeddings :

CREATE PROCEDURE [dbo].[GET_EMBEDDINGS]
(
    @model VARCHAR(MAX),
    @text NVARCHAR(MAX),
    @embedding VARBINARY(8000) OUTPUT
)
AS
BEGIN
    DECLARE @retval INT, @response NVARCHAR(MAX);
    DECLARE @url VARCHAR(MAX);
    DECLARE @payload NVARCHAR(MAX) = JSON_OBJECT('input': @text);

    -- Set the @url variable with proper concatenation before the EXEC statement
    SET @url = 'https://<resourcename>.openai.azure.com/openai/deployments/' + @model + '/embeddings?api-version=2023-03-15-preview';

    EXEC dbo.sp_invoke_external_rest_endpoint 
        @url = @url,
        @method = 'POST',   
        @payload = @payload,   
        @headers = '{"Content-Type":"application/json", "api-key":"<openAIkey>"}', 
        @response = @response OUTPUT;

    -- Use JSON_QUERY to extract the embedding array directly
    DECLARE @jsonArray NVARCHAR(MAX) = JSON_QUERY(@response, '$.result.data[0].embedding');

    
    SET @embedding = JSON_ARRAY_TO_VECTOR(@jsonArray);
END
GO

You can now call the stored procedure and use the embedding for similarity search against already stored embeddings as we will see below.

VECTOR_DISTANCE

This function calculates the distance between two vectors using a specified distance metric.

The following distance metrics are supported:

  • cosine – Cosine distance
  • euclidean – Euclidean distance
  • dot – (Negative) Dot product

Example Syntax: The following example creates a vector with three dimensions from a string with a JSON array

DECLARE @v1 VARBINARY(8000) = JSON_ARRAY_TO_VECTOR('[1,1]')
DECLARE @v2 VARBINARY(8000) = JSON_ARRAY_TO_VECTOR('[-1,-1]')

SELECT 
    VECTOR_DISTANCE('euclidean', @v1, @v2) AS euclidean,
    VECTOR_DISTANCE('cosine', @v1, @v2) AS cosine,
    VECTOR_DISTANCE('dot', @v1, @v2) AS negative_dot_product;

Similarity Search in Azure SQL DB

We now have the finefoodreviews table enriched with vector Embeddings for the customer reviews.

Let us now query our embedding table to get the top similar reviews given the User search query. Given any user search query, we can get the vector representation of that text. Then we can use that vector to calculate the cosine distance against all the customer review comments stored in the database and take only the closest ones which will return the product most likely connect to the product we are interested in. The reviews with the highest similarity are considered the most relevant to the query, helping users discover products or experiences related to their search.

  • We start by converting the user’s text query into its vector representation using an embedding model User text query :"healthy options instead of coke"
  • The resulting vector represents the semantic meaning of the query.

Cosine distance is then calculated between this query vector and the embeddings of reviews in our dataset.

--Assuming you have a stored procedure to get embeddings for a given text
DECLARE @e VARBINARY(8000);
EXEC dbo.GET_EMBEDDINGS @model = '<yourmodeldeploymentname>', @text = 'healthy options instead of coke', @embedding = @e OUTPUT;

SELECT TOP(10) ProductId,
               Summary,
               Text,
               VECTOR_DISTANCE('cosine', @e, VectorBinary) AS Distance
FROM dbo.FineFoodReviews
ORDER BY Distance;

Result:

Image vectorsearch

You can see the power of semantic search by finding reviews that are contextually related to the search query, even if they don’t contain exact match keywords.

Let’s look at a simple example of using Keyword Filters which leverages both semantic and keyword-based techniques to enhance search results. The below query combines filters to narrow down the review set and then calculates cosine distances to prioritize semantically related reviews. This approach balances relevance and computational efficiency.

We filter reviews based on specific criteria:

  • Exclude anonymous users.
  • Include reviews with a score of 2 or higher.
  • Consider only detailed reviews (text length > 50 characters).
  • Include reviews containing specific words like “gluten” or “dairy.”

We calculate the cosine distance between each review’s embedding and the query embedding based on user text "quick fix breakfast option for toddlers"

-- Assuming you have a stored procedure to get embeddings for a given text
DECLARE @e VARBINARY(8000);
EXEC dbo.GET_EMBEDDINGS @model = '<yourmodeldeploymentname>', @text = 'quick fix breakfast option for toddlers', @embedding = @e OUTPUT;

-- Comprehensive query with multiple filters.
SELECT TOP(10)
    f.Id,
    f.ProductId,
    f.UserId,
    f.Score,
    f.Summary,
    f.Text,
    VECTOR_DISTANCE('cosine', @e, VectorBinary) AS Distance,
    CASE 
        WHEN LEN(f.Text) > 100 THEN 'Detailed Review'
        ELSE 'Short Review'
    END AS ReviewLength,
    CASE 
        WHEN f.Score >= 4 THEN 'High Score'
        WHEN f.Score BETWEEN 2 AND 3 THEN 'Medium Score'
        ELSE 'Low Score'
    END AS ScoreCategory
FROM FineFoodReviews f
WHERE
    f.UserId NOT LIKE 'Anonymous%' -- User-based filter to exclude anonymous users
    AND f.Score >= 2 -- Score threshold filter
    AND LEN(f.Text) > 50 -- Text length filter for detailed reviews
    AND (f.Text LIKE '%gluten%' OR f.Text LIKE '%dairy%') -- Inclusion of specific words
ORDER BY
    Distance,  -- Order by distance
    f.Score DESC, -- Secondary order by review score
    ReviewLength DESC; -- Tertiary order by review length

Result:

Image semanticpluskeyword

Hence, we can see that vector search enhances semantic understanding and relevance, while traditional search provides simplicity and direct matches

More Samples

You can find more end-to-end samples at Azure-Samples/azure-sql-db-vector-search (github.com)

Private Preview Sign-up

We are currently accepting requests from customers who would like to participate in the private preview and try out the Native Vector Support for SQL feature. If you are interested, please fill out https://aka.ms/azuresql-vector-eap

Image QRCode for Vector Support in Azure SQL Database Private Preview Sign up 1

Wrapping up

Our journey to enhance SQL’s Vector capabilities has just begun with the Early Adopter preview. Join us in shaping the future of SQL databases by participating in the early adopter preview of Native Vector Support. You can submit your feedback using this Form or leave a comment on this blog.

Thank you for choosing Azure SQL Database as your data platform.

2 comments

Leave a comment

  • Ketan Gadre 0

    Thanks for a great article. It is very exciting to see Vector support in Azure SQL Db. I signed up and got access to the SQL Db private preview. However, when I tried to use JSON_ARRAY_TO_VECTOR function, it gave me the error “Msg 195, Level 15, State 10, Line 2, ‘JSON_ARRAY_TO_VECTOR’ is not a recognized built-in function name”. Do I have to do anything special to enable these functions in the Private Preview? Please advise.

    • Davide MauriMicrosoft employee 0

      Hi Ketan, that usually means that for some reasons the setting that enable the feature has not be set. I’ll follow up with you offline to check what’s going on.

Feedback usabilla icon