May 21st, 2024

Announcing EAP for Vector Support in Azure SQL Database

Image welcometotheworldofvector

We are super excited to announce the Early Adopter Preview of Native Vector Support in Azure SQL Database and are currently accepting requests from customers who wish to participate.

Latest update – 06 November 2024

Vector Support is now available as Public Preview! Read the announcement here: Public Preview of Native Vector Support in Azure SQL Database! Access to full documentation here: Vector functions.

SQL beyond RDBMS

Vector databases have gained a lot of prominence lately. These specialized repositories handle vector data, crucial for applications like semantic search, chatbots, and recommendation systems.

However, a paradigm shift is underway: Why maintain a separate Vector database when Azure SQL Database can seamlessly accommodate vector embeddings?

By integrating vector search into Azure SQL, you simplify application development, coexisting with operational data for efficient similarity searches, joins, and aggregations—all while leveraging Azure SQL’s sophisticated query optimizer and robust enterprise features. The integration of vector search within RDBMS allows customers to perform vector similarity searches alongside traditional SQL queries, enhancing data analysis and decision-making.

Embeddings

Embedding is the mathematical representation of objects (such as words, paragraphs, documents, images, or audio) as dense vectors. These vectors are typically generated by deep learning models and are instrumental in various machine learning and AI applications. By translating data into a vector space, embeddings can effectively capture and quantify the semantic similarities between related concepts, facilitating tasks such as semantic search and natural language processing.

Embeddings capture semantic similarity between similar concepts. For example, consider word embeddings: Clusters of related words emerge (e.g., “computer,” “software,” “machine” in one cluster; “lion,” “cow,” “cat,” “dog” in another). The gap between clusters highlights dissimilarity in meaning or context

The real complex part is calculating the embeddings, but thanks to Azure OpenAI, everyone has an easily accessible REST service that can be used to get the embeddings using pre-trained ML models. Once the embeddings are generated, they can be stored into a SQL Server database. This allows you to store the embeddings alongside the data they represent, and to perform vector search queries to find similar data points.

A great article to understand how embeddings work, is the following: Introducing text and code embeddings.

Vector Search Use Cases

Vector search refers to the process of finding all vectors in a dataset that are similar to a specific query vector. Therefore, a query vector for the word human searches the entire dataset for similar vectors, and thus similar words: in this example it should find the word person as a close match. This closeness, or distance, is measured using a distance metric such as cosine distance

Image Screenshot 2024 05 16 183942

Similarity enables applications such as:

  • Search (where items are ranked by relevance to a query string)​
  • Clustering (where items are grouped by similarity)​
  • Recommendations (where related items are recommended)​
  • Anomaly detection (where outliers with little relatedness are identified)​
  • Diversity measurement (where similarity distributions are analyzed)​
  • Classification (where items are classified by their most similar label)

Feature Exploration

Azure SQL natively supports vectors allowing you to create, store and search vectors. Vectors are stored in a compact binary format so that calculating distance between two vectors is done in the fastest and most efficient way possible.

There are 4 functions introduced to perform operations on vectors in binary format and generated vectors can be stored in a VECTOR column or variable:

 

Function Description 11 Sept Update
JSON_ARRAY_TO_VECTOR (Deprecated) Creates a vector from a JSON array Not needed anymore, you can CAST a JSON array string like ‘[1,2,3]’ to VECTOR
ISVECTOR (Deprecated) Tests whether a binary contains a valid vector Not needed anymore as vector type assure correct format already
VECTOR_TO_JSON_ARRAY (Deprecated) Returns a vector as a JSON array Not needed anymore, you can CAST a JSON array from vector to NVARCHAR to get a JSON array string back
VECTOR_DISTANCE Calculates the distance between two vectors using a specified distance metric No changes

Example

Let us walk through an example of using Azure SQL DB to Create, Store Vector embeddings and perform similarity searches using our new functions.

The table has been created using the customer reviews dataset from FineFoods and enriching it with embeddings generated via the text-embedding-small Azure OpenAI model.

The embeddings have been generated using the concatenation of Summary + Text field.

Imagine a user asks, “What’s the best coffee?” We’ll transform their query into a vector and search our database of reviews to extract all products that are similar to provided question.

Let’s look at the data loaded in the table:

SELECT TOP(10) * FROM [dbo].[FineFoodReviews] ORDER BY Id

Results:

contents of finefoodreviews table

Storing Vectors

To have the broadest compatibility with any language and platform in the first wave vectors will take advantage of existing VARBINARY data type to store vector binary format.

Add the New Column: vector with the type VECTOR(1536)

ALTER TABLE [dbo].[FineFoodReviews] ADD [VectorBinary] VECTOR(1536);

Update the new column with the transformed data from the embedding column using the below function.

Creating a vector from a JSON array

We will cast the embeddings – stored as JSON Array – to the vector type, to create a compact binary representation of a vector

UPDATE [dbo].[FineFoodReviews]
SET [VectorBinary] = CAST([vector] AS VECTOR(1536));
GO

Vectors are now stored in an efficient binary format that also enables usage of dedicated CPU vector processing extensions like SIMD and AVX.

SELECT TOP(5) ID, ProductId, Summary, Text, VectorBinary FROM [dbo].[FineFoodReviews] ORDER BY Id

Result:

Image vectorbinary

Specialized functions will allow developers to transform stored vector data back into JSON arrays and to check and mandate vector dimensionality

Getting a JSON array from vector

To transform the compact binary format to a human-readable string format you can just cast the vector binary format to a string:

SELECT TOP(5) ProductId, Summary, VectorBinary, CAST(VectorBinary as NVARCHAR(MAX)) AS jsonvector FROM [dbo].[FineFoodReviews] ORDER BY ID

Result:

Image vector to jsonarray

Using a REST service to get embeddings

OpenAI models are available as REST endpoints and thus can be easily consumed from Azure SQL Database using the sp_invoke_external_rest_endpoint system stored procedure:

Using a call to a REST service to get embeddings is just one of the integration options you have when working with SQL Database and OpenAI.

You can let any of the available models access data stored in Azure SQL Database to create solutions where your users can interact with the data as shown here

For this specific sample you have to deploy an embeddings model text-embedding-small model, the same used for the source we are using in this sample. Once that is done, you need to get the API KEY and the URL of the deployed model and use in the below example

For example, you can use the stored procedure to get embeddings :

CREATE PROCEDURE [dbo].[GET_EMBEDDINGS]
(
    @model VARCHAR(MAX),
    @text NVARCHAR(MAX),
    @embedding VECTOR(1536) OUTPUT
)
AS
BEGIN
    DECLARE @retval INT, @response NVARCHAR(MAX);
    DECLARE @url VARCHAR(MAX);
    DECLARE @payload NVARCHAR(MAX) = JSON_OBJECT('input': @text);

    -- Set the @url variable with proper concatenation before the EXEC statement
    SET @url = 'https://<resourcename>.openai.azure.com/openai/deployments/' + @model + '/embeddings?api-version=2023-03-15-preview';

    EXEC dbo.sp_invoke_external_rest_endpoint 
        @url = @url,
        @method = 'POST',   
        @payload = @payload,   
        @headers = '{"Content-Type":"application/json", "api-key":"<openAIkey>"}', 
        @response = @response OUTPUT;

    -- Use JSON_QUERY to extract the embedding array directly
    DECLARE @jsonArray NVARCHAR(MAX) = JSON_QUERY(@response, '$.result.data[0].embedding');

    
    SET @embedding = CAST(@jsonArray as VECTOR(1536));
END
GO

You can now call the stored procedure and use the embedding for similarity search against already stored embeddings as we will see below.

VECTOR_DISTANCE

This function calculates the distance between two vectors using a specified distance metric.

The following distance metrics are supported:

  • cosine – Cosine distance
  • euclidean – Euclidean distance
  • dot – (Negative) Dot product

Example Syntax: The following example creates a vector with three dimensions from a string with a JSON array

DECLARE @v1 VARBINARY(8000) = JSON_ARRAY_TO_VECTOR('[1,1]')
DECLARE @v2 VARBINARY(8000) = JSON_ARRAY_TO_VECTOR('[-1,-1]')

SELECT 
    VECTOR_DISTANCE('euclidean', @v1, @v2) AS euclidean,
    VECTOR_DISTANCE('cosine', @v1, @v2) AS cosine,
    VECTOR_DISTANCE('dot', @v1, @v2) AS negative_dot_product;

Similarity Search in Azure SQL DB

We now have the finefoodreviews table enriched with vector Embeddings for the customer reviews.

Let us now query our embedding table to get the top similar reviews given the User search query. Given any user search query, we can get the vector representation of that text. Then we can use that vector to calculate the cosine distance against all the customer review comments stored in the database and take only the closest ones which will return the product most likely connect to the product we are interested in. The reviews with the highest similarity are considered the most relevant to the query, helping users discover products or experiences related to their search.

  • We start by converting the user’s text query into its vector representation using an embedding model User text query :"healthy options instead of coke"
  • The resulting vector represents the semantic meaning of the query.

Cosine distance is then calculated between this query vector and the embeddings of reviews in our dataset.

--Assuming you have a stored procedure to get embeddings for a given text
DECLARE @e VECTOR(1536);
EXEC dbo.GET_EMBEDDINGS @model = '<yourmodeldeploymentname>', @text = 'healthy options instead of coke', @embedding = @e OUTPUT;

SELECT TOP(10) ProductId,
               Summary,
               Text,
               VECTOR_DISTANCE('cosine', @e, VectorBinary) AS Distance
FROM dbo.FineFoodReviews
ORDER BY Distance;

Result:

Image vectorsearch

You can see the power of semantic search by finding reviews that are contextually related to the search query, even if they don’t contain exact match keywords.

Hybrid Search

Let’s look at a simple example of using Keyword Filters which leverages both semantic and keyword-based techniques to enhance search results. The below query combines filters to narrow down the review set and then calculates cosine distances to prioritize semantically related reviews. This approach balances relevance and computational efficiency.

We filter reviews based on specific criteria:

  • Exclude anonymous users.
  • Include reviews with a score of 2 or higher.
  • Consider only detailed reviews (text length > 50 characters).
  • Include reviews containing specific words like “gluten” or “dairy.”

We calculate the cosine distance between each review’s embedding and the query embedding based on user text "quick fix breakfast option for toddlers"

-- Assuming you have a stored procedure to get embeddings for a given text
DECLARE @e VECTOR(1536);
EXEC dbo.GET_EMBEDDINGS @model = '<yourmodeldeploymentname>', @text = 'quick fix breakfast option for toddlers', @embedding = @e OUTPUT;

-- Comprehensive query with multiple filters.
SELECT TOP(10)
    f.Id,
    f.ProductId,
    f.UserId,
    f.Score,
    f.Summary,
    f.Text,
    VECTOR_DISTANCE('cosine', @e, VectorBinary) AS Distance,
    CASE 
        WHEN LEN(f.Text) > 100 THEN 'Detailed Review'
        ELSE 'Short Review'
    END AS ReviewLength,
    CASE 
        WHEN f.Score >= 4 THEN 'High Score'
        WHEN f.Score BETWEEN 2 AND 3 THEN 'Medium Score'
        ELSE 'Low Score'
    END AS ScoreCategory
FROM FineFoodReviews f
WHERE
    f.UserId NOT LIKE 'Anonymous%' -- User-based filter to exclude anonymous users
    AND f.Score >= 2 -- Score threshold filter
    AND LEN(f.Text) > 50 -- Text length filter for detailed reviews
    AND (f.Text LIKE '%gluten%' OR f.Text LIKE '%dairy%') -- Inclusion of specific words
ORDER BY
    Distance,  -- Order by distance
    f.Score DESC, -- Secondary order by review score
    ReviewLength DESC; -- Tertiary order by review length

Result:

Image semanticpluskeyword

Hence, we can see that vector search enhances semantic understanding and relevance, while traditional search provides simplicity and direct matches

More Samples

You can find more end-to-end samples at Azure-Samples/azure-sql-db-vector-search (github.com)

Private Preview Sign-up

We are currently accepting requests from customers who would like to participate in the private preview and try out the Native Vector Support for SQL feature. If you are interested, please fill out https://aka.ms/azuresql-vector-eap

Image QRCode for Vector Support in Azure SQL Database Private Preview Sign up 1

Wrapping up

Our journey to enhance SQL’s Vector capabilities has just begun with the Early Adopter preview. Join us in shaping the future of SQL databases by participating in the early adopter preview of Native Vector Support. You can submit your feedback using this Form or leave a comment on this blog.

Thank you for choosing Azure SQL Database as your data platform.

Author

Pooja Kamath
Senior Product Manager
Davide Mauri
Principal Product Manager

Principal Product Manager in Azure SQL, with a career in IT spanning since 1997, earning the prestigious Data Platform MVP status for 12 consecutive years. Currently, he serves as the Principal Product Manager for Azure SQL Database, focusing on developers and AI.

5 comments

Discussion is closed. Login to edit/delete existing comments.

  • 김동현

    I just have applied for the EAP to try out the vector search feature.
    How long will it take to receive the results?

    • Pooja KamathMicrosoft employee Author

      Thank you for your interest! You should have received an email from me to fill in the opt-in form with your server details and accept the preview terms yesterday. Once we have those details we can enable your server in 1-2 business days. Please reach out to me directly on pookam@microsoft.com if you haven’t received an email from us.

      • 김동현

        thank you for your reply!

        I have found the email you sent, and I will review the contents to proceed accordingly.

  • Ketan Gadre

    Thanks for a great article. It is very exciting to see Vector support in Azure SQL Db. I signed up and got access to the SQL Db private preview. However, when I tried to use JSON_ARRAY_TO_VECTOR function, it gave me the error "Msg 195, Level 15, State 10, Line 2, 'JSON_ARRAY_TO_VECTOR' is not a recognized built-in function name". Do I have to do anything special to enable these functions in the Private Preview? Please advise.

    Read more
    • Davide MauriMicrosoft employee

      Hi Ketan, that usually means that for some reasons the setting that enable the feature has not be set. I’ll follow up with you offline to check what’s going on.