Vector Search with Azure Cosmos DB
Azure Cosmos DB NoSQL features advanced vector indexing and search capabilities powered by DiskANN, a suite of highly scalable, accurate, and cost-effective approximate nearest neighbor (ANN) algorithms for low-latency vector search at any scale.
Azure Cosmos DB features key capabilities important for building Modern AI applications:
- A mission critical vector store with industry leading similarity search performance without relying on niche solutions.
- A no-ETL solution with keeping data and vectors together. This drastically reduces complexity and overhead of AI application architectures.
- The ability to combine vector search with flexible query filters to support a wide range of use cases and scenarios.
- Ways to get started quickly and cost-effectively with Azure Cosmos DB’s serverless mode or take advantage of dynamic and instant Autoscale capabilities in provisioned throughput mode.
- Production-ready capabilities including 5 levels of built-in multitenancy, global replication and industry leading SLAs, with up to 99.999% availability.
In this multi-part series, we dive into several aspects of vector indexing, vector similarity search performance, and highlight best practices with Azure Cosmos DB for NoSQL. To kick things off in Part 1, we:
- Explore full space vector search along with performance and cost characteristics.
- Share guidance around best practices for building a vector search solution using Azure Cosmos DB for NoSQL
Exploring Vector Indexing with the Cohere Wiki-Embeddings dataset
To ground our discussion in a real-world scenario, we use the Wikipedia dataset embedded using Cohere’s multilingual-22-12 model. This dataset contains embeddings of Wikipedia articles, chunked into passages, with one 768-dimensional vector per passage. For simplicity, we use pre-processed embedding-only slices for the English text of the dataset hosted by BigANN. We’ll build on this repository in future blog posts to cover more advanced features. To download the dataset, use the download script included in the GitHub repository.
Getting Started with the DiskANN vector index for Azure Cosmos DB
If you are new to vector databases, we highly recommend reading this great introduction about ‘Vector Databases’ at Microsoft Learn. Azure Cosmos DB for NoSQL’s Indexing Policy documentation is another great resource to get started. We highlight the most relevant aspects of the scenario below. The code to replicate the instructions in this post is available on GitHub at ‘VectorIndexScenarioSuite’. For the full code walkthrough browse through the scenario at WikiCohereEnglishEmbeddingOnlyScenario.
Resource and Container Setup
Step 1: Setup an Azure Cosmos DB for NoSQL resource in your Azure subscription.
Step 2: In the Azure Portal, navigate to your Azure Cosmos DB for NoSQL resource, select the “Features” tab, and then enroll in the “Vector Search for NoSQL API (preview)”.
Step 3: Signup for DiskANN
This can be done by filling out this form. Your Azure Cosmos DB account will be enrolled in the DiskANN preview within 1 week, and you’ll receive an email notifying you when this is completed.
Step 4: Create a collection with a Container Vector Policy and Indexing Policy
Before we create a vector index, we need to declare the properties of the embeddings we plan to index via the Container Vector Policy. For the Wiki-Cohere dataset, we declare that the embeddings are of type Float32, have 768 dimensions and should be compared with the Dot Product distance function in the following C# snippet:
{
new Embedding()
{
Path = EMBEDDING_PATH,
DataType = VectorDataType.Float32,
DistanceFunction = DistanceFunction.DotProduct,
Dimensions = 768,
}
}
With the Container Vector Policy defined, we can create a DiskANN index over these embeddings by declaring a Vector Indexing Policy on the ‘EMBEDDING_PATH’ ‘/embedding’. While adding a vector index is optional, it’s strongly recommended as it makes vector search efficient (lower latency, higher throughput, and less RU consumption). Without a vector index, vector search queries would scan all the embeddings in the collection and will consume a very large number of RUs. Enabling a DiskANN index creates an efficient graph-structured index over the embeddings and allows the query to be answered efficiently by traversing only a fraction of the embeddings in the index.
IndexingPolicy = new IndexingPolicy()
{
VectorIndexes = new()
{
new VectorIndexPath()
{
Path = EMBEDDING_PATH,
Type = VectorIndexType.DiskANN,
}
}
}
Key callouts:
- We highly recommend adding the vector path to the “excludedPaths” section of the indexing policy (to avoid indexing as a regular array path and ensure optimized performance for insertion). This will be the default behavior going forward after upcoming release in Fall 2024.
- While DiskANN is in preview, we only allow a single DiskANN index path per container. This limitation will be relaxed in an upcoming release later in Fall 2024.
Data Ingestion
We can use bulk ingestion support in Azure Cosmos DB SDK to effectively ingest embedding data with high throughput. This can be done by specifying the ‘AllowBulkExecution’ client option. For more details on bulk support, please see this resource.
CosmosClientOptions cosmosClientOptions = new()
{
AllowBulkExecution = true,
};
To further speed up ingestion, you can send multiple batches in parallel. Please refer to the scenario code for a complete example.
Key callouts:
- Azure Cosmos DB will skip vector indexing a document if the content in the relevant path does not meet both the type and the dimension specifications from Container’s Vector Embedding Policy.
- DiskANN requires a small number of records (at least 1,000 vectors) to bootstrap accurate quantization of the vectors. Azure Cosmos DB vector queries will fall back to scanning the data when the total number of vectors ingested are fewer than 1,000. This means that vector search queries will be slower and cost more RUs.
- Vector indexing with DiskANN is an asynchronous process and time to ingest / search depends on the size of vector embedding, number of documents ingested and the amount of spare RU/s (throughput) available.
Azure Cosmos DB normalizes the cost of data operations with Request Units (RUs) and abstracts any internal system resources such as CPU and IOPS. The cost of ingesting documents depends on document size and indexing. A rough estimate of the RU cost for ingesting and indexing vectors with DiskANN is:
- 20 RUs for a small document with a 768-dimensional vector (this data set).
- 30 RUs for a small document with a 1,536-dimensional vector
- 50 RUs for a small document with a 3,072-dimensional vector.
The RU cost is subject to change during the preview of DiskANN.
Executing a Vector Search Query
Query Experience
You can now perform vector search query using the new Vector Distance system function in conjunction with an ORDER BY clause.
string queryText = $"SELECT TOP {K} c.id, VectorDistance(c.{EMBEDDING_COLOUMN}, @vectorEmbedding) AS similarityScore " +
$"FROM c ORDER BY VectorDistance(c.{EMBEDDING_COLOUMN}, @vectorEmbedding, false)";
var queryDef = new QueryDefinition(queryText).WithParameter("@vectorEmbedding", vector);
Here we define a query ordered by EMBEDDING_COLUMN ‘/embedding’ (declared in vector policy, which makes it a vector query). In addition, we can use the ‘VectorDistance’ function with the select clause to project similarity score along with any other attribute.
Here is a sample Index Metrics to tell if Index was used or not.
{
"UtilizedIndexes": {
"SingleIndexes": [
{
"IndexSpec": "\/embedding\/?"
}
],
"CompositeIndexes": []
},
"PotentialIndexes": {
"SingleIndexes": [],
"CompositeIndexes": []
}
}
Please note that DiskANN, like the Quantized Flat index, uses quantized vectors to guide its search (Learn more here). The Quantized Flat index consists solely of one quantized version of each vector. To answer queries, it scans the entire collection of quantized vectors and re-ranks about the top-5TimesK closest to the query based on distance to the full precision vectors. If the DiskANN index build progress is below a certain threshold, the Azure Cosmos DB query engine can fall back to using a scan of the quantized vectors as in the Quantized Flat index. In this case Index Metrics will show path in the ‘IndexSpec’ section but may not be using full DiskANN capabilities.
Query Performance
Key metrics to consider for a Vector Search query scenario:
- Throughput (number of query requests database can process).
- Latency (Time taken to process a query).
- Recall / Accuracy (proportion of true nearest neighbors correctly identified by the approximate indexing).
- Cost.
We encourage readers to experiment and measure these properties with the GitHub application. You can do this with a single click for the Wikipedia Cohere scenario. For general best practices with Azure Cosmos DB, please refer to this detailed performance guide. To simulate a real-world scenario, we use a pre-computed slice of 5000 query vectors which do not overlap with the vectors in the index.
Throughput
Azure Cosmos DB infinite scale-out architecture allows vector query throughput to scale infinitely.
Azure Cosmos DB for NoSQL Offering | Throughput |
Provisioned Mode | Unlimited |
AutoScale Mode | Unlimited |
Serverless Mode | Scales in Proportion with Storage (Please refer to Serverless Performance) |
Recall and Latency
Recall@k is a key metric which determines the quality of result of any Approximate K-Nearest Neighbor query. It measures the overlap between the top-k results returned by the index and actual top-k results.
Low query latency is critical for building highly responsive AI applications. Some best practices to consider when measuring latency:
- Ensure clients are in the same region as your Azure Cosmos DB and AI model resources.
- Reuse client instances (Recommended singleton per AppDomain in C#).
- Avoid a “cold start” of the Azure Cosmos DB SDK by issuing warm-up queries before measuring latency.
Latency numbers are commonly reported at different percentiles for a clear picture of overall distribution. Here is the recall and latency distribution for vector search queries on Wiki-Cohere dataset for querying the top-10 and 50 candidates post warmup:
(Please note that these preview numbers are rounded up and may change in future)
Scenario | P50 Latency (ms)
k=10/ 50 |
P95 Latency (ms)
k=10/50 |
Avg Latency (ms)
k=10/50 |
Recall@k
k=10/50 |
100k vectors | 25/95 | 29/108 | 25/95 | 92.47/98.03 |
1M vectors | 34/130 | 51/164 | 39/133 | 89.2/95.41 |
35M vectors | 108/544 | 166/879 | 112/569 | 90.73/95.67 |
Vector Search Cost
Finally, cost is an important consideration when choosing a vector database. The number of vectors, their dimension and query rate influence the cost. In Azure Cosmos DB for NoSQL, vector searches are charged as Request Units (RUs) like any other query (learn more about RUs here). In the table below, we list RU cost distributions for querying the TOP K = 10 and 50 closest vectors over data slices of different sizes.
(Please note that these preview numbers are rounded up and may change in future)
Scenario | P50 RU Cost
k=10/50 |
P95 RU Cost
k=10/50 |
Avg RU Cost
k=10/50 |
100k vectors | 36/159 | 39/169 | 36/159 |
1M vectors | 38/170 | 42/179 | 45/182 |
35M vectors | 282/1,249 | 300/1,298 | 282/1,245 |
In the 100k and 1M scenarios, all the embeddings are store in one Azure Cosmos DB physical partition. And since DiskANN query cost is highly sub-linear in the number of vectors, there is a very small increase in the query cost going from an index of 100K vectors to 1M vectors. In the 35M scenario, the data is automatically distributed across 7 physical partitions by the Azure Cosmos DB’s automatic partitioning that enables automatic scale out. Because of the increase in the number of partitions, we notice a roughly 7x increase in the RU consumption as the vector search query fans out to all the partitions. Targeting a query to one or a few partitions can help reduce the RU cost of queries.
To translate RUs into monthly cost (in USD) in a production environment is, we use the RU cost table below for Azure Cosmos DB for NoSQL in the East US 2 region:
- Pricing (Full Azure Cosmos DB pricing details can be found here)
- Azure cosmos DB Provisioned Throughput (Manual): $0.008 per 100 RU/s for 1 hour
- Azure cosmos DB Provisioned Throughput (Autoscale): $0.012 per 100 RU/s for 1 hour
- Throughput:
- Manual: RU/s is manually configured to sustain 10 and 100 Queries Per Second (QPS) 100% of the time.
- Autoscale RU/s configured to sustain the following scenarios, and autoscale will automatically scale in/out to meet the throughput demand (QPS).
- 10 QPS for 100% of the time
- 10 QPS for 50% of the time, 1 QPS for 50% of the time
- 100 QPS for 100% of the time
- 100 QPS for 50% of the time, 10 QPS for 50% of the time
- RU cost estimate: We use the P95 RU cost estimate listed in the above table for a TOP 10 vector search in each scenario.
(Please note that these preview numbers may change in future)
Scenario | Manual
(100% 10 QPS) |
Manual
(100% 100 QPS) |
Autoscale
(50% 1 QPS, 50% 10 QPS) |
Autoscale
(100% 10QPS) |
Autoscale
(50% 10 QPS, 50% 100 QPS) |
Autoscale
(100% 100 QPS) |
100k vectors | $22.46 | $224.64 | $18.53 | $33.70 | $185.33 | $336.96 |
1M vectors | $24.19 | $241.92 | $19.96 | $36.29 | $199.58 | $362.88 |
35M vectors | $172.80 | $1,728.00 | $142.56 | $259.20 | $1,425.60 | $2,592.00 |
Going Forward
Azure Cosmos DB for NoSQL now supports scalable and cost-effective vector search with its DiskANN vector index. In this article, we reviewed index setup, ingestion, full space queries, cost estimates and best practices using the Cohere Wikipedia dataset. Over the next few months, we will roll out more performance improvements to DiskANN.
In the coming weeks we’ll publish more blog posts that dive deeper into other aspects of vector indexing and search like predicated (i.e., filtered) vector search, multi-tenant scenarios, streaming workloads with replaces and deletes, etc. Stay tuned!
Questions or feedback? Please reach out to: CDB4AI@microsoft.com
Next Steps
- Learn more about Vector Search in Azure Cosmos DB for NoSQL.
- Visit our AI Samples Github repo for links to documentation, code samples, solutions accelerators, videos, and more!
- Try this QuickStart guide to build your own RAG chatbot in Python.
About Azure Cosmos DB
Azure Cosmos DB is a fully managed and serverless distributed database for modern app development, with SLA-backed speed and availability, automatic and instant scalability, and support for open-source PostgreSQL, MongoDB, and Apache Cassandra. Try Azure Cosmos DB for free here. To stay in the loop on Azure Cosmos DB updates, follow us on X, YouTube, and LinkedIn.
0 comments