Fast, Accurate, and Affordable: Sharded DiskANN for Multitenant Vector Search in Azure Cosmos DB

Introduction

Azure Cosmos DB now includes Sharded DiskANN, a powerful capability that optimized for large-scale multitenant apps by splitting a DiskANN index into smaller, more performant pieces. When you specify a VectorIndexShardKey in your container’s indexing policy, Azure Cosmos DB creates multiple DiskANN indexes, one for each unique value of the selected document property. This can provide multiple benefits including:

Lower latency – Each individual index is smaller, which allows vector searches to complete up to 91% faster than with a single, large index.
Reduced cost – Searching for a smaller index requires less compute, resulting in as much as 85% lower RU consumption.
Improved recall – By narrowing the search to a specific shard, results are more accurate and relevant, increasing recall up to 12%.
Better scalability – Sharding distributes data across multiple sub-indexes, enabling the system to scale more effectively with growing data.
Isolated vector data – Sharded DiskANN keeps vector data segmented by shard keys, helping you manage and isolate data more precisely. Searches only run against the data matching the selected shard key.

Use Cases

Sharded DiskANN is especially valuable in scenarios where data isolation and targeted search are essential, such as:

Multi-tenant environments – Shard vector indexes by tenant ID to ensure each tenant’s data is isolated. This setup allows queries to run independently for each tenant, improving both performance and cost.
Category-specific searches – Partition vector indexes by a property that defines a category (e.g., product type, region, or content tag). This allows vector searches to run exclusively within a given category, resulting in faster, more relevant results.

How is it different than a Partition Key?

The VectorIndexShardKey divides a DiskANN vector index into multiple distinct vector indexes, each corresponding to a unique value of the specified property. This segmentation ensures that vector searches are confined to the relevant subset of data, enhancing query efficiency and precision.

In contrast, the Partition Key in Cosmos DB determines how data is logically partitioned across the database’s physical infrastructure. When a partition key is specified in a query, Cosmos DB uses it to route the request directly to the pertinent physical partitions that store the corresponding data, optimizing both performance and scalability.

Note that the VectorIndexShardKey can reference the same path as the partition key or a different property within the document.

How to use Sharded DiskANN

Defining a vector index shard key

Here, we can see an example of defining the shard key based on a tenantID property. This can be any property of the data item, even the partition key. Note that the single string needs to be enclosed in an array.

"vectorIndexes": [
   {
      "path": "/vector2",
      "type": "DiskANN",
      "vectorIndexShardKey": ["/tenantID"]
   }
]

Use DiskANN without sharding

As the VectorIndexShard key is an optional parameter, omitting it will create one DiskANN index per physical partition as usual.

"vectorIndexes": [
   {
      "path": "/vector2",
      "type": "DiskANN"
   }
]

Writing a vector search query focused to a shard key

To focus a vector search query on the specific shard key value, simply filter the path to the value using a WHERE clause as in the following example:

SELECT TOP 10 *
FROM c
ORDER BY VectorDistance(c.vector, {queryVector})
WHERE c.tenantID = "tenant1"

An example using the YFCC Dataset

We demonstrate the utility of Sharded DiskANN by indexing 1 million items from the YFCC dataset. Each document corresponds to a 192-dimensional CLIP embedding applied to (copyleft) photos and videos in Flickr from the year 2004 until early 2014, and metadata such as camera model, country, year, month. The number of documents per year ranges from ~30,000 to ~144,000

Here is a sample document:

{
  "embedding": […],
  "year": "2012",
  "month": "March",
  "camera": "FUJIFILM",
  "country": "GB",
}

We created two collections that index this dataset two different ways – one with Sharded DiskANN and one without Sharded DiskANN and ran 5,000 queries of the form:

SELECT TOP 10 *
FROM c
ORDER BY VectorDistance(c.embedding, {queryVector})
WHERE c.year = "YEAR"

with the following vector indexing policy leveraging Sharded DiskANN

  "vectorIndexes": [
    {
      "path": "/embedding",
      "type": "diskANN",
      "vectorIndexShardKey": ["/year"]
    }

Example results

We compared Sharded DiskANN with default query parameters with a collection without Sharded DiskANN, with default parameters and increased search depth we observed that Sharded DiskANN yields about 3x lower mean and p99 latency with default settings. It also provided much higher recall at 98%. It was also more accurate than increasing search depth for the case without sharded DiskANN searchListSizeMultiplier = 100. The query RU charge also compared favorably.

Latency & Accuracy

Scenario	Avg Latency (ms)	P50 Latency (ms)	P99 Latency (ms)	Recall@10
With Sharded DiskANN (searchListSizeMultiplier = 10, Default)	7.28	10.07	13.43	98.09
Without Sharded DiskANN (searchListSizeMultiplier = 10, Default)	22.40	21.29	38.59	66.17
Without Sharded DiskANN (searchListSizeMultiplier = 100)	94	95.47	164	87

RU Charges and Costs

Scenario	Avg RU per query	P50 per query	P99 per query	Cost per 1M Queries ($)
With Sharded DiskANN (searchListSizeMultiplier = 10, Default)	19.69	19.54	27.42	$6.86
Without Sharded DiskANN (searchListSizeMultiplier = 10, Default)	36.27	33.83	38.59	$9.65
Without Sharded DiskANN (searchListSizeMultiplier = 100)	118.49	115.62	193.02	$193.02

Summary

Sharded DiskANN in Azure Cosmos DB lets you break large DiskANN vector indexes into smaller, scoped indexes with a simple VectorIndexShardKey setting in your vector indexing policy. This can boost recall by up to 12%, cut costs by as much as 85%, and deliver search speeds up to 91% faster, giving you the performance needed to build scalable, high-performance AI and multitenant applications.

Try it out

Unlock lightning‑fast, cost‑effective, and precision‑tuned vector search across tenants or scopes and try Sharded DiskANN in Azure Cosmos DB today! Read the documentation for more details. Are you new to vector search in Azure Cosmos DB? Check out our overview documentation on how to get started.

About Azure Cosmos DB

Azure Cosmos DB is a fully managed and serverless NoSQL and vector database for modern app development, including AI applications. With its SLA-backed speed and availability as well as instant dynamic scalability, it is ideal for real-time NoSQL and MongoDB applications that require high performance and distributed computing over massive volumes of NoSQL and vector data.

Try Azure Cosmos DB for free here. To stay in the loop on Azure Cosmos DB updates, follow us on X, YouTube, and LinkedIn.

Fast, Accurate, and Affordable: Sharded DiskANN for Multitenant Vector Search in Azure Cosmos DB

Introduction

Use Cases

How is it different than a Partition Key?

How to use Sharded DiskANN

Defining a vector index shard key

Use DiskANN without sharding

Writing a vector search query focused to a shard key

An example using the YFCC Dataset

Example results

Latency & Accuracy

RU Charges and Costs

Summary

Try it out

About Azure Cosmos DB

Author

0 comments

Leave a commentCancel reply

Read next

Getting Started with Azure Cosmos DB Using the Python SDK

General Availability for Data API in vCore-based Azure Cosmos DB for MongoDB

Introduction

Use Cases

How is it different than a Partition Key?

How to use Sharded DiskANN

Defining a vector index shard key

Use DiskANN without sharding

Writing a vector search query focused to a shard key

An example using the YFCC Dataset

Example results

Latency & Accuracy

RU Charges and Costs

Summary

Try it out

About Azure Cosmos DB

Author

0 comments

Leave a commentCancel reply

Read next

Getting Started with Azure Cosmos DB Using the Python SDK

General Availability for Data API in vCore-based Azure Cosmos DB for MongoDB

Stay informed