May 19th, 2025
heart2 reactions

Boost Query Performance with Global Secondary Indexes in Azure Cosmos DB

Justine Cocchi
Senior Program Manager

Global secondary indexes for Azure Cosmos DB—now in Public Preview—make it easier to query data efficiently, especially as your datasets grow. 

Distributed databases like Azure Cosmos DB scale with dataset growth by partitioning data across multiple physical machines. When your queries include the partition key, the client routes them directly to the appropriate partition. Queries that include the partition key remain low-latency regardless of how many partitions exist.

But in real-world applications, or in the case of AI apps and agents, you often need to query data without including the partition key. That’s where global secondary indexes (GSIs) help. GSIs let you define an alternate partition key and data model, allowing for fast, efficient lookups across a wider range of query patterns. 

 

When to use GSI

Adding global secondary indexes (GSIs) can lower costs and boost performance for many workloads. Azure Cosmos DB automatically syncs changes to data from the source container to the GSI container. Data is kept up to date with no extra work on your part. Because GSIs have their own container properties—such as partition key, data model, and indexing policy—you can optimize for diverse query patterns while avoiding cross-partition queries on the source container. 

You can also add GSIs to containers as your application evolves, keeping queries efficient over time. Since GSIs exist as separate containers, they let you isolate specific parts of your workload. For example, you can create vector or full-text search indexes in the GSI to support vector and hybrid search, all without affecting transactional operations on the source container. 

 

Let’s see a realworld example 

Contoso, a fictitious company, stores user information in an Azure Cosmos DB container. Because their verification process often involves looking up users by email, they chose to partition the container using the email address. However, some users prefer verification by phone number, so Contoso also needs to support lookups by that field. 

The users container spans 10 physical partitions and currently holds 200,000 user records. Queries by email perform efficiently thanks to the partitioning strategy, but queries by phone number are less optimized and need to search across all 10 partitions. To improve performance and reduce query cost, Contoso added a global secondary index (GSI) called usersByPhone. They partitioned the GSI by phone number and used it to compare performance.

 

SELECT *  
FROM c  
WHERE c.phoneNumber = "555-123-4567" 

RU charge on users container: 27.93            Execution time: 6.90 seconds

RU charge on GSI container: 2.82                  Execution time: 0.49 seconds

 

In this scenario, executing the same query on the GSI container reduces both RU charges and execution time by 90%. Since Contoso performs thousands of phone number lookups per second during peak load, the GSI delivers significant cost savings and improved performance. This improvement is driven by the client routing the query directly to the correct GSI partition, as shown below. 

GSI FindByPhone image

 

Because GSIs are implemented as independent containers, you can define a custom indexing policy to further optimize queries. For example, Contoso runs targeted promotions for active users in specific area codes. To support this, they query their GSI and have added a composite index on isActive and phoneNumber to accelerate those lookups.

 

SELECT * 
FROM c 
WHERE STARTSWITH(c.phoneNumber, "555") AND c.isActive = true

RU charge on users container: 33.32            Execution time: 1.82 seconds

RU charge on GSI container: 6.98                  Execution time: 0.53 seconds

 

For this query, Contoso sees an 80% reduction in RU charges and a 70% improvement in execution time. The exact savings depend on the query’s complexity and the container’s configuration. By leveraging Azure Cosmos DB’s rich query syntax and flexible indexing capabilities, GSIs offer a powerful way to optimize performance.

If you’d like to try this sample yourself, the code is available on GitHub. The repository includes setup instructions, a data loader project, and a console app to run the queries and see the performance impact firsthand.

 

Getting started with global secondary indexes

You can enable global secondary indexes on your Azure Cosmos DB for NoSQL account directly from the Azure Portal. We’ve also added support for accounts that use continuous backup—this backup mode is now required before you can enable GSIs on your account.

enable global secondary indexes image

 

Global secondary indexes (GSIs) are the evolution of the former materialized views feature. We’ve made them easier and more cost-effective by removing the need to provision, manage, or pay for the builder component that handles data synchronization. Once you enable GSIs on your account, you can immediately start creating GSI containers—either through the Azure Portal or using the Azure CLI. GSI containers should be created using autoscale throughput to account for spikes and dips in traffic without falling too far behind the source container. After creation, the GSI automatically begins syncing data from the source container.

create global secondary indexes image

 

Global secondary indexes offer a powerful and flexible way to optimize query performance in Azure Cosmos DB for NoSQL. By supporting alternate partition keys and custom indexing policies, GSIs help you adapt to evolving application needs without compromising performance or scalability. Whether you’re improving lookup efficiency, isolating workloads, or enabling new query patterns, GSIs can help reduce cost and complexity.

 

Learn more

Explore the following resources to help you get started and make the most of GSIs in your applications:

Leave a review

Tell us about your Azure Cosmos DB experience! Leave a review on PeerSpot and we’ll gift you $50. Get started here.

About Azure Cosmos DB

Azure Cosmos DB is a fully managed and serverless NoSQL and vector database for modern app development, including AI applications. With its SLA-backed speed and availability as well as instant dynamic scalability, it is ideal for real-time NoSQL and MongoDB applications that require high performance and distributed computing over massive volumes of NoSQL and vector data.

To stay in the loop on Azure Cosmos DB updates, follow us on XYouTube, and LinkedIn.

Author

Justine Cocchi
Senior Program Manager

Justine is a Program Manager on the Azure Cosmos DB team working on various aspects of the SQL API.

0 comments