Global secondary indexes for Azure Cosmos DB—now in Public Preview—make it easier to query data efficiently, especially as your datasets grow.
Distributed databases like Azure Cosmos DB scale with dataset growth by partitioning data across multiple physical machines. When your queries include the partition key, the client routes them directly to the appropriate partition. Queries that include the partition key remain low-latency regardless of how many partitions exist.
But in real-world applications, or in the case of AI apps and agents, you often need to query data without including the partition key. That’s where global secondary indexes (GSIs) help. GSIs let you define an alternate partition key and data model, allowing for fast, efficient lookups across a wider range of query patterns.
When to use GSI
Adding global secondary indexes (GSIs) can lower costs and boost performance for many workloads. Azure Cosmos DB automatically syncs changes to data from the source container to the GSI container. Data is kept up to date with no extra work on your part. Because GSIs have their own container properties—such as partition key, data model, and indexing policy—you can optimize for diverse query patterns while avoiding cross-partition queries on the source container.
You can also add GSIs to containers as your application evolves, keeping queries efficient over time. Since GSIs exist as separate containers, they let you isolate specific parts of your workload. For example, you can create vector or full-text search indexes in the GSI to support vector and hybrid search, all without affecting transactional operations on the source container.
Let’s see a real–world example
Contoso, a fictitious company, stores user information in an Azure Cosmos DB container. Because their verification process often involves looking up users by email, they chose to partition the container using the email address. However, some users prefer verification by phone number, so Contoso also needs to support lookups by that field.
The users container spans 10 physical partitions and currently holds 200,000 user records. Queries by email perform efficiently thanks to the partitioning strategy, but queries by phone number are less optimized and need to search across all 10 partitions. To improve performance and reduce query cost, Contoso added a global secondary index (GSI) called usersByPhone. They partitioned the GSI by phone number and used it to compare performance.
SELECT *
FROM c
WHERE c.phoneNumber = "555-123-4567"
RU charge on users container: 27.93 Execution time: 6.90 seconds
RU charge on GSI container: 2.82 Execution time: 0.49 seconds
In this scenario, executing the same query on the GSI container reduces both RU charges and execution time by 90%. Since Contoso performs thousands of phone number lookups per second during peak load, the GSI delivers significant cost savings and improved performance. This improvement is driven by the client routing the query directly to the correct GSI partition, as shown below.
Because GSIs are implemented as independent containers, you can define a custom indexing policy to further optimize queries. For example, Contoso runs targeted promotions for active users in specific area codes. To support this, they query their GSI and have added a composite index on isActive and phoneNumber to accelerate those lookups.
SELECT *
FROM c
WHERE STARTSWITH(c.phoneNumber, "555") AND c.isActive = true
RU charge on users container: 33.32 Execution time: 1.82 seconds
RU charge on GSI container: 6.98 Execution time: 0.53 seconds
For this query, Contoso sees an 80% reduction in RU charges and a 70% improvement in execution time. The exact savings depend on the query’s complexity and the container’s configuration. By leveraging Azure Cosmos DB’s rich query syntax and flexible indexing capabilities, GSIs offer a powerful way to optimize performance.
If you’d like to try this sample yourself, the code is available on GitHub. The repository includes setup instructions, a data loader project, and a console app to run the queries and see the performance impact firsthand.
Getting started with global secondary indexes
You can enable global secondary indexes on your Azure Cosmos DB for NoSQL account directly from the Azure Portal. We’ve also added support for accounts that use continuous backup—this backup mode is now required before you can enable GSIs on your account.
Global secondary indexes (GSIs) are the evolution of the former materialized views feature. We’ve made them easier and more cost-effective by removing the need to provision, manage, or pay for the builder component that handles data synchronization. Once you enable GSIs on your account, you can immediately start creating GSI containers—either through the Azure Portal or using the Azure CLI. GSI containers should be created using autoscale throughput to account for spikes and dips in traffic without falling too far behind the source container. After creation, the GSI automatically begins syncing data from the source container.
Global secondary indexes offer a powerful and flexible way to optimize query performance in Azure Cosmos DB for NoSQL. By supporting alternate partition keys and custom indexing policies, GSIs help you adapt to evolving application needs without compromising performance or scalability. Whether you’re improving lookup efficiency, isolating workloads, or enabling new query patterns, GSIs can help reduce cost and complexity.
Learn more
Explore the following resources to help you get started and make the most of GSIs in your applications:
- 📘 Global Secondary Indexes (Preview) – Azure Cosmos DB for NoSQL
- ⚙️ How to Configure Global Secondary Indexes (Preview)
Leave a review
Tell us about your Azure Cosmos DB experience! Leave a review on PeerSpot and we’ll gift you $50. Get started here.
About Azure Cosmos DB
Azure Cosmos DB is a fully managed and serverless NoSQL and vector database for modern app development, including AI applications. With its SLA-backed speed and availability as well as instant dynamic scalability, it is ideal for real-time NoSQL and MongoDB applications that require high performance and distributed computing over massive volumes of NoSQL and vector data.
To stay in the loop on Azure Cosmos DB updates, follow us on X, YouTube, and LinkedIn.
0 comments
Be the first to start the discussion.