New ways to use composite indexes in Azure Cosmos DB

Tim

You can now use composite indexes in Azure Cosmos DB to optimize additional cases of the most common inefficient queries! Anytime you have a slow or high request unit (RU) query, you should consider optimizing it with a composite index.

Current use cases for composite indexes:

Use case Example
Queries that ORDER BY multiple properties
SELECT * 
FROM c
ORDER BY c.name, c.age
Queries with an equality filter and ORDER BY
SELECT *
FROM c
WHERE c.name = “Tim”
ORDER BY c.age
(NEW) Queries with multiple equality and/or range filters
SELECT *
FROM c
WHERE c.name = “Tim” AND c.age > 18 and c._ts > 100
(NEW) Queries with aggregates and equality filters
SELECT SUM(c.total)
FROM c
WHERE c.name = “Tim”
(NEW) Queries with system functions
SELECT *
FROM c
WHERE c.name = “Tim” AND CONTAINS(c.biography, “developer”)
ORDER BY c.name, c.biography

 

This post looks at examples of the new use cases for composite indexes. For these tests, we’ve ingested 9 million items into an Azure Cosmos DB container. Here is an example item:

Cosmos DB sample item

Each item is a different product in an online clothing store. Multiple items can belong to the same CartId but each item has its own unique id value.

 

Queries with multiple equality and range filters:

Queries can now use multiple composite indexes in the same filter expression. This allows you to optimize queries that have multiple equality and range filters in the same filter expression. Remember, a single composite index can only apply to one range filter. Therefore, using multiple composite indexes can help optimize queries with multiple range filters.

Query:

SELECT TOP 10 *
FROM c
WHERE c.BuyerState = "TN" and c._ts > 1611947901 
and c.id > "9c5e4e08-e1d2-4091-8062-78d1da832523"

Composite indexes:

  1. (BuyerState ASC, _ts ASC)
  2. (BuyerState ASC, id ASC)

Query charge with default indexing policy: 533.47 RUs

Query charge with both composite indexes: 72.23 RUs

 

This optimization tends to have a higher impact for multiple range filters than queries with only equality filters. Composite indexes will have a bigger impact on high cardinality properties, such as _ts or id.

 

Queries with aggregates and an equality filters:

Aggregate queries (SUM,  AVG, MIN, or MAX) can now benefit from composite indexes. Previously, composite indexes could only optimize queries with a COUNT(1) aggregate.

Query: 
SELECT AVG(c._ts)
FROM c 
WHERE c.BuyerState = "TN"

 

Composite index: (BuyerState ASC, _ts ASC)

Query charge with default indexing policy: 186.25 RUs

Query charge with composite index: 39.8 RUs

 

Queries with higher cardinality properties typically benefit more from composite indexes. For example, if the property in the AVG() system function was price, instead of _ts, you might not need a composite index in the first place since price has fewer possible values than _ts.

Here’s an example:

Query:

SELECT AVG(c.price)
FROM c
WHERE c.BuyerState = "TN"

 

Composite index: (BuyerState ASC, price ASC)

Query charge with default indexing policy: 5.08 RUs

Query charge with composite index: 2.9 RUs

 

Queries with ORDER BY and system functions:

Queries with ORDER BY and system functions can now benefit from composite indexes. When you use a high cardinality property in the system function, composite indexes may have a big impact. In fact, the impact can be so significant, that it can be worthwhile to rewrite some queries with system functions to use ORDER BY so that they can leverage composite indexes.

 

Here’s an example:

Original Query: 
SELECT *
FROM c
WHERE c.BuyerState = "NJ" AND Contains (c.id, "abc", true)
Query with added ORDER BY:
SELECT *
FROM c
WHERE c.BuyerState = "NJ" AND Contains (c.id, "abc", true)
ORDER BY c.BuyerState, c.id

 

Composite index: (BuyerState ASC, id ASC)

Query charge with default indexing policy: 4,138.66 RUs

Query charge with composite index: 51.3 RUs

 

Adding an ORDER BY clause for the property in a system function improves system function execution. This is broader than just composite indexes and can often help in any query with a system function on a high cardinality property. Here’s an example that doesn’t involve composite indexes:

 

Here’s an example:

Original Query:

SELECT *
FROM c
WHERE RegexMatch (c.id, "abc")

Query with added ORDER BY:

SELECT *
FROM c
WHERE RegexMatch (c.id, "abc")
ORDER BY c.id

Query charge without ORDER BY: 8,845.61 RUs

Query charge with ORDER BY: 92.71 RUs

 

Queries with the following system functions are most likely to see big improvements since their efficiency depends on cardinality of the property in the system function:

  • Contains
  • RegexMatch
  • EndsWith

Having query results sorted can improve the way the query engine utilizes the index for high cardinality properties. Even if a query with a system function doesn’t require ORDER BY, you may be able to significantly improve by adding it, along with the relevant composite indexes.

 

Monitoring index policy transformations:

When you modify your indexing policy, the existing policy is automatically transformed to include the new additional indexed properties. When modifying indexing policy, there are some important points to keep in mind:

  • Indexing policy changes do not affect read or write availability
  • When you add a new index, the Cosmos DB query engine will not use the new index until the index transformation is complete
  • It is safe to modify indexing policy at any time, including when in production. It is common to add composite indexes while in production.

Additionally, it’s good to keep in mind that the following factors affect index transformation time:

  • Amount of data – The more data, the more time it takes for indexing policy transformations.
  • Number of added or removed indexes – A more significant change to the indexing policy will take longer than simple change (for example, adding a single composite index).
  • Available provisioned throughput – Indexing policy transformations will use RUs that are leftover from all other database operations.

You can track indexing policy transformation progress in the Azure Portal. Once the loading bar disappears, the indexing policy transformation has been successfully applied:

Tracking index progress

We look forward to using composite indexes for many additional query optimizations in the future! Stay up-to-date by reading our query troubleshooting guide when optimizing queries.

Learn more:

3 comments

Comments are closed. Login to edit/delete your existing comments

  • Munib Tahir

    Are composite indexes only available with the Core (SQL) API? I am using Cosmos DB with my Node.JS/Mongoose app via the Mongo 3.6 API.

  • KP

    Hi Tim,

    I was trying composite index on equality and range filter like this SELECT  * FROM c where c.isAvailable= false and c.productName = ‘headset’
    and c.createdDate > ‘2014-12-24’ with composite index as
    {
    “path”: “/isAvailable”,
    “order”: “ascending”
    },
    {
    “path”: “/productName “,
    “order”: “ascending”
    },
    {
    “path”: “/createdDate”,
    “order”: “ascending”
    }

    The above query will return 1.5 million records and we have total of 2 million records in container. The RUs even after composite index applied are looking high “1272.04 RUs”. Could you please let me know the reason behind this.
    Whereas when i applied order by to the same query, RUs reduced to 63 RUs.
    Why do i need to do order by even in non-system function queries? Is it like mandatory for using composite indexes efficiently?