We are constantly adding new query and indexing features to Azure Cosmos DB. This blog outlines a few recent query and indexing improvements for:
- Aggregates
- Inequality filters and filters on undefined values
- High precision dates and times
In this post, we’ll use a sample 5 GB dataset to show the significant reduction in RU consumption for many queries that utilize these improvements. To highlight the impact of these improvements, we’ll compare query RU charges with the RU charges if you had run these queries in early 2019. In the dataset, documents have the following structure:
The arrivalTime property is defined for about 95% of the documents in this dataset. All other properties are defined for all documents.
Aggregate performance improvements
We are excited to announce improved index utilization for aggregate functions in Azure Cosmos DB. The RU charge and execution time for many aggregate functions have significantly improved.
The following aggregate system functions will now benefit from an index. They’ll consume fewer RU’s and execute with lower latency than before:
- COUNT
- SUM
- AVG
- MAX
- MIN
Here’s an example query that finds the maximum airplane speed in our dataset where altitude is greater than 36,000:
Query:
SELECT MAX(c.speed) FROM c WHERE c.altitude > 36000
Original RU charge: 11,201.54 RUs
Current RU charge (with index improvements): 121.69 RUs
Here’s another example that computes the count of documents with “LAX” as the departure airport:
Query:
SELECT COUNT(1) FROM c WHERE c.departureAirport = “LAX”
Original RU charge: 12,823.04 RUs
Current RU charge (with index improvements): 24.58 RUs
To utilize these query engine improvements, you should add an index for properties in aggregate system functions.
Performance improvements for inequality filters and filters on undefined values
Queries with inequality filters or filters on undefined values can now be run more efficiently. Previously, these filters did not utilize the index. When executing a query, Azure Cosmos DB would first evaluate other less expensive filters (such as =, >, or <) in the query. If there were inequality filters or filters on undefined values remaining, the query engine would be required to load each of these documents. Since inequality filters and filters on undefined values now utilize the index, we can avoid loading these documents and see a significant improvement in RU charge.
Here’s a full list of query filters with improvements:
- Inequality comparison expression (e.g. c.age != 4)
- NOT IN expression (e.g. c.name NOT IN (‘Luis’, ‘Andrew’, ‘Deborah’))
- NOT IsDefined
- Is<Type> expressions (e.g. NOT IsDefined(c.age), NOT IsString(c.name))
- Coalesce operator expression (e.g. (c.name ?? ‘N/A’) = ‘Thomas’)
- Ternary operator expression (e.g. c.name = null ? ‘N/A’ : c.name)
If you have queries with these filters, you should add an index for the relevant properties.
Here’s an example query with an inequality filter:
Query:
SELECT TOP 2000 * FROM c WHERE c.departureAirport = "ORD" AND c.altitude < 10000 AND c.arrivalAirport NOT IN("JFK", "SEA", "EWR", "LHR", "LGA","MCO", "SFO","LAX", "DEN", "PHL", "SAN")
Original RU charge: 1,328.93 RUs
Current RU charge (with index improvements): 637.92 RUs
Here’s another example with a filter on an undefined value:
Query:
SELECT TOP 10 * FROM c WHERE c. arrivalAirport = “ORD” and NOT IS_DEFINED(c.arrivalTime)
Original RU charge: 11,296.10 RUs
Current RU charge (with index improvements): 26.34 RUs
High precision dates and times:
We have increased the precision from milliseconds to 100 nano seconds increments for the GetCurrentDateTime system function in Azure Cosmos DB. GetCurrentDateTime now returns the date in ISO 8601 format with seven-digit fractional second precision.
Here is an example:
Query:
SELECT GetCurrentDateTime() AS currentUtcDateTime
Results:
[{ "currentUtcDateTime": "2020-04-10T21:31:49.1234567Z" }]
Next steps:
Will this apply to all Cosmos collections that exists currently, or only new collections that are created from this point onwards?
what a fine blog post