April 9th, 2020

April query updates in Azure Cosmos DB

Tim Sander
Program Manager

We are constantly adding new query and indexing features to Azure Cosmos DB. This blog outlines a few recent query and indexing improvements for:

  • Aggregates
  • Inequality filters and filters on undefined values
  • High precision dates and times

In this post, we’ll use a sample 5 GB dataset to show the significant reduction in RU consumption for many queries that utilize these improvements. To highlight the impact of these improvements, we’ll compare query RU charges with the RU charges if you had run these queries in early 2019. In the dataset, documents have the following structure:

SampleJSONDocument

The arrivalTime property is defined for about 95% of the documents in this dataset. All other properties are defined for all documents.

 

Aggregate performance improvements

­We are excited to announce improved index utilization for aggregate functions in Azure Cosmos DB. The RU charge and execution time for many aggregate functions have significantly improved.

The following aggregate system functions will now benefit from an index. They’ll consume fewer RU’s and execute with lower latency than before:

  • COUNT
  • SUM
  • AVG
  • MAX
  • MIN

Here’s an example query that finds the maximum airplane speed in our dataset where altitude is greater than 36,000:

Query:

SELECT MAX(c.speed) FROM c WHERE c.altitude > 36000

Original RU charge: 11,201.54 RUs

Current RU charge (with index improvements): 121.69 RUs

 

Here’s another example that computes the count of documents with “LAX” as the departure airport:

Query:

SELECT COUNT(1) FROM c WHERE c.departureAirport = “LAX”

Original RU charge: 12,823.04 RUs

Current RU charge (with index improvements): 24.58 RUs

 

To utilize these query engine improvements, you should add an index for properties in aggregate system functions.

 

Performance improvements for inequality filters and filters on undefined values

Queries with inequality filters or filters on undefined values can now be run more efficiently. Previously, these filters did not utilize the index. When executing a query, Azure Cosmos DB would first evaluate other less expensive filters (such as =, >, or <) in the query. If there were inequality filters or filters on undefined values remaining, the query engine would be required to load each of these documents. Since inequality filters and filters on undefined values now utilize the index, we can avoid loading these documents and see a significant improvement in RU charge.

Here’s a full list of query filters with improvements:

  • Inequality comparison expression (e.g. c.age != 4)
  • NOT IN expression (e.g. c.name NOT IN (‘Luis’, ‘Andrew’, ‘Deborah’))
  • NOT IsDefined
  • Is<Type> expressions (e.g. NOT IsDefined(c.age), NOT IsString(c.name))
  • Coalesce operator expression (e.g. (c.name ?? ‘N/A’) = ‘Thomas’)
  • Ternary operator expression (e.g. c.name = null ? ‘N/A’ : c.name)

If you have queries with these filters, you should add an index for the relevant properties.

Here’s an example query with an inequality filter:

Query:

SELECT TOP 2000 * FROM c WHERE c.departureAirport = "ORD" AND c.altitude < 10000 AND c.arrivalAirport NOT IN("JFK", "SEA", "EWR", "LHR", "LGA","MCO", "SFO","LAX", "DEN", "PHL", "SAN")

Original RU charge: 1,328.93 RUs

Current RU charge (with index improvements): 637.92 RUs

 

Here’s another example with a filter on an undefined value:

Query:

SELECT TOP 10 * FROM c WHERE c. arrivalAirport = “ORD” and NOT IS_DEFINED(c.arrivalTime)

Original RU charge: 11,296.10 RUs

Current RU charge (with index improvements): 26.34 RUs

 

High precision dates and times:

We have increased the precision from milliseconds to 100 nano seconds increments for the GetCurrentDateTime system function in Azure Cosmos DB. GetCurrentDateTime now returns the date in ISO 8601 format with seven-digit fractional second precision.

Here is an example:

Query:

SELECT GetCurrentDateTime() AS currentUtcDateTime

Results:

[{
"currentUtcDateTime": "2020-04-10T21:31:49.1234567Z"
}]

Next steps:

Author

Tim Sander
Program Manager

Program Manager on the Azure Cosmos DB engineering team

2 comments

Discussion is closed. Login to edit/delete existing comments.

  • Ian Young

    Will this apply to all Cosmos collections that exists currently, or only new collections that are created from this point onwards?

  • Luis BosquezMicrosoft employee

    what a fine blog post