Understanding how to query arrays in Azure Cosmos DB

Avatar

Tim

This blog is the final part of a series of blogs where we’ll demystify commonly confused concepts for developers learning how to query data using the SQL (core) API in Azure Cosmos DB. Today, we’ll walk through tips and tricks for querying arrays.

In case you missed them, here are the earlier two blogs in the series:

Using an example, we’ll show how to use these concepts when querying arrays:

  • Accessing a specific array element
  • Iterating over arrays
  • JOINs
  • Subqueries

Example Scenario:

Nick is a developer that created an app that stores shopping lists. His app is incredibly popular around the holidays and has soared to 1 million active users!

He has a Cosmos container that has the shopping lists modeled as JSON documents. Here’s an example document:

{
  "id": "Tim",
  "city": "Seattle",
  "gifts": [
     {
        "recipient": "Andrew",
        "gift": "blanket"
     },
     {
        "recipient": "Deborah",
        "gift": "board game"
     },
     {
        "recipient": "Chris",
        "gift": "coffee maker"
     }
  ]
}

The remainder of the blog will focus on ways to query arrays, starting with the simplest (and least expressive) and concluding with the more advanced concepts.

 

Accessing a specific array position

The simplest way to query an array is to specify a specific position in the array. For example, the below query finds all shopping lists that have a recipient named Andrew in the first position within the gifts array.

Query:

SELECT *
FROM c
WHERE c.gifts[0].recipient = "Andrew"

This query is very simple to understand and inexpensive to run. However, filtering based on a specific array element isn’t enough for many scenarios.

 

Iterating over arrays

Azure Cosmos DB provides support for iterating over arrays by using the IN keyword in the FROM source. This is best illustrated with an example.

Query:

SELECT *
FROM c IN t.gifts

This query will return the data from the gifts array for all items in the container. The t in t.gifts was arbitrary. We could have used any letter or word (except for c) to reference the item.

Results:

[
   {
      "recipient": "Andrew",
      "gift": "blanket"
   },
   {
      "recipient": "Deborah",
      "gift": "board game"
   },
   {
      "recipient": "Chris",
      "gift": "coffee maker"
   },
…
]

This concept is especially useful if you need to filter on properties within the array.

For example, this query returns all gifts where the recipient is Andrew.

Query:

SELECT *
FROM c IN t.gifts
WHERE c.recipient = "Andrew"

However, once you’ve opted to iterate over an array by using the IN keyword in the FROM source, it isn’t possible to project or filter on any properties outside of that array. This limitation is solved by using JOINs.

 

JOINs

JOINs in Azure Cosmos DB are different from JOINs in relational databases. All JOINs in Cosmos DB are scoped within a single item. JOINs are a cross product between different sections of a single item.

For example, here’s a query that JOINs the gifts array with the rest of the item.

Query:

SELECT c.id, t AS gift
FROM c
JOIN t IN c.gifts

Results:

[
   {
      "id": "Tim",
      "gift": {
                 "recipient": "Andrew",
                 "gift": "blanket"
              }
   },
   {
      "id": "Tim",
      "gift": {
                 "recipient": "Deborah",
                 "gift": "board game"
   }
   },

]

In this case, the query returns all possible combinations for the id property and gifts array within each item. Using JOINs, you can construct more complex array queries, including queries that filter or project properties outside of the array.

Here’s an example query that adds on a filter for city, a property outside of the gifts array.

Query:

SELECT c.id, t AS gifts
FROM c
JOIN t IN c.gifts
WHERE t.recipient = "Andrew" AND c.city = "Seattle"

Subqueries

A subquery is a query nested within another query. Azure Cosmos DB supports correlated subqueries. Correlated subqueries have the following uses when querying arrays:

Optimize JOIN expressions

We can optimize most queries with JOINs and filters by rewriting them to include a subquery. This optimization is recommended if you use JOIN on a large array that later has a filter. By using a subquery, the filter is applied before the JOIN, instead of afterwards. For example, the query in the prior example could be rewritten.

Query:

SELECT c.id, gifts
FROM c
JOIN (SELECT VALUE t FROM t IN c.gifts WHERE t.recipient = 'Andrew') AS gifts
WHERE c.city = "Seattle"

EXISTS expression

The EXISTS expression, takes a subquery and returns true if the subquery returns at least one result. Because EXISTS take a subquery, it is more expressive than using ARRAY_CONTAINS, which is restricted to equality comparisons.

EXISTS stands out most from other array concepts because it can be used in the SELECT clause.

Query:

SELECT c.id, EXISTS(SELECT VALUE t FROM t IN c.gifts WHERE t.gift = "Coffee Maker") AS GivingCoffeeMaker
FROM c
WHERE c.city = "Seattle"

The above query returns all shopping lists from Seattle. It projects both the id value and a value that indicates whether that shopping list contains a coffee maker.

[
   {
      "id": "Tim",
      "GivingCoffeeMaker": false
   },
   {
      "id": "Thomas",
      "GivingCoffeeMaker": true
   }
…
]

ARRAY projection

You can use the ARRAY expression to project the results of a query as an array. Using the ARRAY expression, in combination with a JOIN, makes it easy construct arrays in the query’s output. In this example, we’ll create a ShoppingList array in the query projection.

Query:

SELECT c.id, ARRAY(SELECT VALUE t.gift FROM t in c.gifts) as ShoppingList
FROM c

Results:

[
   {
      "id": "Tim",
      "ShoppingList": [
         "blanket",
         "board game",
         "coffee maker"
      ]
   },
   {
      "id": "Matt",
      "ShoppingList": [
         "pillow",
         "gift card",
         "computer keyboard"
   ]
   }
…
]

Here are more resources to learn about querying arrays in Azure Cosmos DB:

6 comments

Leave a comment

  • Avatar
    haroon shah

    Thanks for a very timely article for a solution I was looking for. Along with what you have explained very nicely, I was wondering if there is a way to retrieve “gifts” that are “order by” “recipient”? At this time, I am sorting using C# lists. Thanks again for a great article.

    • Avatar
      Tim SanderMicrosoft employee

      Hi Haroon,

      Glad the article was useful! The scenario you mentioned (retrieve gifts sorted by recipient), isn’t easy to achieve right now. You can only ORDER BY values in your document and not values computed at runtime (in this case, the result of a JOIN).

      This is something that we plan to add in the future but for now, I can recommend the following two workarounds:
      – If possible, model the data as nested JSON (https://docs.microsoft.com/azure/cosmos-db/sql-query-working-with-json#nested-properties). In some cases, you do need an array but if nested JSON is a feasible alternative, you can easily ORDER BY any nested properties
      – If a small number of results are returned by the query, it would be practical to do the sorting client-side after getting the query results

      • Avatar
        Rich Mercer

        Really useful article thanks. However, I’m confused by your suggestion here so perhaps I’m missing something. How could you model an array as nested JSON? Surely if it’s an array it’s an array, and I’m not able to see how you could model that array differently as nested JSON. Do you have an example of that to help me out? I run into this problem a lot in my application where I’m having to sort C# lists as sorting isn’t possible in Cosmos. This will not scale over time as the number or results increases. I find this same restriction frustrating for other aggregate functions too, having come from SQL.

        • Avatar
          Tim SanderMicrosoft employee

          Hi Rich,

          It’s not always possible to do this, but in some cases you can model data slightly differently to be nested json instead of an array.

          For example, let’s say we have a document that contains a customer’s daily bank balance. This document has the bank balance in an array:

          {
            "id": "Contoso-Checking-Account-2020",
            "balance": [
                {
                  "day": 1,
                  "checkingAccount": 1000,
                  "savingsAccount": 5000
                },
                {
                  "day": 2,
                  "checkingAccount": 100,
                  "savingsAccount": 5000
                },
                {
                  "day": 3,
                  "checkingAccount": -10,
                  "savingsAccount": 5000,
                },
                   ...
            ]
          }

          Instead of using an array, you could have a property named Day1, Day2, Day3, etc and have the checkingAccount and savingsAccount balance nested within that.

          Examples like this are probably the exception, rather than the norm. Like you mentioned, sometimes you absolutely need an array.