Getting started with Azure Cosmos DB – end to end example

Cyrille

This is the first of two articles in which we will walk you through the steps for realizing your first Azure Cosmos DB implementation. In this post, we cover:

  • Relational or NoSQL
  • Identify access patterns
  • Configure Azure Cosmos DB
  • Using the Client SDK

Part 2 looks at performance tuning and monitoring.

Relational or NoSQL

For our scenario, we need to store data from sports events (e.g., marathon, triathlon, cycling, etc.). Users should be able to select an event and view a leaderboard. The amount of data that will be stored is estimated at 100 GB.

The schema of the data is different for various events and likely to change. As a result, this requires the database to be schema-agnostic and therefore we decided to use Azure Cosmos DB as our database. For more information, see Understanding the differences between Azure Cosmos DB NoSQL and relational databases | Microsoft Docs.

Identify access patterns

To design an efficient data model it is important to understand how the client application will interact with Azure Cosmos DB. The most important questions are:

  1. Is the access pattern more read-heavy or write-heavy?
  2. What are the main queries?
  3. What is the expected document size?

If the access pattern is read-heavy you want to choose a partition key that appears frequently as a filter in your queries. Queries can be efficiently routed to only the relevant physical partitions by including the partition key in the filter predicate.

When the access pattern is write-heavy you might want to choose item ID as the partition key. Item ID does a great job with evenly balancing partitioned throughput (RUs) and data storage since it’s a unique value. For more information, see Partitioning and horizontal scaling in Azure Cosmos DB | Microsoft Docs.

Finally, we need to understand the document size. 1 kb documents are very efficient in Azure Cosmos DB. To understand the impact of large documents on RU utilization see the capacity calculator and change the item size to a larger value. As a starting point you should start with only one container and embed all values of an entity in a single JSON document. This provides the best reading performance. However, if your document size is unpredictable and can grow to hundreds of kilobytes you might want to split these in different documents within the same container. For more information, see Modeling data in Azure Cosmos DB – Azure Cosmos DB | Microsoft Docs.

Let’s apply the considerations mentioned above to our scenario:

Our scenario

For our scenario, we expect far more reads than writes. Therefore we will optimize our Azure Cosmos DB instance for read access. The main queries for our scenario are:

Query 1: View top ranked participants for a selected event
SELECT * FROM c 
WHERE c.Eventname = '<eventname>'
AND c.Eventdate = '<eventdate'
ORDER BY c.TotalScore asc
Query 2: View all events for a selected year a person has participated in
SELECT c.Eventname FROM c 
WHERE c.Eventdate > '<startdate>' AND c.Eventdate < '<enddate>'
AND c.ParticipantId = <id>
Query 3: View all registered participants per event
SELECT c.ParticipantFirstname, c.ParticipantLastname, c.ParticipantId  FROM c 
WHERE c.Eventname = '<eventname>'
Query 4: View total score for a single participant per event
SELECT c.ParticipantFirstname, c.ParticipantLastname, c.TotalScore FROM c 
WHERE c.ParticipantId = <id>
AND c.Eventname = '<eventname>'

Except for query 2, all these queries are using Eventname as a filter. The amount of data stored per event is expected to be 1 GB which is well below the 20 GB limit of a logical partition. Therefore Eventname is a great fit as partition key. The document size for our sports events is small; less than 1 kb. Below is an example of such a document. Depending on the event the columns might change.

"id": "d5c137ae-5954-4eb2-9a7b-59ad3e65236d",
"Eventname": "Triathlon Amsterdam",
"Eventdate": "2021-03-04",
"ParticipantId": 4713,
"ParticipantLastname": "Visser",
"ParticipantFirstname": "Cyrille",
"Category": "Olympic",
"SwimmingScore": "03:17:16",
"CyclingScore": "03:25:48",
"RunningScore": "03:04:23",
"TotalScore": "09:47:27"

Configure Azure Cosmos DB

Now that we have a good understanding of the data model, we can create our Azure Cosmos DB account. For more information, see Create an Azure Cosmos account, database, and container from the Azure portal. Once the account is created a new database can be created using the Data Explorer:

Application Description automatically generated with medium confidence

We will not provision throughput on the database level. Provisioning throughput on the database level is useful when you need multiple containers which have a different utilization pattern. Since we only use a single container and we want predictable throughput for our container, we will provision throughput on the container level. For more information, see Provision throughput on Azure Cosmos containers and databases

Once the database is created a new container can be created:

Graphical user interface, text, application, email Description automatically generated

At the container level the partition key is specified, which in our case is /Eventname. We also provision throughput manually and select 1000 RU as a starting point. You can always change the amount of RU provisioned, or switch to autoscale at any moment in time. This doesn’t cause any downtime. For more information, see How to choose between manual and autoscale on Azure Cosmos DB.

Once the container is created, we can add a few documents by selecting items and clicking new item.

Graphical user interface, application Description automatically generated

Now we have added a few documents, we click the Query button and test one of our main queries. When clicking on Query Stats we see useful information about the RU consumption and the efficiency of our query. When writing queries for Azure Cosmos DB always validate how efficient your queries are.  For more information, see Troubleshoot query issues when using Azure Cosmos DB

Graphical user interface, text, application Description automatically generated

Using the Client SDK

Now our data model is defined and our Azure Cosmos DB instance is configured we will create the first version of our application using the Client SDK. The Client SDK is available for multiple languages, but for our scenario we will choose the latest stable .NET client library. For more information, see Quickstart – Build a .NET console app to manage Azure Cosmos DB SQL API resources

It is highly recommended to use a singleton Azure Cosmos DB client for the lifetime of your application. The reason for this is that initiating the client object is an expensive operation. Once initialized the client object addresses connection management and caching. For more information, see Azure Cosmos DB performance tips for .NET SDK v3

Writing data

Adding an item using the Client SDK is straightforward:

ItemResponse<dynamic> response = await _container.CreateItemAsync<dynamic>(item, new PartitionKey(partitionKeyValue));

The ItemResponse object has a property called RequestCharge. This property displays the RU’s being used to perform the insert operation. It is recommended to log this value for troubleshooting purposes. For more information, see Find request unit (RU) charge for a SQL query in Azure Cosmos DB

In our case the Insert operation consumes 9 RU.

Selecting data

To prepare and parameterize a query we will use the QueryDefinition object. For creating a query we can use the GetQueryItemIterator method:

            List<dynamic> list = new List<dynamic>();

            QueryDefinition query = new QueryDefinition("SELECT c.TotalScore FROM c WHERE c.Eventname = @Eventname AND c.Eventdate = @Eventdate ORDER BY c.TotalScore ASC")
                .WithParameter("@Eventname", eventName)
                .WithParameter("@Eventdate", eventDate);

            using (FeedIterator<dynamic> resultset = _container.GetItemQueryIterator<dynamic>(query))
            {
                while (resultset.HasMoreResults)
                {
                    FeedResponse<dynamic> response = await resultset.ReadNextAsync();
                    Console.WriteLine("Q1 took {0} ms. RU consumed: {1}, Number of items : {2}", response.Diagnostics.GetClientElapsedTime().TotalMilliseconds, response.RequestCharge, response.Count);

                    foreach (var item in response)
                    {
                        list.Add(item);
                    }
                }
            }

The GetQueryItemIterator returns a FeedIterator object. With this object, we can loop through the items. You will notice that the SDK applies query pagination. By default, a maximum of 1000 items are returned at once. This can be changed by adjusting the MaxItemCount of the QueryRequestOptions object. Using the method ReadNextAsync() method we can iterate through the entire result set. It is important to understand each iteration consumes RUs.

The table below shows the RUs consumed for each query.

Query # Iterations Items returned RU consumption
1 6 5001 208
2 1 10 3
3 6 5002 191
4 1 1 3

 

In part 2, we will focus on optimizing our queries.

 

Get started

3 comments

Leave a comment