Azure Cosmos DB design patterns – Part 2: Data Binning

Jay Gordon

Over the years, customers have asked us for help in designing applications around specific scenarios they were trying to achieve. In some cases, these centered around implementing certain patterns using a JSON-based NoSQL database. Some of these patterns are very common in the NoSQL world, but not well understood by those new to NoSQL databases. Other patterns were very specific to the Cosmos DB service itself in demonstrating how to leverage specific capabilities to solve difficult architectural challenges.

Azure Samples / cosmsos-db-design-patterns

We’ve been capturing these patterns and sharing them with customers individually. We felt now was a good time to publish some of these more broadly to make more discoverable by users. The result is Azure Cosmos DB Design Patterns. A repository on GitHub that includes a wide variety of samples that show how to implement specific patterns that will allow you to solve design-related challenges when using Azure Cosmos DB for your solutions.

To help share these, we’ve created a blog post series on each of them. Each post will focus on a specific design pattern with a corresponding sample application that’s featured in this repository. We hope you enjoy and find this series useful.

Here’s a list of the previous posts in this series:

Azure Cosmos DB design pattern: Data Binning

This post will focus on the data binning pattern.  Data binning, also known as windowing or bucketing, is a technique that allows you to summarize or aggregate data into specific time intervals, or “bins.” It’s like putting data into neat little boxes, which can help you optimize costs, reduce storage requirements, and supercharge query performance.

The Scenario:

In the fast-paced world of data-driven decision-making, handling high-velocity data streams efficiently is a challenge many organizations face. Imagine you’re managing a hotel chain, and you’ve installed IoT devices in every room to monitor temperatures. These devices are sending an event to Azure IoT Hub every 5 seconds, resulting in a staggering 12,000 records per minute for a thousand rooms. But here’s the kicker: your Azure Cosmos DB-based application only needs to display results once per minute. That’s where the data binning pattern comes to the rescue.

In this scenario, we’re dealing with sensor data from IoT devices. The devices are diligently sending their temperature readings every 5 seconds. While this high-frequency data is crucial for monitoring, the Azure Cosmos DB-based application only needs minute-level summaries. Here’s how data binning is our hero.

Sample Implementation:

The sample implementation showcases how data binning can be implemented to optimize cost, reduce storage needs, and enhance query performance using Azure Cosmos DB.

  1. Simulating Sensor Events: The code simulates the continuous stream of sensor events. These events carry device IDs, timestamps, temperature readings, and units of measurement.
  2. Data Binning in Action: After simulating the sensor events, the magic happens. The application applies data binning, aggregating the data into 1-minute windows. This aggregation isn’t just about basic summarization; it calculates the average temperature, minimum temperature, maximum temperature, and the number of readings within each minute.
  3. Data Storage: With the data now neatly summarized, it’s time to store it in Azure Cosmos DB. Instead of storing every 5-second reading separately, you now have just one record per device per minute. This results in a significant reduction in the number of records, which is cost-effective and storage-friendly.
  4. Transformation Example: Here’s a sneak peek at how an event transforms from its original form (every 5 seconds) into the summarized form (1-minute window) before being written to Azure Cosmos DB:
Original Event:
{
  "deviceId": 1,
  "eventTimestamp": "12/30/2022 10:53:05 PM",
  "temperature": 71.3,
  "unit": "Fahrenheit",
  "receivedTimestamp": "12/30/2022 10:53:05.128 PM"
}

Summarized Event:
{
  "deviceId": 1,
  "eventTimestamp": "12/30/2022 10:53:00 PM",
  "avgTemperature": 71.2,
  "minTemperature": 71.1,
  "maxTemperature": 71.3,
  "numberOfReadings": 12,
  "readings": [
    {
      "eventTimestamp": "12/30/2022 10:53:05 PM",
      "temperature": 71.1
    },
    {
      "eventTimestamp": "12/30/2022 10:53:10 PM",
      "temperature": 71.1
    },
    // and so on...
  ],
  "receivedTimestamp": "12/30/2022 10:54:00 PM"
}

Why It Matters:

  • Cost Optimization: Data binning helps you optimize costs by slashing the number of records stored in Azure Cosmos DB. This is vital when dealing with high-velocity data streams, where storing every data point individually could be a costly affair.
  • Query Performance Boost: By aggregating data into predefined time windows, you make querying and analyzing the data faster and more cost-effective. Minute-level queries become a breeze compared to sifting through individual data points.
  • Storage Efficiency: Storing summarized data requires less storage space, making it a win-win situation for your wallet and your database.

Versatile Applications:

While this example revolves around IoT sensor data, the data binning pattern can be applied in various contexts. For instance, consider a social media platform where you want to analyze user engagement based on the number of likes received by posts. Instead of diving into individual-like events, you could use data binning to group posts into different like ranges, making analysis more efficient.

In a world where data never sleeps, the data binning pattern is your secret weapon. It keeps your costs in check, your queries lightning-fast, and your storage needs minimal. Thanks to Azure Cosmos DB’s flexibility and scalability, you can implement this pattern seamlessly and make the most out of your high-velocity data.

Getting Started with Azure Cosmos DB Design Patterns

You take a look at the sample code by visiting the data binning pattern on GitHub. You can also try this out for yourself by visiting the Azure Cosmos DB Design Patterns GitHub repo and cloning or forking it. Then run locally or from Code Spaces in GitHub. If you are new to Azure Cosmos DB, we got you covered with a free Azure Cosmos DB account for 30 days, no credit card required. If you want more time, you can extend the free period. You can even upgrade too.

Sign up for your free Azure Cosmos DB account at aka.ms/trycosmosdb.

Explore this and the other design patterns and see how Azure Cosmos DB can enhance your application development and data modeling efforts. Whether you’re an experienced developer or just getting started, the free trial allows you to discover the benefits firsthand.

To get started with Azure Cosmos DB Design Patterns, follow these steps:

  1. Visit the GitHub repository and explore the various design patterns and best practices provided.
  2. Clone or download the repository to access the sample code and documentation.
  3. Review the README files and documentation for each design pattern to understand when and how to apply them to your Azure Cosmos DB projects.
  4. Experiment with the sample code and adapt it to your specific use cases.

About Azure Cosmos DB

Azure Cosmos DB is a fully managed and serverless distributed database for modern app development, with SLA-backed speed and availability, automatic and instant scalability, and support for open-source PostgreSQL, MongoDB, and Apache Cassandra. Try Azure Cosmos DB for free here. To stay in the loop on Azure Cosmos DB updates, follow us on TwitterYouTube, and LinkedIn.

2 comments

Discussion is closed. Login to edit/delete existing comments.

  • John King 0

    I couldn’t understad what Cosmos db are, because I don’t use Azure, I’ll choose mongodb if I need no sql database because It work the same on locall and server and even k8s. why there no way to download it If cosmos db is a database software (or a docker image)? the only download option is for development locally on Window, So that means to me that cosmos db is an “Azure only” software, then it’s a big “No” for me and many other company.

    • Jay GordonMicrosoft employee 0

      John,

      Thanks for your feedback. Azure Cosmos DB is a platform as a service database which is indeed meant to be run on Azure. There are emulators to run locally that include a MongoDB API option along with the NoSQL API described in this post. If you have more questions, feel free to ask.

Feedback usabilla icon