Azure Cosmos DB design patterns – Part 5: Document versioning

Jay Gordon

Welcome to part five of our series of blog posts focused on sharing common design patterns you can use to build applications with Azure Cosmos DB for NoSQL. Over the years, customers have asked us for help in designing applications around specific scenarios they were trying to achieve. In some cases, these centered around implementing certain patterns using a JSON-based NoSQL database. Some of these patterns are very common in the NoSQL world, but not well understood by those new to NoSQL databases. Other patterns were very specific to the Cosmos DB service itself in demonstrating how to leverage specific capabilities to solve difficult architectural challenges.

Azure Samples / cosmsos-db-design-patterns

We have been capturing these patterns and sharing them with customers individually. We felt now was a good time to publish some of these more broadly to make it more discoverable by users. The result is Azure Cosmos DB Design Patterns. A repository on GitHub that includes a wide variety of samples that show how to implement specific patterns that will allow you to solve design-related challenges when using Azure Cosmos DB for your solutions.

To help share these, we have created a blog post series on each of them. Each post will focus on a specific design pattern with a corresponding sample application that is featured in this repository. We hope you enjoy and find this series useful.

Here is a list of the previous posts in this series:

Azure Cosmos DB design pattern: Document versioning

This post will focus on the document versioning design pattern. Document versioning in NoSQL databases is a design pattern that enables the tracking and management of changes to documents over time. This method is particularly useful in environments where data evolves frequently and where maintaining a historical record of these changes is essential. In a typical setup, each document is assigned a version number, which is updated as changes are made. The most recent version of the document is kept in a primary collection, while previous versions are stored in a separate historical collection. This approach ensures easy access to the latest data while preserving a complete history of document revisions.

The key advantage of document versioning is its ability to provide a comprehensive audit trail for data changes, which is critical in scenarios where understanding the evolution of data over time is important. However, this pattern also introduces additional complexity in data management. It increases the volume of write operations since each change results in a new version of the document. Furthermore, accessing historical data may require querying a separate collection, adding an extra layer of complexity to database interactions. Despite these challenges, document versioning remains a valuable tool in managing evolving data within NoSQL environments.

The Scenario:

In a common scenario across various industries, adherence to regulatory guidelines necessitates the retention and tracking of historical document versions. This is particularly relevant in sectors where data retention for auditing and document control is critical. To accommodate this, a two-pronged storage strategy is often implemented. The most current versions of documents are stored in a dedicated collection, named to reflect its purpose of holding present-day records. Conversely, a separate collection is earmarked for archiving historical documents. This division enhances the efficiency of data queries, as it allows for prompt access to current documents without the interference of historical records. It is noteworthy that the responsibility for managing document versioning typically falls on the application layer and is not a direct function of the database system, such as Azure Cosmos DB.

This approach is a hallmark of how NoSQL databases, like Azure Cosmos DB, can be tailored to meet specific regulatory compliance and data management needs in various industries. The separation of current and historical data not only aligns with legal requirements but also optimizes data retrieval, ensuring ease of access to both current and historical information as needed.

Sample Implementation:

In the sample implementation for document versioning in an eCommerce environment using Azure Cosmos DB for NoSQL, we manage order documents that undergo changes in their lifecycle. Here is how it is structured:

Original Order Document

At the beginning, an order is placed, and the document looks like this:

{
    "customerId": 10,
    "orderId": 1101,
    "status": "Submitted",
    "orderDetails": [
        {
            "productName": "Product 1",
            "quantity": 1
        },
        {
            "productName": "Product 2",
            "quantity": 3
        }
    ]
}

This document is stored in the CurrentOrderStatus container, representing the latest status of the order.

Updated Order Document (After Cancellation)

Suppose the customer decides to cancel the order. The document is updated to reflect this change:

{
    "customerId": 10,
    "orderId": 1101,
    "status": "Cancelled",
    "orderDetails": [
        {
            "productName": "Product 1",
            "quantity": 1
        },
        {
            "productName": "Product 2",
            "quantity": 3
        }
    ]
}

The updated document, showing the order’s cancellation, is then saved back to the CurrentOrderStatus container.

Implementing Document Versioning

In this implementation, the document versioning is handled at the application layer. Each time an order document is modified, a new version is created, and the following happens:

  • The updated document (with the latest status) is saved in the CurrentOrderStatus container.
  • A Function App monitors changes using Azure Cosmos DB’s change feed feature.
  • The Function App then copies this versioned document into the HistoricalOrderStatus container, preserving a record of each state of the order for historical tracking.

This process not only maintains the current state of each order but also builds a comprehensive history of changes, ensuring compliance with data retention policies and providing valuable insights for auditing and data analysis.

Why it Matters:

Using document versioning in database systems, particularly in NoSQL environments, is crucial for several reasons:

  • Regulatory Compliance and Auditing: Many industries are governed by strict regulations that require the retention and tracking of historical data for compliance purposes. Document versioning enables organizations to maintain an audit trail of changes, thereby complying with legal requirements and facilitating audits.
  • Data Integrity and Recovery: With document versioning, it is easier to track changes and revert to previous versions in case of errors or data corruption. This capability is vital for maintaining data integrity and ensuring reliable data recovery mechanisms.
  • Change Management and Collaboration: In scenarios where multiple users or systems might update documents concurrently, versioning helps in managing these changes effectively. It avoids conflicts by keeping a record of who made what changes and when, thus enhancing collaboration.
  • Historical Analysis and Reporting: Keeping historical versions of documents allows for detailed analysis and reporting. Organizations can track the evolution of data over time, gaining insights into trends, patterns, and operational efficiency.
  • System Performance Optimization: By segregating current and historical data into different collections or containers, document versioning can improve query performance. Systems can access current data more swiftly without sifting through a vast history of changes.
  • Bespoke Business Logic Implementation: Document versioning allows businesses to implement custom logic based on historical data changes. This might include triggering specific actions when data reaches certain states or maintaining custom logs for business analysis.
  • Enhanced User Experience: For applications that rely on historical data (like version control systems or content management systems), document versioning is essential for providing a rich user experience, allowing users to view, compare, and revert changes as needed.
  • Scalability and Futureproofing: As businesses grow and evolve, their data management needs become more complex. Document versioning offers a scalable way to manage data changes over time, ensuring that the system remains robust and adaptable for future requirements.

In summary, document versioning is a pivotal feature in modern database management, offering significant benefits in terms of compliance, data integrity, collaboration, and operational efficiency. Its implementation can be a key factor in an organization’s ability to manage data effectively in a dynamic and evolving business environment.

Getting Started with Azure Cosmos DB Design Patterns

You can review the sample code by visiting the Document versioning pattern on GitHub. You can also try this out for yourself by visiting the Azure Cosmos DB Design Patterns GitHub repo and cloning or forking it. Then run locally or from Code Spaces in GitHub. If you are new to Azure Cosmos DB, we have you covered with a free Azure Cosmos DB account for 30 days, no credit card required. If you want more time, you can extend the free period. You can even upgrade too.

Sign up for your free Azure Cosmos DB account at aka.ms/trycosmosdb.

Explore this and the other design patterns and see how Azure Cosmos DB can enhance your application development and data modeling efforts. Whether you are an experienced developer or just getting started, the free trial allows you to discover the benefits firsthand.

To get started with Azure Cosmos DB Design Patterns, follow these steps:

  1. Visit the GitHub repository and explore the various design patterns and best practices provided.
  2. Clone or download the repository to access the sample code and documentation.
  3. Review the README files and documentation for each design pattern to understand when and how to apply them to your Azure Cosmos DB projects.
  4. Experiment with the sample code and adapt it to your specific use cases.

About Azure Cosmos DB

Azure Cosmos DB is a fully managed and serverless distributed database for modern app development, with SLA-backed speed and availability, automatic and instant scalability, and support for open-source PostgreSQL, MongoDB, and Apache Cassandra. Try Azure Cosmos DB for free here. To stay in the loop on Azure Cosmos DB updates, follow us on TwitterYouTube, and LinkedIn.

Try Azure Cosmos DB free with Azure AI Advantage

Sign up for the Azure AI Advantage! The Azure AI Advantage offer is for existing Azure AI and GitHub Copilot customers who want to use Azure Cosmos DB as part of their solution stack. With this offer, you get 40,000 free RUs, equivalent of up to $6,000 in savings.

9 comments

Discussion is closed. Login to edit/delete existing comments.

  • Razvan Goga 1

    Hi!

    has the CosmosDB emulator (specifically the linux docker version) been deprecated? It’s unusable in Azure Devops pipelines since AzDevops deprecated the Ubuntu 18.04 images (more than 1 year ago) and the github repo looks like a ghost town

    • Paul Irwin 0

      I also am saddened by the lack of activity around the Cosmos DB emulator. It is a must to be able to develop applications offline without everyone having to have their own cloud Cosmos DB instance or stepping on your team members’ toes by sharing a database. Our team members that have Apple Silicon Macs have had to resort to installing a specific outdated version of the emulator in a Windows VM just to be able to use it, since the Linux Docker image still does not work on arm64 macOS even given Docker’s Rosetta emulation, over three years since the M1 came out. Combined with the failed .NET SDK v4 release, I hope the team will put efforts into improving this developer experience which feels a bit neglected.

        • Razvan Goga 0

          what about the problems on Ubuntu 20 and 22 (aka the only supported linux versions on AZDevops hosted agents)?

          • Mark BrownMicrosoft employee 0

            I am looking into that. I or someone on our team will respond here with an update. Thanks.

          • Sajeetharan SinnathuraiMicrosoft employee 0

            Hi Razvan, have you tried the latest version? both should work.

          • Razvan Goga 0

            Hi,

            which latest version would that be? when was it released? where can i find the release notes?

            i tried to run the docker emulator in a AzDevops pipeline at the end of November / early December 2023 and it did not work (same symptoms as in all the open github issues about this).

            https://github.com/Azure/azure-cosmos-db-emulator-docker/issues/45#issuecomment-1839417066

            I’ll give it another try, but if there is truly a new version that fixes the ubuntu behaviour, it would be very considerate from your side to update everyone on the countless open github issues referencing this problem – folks are waiting for a fix since the deprecation of the Ubuntu 18.

  • Jeroen Vrijkorte 0

    As per my understanding, the proposed solution is not reliable. The default change feed mode of CosmosDB has the following feature (quoted from docs):
    “Only the most recent change for a specific item is included in the change feed. Intermediate changes might not be available.”

    I think this implies that there are scenarios where changes to a document could be missed, for example if the change feed is processed in a polling fashion, and there are multiple updates to the same document within one change feed polling period. Depending on the reason for implementing document versioning, this may or may not be acceptable. I think it would be appropriate to mention such limitations in this article, as well as to suggest alternative approaches with strong guarantees about capturing all changes to a document.

    • Mark BrownMicrosoft employee 1

      Thanks Jeroen for your thoughts.

      Yes, it is a possibility as change feed is not a true op-log.

      This can be isolated somewhat by leveraging optimistic concurrency in our SDKs when there are concurrent writes as the time to re-read, merge and commit may span polling periods. But to completely eliminate, ever mutation must be committed. That capability is in preview now and you can learn more about it here, https://learn.microsoft.com/azure/cosmos-db/change-feed#all-versions-and-deletes-mode-preview.

Feedback usabilla icon