Optimizing costs with the Azure Cosmos DB integrated cache

Tim Sander

Caching can be a great way to lower costs and reduce latency for read-heavy workloads. However, adding a cache also traditionally involves managing a resource separate from your database, including manually populating the cache and managing cache invalidation. In this blog, we’ll dive into the details of how to reap all the benefits without the traditional pain points of caching with the Azure Cosmos DB integrated cache!

With the integrated cache, you can add caching to an existing Azure Cosmos DB workload without modifying your application’s logic. The integrated cache helps read-heavy workloads further reduce costs and latency for repeated point reads and queries.

The integrated cache was battle-tested during the private preview last year and we were excited to announce the public preview at Build in May of this year. We are grateful for all the awesome customer feedback and feature requests in the private preview that helped shape our roadmap. A special thank you to the teams at ASOS for their feedback!

“Cosmos integrated cache is a potential gamechanger for the promotions platform at ASOS. We’ve significantly reduced our RU consumption and the overheard of managing cached data, while keeping well within our SLA’s for our consuming teams.” – Mhamun Hussain, Software Engineer at ASOS

If you try out the public preview and have feedback, please send us an email at: cosmoscachefeedback@microsoft.com.

Getting started:

To use the integrated cache, you must first provision a dedicated gateway. A dedicated gateway provides server-side compute resources that act as a front-end to your Azure Cosmos DB account. When your account has a dedicated gateway provisioned, the gateway routes requests and caches data. Like provisioned throughput, the dedicated gateway is billed hourly. Unlike provisioned throughput, it is billed as a set compute capacity rather than based on the number of requests.

Image showing how to provision a dedicated gateway
Provisioning a dedicated gateway cluster in the Azure portal


In most cases, the deciding factor for dedicated gateway size is the amount of data that you need to cache. Provisioning multiple dedicated gateway nodes, on the other hand, is helpful for improving availability or handling additional request volume.

You can select up to five nodes of the D4, D8 or D16 dedicated gateway sizes. Here’s a summary of the available SKUs:

SKU Name vCPU Memory
D4s 4 16 GB
D8s 8 32 GB
D16s 16 64 GB


When you provision a dedicated gateway cluster, an instance of the integrated cache is automatically provisioned on each dedicated gateway node. When your application connects to Azure Cosmos DB through the dedicated gateway and runs a point read or query, the gateway automatically checks the integrated cache first and only executes a read request on the backend partitions if it has not previously been cached. Points reads and queries that hit the integrated cache will have an RU charge of 0 and cache misses will have the same RU charge that they would have without the integrated cache.

Diagram of dedicated gateway connection
Connecting to the dedicated gateway with gateway mode


Modifying your application’s code:

After provisioning a dedicated gateway, it only takes minutes to modify your application’s code to start caching. You can connect to the dedicated gateway and use the integrated cache with the same SDK’s that you already use to connect to Azure Cosmos DB.

To start using the integrated cache to improve read costs, there are just three simple steps:

  1. Use the new connection string for the dedicated gateway
  2. Switch your connection mode to gateway mode
  3. Use eventual consistency (either for the entire account or for the specific requests that you’d like to use the integrated cache)

You can also set an optional MaxIntegratedCacheStaleness value, which is the maximum acceptable staleness of cached point reads and queries. The MaxIntegratedCacheStaleness is set at the per-request level and defaults to 5 minutes when unspecified.

Measuring performance improvements:

The best way to measure the impact of the integrated cache is to compare the consumed RU’s before and after using the integrated cache. Based on your new level of consumed RU’s after setting up the integrated cache, you can explore lowering your provisioned throughput. If the cost savings from throughput reduction is greater than the cost of the dedicated gateway, you should keep using it!

In general, workloads that fit the following characteristics are most likely to save money with the integrated cache:

  • Read-heavy
  • Many repeated point reads or queries
  • Many high RU queries

However, workloads with the below characteristic won’t benefit from the integrated cache:

  • Write-heavy
  • Rarely repeated point reads or queries
  • Session consistency or stronger consistency requirement

Quick example:

We’ve created a simple demo where we’ll run point reads and queries using a small dataset. In this demo, the application was hosted in a VM in the same region as an Azure Cosmos DB account with a dedicated gateway. We first ran 100 repeated point reads and queries but connected using the standard gateway instead of the dedicated gateway. Since we used the standard gateway, these reads couldn’t use the integrated cache. After that, we repeated the 100 point reads and queries using the dedicated gateway, so that the workload could benefit from caching.

Latency results (average):

Without integrated cache With integrated cache
Point read 1.6 ms 2.3 ms
Query 12.0 ms 3.3 ms


These demo results show how reads, even without caching, are blazingly fast in Azure Cosmos DB. In fact, point read latency is so low that caching is rarely necessary for achieving lower latency. However, for complex queries, caching will likely result in a more noticeable latency improvement.


RU charge results (average):

Without integrated cache With integrated cache
Point read 1 RUs 0 RUs
Query 81.4 RUs 0 RUs


The impact of the integrated cache on cost savings is significant. If reads are frequently repeated, the integrated cache makes these reads “free”. They won’t use any of your RUs!


Next steps:

The great thing about the integrated cache is that you can take any existing Cosmos DB workload on Core (SQL) API and optimize it in minutes without any major code changes. We do provide a demo with sample code that compares performance of backend vs. cached reads. However, you do not need to use this sample; any existing code that works with Azure Cosmos DB will also work with the integrated cache!


Discussion is closed. Login to edit/delete existing comments.

  • David Baker 0

    Hi Tim,

    Thanks for this article.

    – Can you tell us how MaxIntegratedCacheStaleness works? If I set it to 5 mins (which is the default value), what happens after 5 mins? Does the system automatically fetch the most recent version of an item and update the cache OR does it evict the item from the cache?
    – Is the cache configured at account level or can we target specific containers?
    – What’s the maximum number of gateway nodes that can be configured for a single Cosmos DB account? I assume that the cache is horizontally partitioned across the nodes. So, if a node was to go down, queries would hit the db directly? Is there any fault tolerance added to the nodes?

    • Tim SanderMicrosoft employee 0

      Hi David,

      To best understand how MaxIntegratedCacheStaleness works, here’s an example: https://docs.microsoft.com/en-us/azure/cosmos-db/integrated-cache#integrated-cache-retention-time. To answer your question, the integrated cache is updated only if a new read is run after 5 min. The first read after 5 min will be executed on the backend (so cache miss) and refresh the integrated cache.

      The integrated cache is account-level.

      You can provision up to 5 dedicated gateway nodes per account. Right now, the integrated cache is not partitioned across the nodes (but this is a planned feature). If a node were to go down now, the queries will just be routed through a dedicated gateway node, which might result in a cache miss (but then that node would then have the cached value). Provisioning multiple dedicated gateway nodes is the best way to achieve higher dedicated gateway availability.

      • David Baker 0

        Thanks for the reply. It’d be great if we can have container-level caching. For some containers, we’d like immediate consistency whereas others can tolerate eventual consistency and are good candidates for integrated cache.

  • Rzepka, Lou 0

    We primarily use the Java Script SDK for querying Cosmos. I assume that if I configure the Java Script client to use the Integrated Cache Endpoint, it would work as the C# SDK does. Is that a correct assumption? Related to that, to fully take advantage of the Integrated Cache, the MaxIntegratedCacheStaleness parameter would need to be set, however according to the documentation, that is not yet available in the Java Script SDK. Are there plans to incorporate that into the Java Script SDK? Another question we have is if the Cosmos data changes, is the Integrated Cache updated? Thanks for any information you can provide.

Feedback usabilla icon