{"id":11603,"date":"2026-01-06T07:00:52","date_gmt":"2026-01-06T15:00:52","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/cosmosdb\/?p=11603"},"modified":"2025-12-19T10:55:56","modified_gmt":"2025-12-19T18:55:56","slug":"how-azure-cosmos-db-powers-arms-federated-future-scaling-for-the-next-billion-requests","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/cosmosdb\/how-azure-cosmos-db-powers-arms-federated-future-scaling-for-the-next-billion-requests\/","title":{"rendered":"How Azure Cosmos DB Powers ARM\u2019s Federated Future: Scaling for the Next Billion Requests"},"content":{"rendered":"<p><strong>The Cloud at Hyperscale: ARM\u2019s Mission and Growth<\/strong><\/p>\n<p>Azure Resource Manager (ARM) is the backbone of Azure\u2019s resource provisioning and management, orchestrating billions of daily requests from customers around the globe. ARM manages all resources for Azure: VMs, Storage, Databases, etc. As Azure\u2019s reach expands and customer expectations rise, ARM\u2019s architecture must not only keep pace\u2014it must set the pace for cloud-scale reliability, agility, and innovation.<\/p>\n<p>In recent years, ARM has seen its request volume surge at an exponential rate, reaching unprecedented levels that continually redefine the boundaries of cloud-scale operations. Meeting this demand requires more than incremental improvements; it calls for a fundamental reimagining of how ARM stores, replicates, and serves data at planetary scale.<\/p>\n<p><strong>The Challenge: Global Scale Meets Regional Demands<\/strong><\/p>\n<ul>\n<li>High replication latency across distant regions<\/li>\n<li>Single points of failure that could impact global availability<\/li>\n<li>Difficulty scaling to meet explosive growth in request rates<\/li>\n<li>Complexities in meeting regional compliance and data sovereignty requirements<\/li>\n<\/ul>\n<p>The solution? A federated, regionally isolated architecture\u2014powered by Azure Cosmos DB.<\/p>\n<p><strong>Enter Azure Cosmos DB: The Engine of Federated Evolution<\/strong><\/p>\n<p>Azure Cosmos DB is uniquely suited to meet ARM\u2019s evolving needs. Its multi-region, tunable consistency levels, seamless sharding and elasticity make it the ideal foundation for a federated architecture.<\/p>\n<ul>\n<li><strong>Multi-region support:<\/strong> Azure Cosmos DB allows ARM to deploy data stores across strategic \u201chero\u201d regions in a self-backup architecture, where each region serves as a backup for the others, ensuring high availability and disaster recovery.<\/li>\n<li><strong>Cross-region replication:<\/strong> Azure Cosmos DB\u2019s built-in cross-region replication provides fault tolerance and failover in the event of regional outages, ensuring data remains available and consistent in backup regions.<\/li>\n<li><strong>Automatic failover (PPAF):<\/strong> Per-partition automatic failover ensures that even if a region or partition experiences issues, requests are seamlessly routed to healthy replicas in other regions.<\/li>\n<li><strong>Request hedging:<\/strong> ARM can route requests to multiple stores or regions, minimizing latency and avoiding bottlenecks.<\/li>\n<li><strong>Flexible sharding:<\/strong> Both horizontal (across accounts) and vertical (across containers) sharding allow ARM to scale out and fine-tune performance for different workloads.<\/li>\n<\/ul>\n<p><strong>ARM\u2019s Unique Use of Azure Cosmos DB<\/strong><\/p>\n<p>At ARM\u2019s scale, the challenge isn\u2019t just storing data\u2014it\u2019s ensuring global consistency, minimizing replication latency and reducing the blast radius of outages. ARM uses Azure Cosmos DB in ways that go far beyond typical customer scenarios. To achieve this, ARM combines Azure Cosmos DB\u2019s native capabilities with custom-built solutions like inline durable replication (for intra-regional durability and follower container updates), advanced routing strategies and follower containers partitioned for workload optimization. These out-of-the-box innovations allow ARM to:<\/p>\n<ul>\n<li>Handle hyperscale workloads with predictable performance<\/li>\n<li>Reduce the impact of regional failures through layered failover strategies<\/li>\n<li>Optimize query and point-get operations across billions of resources<\/li>\n<\/ul>\n<p>This unique approach underscores the flexibility of Azure Cosmos DB and the engineering ingenuity required to operate at Azure\u2019s scale.<\/p>\n<p><strong>From Monolith to Federation: A New Architectural Paradigm<\/strong><\/p>\n<p><strong>1. Regional Segregation &amp; Global Federation<\/strong><\/p>\n<p>Each Azure region now maintains its own dedicated storage for resources, while a global store layer manages cross-region data and routing. This hybrid model delivers both regional autonomy and global consistency.<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-11628\" src=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2025\/12\/word-image-11603-1-6.png\" alt=\"client \/ app\" width=\"1428\" height=\"798\" srcset=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2025\/12\/word-image-11603-1-6.png 1428w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2025\/12\/word-image-11603-1-6-300x168.png 300w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2025\/12\/word-image-11603-1-6-1024x572.png 1024w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2025\/12\/word-image-11603-1-6-768x429.png 768w\" sizes=\"(max-width: 1428px) 100vw, 1428px\" \/><\/p>\n<p><strong>2. Sharding for Scale<\/strong><\/p>\n<p>To achieve massive scalability and performance, ARM employs a dual sharding strategy:<\/p>\n<ul>\n<li><strong>Horizontal sharding:<\/strong> Data is distributed across multiple Azure Cosmos DB accounts (stores) within a region or pool. ARM uses a consistent hashing algorithm on a routing key (such as subscription or tenant ID) to determine which store holds each piece of data. This approach allows ARM to scale out seamlessly by adding more stores as needed, balancing load, and minimizing the risk of \u201chot partitions\u201d or bottlenecks.<\/li>\n<li><strong>Vertical sharding:<\/strong> Within each store, data is further partitioned into multiple containers, each optimized for a specific entity type or workload. This enables fine-tuning of throughput and partitioning strategies for different data shapes and access patterns. Vertical sharding also allows ARM to add containers and repartition data as requirements evolve, ensuring flexibility and efficiency at every layer.<\/li>\n<\/ul>\n<p>This dual sharding approach empowers ARM to scale horizontally and vertically, supporting explosive growth and diverse workloads without sacrificing performance or manageability.<\/p>\n<p><strong><img decoding=\"async\" class=\"alignnone wp-image-11629 size-large\" src=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2025\/12\/word-image-11603-2-6-1024x599.png\" alt=\"reigonal store unit\" width=\"1024\" height=\"599\" srcset=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2025\/12\/word-image-11603-2-6-1024x599.png 1024w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2025\/12\/word-image-11603-2-6-300x176.png 300w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2025\/12\/word-image-11603-2-6-768x450.png 768w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2025\/12\/word-image-11603-2-6.png 1430w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/strong><\/p>\n<p><strong>3. Robust Replication &amp; Reliability<\/strong><\/p>\n<ul>\n<li><strong>Inline Durable Replication (ARM Innovation):<\/strong> Within each region, ARM\u2019s data layer implements inline durable replication\u2014a custom-built mechanism that keeps primary and secondary stores in sync and ensures that follower containers (which may use different partitioning schemes) are always updated. This approach provides strong intra-regional durability and supports workload-optimized data access.<\/li>\n<li><strong>Azure Cosmos DB Cross-Regional Replication:<\/strong> For fault tolerance and disaster recovery, ARM leverages Azure Cosmos DB\u2019s built-in cross-regional replication. This ensures that, in the event of a regional outage, data remains available and consistent in backup regions, supporting seamless failover and business continuity.<\/li>\n<li><strong>Background Repair:<\/strong> Any failed inline replications are handled by background processes, guaranteeing data durability and consistency.<\/li>\n<\/ul>\n<p><strong>4. Regional Isolation for Performance and Compliance<\/strong><\/p>\n<p>With regional isolation, reads and writes are served locally whenever possible, minimizing latency and supporting data sovereignty. In the event of a regional outage, traffic managers and circuit breakers ensure seamless failover to backup regions.<\/p>\n<p><strong>Value Delivered: Scalability, Reliability, Performance, and Cost<\/strong><\/p>\n<ul>\n<li><strong>Scalability:<\/strong> ARM can now handle surging request rates with ease, scaling horizontally and vertically as demand grows.<\/li>\n<li><strong>Reliability:<\/strong> Multiple data copies, custom intra-regional replication and Azure Cosmos DB\u2019s cross-regional failover mechanisms (using PPAF) ensure business continuity\u2014even during outages or network partitions.<\/li>\n<li><strong>Performance:<\/strong> Localized data access means faster, more predictable operations for customers worldwide.<\/li>\n<li><strong>Security &amp; Compliance:<\/strong> Data remains within regional boundaries, supporting strict compliance and sovereignty requirements.<\/li>\n<li><strong>Cost Optimization:<\/strong> By leveraging Azure Cosmos DB\u2019s Per-Partition Per-Region Dynamic Autoscale, ARM achieved a <strong>75% cost reduction<\/strong> in recent months\u2014demonstrating that hyperscale reliability and performance can go hand-in-hand with operational efficiency.<\/li>\n<\/ul>\n<p><strong>Voices from the Front Lines<\/strong><\/p>\n<p>\u201cWith Azure Cosmos DB\u2019s federated architecture, we\u2019ve reduced replication latency and improved reliability for millions of customers worldwide.\u201d<\/p>\n<p><em>\u2014 ARM Engineering Team<\/em><\/p>\n<p>\u201cThe ability to scale out by simply adding more stores or containers means we\u2019re always ready for the next wave of growth.\u201d<\/p>\n<p><em>\u2014 ARM Product Management<\/em><\/p>\n<p><strong>The Road Ahead: Toward Full Regional Isolation<\/strong><\/p>\n<p>ARM\u2019s journey toward full regional isolation is more than a technical upgrade\u2014it\u2019s a leap toward a more resilient, scalable, and customer-centric Azure. As we continue to innovate atop Azure Cosmos DB, we\u2019re building the foundation for the next generation of cloud applications\u2014where performance, reliability, and compliance are never compromised.<\/p>\n<p><strong>Get Involved \/ Learn More<\/strong><\/p>\n<p>Curious how Azure Cosmos DB enables globally distributed, high-throughput, and resilient architectures at scale? Explore the Azure Cosmos DB documentation and engineering blogs to learn how features like multi-region distribution, flexible consistency models, autoscale throughput, and partition-level failover can help you design cloud-scale systems.<\/p>\n<p>Whether you\u2019re building globally available applications or modernizing large, distributed platforms, Azure Cosmos DB provides foundational primitives to support growth, resiliency, and operational simplicity.<\/p>\n<h2><strong>About Azure Cosmos DB<\/strong><\/h2>\n<p>Azure Cosmos DB is a fully managed and serverless NoSQL and vector database for modern app development, including AI applications. With its SLA-backed speed and availability as well as instant dynamic scalability, it is ideal for real-time NoSQL and MongoDB applications that require high performance and distributed computing over massive volumes of NoSQL and vector data.<\/p>\n<p>To stay in the loop on Azure Cosmos DB updates, follow us on\u00a0<a href=\"https:\/\/twitter.com\/AzureCosmosDB\" target=\"_blank\" rel=\"noopener\">X<\/a>,\u00a0<a href=\"https:\/\/aka.ms\/AzureCosmosDBYouTube\" target=\"_blank\" rel=\"noopener\">YouTube<\/a>, and\u00a0<a href=\"https:\/\/www.linkedin.com\/company\/azure-cosmos-db\/\" target=\"_blank\" rel=\"noopener\">LinkedIn<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The Cloud at Hyperscale: ARM\u2019s Mission and Growth Azure Resource Manager (ARM) is the backbone of Azure\u2019s resource provisioning and management, orchestrating billions of daily requests from customers around the globe. ARM manages all resources for Azure: VMs, Storage, Databases, etc. As Azure\u2019s reach expands and customer expectations rise, ARM\u2019s architecture must not only keep [&hellip;]<\/p>\n","protected":false},"author":204880,"featured_media":11634,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1980,1981,1982],"tags":[1983],"class_list":["post-11603","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-azure-cosmos-db","category-azure-resource-manager","category-distributed-systems","tag-arm-federated-arm-azure-cosmos-db-hyperscale"],"acf":[],"blog_post_summary":"<p>The Cloud at Hyperscale: ARM\u2019s Mission and Growth Azure Resource Manager (ARM) is the backbone of Azure\u2019s resource provisioning and management, orchestrating billions of daily requests from customers around the globe. ARM manages all resources for Azure: VMs, Storage, Databases, etc. As Azure\u2019s reach expands and customer expectations rise, ARM\u2019s architecture must not only keep [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts\/11603","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/users\/204880"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/comments?post=11603"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts\/11603\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/media\/11634"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/media?parent=11603"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/categories?post=11603"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/tags?post=11603"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}