May 19th, 2025
heart3 reactions

Elevating Azure Cosmos DB Resilience with Per Partition Automatic Failover

Announcing Per Partition Automatic Failover for Azure Cosmos DB

Today, we’re excited to announce the preview of Per Partition Automatic Failover (PPAF) for Azure Cosmos DB, a significant improvement to our single-region write accounts that boosts availability and resilience. If you rely on Azure Cosmos DB to be “always on” for your mission-critical applications, this new feature is built for you. Per Partition Automatic Failover enables Azure Cosmos DB to recover more quickly and efficiently during regional outages.

Azure Cosmos DB already supports active-active deployment using multi-writes. With PPAF, we now enable an active-active architecture by automatically allowing individual partitions to failover to other region while preserving the desired level of data consistency.

Active-Active deployments using multi-writer and PPAF.
Active-Active deployments using multi-writer and PPAF.

It’s a smarter failover for your database. Traditionally, if your account’s write region experienced an outage, the system had to fail over the entire account to a secondary region which required time and coordination. PPAF changes that by making geo-failover completely automatic and far more agile by performing failovers at the partition level.

If a partition-set in preferred write region has an outage, PPAF automatically promotes another region, based on the priority order configured, for that partition-set as the new write region. Meanwhile, unaffected partitions continue writing to the preferred region without interruption. These granular, independent failovers keep the failover footprint limited to only the affected areas, eliminating the need to fail over an entire and potentially large account. When the original write region becomes healthy again, the system detects the recovery and initiates a failback to the preferred region, automatically reconciling any data changes during the process.

Near-Instant Recovery During Outages

The biggest benefit you’ll notice is faster recovery. Partition failovers are now much more granular and complete with an RTO of less than 2 minutes at P99. In our observations, Azure Cosmos DB accounts with PPAF enabled maintained write availability even during partial or full regional outages. Affected partitions switched regions so quickly that applications stayed online with no downtime.

Transparent to Your Application

The beauty of this feature is that your application doesn’t need complex logic to take advantage of it, especially for strong consistent accounts. You continue writing to your Azure Cosmos DB account endpoint as usual. Behind the scenes, the Azure Cosmos DB SDKs handle redirection when a partition failover occurs. When that happens, your app automatically retries writes to the new write region for that partition, no code changes required, other than upgrading to the latest SDK and enabling the feature on your account.

Try it in Preview: Enable Per Partition Automatic Failover

Starting today, PPAF is available in preview for Azure Cosmos DB for NoSQL accounts. If you have a single-write region account with at least one read region, you can opt-in to this preview. Here’s how to get started:

  • Check prerequisites: Currently, your account needs to use Strong, Session, Consistent Prefix or Eventual consistency and reside in public Azure regions. Also update your SDK to the latest version for .Net and Java SDK. We’ll add support for Bounded Staleness consistency level soon and other SDK versions soon – stay tuned.
  • Enable the feature: You can enable the feature by accessing the preview capabilities on the subscription. On the preview features pane, for Azure Cosmos DB, enable the PPAF(Preview) feature. You can get more details on enabling preview features here.

Raising the Bar for Cloud Database Uptime

PPAF is part of our ongoing commitment to make Azure Cosmos DB the most reliable globally distributed database for your applications. Earlier, if you wanted highest availability, you might have considered using multiple-write regions in Azure Cosmos DB but needs to handle complexity of conflict resolution. Now, with this new capability, even if you prefer a single-write region setup, you can achieve a level of resiliency that was previously only possible with multi-region writes.

Learn More & Provide Feedback 

To dive deeper, explore the feature documentation for further details. If you have questions, reach out to cosmosdbppafpreview@microsoft.com.

This preview is just the beginning. Looking ahead, we plan to expand support and make this feature generally available. Your input during this preview phase is invaluable to us as we polish the experience.

Leave a review

Tell us about your Azure Cosmos DB experience! Leave a review on PeerSpot and we’ll gift you $50. Get started here.

About Azure Cosmos DB

Azure Cosmos DB is a fully managed and serverless NoSQL and vector database for modern app development, including AI applications. With its SLA-backed speed and availability as well as instant dynamic scalability, it is ideal for real-time NoSQL and MongoDB applications that require high performance and distributed computing over massive volumes of NoSQL and vector data.

To stay in the loop on Azure Cosmos DB updates, follow us on XYouTube, and LinkedIn.

Author

0 comments

Discussion is closed.