Today, we are excited to announce the General Availability of Per Partition Automatic Failover (PPAF) for Azure Cosmos DB NoSQL API. PPAF is a significant advancement in how Azure Cosmos DB delivers availability and resilience for mission-critical workloads running on single-write-region accounts.
If you rely on Azure Cosmos DB to be always on for your mission-critical applications — PPAF is built for you.
With PPAF, Azure Cosmos DB can automatically recover affected partitions by failing over writes to a secondary region within 3 minutes at P99, without requiring application changes.
A smarter, more granular failover
Azure Cosmos DB already supports active-active deployments through multi-region writes. With PPAF, you can now achieve an active-active architecture on a single-write-region account by allowing individual partitions to fail over to other regions automatically, while preserving your configured consistency level.
Traditionally, if your account’s write region experienced an outage, Azure Cosmos DB had to fail over the entire account to a secondary region, which is complex and requires time. PPAF makes geo-failover automatic and far more agile by performing failovers at the partition level. If a partition-set in the preferred write region has an outage, PPAF automatically promotes another region as the new write region for that partition-set. Unaffected partitions continue writing to the preferred region without interruption. When the original write region recovers, the system detects the recovery, initiates a failback to the preferred region, and automatically reconciles any data changes during the process.

Use cases PPAF enables
Mission-critical applications that cannot tolerate write downtime. Payments, order management, real-time gaming, ride-hailing, and IoT ingestion workloads where every minute of write unavailability has a direct business impact. With sub-three-minute RTO at P99, regional incidents become a non-event for most of your users.
Single-write-region workloads that need multi-region resiliency. Customers who previously considered multi-write regions purely for availability — but did not want to design and operate conflict resolution logic — can now achieve a comparable resiliency profile without that complexity.
Workloads with strict consistency requirements. Financial systems, ledgers, and inventory platforms running on Strong consistency maintain RPO = 0 through a PPAF failover. Your consistency contract is honored end-to-end.
Near-instant recovery during outages
Partition failover is granular and designed to complete within 3 minutes at P99, representing a significant improvement over traditional account-level failover.
In practice, Azure Cosmos DB accounts with PPAF enabled have maintained write availability during partial regional disruptions, with affected partitions redirecting writes to secondary regions within minutes.
Transparent to your application
Your application does not need additional logic to take advantage of PPAF. You continue writing to your Azure Cosmos DB account endpoint as usual. Behind the scenes, the Azure Cosmos DB SDKs handle redirection when a partition failover occurs, and your application automatically retries writes to the new write region for that partition. No code changes are required beyond upgrading to a supported SDK version and enabling the feature on your account.
What is new at General Availability
Since preview, we have expanded PPAF across the areas customers asked for:
- Broader consistency support. Strong, Session, Consistent Prefix, and Eventual consistency levels are supported at GA. Bounded Staleness is on the roadmap.
- Multi-language SDK coverage:
- .NET v3 v3.60.0 or later
- Java v4 v4.79.0 or later
- Python SDK v4.16.0 or later
- Node.js SDK v4.7.0 or later
- Production observability. A new
PartitionWriteGlobalStatusmetric shows the number of partitions writing in each region at any point in time. - Resilience defaults. Per-Partition Circuit Breaker and Read Hedging are enabled by default for PPAF-enabled accounts, with configurable thresholds.
- Chaos simulation kit. A sample application allows you to inject partition-level faults and validate failover behavior safely.
- Pricing. PPAF is included as part of the Azure Cosmos DB Business Critical service tier.
Enable Per Partition Automatic Failover
PPAF is available for Azure Cosmos DB for NoSQL accounts that meet the prerequisites. To get started:
- Upgrade your SDK to a supported version as described in the SDK support section above.
- Enable the feature via Features blade in the Account level settings
Step-by-step instructions are available in the how-to guide.
With Per Partition Automatic Failover, Azure Cosmos DB redefines how single-write-region applications achieve resilience. By combining partition-level failover, preserved consistency, and recovery within 3 minutes at P99, you can build always-on applications without added complexity. We’re excited to see how you use PPAF to raise the bar for availability in your workloads.
Learn more and provide feedback
To dive deeper, explore the feature documentation and resources below:
- How-to: Configure and use Per Partition Automatic Failover
- Samples and chaos simulation kit
- Implementing Decentralized Per-Partition Automatic Failover in Azure Cosmos DB
- Azure Cosmos DB pricing
About Azure Cosmos DB
Azure Cosmos DB is a fully managed and serverless NoSQL and vector database for modern app development, including AI applications. With its SLA-backed speed and availability as well as instant dynamic scalability, it is ideal for real-time NoSQL and MongoDB applications that require high performance and distributed computing over massive volumes of NoSQL and vector data.
To stay in the loop on Azure Cosmos DB updates, follow us on X, YouTube, and LinkedIn. Join the discussion with other developers on the #nosql channel on the Microsoft Open Source Discord.

0 comments
Be the first to start the discussion.