Autoscale + serverless: new offers to fit any workload

Deborah Chen

This blog post was co-authored by Deborah Chen and Thomas Weiss, program managers on Azure Cosmos DB. 

Update: Serverless is available in preview for the Core (SQL) API as of August 19, 2020.

Over the years, we’ve heard from many of you that you’d like more flexibility with the Azure Cosmos DB billing model to better balance cost and performance. With our current provisioned throughput model, it’s easy to set the exact throughput you need for your workload, measured in Request Units per second (RU/s), guaranteed by Azure Cosmos DB’s SLAs. For scenarios where your application doesn’t need constant throughput however, we know choosing the right RU/s can be a challenge. For example, some apps might have variable, unpredictable traffic, while others might not have consistent usage at all, seeing only sporadic spikes of usage.

To make this easier, we’re excited to announce two new ways to pay for your database operations: the general availability of autoscale provisioned throughput, which brings automatic scaling to our provisioned throughput model, and the upcoming preview of serverless. Together, provisioned throughput and serverless ensure that Azure Cosmos DB is, more than ever, a database that delivers the best performance and cost-effectiveness for any kind of workload.

Autoscale provisioned throughput (GA)

With autoscale provisioned throughput (formerly known as “autopilot”) Azure Cosmos DB automatically and instantaneously scales your throughput (RU/s) based on the workload usage within a preset range. This means you can focus on building your application and let Azure Cosmos DB handle the work of capacity and scale management for you. Autoscale is well suited for mission-critical applications that have variable or unpredictable traffic patterns.

It’s backed by all Azure Cosmos DB SLAs, supports all Azure Cosmos DB APIs – Core (SQL), Gremlin, Cassandra, Table, and API For MongoDB –  and helps optimize your RU/s usage and cost by scaling down when not in use.

How does autoscale work?

You set the highest or maximum throughput (RU/s), T_max, you want your database or container to scale to. Azure Cosmos DB scales the RU/s based on usage, so that it’s always between 10% of T_max and T_max. For example, if you set a maximum throughput of 10,000 RU/s, this will scale between 1000 to 10,000 RU/s. Billing is done on a per-hour basis, for the highest RU/s the system scaled to within the hour.

Autoscale automatically adjusts the provisioned throughput to your traffic

Based on your feedback from the preview, with GA, we are introducing several new features to make autoscale easier to use:

Custom values are now supported for the maximum throughput (RU/s), which replaces the set of values available in the preview. This gives you more flexibility to set the right value based on the needs of your workload. Any resources created during the preview will automatically be compatible with the new model.

Create new autoscale container in Azure portal

Autoscale can now be enabled on existing databases and containers. This makes it easy to take advantage of autoscale without having to migrate your data or create resources. If you have an existing workload with a variable or unpredictable traffic pattern, and don’t currently do manual scaling yourself, enabling autoscale can help optimize your RU/s usage and cost, as it will scale down to the minimum of the RU/s range when not in use.

Finally, we now have programmatic support in the latest versions of the Azure Cosmos DB SDKs for .NET and Java, Resource Manager, and commands for Cassandra API and API for MongoDB. Support for PowerShell and Azure CLI will be available in an upcoming release.

How you can save with autoscale provisioned throughput

Imagine a workload with certain characteristics:

  • Sustained traffic that varies over time, with no predictable pattern
  • Peaks of 50,000 RU/s throughput the month
  • Peaks occur no more than 25% of the time

Example of workload with variable traffic between 5000 to 50,000 RU/s

The workload’s peak is easily identified – 50,000 RU/s – but the throughput needs for the rest of the time keeps changing. If you use standard provisioned throughput for this workload, you may decide to simply set the throughput capacity to 50,000 RU/s in order to manage your peaks over the course of the month. However, doing this would mean that you’d be paying the maximum all month – despite only needing that capacity 25% of the time.

With autoscale provisioned throughput, you would set 50,000 RU/s as your maximum and then pay for the throughput your workload uses, starting at 10% of your maximum.  As long as you don’t use your maximum more than 1/3 of the time, you can expect considerable savings. Learn more about how to choose between standard and autoscale throughput.

Announcing serverless on Azure Cosmos DB

Autoscale is a great fit for any situation where you need guaranteed throughput and performance, and your traffic isn’t predictable enough to scale your throughput manually. But what if your workload doesn’t require sustained throughput?

In some scenarios, you may expect your Azure Cosmos DB database to sit idle most of the time, only processing requests occasionally. This is typically the case when you get started with Azure Cosmos DB, build a prototype, or even run small, non-critical applications. Provisioning throughput isn’t required here; instead, you just need a cost-effective way to pay for the individual database requests you are sending.

A spiky workload with sporadic requests

To best serve this kind of use-cases, we are extremely excited to announce the upcoming preview of Azure Cosmos DB serverless, a purely consumption-based offer. With serverless, you will only pay for:

  • the request units (RU) consumed by your database operations,
  • the storage consumed by your data.

Only RUs consumed by your requests get billed in serverless mode

Because serverless is a true pay-per-request billing model, it will lower even further the entry price for anyone who wants to start using Azure Cosmos DB or run small applications with light traffic.

Azure Cosmos DB serverless will launch in public preview in the next couple of months and will be available for all Azure Cosmos DB APIs.

A tour of Azure Cosmos DB pricing models in video

Watch this video to better understand how autoscale provisioned throughput and serverless make Azure Cosmos DB a cost-effective solution for any kind of workload:

Get started

To get started with autoscale, check out our guide on how to determine if you should use autoscale for your workload. Learn more about how autoscale works and how to enable autoscale on a new or existing workload.


Discussion is closed. Login to edit/delete existing comments.

  • Andrew Moreno 0

    For autoscale per-hour pricing what would happen in the following scenario: I have a container with max of 4000 RUs (min 400) which is idle except for a 2 minute burst up to 4000 RUs at the end of an hour and into another hour, say from minute 59 of one hour to minute 01 of the next hour and then went back to being idle. Will I be billed at rate of 4000 RUs for both hours?

    • Deborah ChenMicrosoft employee 0

      Hi Andrew, thanks for the question! Autoscale bills based on the highest RU/s the system scaled to within the hour. So in your scenario, yes, it would be billed for 4000 RU/s in both hours. Autoscale is best suited for when you have consistent, but unpredictable traffic, so if your workload does sit idle for most of the time, the upcoming serverless preview may be a better fit.

  • Modus Ponens 0

    Is it also supported in Azure Government cloud?

    • Mark BrownMicrosoft employee 0

      Yes, this is available in Gov Cloud. We do not yet have ARM template support for it yet but that should be available in < 2 weeks.


  • Ather Shareef 0

    Cost Query:
    Note: Unit (100 RU/s per hour) and single-region account

    Cost of Manually configuring provisioned throughput is $0.008/hour -> A
    Cost of Automatically configuring provisioned throughput with Autopilot is $0.012/hour -> B

    i.e. Autopilot container is 50% costlier than normal container. Similarly what will be cost of Serverless containers? Will it be same as A or B? How will they be related to A & B?

    • Mark BrownMicrosoft employee 0

      You can read here to get a better understanding of when autoscale is the better option. Certainly there are scenarios where it is not cost effective, particularly for workloads with relatively stable and constant load.

      Serverless pricing will be available when we release to preview in a couple months.


  • Murray Bauer 0

    I couldn’t find the pricing for the new Serverless pricing model – when will this be available?

    • Mark BrownMicrosoft employee 0

      Serverless pricing will be available when we release to preview in a couple months.


  • Maxim Rybkov 0

    Same limits per number of containers as in the Preview? Any news to increase number of containers?

  • Yair RipshtosMicrosoft employee 0

    Great article, Deborah!
    I have a question – if serverless is a pay-per-request, why not using it in the first place? That way we won’t have to even configure the maximum RUs for the autoscale option and just use Cosmos.

    • Kshitij Sharma 0

      My guess is the per-request pricing will be higher. So if your read/write volume is high this will cost more than the autoscale option.

      Could be completely wrong. We will know when they announce the pricing.

      • Mark BrownMicrosoft employee 0

        Yes, this is correct. Serverless is intended for workloads where there is infrequent usage.

  • Fangyan XuMicrosoft employee 0

    Hi, Deborah, I am very excited to see the autoscale and serverless launch, our service is using CosmosDB and would like try them asap. I think autoscale will help much, just wonder if we could use severless in our cases, as pay-per-request sounds like more efficient.

    It was mentioned in this article that the serverless is more suitable for a non-critical service with low traffic, what is definition of low traffic?

    • Mark BrownMicrosoft employee 0

      In general workloads that at times have zero requests. When we release the preview we will include the pricing and guidance for how to choose the right throughput for your app.


  • Paul Huizer 0

    An app that has 24×7 ingress and occassional (5x a day) high load queries for reporting purposes would need a hybrid solution. ‘Manual’ for baseload, and PayPerUse on exceeding RU’s. Any thoughts on that?

    • Mark BrownMicrosoft employee 0

      If your app is largely steady state with pre-determined periods of heavier load due to regularly run queries then it would be more efficient to use standard throughput and combine that with this tool I wrote that schedules throughput changes using Azure Functions and PowerShell.

      Hope this is helpful.

      • Paul Huizer 0

        This indeed sounds like the way to go. Thanks!

Feedback usabilla icon