Autoscale + serverless: new offers to fit any workload

This blog post was co-authored by Deborah Chen and Thomas Weiss, program managers on Azure Cosmos DB.

Update: Serverless is available in preview for the Core (SQL) API as of August 19, 2020.

Over the years, we’ve heard from many of you that you’d like more flexibility with the Azure Cosmos DB billing model to better balance cost and performance. With our current provisioned throughput model, it’s easy to set the exact throughput you need for your workload, measured in Request Units per second (RU/s), guaranteed by Azure Cosmos DB’s SLAs. For scenarios where your application doesn’t need constant throughput however, we know choosing the right RU/s can be a challenge. For example, some apps might have variable, unpredictable traffic, while others might not have consistent usage at all, seeing only sporadic spikes of usage.

To make this easier, we’re excited to announce two new ways to pay for your database operations: the general availability of autoscale provisioned throughput, which brings automatic scaling to our provisioned throughput model, and the upcoming preview of serverless. Together, provisioned throughput and serverless ensure that Azure Cosmos DB is, more than ever, a database that delivers the best performance and cost-effectiveness for any kind of workload.

Autoscale provisioned throughput (GA)

With autoscale provisioned throughput (formerly known as “autopilot”) Azure Cosmos DB automatically and instantaneously scales your throughput (RU/s) based on the workload usage within a preset range. This means you can focus on building your application and let Azure Cosmos DB handle the work of capacity and scale management for you. Autoscale is well suited for mission-critical applications that have variable or unpredictable traffic patterns.

It’s backed by all Azure Cosmos DB SLAs, supports all Azure Cosmos DB APIs – Core (SQL), Gremlin, Cassandra, Table, and API For MongoDB – and helps optimize your RU/s usage and cost by scaling down when not in use.

How does autoscale work?

You set the highest or maximum throughput (RU/s), T_max, you want your database or container to scale to. Azure Cosmos DB scales the RU/s based on usage, so that it’s always between 10% of T_max and T_max. For example, if you set a maximum throughput of 10,000 RU/s, this will scale between 1000 to 10,000 RU/s. Billing is done on a per-hour basis, for the highest RU/s the system scaled to within the hour.

Based on your feedback from the preview, with GA, we are introducing several new features to make autoscale easier to use:

Custom values are now supported for the maximum throughput (RU/s), which replaces the set of values available in the preview. This gives you more flexibility to set the right value based on the needs of your workload. Any resources created during the preview will automatically be compatible with the new model.

Autoscale can now be enabled on existing databases and containers. This makes it easy to take advantage of autoscale without having to migrate your data or create resources. If you have an existing workload with a variable or unpredictable traffic pattern, and don’t currently do manual scaling yourself, enabling autoscale can help optimize your RU/s usage and cost, as it will scale down to the minimum of the RU/s range when not in use.

Finally, we now have programmatic support in the latest versions of the Azure Cosmos DB SDKs for .NET and Java, Resource Manager, and commands for Cassandra API and API for MongoDB. Support for PowerShell and Azure CLI will be available in an upcoming release.

How you can save with autoscale provisioned throughput

Imagine a workload with certain characteristics:

Sustained traffic that varies over time, with no predictable pattern
Peaks of 50,000 RU/s throughput the month
Peaks occur no more than 25% of the time

The workload’s peak is easily identified – 50,000 RU/s – but the throughput needs for the rest of the time keeps changing. If you use standard provisioned throughput for this workload, you may decide to simply set the throughput capacity to 50,000 RU/s in order to manage your peaks over the course of the month. However, doing this would mean that you’d be paying the maximum all month – despite only needing that capacity 25% of the time.

With autoscale provisioned throughput, you would set 50,000 RU/s as your maximum and then pay for the throughput your workload uses, starting at 10% of your maximum. As long as you don’t use your maximum more than 1/3 of the time, you can expect considerable savings. Learn more about how to choose between standard and autoscale throughput.

Announcing serverless on Azure Cosmos DB

Autoscale is a great fit for any situation where you need guaranteed throughput and performance, and your traffic isn’t predictable enough to scale your throughput manually. But what if your workload doesn’t require sustained throughput?

In some scenarios, you may expect your Azure Cosmos DB database to sit idle most of the time, only processing requests occasionally. This is typically the case when you get started with Azure Cosmos DB, build a prototype, or even run small, non-critical applications. Provisioning throughput isn’t required here; instead, you just need a cost-effective way to pay for the individual database requests you are sending.

To best serve this kind of use-cases, we are extremely excited to announce the upcoming preview of Azure Cosmos DB serverless, a purely consumption-based offer. With serverless, you will only pay for:

the request units (RU) consumed by your database operations,
the storage consumed by your data.

Because serverless is a true pay-per-request billing model, it will lower even further the entry price for anyone who wants to start using Azure Cosmos DB or run small applications with light traffic.

Azure Cosmos DB serverless will launch in public preview in the next couple of months and will be available for all Azure Cosmos DB APIs.

A tour of Azure Cosmos DB pricing models in video

Watch this video to better understand how autoscale provisioned throughput and serverless make Azure Cosmos DB a cost-effective solution for any kind of workload:

Get started

To get started with autoscale, check out our guide on how to determine if you should use autoscale for your workload. Learn more about how autoscale works and how to enable autoscale on a new or existing workload.

Author

Deborah Chen

Principal Product Manager

Deborah Chen is a Product Manager on Azure Cosmos DB. She focuses on building a great partitioning and elasticity experience to enable customers to get the best cost/performance in Cosmos DB, as well as investments to improve Cosmos DB for customers building multi-tenant apps. In her free time, she enjoys building demos to show the capabilities of

17 comments

Paul Huizer June 21, 2020

An app that has 24×7 ingress and occassional (5x a day) high load queries for reporting purposes would need a hybrid solution. ‘Manual’ for baseload, and PayPerUse on exceeding RU’s. Any thoughts on that?

Mark Brown June 21, 2020

If your app is largely steady state with pre-determined periods of heavier load due to regularly run queries then it would be more efficient to use standard throughput and combine that with this tool I wrote that schedules throughput changes using Azure Functions and PowerShell. https://github.com/Azure-Samples/azure-cosmos-throughput-scheduler

Hope this is helpful.
- Paul Huizer June 22, 2020
  
  This indeed sounds like the way to go. Thanks!

Fangyan Xu

June 15, 2020

Hi, Deborah, I am very excited to see the autoscale and serverless launch, our service is using CosmosDB and would like try them asap. I think autoscale will help much, just wonder if we could use severless in our cases, as pay-per-request sounds like more efficient.

It was mentioned in this article that the serverless is more suitable for a non-critical service with low traffic, what is definition of low traffic?

Mark Brown June 16, 2020

In general workloads that at times have zero requests. When we release the preview we will include the pricing and guidance for how to choose the right throughput for your app.

Thanks.

Yair Ripshtos

June 1, 2020

Great article, Deborah!
I have a question – if serverless is a pay-per-request, why not using it in the first place? That way we won’t have to even configure the maximum RUs for the autoscale option and just use Cosmos.

Kshitij Sharma June 4, 2020

My guess is the per-request pricing will be higher. So if your read/write volume is high this will cost more than the autoscale option.

Could be completely wrong. We will know when they announce the pricing.
- Mark Brown June 5, 2020
  
  Yes, this is correct. Serverless is intended for workloads where there is infrequent usage.

Maxim Rybkov May 22, 2020

Same limits per number of containers as in the Preview? Any news to increase number of containers?

Murray Bauer May 20, 2020

I couldn’t find the pricing for the new Serverless pricing model – when will this be available?

Mark Brown May 21, 2020

Serverless pricing will be available when we release to preview in a couple months.

Thanks.

Ather Shareef May 20, 2020

Cost Query:
Note: Unit (100 RU/s per hour) and single-region account

Cost of Manually configuring provisioned throughput is $0.008/hour -> A
Cost of Automatically configuring provisioned throughput with Autopilot is $0.012/hour -> B

i.e. Autopilot container is 50% costlier than normal container. Similarly what will be cost of Serverless containers? Will it be same as A or B? How will they be related to A & B?

Mark Brown May 21, 2020

You can read here to get a better understanding of when autoscale is the better option. https://docs.microsoft.com/en-us/azure/cosmos-db/how-to-choose-offer. Certainly there are scenarios where it is not cost effective, particularly for workloads with relatively stable and constant load.

Serverless pricing will be available when we release to preview in a couple months.

Thanks.

Modus Ponens May 20, 2020

Is it also supported in Azure Government cloud?

Mark Brown May 21, 2020

Yes, this is available in Gov Cloud. We do not yet have ARM template support for it yet but that should be available in < 2 weeks.

Thanks.

Andrew Moreno May 19, 2020

For autoscale per-hour pricing what would happen in the following scenario: I have a container with max of 4000 RUs (min 400) which is idle except for a 2 minute burst up to 4000 RUs at the end of an hour and into another hour, say from minute 59 of one hour to minute 01 of the next hour and then went back to being idle. Will I be billed at rate of 4000 RUs for both hours?

Deborah Chen Author May 19, 2020

Hi Andrew, thanks for the question! Autoscale bills based on the highest RU/s the system scaled to within the hour. So in your scenario, yes, it would be billed for 4000 RU/s in both hours. Autoscale is best suited for when you have consistent, but unpredictable traffic, so if your workload does sit idle for most of the time, the upcoming serverless preview may be a better fit.

Autoscale + serverless: new offers to fit any workload

Autoscale provisioned throughput (GA)

How does autoscale work?

How you can save with autoscale provisioned throughput

Announcing serverless on Azure Cosmos DB

A tour of Azure Cosmos DB pricing models in video

Get started

Author

17 comments

Read next

Enhanced encryption at rest with customer-managed keys

Run C# notebooks with Azure Cosmos DB

Autoscale provisioned throughput (GA)

How does autoscale work?

How you can save with autoscale provisioned throughput

Announcing serverless on Azure Cosmos DB

A tour of Azure Cosmos DB pricing models in video

Get started

Author

17 comments

Read next

Enhanced encryption at rest with customer-managed keys

Run C# notebooks with Azure Cosmos DB

Stay informed