Azure Cosmos DB Java SDK v4 - Exploring the new Async API

In this second of our series for the Azure Cosmos DB Java SDK v4 for Core (SQL) API, I’m going to explore our new Async API.

To get you caught up though go check out the first post in this series Azure Cosmos DB Java SDK v4 – New Java SDK Quickstart Guide and Sample Code!

Current users of our Java SDK v2 are familiar with our Sync API and may have tried our Java SDK v3 with mixed results. We are really excited about our new Java SDK v4 because the performance is much better than our v3 SDK but also because the Java SDK v4 implements both an Async API and Sync API.

If you are a Sync API user you may be wondering why would I want to use the Async API. The answer is that asynchronous calls will allow you to better saturate your available throughput. This is important because you always want to squeeze every ounce of performance for the provisioned throughput you are paying for.

Another thing to understand about our new Async API is that it is built upon a Reactive programming model. Reactive programming is a declarative programming data flow paradigm in which program operation and control flow are described as a stream of data items passing through a pipeline of operations in which each operation affects the data which flows downstream. In practical terms, your code will describe a directed graph of operations which represents the logic of the program. Here’s a simple declarative data flow in pseudo-code:

asynchronous data source => operation1 => operation2 => operation3 => print

With the Async API, Java SDK 4.0 requires that you use Reactor framework to describe the logic of your application. In the rest of this post. I’ll demonstrate some common Cosmos DB tasks with Async API and Reactor. Then I’ll show the same tasks with Sync API for comparison. When you’re done, take a look at our new Reactor Pattern Guide to help you get started with Reactive Programming!

Note: Real Azure Cosmos DB performance test results are shown below. Reproducing them will incur container throughput and storage costs.

Async API

The Async API sends requests to Azure Cosmos DB using Reactor Netty with asynchronous I/O at the OS level – therefore your application won’t block waiting for the response to each request. Instead your application will push out as many requests per second as your system hardware and your provisioned throughput allow. Meanwhile responses from the server will be handled as they arrive. This raises the upper bound on request throughput substantially.

This code is from the Async Request Throughput sample. Here is an example of creating a new database- and container-if not-exists and then inserting new items using the Async API. Clients, databases, and containers have special async types in Java SDK v4 (like CosmosAsyncClient below.) Notice the time between request and response is used to run background tasks.

I configured an Azure VM to issue Async API requests to Azure Cosmos DB. As you can see below, the VM drove more than 64000 RU/s of async request throughput into a geographically co-located container using a single execution thread:

In all the tests I ran for this blog post, I changed container provisioned throughput from 400 RU/s to 100000 RU/s to fully demonstrate attainable throughput.

Now let’s see how Sync API compares.

Sync API

If you found the Reactor patterns hard to follow, a benefit of Sync API is that code can be simpler and easier to follow.

One thing you may find surprising is that the Sync API is actually built on top of the Async API. Put simply: it makes an Async API call and then blocks on it. For example a Sync API call to CosmosContainer.createItem() basically calls CosmosAsyncContainer.block(). .block() hangs until the Async API gets a response. Notice this means we cannot use the time between request and response for other tasks.

Here is a snippet from the Sync Request Throughput sample. This code below does the same thing as the snippet above, but using only sync calls. In the line that starts with client = , .buildClient() chooses the Sync API for your application.

I updated my Azure VM to send requests using Sync API, and for comparison it drove ~1000 RU/s of sync request throughput into Azure Cosmos DB – much less than the >64000 RU/s attainable with Async API:

And this is still with 100000 RU/s provisioned throughput. This happens because Sync API waits the full response time between requests. As long as Async API calls are non-blocking, Async throughput is the same regardless of response time and will be higher than Sync API.

Conclusion

So that’s it! We think our new Async API is pretty great. But if you’re not there yet and want or need to keep using the Sync API well we got you covered there too. If you’re ready to take that next step or just want to explore, we’ve got lots of resources to help you get started. Go check out our new Reactor Pattern Guide and our Java SDK v4 samples as well as our updated performance tips for Async Java.

Enjoy and keep an eye out for our next post, Azure Cosmos DB Java SDK v4, Post #3: How to make a Java SDK v4 app with Change Feed!!

4 comments

Discussion is closed. Login to edit/delete existing comments.

Grzegorz Kalisz April 13, 2020

Do you know when the library will be generally available?
- Andy Feldman Author April 19, 2020
  
  Grzegorz, the anticipated GA is in May. Look for an announcement in the timeframe of Microsoft’s BUILD conference!
Luis Bosquez April 9, 2020

This blog post is TOP notch
- Mark Brown April 9, 2020
  
  Good Day to you Sir!