Ever since Azure Cosmos DB first released a Java SDK, we’ve had many customers ask us to improve the SDK such that the customer can max out request throughput with fewer client-side CPUs. Server-side, it’s no problem to scale up Azure Cosmos DB for any demand. However, in high-performance applications, the client app often drives compute cores to high utilization in an attempt to saturate container provisioned throughput. If adding more compute cores is cost-prohibitive, SDK efficiency and consistent performance under load become critical. Customers have used Java SDK v3 and v2 for these performance-intensive types of workloads in the past, with mixed results.
To that end, we are really excited to announce that Java SDK v4 is now GA. This release brings a lot of benefits to the SDK, such as improved performance, new features, flexible pricing, a more intuitive API, and plenty of sample code and documentation. Comparing against what is currently available, we believe v4 is an all-around better SDK for all “Azure Cosmos DB SQL API + Java” applications, especially those which are high-performance.
The Java SDK v4 Release Notes page has links to all of our Java SDK v4 documentation and samples and is the best starting-point for anything you want to learn about the SDK. In this blog post I’ll highlight the most exciting offerings in this new release.
How much does Java SDK v4 improve performance?
Java SDK v4 incorporates years of user feedback on the Java SDK. The result is substantial optimization of request throughput and stability under load. At time of writing, the rule of thumb is that Java SDK v4 has a 20% performance boost on older Java SDKs (v3 and v2) with more improvement expected in the future.
Java SDK v4 bundles Sync and Async APIs into one SDK, allowing the user to choose one or the other at client setup time. This means you only need to add the single Java SDK v4 Maven artifact to pom.xml to get access to both APIs. My v4 Sync vs Async blog post is a good starting point for using Async API to optimize throughput. Give this post a read to understand the benefits and tradeoffs of Sync and Async. Typically the rule is that Async API is the high-performance option because it uses threads more efficiently.
Java SDK v4 Direct Mode is based on TCP not HTTPS. This addresses two customer concerns. The first ask from multiple customers was to have TCP-based Direct Mode in Java SDK v4, for the crucial reason that TCP protocol supports request multiplexing. Request multiplexing is a critical SDK optimization not supported by early versions of HTTPS. The TCP-based Direct Mode implementation in Java SDK v4 is highly optimized and has been benchmarked to have lower latency than earlier HTTPS-based implementations.
The second ask was that both Sync and Async APIs should have TCP-based Direct Mode, and the implementations should work equally well so that Sync API users are not penalized. Java SDK v4 accomplishes this de facto because the Sync API is a blocking wrapper on the Async API, therefore Sync API uses the same TCP-based Direct Mode implementation as Async API.
Keys to getting the best performance? Start by reviewing the Java SDK v4 performance tips and troubleshooting docs to make sure you have fully optimized your application. Review Azure Cosmos DB best practices for data modeling and partitioning – in the long-term these factors can be more impactful than SDK when it comes to performance!
Java SDK v4 saves on throughput with autoscale now GA!
This year at BUILD 2020 the Azure Cosmos DB Team announced the GA of autoscale throughput provisioning, which enables flexible usage-based pricing for throughput (RU/s). Autoscale helps you save on workloads with “variable, unpredictable traffic.”
Java SDK v4 introduces full support for provisioning autoscale throughput programmatically, giving you fine-grained control over your savings. The ThroughputProperties class represents your choice of provisioned throughput and pricing model for a container:
Autoscale shared throughput provisioning on databases is also supported:
You can find more Java SDK v4 + autoscale sample code on the Java SDK v4 samples page.
Java SDK v4 adds DISTINCT query
SQL queries with DISTINCT are recognized by Azure Cosmos DB on the server-side, however DISTINCT also requires some client-side functionality which is implemented for the first time in Java SDK v4. DISTINCT query is useful for removing results which are duplicate. For example, the query below will return last names of individuals represented in the Azure Cosmos DB container, however DISTINCT will ensure that only unique last names are returned.
You can find more query sample code on the Java SDK v4 samples page.
Java SDK v4 – now with Analytical Time-to-Live (TTL)!Â
Also aligned with BUILD 2020, Azure Cosmos DB announced the public preview of Azure Synapse Link for Azure Cosmos DB HTAP capability and the Azure Cosmos DB Analytical Store – you can get an overview of the Azure Cosmos DB + Azure Synapse Link story here.
With analytical store enabled on your Azure Cosmos DB account, it’s easy to tier data retention between transactional store and analytical store by separately configuring TTL for each.
Transactional store container TTL:
Analytical store container TTL:
You can find Java SDK v4 + Analytical Store sample code on the Java SDK v4 samples page. Look here for more on Transactional Store TTL in Azure Cosmos DB.
Upgrade to Java SDK v4
Thinking of upgrading to Java SDK v4? Here’s a quick summary of what’s new:
Java SDK | Release Date | Bundled APIs | Maven Jar | Java package name | API Reference | Release Notes |
---|---|---|---|---|---|---|
Async 2.x.x | June 2018 | Async(RxJava) | com.microsoft.azure::azure-cosmosdb | com.microsoft.azure.cosmosdb.rx | API | Release Notes |
Sync 2.x.x | Sept 2018 | Sync | com.microsoft.azure::azure-documentdb | com.microsoft.azure.cosmosdb | API | Release Notes |
3.x.x | July 2019 | Async(Reactor)/Sync | com.microsoft.azure::azure-cosmos | com.azure.data.cosmos | API | – |
4.0 | June 2020 | Async(Reactor)/Sync | com.azure::azure-cosmos | com.azure.cosmos | API | Release Notes |
We recommend upgrading to Java SDK v4 for the best performance, the newest features, and the continued long-term support. We know upgrading is hard – that’s why we provide these migration guides which ease the transition. Take a look and give it your best shot!
- Java SDK v4 migration guide – a guide to the process of migrating to Java SDK v4 from previous versions, including a summary of breaking API changes as well as side-by-side code snippets comparing v4 with older versions.
- RxJava vs Project Reactor guide – a guide with side-by-side examples of upgrading Async Java SDK v2 code to Java SDK v4, with a focus on converting RxJava framework code to Project Reactor framework code.
Get started
It’s alright if you’re completely new to the Java SDK – follow these three steps to get started fast:
- Install the minimum supported Java runtime, JDK 8 so you can use the SDK.
- Work through Quickstart Guide for Java SDK v4 which gets you access to the Java SDK v4 Maven artifact and walks you through basic Azure Cosmos DB requests.
- Read the Java SDK v4 performance tips and troubleshooting guides to optimize the SDK for your application.
Then visit the Java SDK v4 Release Notes page for the rest of our documentation and sample code.
Happy coding!
0 comments