June 30th, 2025
0 reactions

Latest NoSQL Java Ecosystem Updates: June 2024 – June 2025

Theo van Kraay
Principal Program Manager

Welcome to the latest roundup of key updates across the Azure Cosmos DB Java ecosystem!

The largest external customers of Azure Cosmos DB API for NoSQL, running some of the biggest and most mission critical workloads in Azure, are primarily Java users! From powerful new AI integrations to improvements in the Java SDK, Spring Data, Spark, and Kafka connectors, the past year has been transformative for developers building cloud-native and AI-powered applications. It’s never been easier or more powerful to build modern Java applications on Azure Cosmos DB!

Stay tuned for more updates in the future. Happy coding!


🤖 AI Integrations

In the past 12 months, Azure Cosmos DB has rolled out native support for AI development in Java, including integrations with Spring AI and LangChain4j, two leading frameworks for building AI applications.

Spring AI

LangChain4j

Azure Cosmos DB is now an ideal choice for building AI applications in Java. With native SDK support for vector indexing, full text search, hybrid search, and seamless integration with AI frameworks, developers can create intelligent apps that are fast, scalable, and easy to manage.


Java SDK Enhancements

Hybrid and Full Text Search (PR #42885)

Azure Cosmos DB now supports native Full Text Search (FTS) and Hybrid Search across structured and vector data. You can filter by semantic meaning and rank results by relevance.

Example queries:

SELECT TOP 50 c.id, c.abstract, c.title
FROM c
WHERE FullTextContainsAll(c.abstract, 'quantum', 'theory')
ORDER BY RANK FullTextScore(c.abstract, ['quantum', 'theory'])
SELECT TOP 50 c.id, c.abstract, c.title
FROM c
ORDER BY RANK RRF(FullTextScore(c.abstract, ['quantum']), VectorDistance(c.Embedding, [%s]))

Full Text Indexing Policy (PR #42278)

Azure Cosmos DB containers now support full text indexing natively through the indexing policy. This makes it easy to declare which paths should be searchable.

"fullTextPolicy": {
  "defaultLanguage": "en-US",
  "fullTextPaths": [
    { "path": "/abstract", "language": "en-US" }
  ]
}

Quantized Vector Indexing Enhancements for Flat, QuantizedFlat, and DiskANN (PR #42333)

Two new tuning knobs are now supported across vector index types – including Flat, quantizedFlat, and DiskANN:

  • quantizationByteSize: controls trade-off between recall and latency.
  • indexingSearchListSize: size of candidate list during index build.

These settings allow deeper control for advanced vector search scenarios.

For more info, see our documentation on Vector index and query vectors in Azure Cosmos DB for Java.

Dynamic Request Options (PR #40061)

This feature allows developers to modify request options at runtime, such as consistency level, diagnostic thresholds, or throughput control settings. It enables dynamic configuration changes without needing to restart the application.

Use-case: Integrate Cosmos DB with your custom configuration service and tune SDK behavior dynamically.

CosmosAsyncClient client = new CosmosClientBuilder()
    .endpoint("https://your-account.documents.azure.com")
    .key("your-key")
    .addOperationPolicy(cosmosOperationDetails -> {
        Properties config = new Properties();
        try (FileInputStream fis = new FileInputStream("app.config")) {
            config.load(fis);
            CosmosRequestOptions options = new CosmosRequestOptions();
            options.setConsistencyLevel(ConsistencyLevel.valueOf(config.getProperty("consistency")));
            cosmosOperationDetails.setRequestOptions(options);
        } catch (IOException e) {
            // Handle exception
        }
    })
    .buildAsyncClient();

Extract Sub-Range Continuation Tokens (PR #42156)

This utility allows customers to extract individual continuation tokens from a combined change feed token. This is helpful for breaking a query into multiple sub-ranges and processing them in parallel.

List<String> tokens = CosmosChangeFeedContinuationTokenUtils.extractContinuationTokens(continuationToken);

Complete Change Feed Queries (PR #42160)

New flag setCompleteAfterAllCurrentChangesRetrieved(true) lets change feed queries automatically finish after all current changes are read.

Ideal for batch workloads or event sourcing pipelines.

Read Consistency Strategy (Beta) (PR #45161)

Historically, Azure Cosmos DB has offered five consistency levels (Strong, Bounded Staleness, Session, Consistent Prefix, and Eventual). These consistency levels governed both write durability (RPO) and read freshness (data staleness), and while the write consistency level could be set per account, read consistency was limited to reductions only (e.g., using Eventual reads from an account configured with Strong consistency).

This model, while simplifying configuration, has sometimes created confusion and rigidity – especially for customers chosingBounded Staleness just to achieve quorum reads. In multi-region or multi-region write scenarios, Bounded Staleness has been shown to be misleading and problematic, notably interfering with newer high-availability features like Per-Partition Automatic Failover (PPAF). To address this, the SDK now introduces a new abstraction in beta preview: ReadConsistencyStrategy, enabling more flexible and accurate control over read behavior independently from write durability.

Key Benefits:

  • Customize read consistency per operation or at the client level.
  • Bypass misleading or limiting configurations like Bounded Staleness.
  • Improve compatibility with features like PPAF.
  • Enable developers to safely use Eventual or Session consistency defaults without sacrificing stronger read guarantees.

New ReadConsistencyStrategy Enum

public enum ReadConsistencyStrategy {
    DEFAULT,           // Honors default consistency level settings
    EVENTUAL,          // Eventually consistent read
    SESSION,           // Session consistency per session token
    LATEST_COMMITTED,  // Latest version committed in preferred region
    GLOBAL_STRONG      // Strong consistency across regions
}
Configure Strategy on Client
CosmosClient client = new CosmosClientBuilder()
    .readConsistencyStrategy(ReadConsistencyStrategy.LATEST_COMMITTED)
    .buildClient();

This overrides any consistency level set at the account or client level unless the strategy is explicitly overridden in request options.

Override Strategy Per Request
CosmosItemRequestOptions options = new CosmosItemRequestOptions()
    .setReadConsistencyStrategy(ReadConsistencyStrategy.GLOBAL_STRONG);

This example demonstrates setting a default strategy on the client and overriding it on a specific query operation.

This API allows granular control – e.g., a session-consistent read in an eventual-consistency environment.

🔁 If you override a request to use SESSION consistency while the client is not configured for it, be sure to enable session token capture explicitly via sessionCapturingOverrideEnabled(true).

Use cases unlocked by this change include:

  • Ensuring quorum reads in multi-region reads without needing bounded staleness.
  • Allowing globally strong reads without globally strong writes.
  • Facilitating hybrid strategies for read-heavy workloads.

This marks a major evolution in Azure Cosmos DB’s consistency model, allowing developers to fine-tune trade-offs between performance, freshness, and availability on a per-operation basis.

⚠️ Note: This is currently supported only when using direct mode. The feature is currently in beta (preview) only.

Per-Partition Automatic Failover (PR #44099)

Improves availability for single-write, multi-region accounts by automatically failing over reads and writes at the partition level when a region is unavailable.

Ideal for mission-critical apps that require high resilience and lower impact during regional outages.

Explore our full blog for more info on this game-changing feature for balancing high availability and consistency: Announcement blog


Spring Data for Azure Cosmos DB

Improved Exception Handling (PR #42902)

The Spring Data Cosmos module now throws more specific exceptions like CosmosBadRequestException and CosmosUnauthorizedException, instead of a generic CosmosAccessException.

This enables cleaner and more precise exception handling logic.

Improved findAllByIds() Performance with readMany() (PR #43759)

If the partition key matches the document ID, Spring Data will now automatically optimize findAllByIds() using the readMany() API for better performance.


Apache Spark Connector

CosmosClientBuilderInterceptor (PR #40714)

Spark developers can now inject logic into the CosmosClient creation process to attach custom monitoring or diagnostics.

spark.conf.set("spark.cosmos.account.clientBuilderInterceptors", "com.example.MyInterceptor")

Support for Non-Public Azure Clouds (PR #45310)

Run Spark workloads in government, China, or private Azure environments by configuring custom Entra ID and ARM endpoints.

spark.conf.set("spark.cosmos.account.azureEnvironment", "Custom")
spark.conf.set("spark.cosmos.account.azureEnvironment.management", "https://mygovcloud.management")
spark.conf.set("spark.cosmos.account.azureEnvironment.aad", "https://mygovcloud.aad")

UDFs for Partition Mapping (PR #43092)

Added GetFeedRangesForContainer and GetOverlappingFeedRange UDFs to make it easier to partition Databricks tables based on Cosmos DB feed ranges.

Improves performance and parallelism for distributed joins.

Continuation Token Size Config (PR #44480)

Limit continuation token size during queries to avoid token size overflows and client errors:

spark.conf.set("spark.cosmos.read.responseContinuationTokenLimitInKb", "16")

🎙️ Kafka Connector

Version 2 Now GA!

Read more: Kafka Connector v2 is GA

The GA release of Cosmos DB Kafka Connector V2 brings production-grade support with:

  • At-least-once and exactly-once delivery
  • Topic-based offset tracking
  • Robust failure recovery

V2 is now the recommended connector for production workloads.


Fixes, patches, and enhancements

In addition to all of the above features, we have of course made a large number of smaller bug fixes, security patches, enhancements, and improvements. You can track all the changes for each client library, along with the minimum version we recommend you use, by viewing the change logs:

Get Started with Java in Azure Cosmos DB

About Azure Cosmos DB

Azure Cosmos DB is a fully managed and serverless distributed database for modern app development, with SLA-backed speed and availability, automatic and instant scalability, and support for open-source PostgreSQL, MongoDB, and Apache Cassandra. To stay in the loop on Azure Cosmos DB updates, follow us on XYouTube, and LinkedIn.

To easily build your first database, watch our Get Started videos on YouTube and explore ways to dev/test free.

Author

Theo van Kraay
Principal Program Manager

Principal Program Manager on the Azure Cosmos DB engineering team. Currently focused on AI, programmability, and developer experience for Azure Cosmos DB.

0 comments