May 21st, 2024

Enhancements in the Kafka Connector for Azure Cosmos DB: A New Era of Scalability and Flexibility

Theo van Kraay
Principal Program Manager

The Azure Fabric team announced Azure Cosmos DB CDC source to an event stream in preview at BUILD 2024. This capability uses a brand new major version of The Azure Cosmos DB Kafka Connector behind the scenes, and we are also excited to introduce version 2.0 in beta preview!

This new version brings significant improvements in both the source and sink connectors, enhancing scalability, performance, and flexibility for developers working with Azure Cosmos DB. Let’s dive into the key advancements and what they mean for your data processing needs.

Enhanced Source Connector: Unleashing Scalability

One of the primary focuses of this update has been the source connector. Previously, the source connector faced limitations, such as requiring at least one Kafka task per container. This constraint has been resolved, allowing multiple containers to be read by a single task, which drastically reduces overhead when needing to read from many containers, and improves scalability of read operations from the change feed.

Image kafka

The new version supports multiple containers per task, making read performance more efficient.

In addition to resolving task limitations, the source connector now leverages the change feed pull model, which is simpler and facilitates more efficiency in the Kafka environment. This model reduces memory footprint and enhances overall performance by minimizing the number of threads required. As a result, users experience faster data processing with lower resource consumption.

Image kafka perf

Benchmark testing shows substantial increase in performance with version 2.0 compared to 1.0.

Improved Sink Connector: Flexibility in Data Handling

The sink connector has also seen significant improvements. In version 1.0, users were limited to the “item override” write strategy. Version 2.0 introduces multiple write strategies, including “item delete,” “item create only,” and “item update if the ETag hasn’t changed.” These new strategies offer greater flexibility, allowing users to tailor their data handling according to specific use cases.

Image kafka strategies

Version 2.0 introduces a variety of write strategies, providing more flexibility for data handling.

 

Additional Features and Enhancements

Version 2.0 also introduces several other important features:

  • Throughput Control: Now included in the connector, allowing users to manage and control the data ingestion rate.
  • Metrics Support: Integrated metrics collection for better monitoring and debugging, leveraging the Azure SDK’s capabilities.
  • Enhanced Security: Support for service principal with client secrets is now included, with plans to expand to managed identities and certificate-based authentication in upcoming updates.

Another critical improvement is the way metadata is handled. The connector now uses Kafka’s native offset tracking mechanism instead of a lease container, reducing the risk of data loss and improving reliability. Additionally, a low-usage metadata topic or container is utilized to handle scenarios involving partition splits and merges, ensuring seamless scalability and consistent data processing.

Image kafka metadata

Improved metadata handling ensures accurate offset tracking and seamless scalability.

Moving Forward

Currently in beta, version 2.0 of the Kafka Connector for Azure Cosmos DB is positioned for a full release soon, when users can start leveraging these new features to enhance their data processing workflows. For more details, check out the Azure Cosmos DB Kafka Connector GitHub repository for V2.

Stay tuned for more updates and enhancements as the team continues to refine and expand the capabilities of the Azure Cosmos DB Kafka Connector, making it an even more powerful tool for your data processing needs!

Get Started with Java in Azure Cosmos DB

About Azure Cosmos DB

Azure Cosmos DB is a fully managed and serverless distributed database for modern app development, with SLA-backed speed and availability, automatic and instant scalability, and support for open-source PostgreSQL, MongoDB and Apache Cassandra. Try Azure Cosmos DB for free here. To stay in the loop on Azure Cosmos DB updates, follow us on X, YouTube, and LinkedIn.

To easily build your first database, watch our Get Started videos on YouTube and explore ways to dev/test free.

Author

Theo van Kraay
Principal Program Manager

Principal Program Manager on the Azure Cosmos DB engineering team. Focused on Apache Cassandra offerings, Java ecosystem, high availability, and customer success.

0 comments

Discussion are closed.