{"id":3306,"date":"2021-08-13T18:02:51","date_gmt":"2021-08-14T01:02:51","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/cosmosdb\/?p=3306"},"modified":"2021-08-18T09:31:30","modified_gmt":"2021-08-18T16:31:30","slug":"now-generally-available-azure-cosmos-db-kafka-source-sink-connectors","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/cosmosdb\/now-generally-available-azure-cosmos-db-kafka-source-sink-connectors\/","title":{"rendered":"Now Generally Available \u2013 Azure Cosmos DB Kafka Source &#038; Sink Connectors"},"content":{"rendered":"<p>Source and sink Apache Kafka connectors for Azure Cosmos DB are now generally available, in collaboration with Confluent. The connectors can also be used in self-managed clusters if preferred, with no code changes needed in the application tier and controlled fully through configurations.<\/p>\n<h2><\/h2>\n<p>&nbsp;<\/p>\n<h2>Kafka Connector Use Cases<\/h2>\n<h3>Microservice Architectures<\/h3>\n<p style=\"text-align: center;\"><a href=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/08\/microservice-architecture.png\"><img decoding=\"async\" class=\"alignnone wp-image-3309\" src=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/08\/microservice-architecture-300x243.png\" alt=\"Image microservice architecture\" width=\"533\" height=\"432\" srcset=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/08\/microservice-architecture-300x243.png 300w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/08\/microservice-architecture.png 574w\" sizes=\"(max-width: 533px) 100vw, 533px\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><strong><span style=\"font-size: 10pt;\"><em>Figure 1: Microservices communicating with one another in a database agnostic manner by leveraging Kafka<\/em><\/span><\/strong><\/p>\n<p>Microservice architectures involve multiple services operating independently to serve a business function. These microservices may be built on different data stores, with a variety of formats and schemas. Some may be backed by relational data stores, some on horizontally scalable NoSQL services like Azure Cosmos DB and others on analytical and warehousing stores.<\/p>\n<p>However, these microservices will need to communicate with one another and share data for serving additional business functions. Furthermore, data from these microservices may be needed for serving downstream functions such as analytics, aggregations and archiving among others.<\/p>\n<p>In the absence of a common platform like Kafka, communication pipelines between services cannot be data and format agnostic, with the need for deeply complex and intertwined dependencies. With Kafka\u2019s rich ecosystem of source and sink connectors for a plethora of database services, microservices can communicate with each another and other downstream services seamlessly, while continuing to operate independently on a database that best fits their immediate function.<\/p>\n<p>&nbsp;<\/p>\n<h3>Data Migration<\/h3>\n<p style=\"text-align: center;\"><a href=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/08\/data-migrations.png\"><img decoding=\"async\" class=\"wp-image-3310 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/08\/data-migrations-300x82.png\" alt=\"Image data migrations\" width=\"725\" height=\"198\" srcset=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/08\/data-migrations-300x82.png 300w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/08\/data-migrations-768x209.png 768w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/08\/data-migrations.png 795w\" sizes=\"(max-width: 725px) 100vw, 725px\" \/><\/a><span style=\"font-size: 10pt;\"><strong><em>Figure 2: Data migration architecture leveraging Kafka as a middle-man<\/em><\/strong><\/span><\/p>\n<p>Data modernization efforts frequently involve migrations from to highly scalable and highly available services like Azure Cosmos DB. Tables from relational data stores are moved into Kafka topics either through CDC functionality or source Kafka connectors. Once staged, they can be converted into JSON documents through varying degrees of transformations before being ingested into Azure Cosmos DB at scale.<\/p>\n<p>Smaller migrations may involve a simple movement of data with minimal transformations, while the more common large scale migrations will involve additional components in the Kafka ecosystem such as <a href=\"https:\/\/docs.confluent.io\/platform\/current\/streams\/index.html#kafka-streams\">Kafka Streams<\/a> and <a href=\"https:\/\/docs.confluent.io\/platform\/current\/ksqldb\/index.html#ksql-home\">ksqlDB<\/a> in conjunction with Kafka Connect.<\/p>\n<p>&nbsp;<\/p>\n<h2>Source &amp; Sink Connector Architecture<\/h2>\n<h3>Source Connector<\/h3>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/08\/source-connector.png\"><img decoding=\"async\" class=\"wp-image-3311 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/08\/source-connector-300x143.png\" alt=\"Image source connector\" width=\"810\" height=\"386\" srcset=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/08\/source-connector-300x143.png 300w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/08\/source-connector-768x367.png 768w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/08\/source-connector.png 795w\" sizes=\"(max-width: 810px) 100vw, 810px\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><span style=\"font-size: 10pt;\"><strong><em>Figure 3: Architecture of the Kafka Source Connector for Azure Cosmos DB<\/em><\/strong><\/span><\/p>\n<p>The source connector for Azure Cosmos DB is built using the <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cosmos-db\/sql-api-sdk-java-v4\">Java v4 client<\/a> by leveraging the inbuilt Change Feed Processor as well as the Apache Kafka Producer library.<\/p>\n<p>Much like a Kafka Consumer that is responsible for retrieving changes from each broker for the topic, handle offsets and periodically checkpoint progress, the Change Feed Processor for Azure Cosmos DB is responsible for picking up changes across the underlying physical partitions for the container, managing lease ownership and publishing continuation tokens (equivalent of offsets in Kafka) to track progress.<\/p>\n<p>With a Kafka Connect cluster configured with the Azure Cosmos DB source connector, the following operations are performed seamlessly under the covers:<\/p>\n<ul>\n<li>A new instance of the Java client is instantiated per Worker in the Kafka Connect cluster<\/li>\n<li>A list of physical partitions for the Azure Cosmos DB container is retrieved by each client instance<\/li>\n<li>Ownership of physical partitions is assigned to each of the client instances across the cluster through a lease container<\/li>\n<li>Each client instance retrieves changes from the physical partitions it owns<\/li>\n<li>Periodic checkpointing is done to track progress made by the Azure Cosmos DB clients<\/li>\n<li>Periodic health checks are made to ensure a new owner is assigned if a worker in the Kafka Cluster becomes unavailable<\/li>\n<li>Data retrieved from the Azure Cosmos DB source container is converted into the format specified in the converter through Single Message Transforms (SMTs) and written to the Kafka topic<\/li>\n<li>Error handling and dead letter queues are also handled within the source connector<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h3>Sink Connector<\/h3>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/08\/sink-connector.png\"><img decoding=\"async\" class=\"wp-image-3312 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/08\/sink-connector-300x135.png\" alt=\"Image sink connector\" width=\"858\" height=\"386\" srcset=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/08\/sink-connector-300x135.png 300w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/08\/sink-connector-768x346.png 768w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/08\/sink-connector.png 796w\" sizes=\"(max-width: 858px) 100vw, 858px\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><span style=\"font-size: 10pt;\"><strong><em>Figure 4: Architecture of the Kafka Sink Connector for Azure Cosmos DB<\/em><\/strong><\/span><\/p>\n<p>The sink connector for Azure Cosmos DB is built using the Apache Kafka Consumer library in conjunction with the Azure Cosmos DB <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cosmos-db\/sql-api-sdk-java-v4\">Java v4 client<\/a>.<\/p>\n<p>With a Kafka Connect cluster configured with the Azure Cosmos DB sink connector, the following operations are performed seamlessly without the need for manual intervention:<\/p>\n<ul>\n<li>Consumers within the Consumer Group retrieve the broker partitions for the source Kafka topic<\/li>\n<li>The Consumer Group assigns the topic&#8217;s partitions to each of the Consumers (Workers) within the cluster<\/li>\n<li>The Consumers poll the broker partitions for incoming messages for the topic<\/li>\n<li>Offsets are periodically committed to capture state<\/li>\n<li>If a Consumer becomes unavailable, other Consumers will resume picking up changes from the last committed offset for the partitions owned by the original Consumer<\/li>\n<li>Messages retrieved from the Kafka cluster are converted to JSON if needed and written to the Azure Cosmos DB container through the Azure Cosmos DB client<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h2>Getting Started<\/h2>\n<ul>\n<li><a href=\"https:\/\/www.confluent.io\/hub\/microsoftcorporation\/kafka-connect-cosmos\">Download<\/a> the source and sink connectors from Confluent<\/li>\n<li>Find <a href=\"https:\/\/github.com\/microsoft\/kafka-connect-cosmosdb\/blob\/dev\/doc\/README_Source.md\">examples on configuring the source connector<\/a> in a self-managed or fully managed Kafka Connect cluster<\/li>\n<li>Find <a href=\"https:\/\/github.com\/microsoft\/kafka-connect-cosmosdb\/blob\/dev\/doc\/README_Sink.md\">examples on configuring the sink connector<\/a> in a self-managed or fully managed Kafka Connect cluster<\/li>\n<li>Find examples in Confluent\u2019s blog <a href=\"https:\/\/www.confluent.io\/blog\/announcing-confluent-cloud-azure-cosmos-db-connector\/?utm_source=linkedin&amp;utm_medium=organicsocial&amp;utm_campaign=tm.devx_ch.bp-announcing-azure-cosmos-db-sink-connector-in-confluent-cloud_content.connecting-to-apache-kafka\">here<\/a> to create a fully managed Connect cluster with the Azure Cosmos DB sink connector on Confluent Cloud<\/li>\n<li>Register for the <a style=\"background-color: #f7f7f9; font-size: 1rem;\" href=\"https:\/\/www.confluent.io\/online-talks\/unlock-data-by-connecting-confluent-cloud-with-azure-cosmos-db\/\">online talk on 08\/30<\/a><span style=\"font-size: 1rem;\"> for a technical deep dive and demo of the Azure Cosmos DB sink connector<\/span><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Explore the architecture of the source and sink Apache Kafka Connectors for Azure Cosmos DB along with their popular areas of usage.<\/p>\n","protected":false},"author":64774,"featured_media":3310,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"image","meta":{"_acf_changed":false,"footnotes":""},"categories":[14],"tags":[],"class_list":["post-3306","post","type-post","status-publish","format-image","has-post-thumbnail","hentry","category-core-sql-api","post_format-post-format-image"],"acf":[],"blog_post_summary":"<p>Explore the architecture of the source and sink Apache Kafka Connectors for Azure Cosmos DB along with their popular areas of usage.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts\/3306","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/users\/64774"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/comments?post=3306"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts\/3306\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/media\/3310"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/media?parent=3306"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/categories?post=3306"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/tags?post=3306"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}