{"id":796,"date":"2020-04-14T09:00:18","date_gmt":"2020-04-14T16:00:18","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/cosmosdb\/?p=796"},"modified":"2020-04-13T14:14:37","modified_gmt":"2020-04-13T21:14:37","slug":"bulk-improvements-net-sdk","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/cosmosdb\/bulk-improvements-net-sdk\/","title":{"rendered":"Bulk support improvements for Azure Cosmos DB .NET SDK"},"content":{"rendered":"<p><a href=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/introducing-bulk-support-in-the-net-sdk\/\" target=\"_blank\" rel=\"noopener noreferrer\">Bulk support<\/a> has been available since version 3.4.0 of the <a href=\"https:\/\/docs.microsoft.com\/azure\/cosmos-db\/sql-api-sdk-dotnet-standard\" target=\"_blank\" rel=\"noopener noreferrer\">Azure Cosmos DB .NET SDK<\/a>. In this post we&#8217;ll go over the improvements released in the recent 3.8.0 SDK and how they affect your bulk operations.<\/p>\n<h3>All aboard<\/h3>\n<p>As we described in our <a href=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/introducing-bulk-support-in-the-net-sdk\/\" target=\"_blank\" rel=\"noopener noreferrer\">previous post<\/a>, the SDK groups concurrent operations based on partition affinity and dispatches them as a single request. This means that if the data is distributed evenly across a wide range of partition key values (with different partition affinity), the SDK can create multiple independent groups of operations and dispatch them in parallel.<\/p>\n<p><img decoding=\"async\" class=\"size-full wp-image-801 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/04\/bulkgroup.png\" alt=\"Bulk groups operations by partition affinity\" width=\"518\" height=\"629\" srcset=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/04\/bulkgroup.png 518w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/04\/bulkgroup-247x300.png 247w\" sizes=\"(max-width: 518px) 100vw, 518px\" \/><\/p>\n<p>In our initial Bulk implementation, we allowed multiple parallel requests as long as they had a different partition affinity, but only a single in-flight request for each partition.<\/p>\n<p>Think of it as a <strong>train station<\/strong>. There will be <strong>one train track<\/strong> per partition, and each operation is a <strong>passenger<\/strong> inside a train. Multiple trains can depart at the same time as long as they are on <strong>different tracks<\/strong>, but on the same track only one train can be moving, the next one will depart when the previous one <strong>comes back<\/strong>.<\/p>\n<h3>What&#8217;s changed?<\/h3>\n<p>In 3.8.0, we added a <a href=\"https:\/\/github.com\/Azure\/azure-cosmos-dotnet-v3\/pull\/1074\" target=\"_blank\" rel=\"noopener noreferrer\">congestion control component<\/a> to bulk. Now, the SDK will send multiple requests in parallel for the same partition. If it detects throttling, it will start to limit the degree of parallelism on that partition. As long as throttling continues it will keep decreasing it until it achieves a balance between available throughput and volume of requests (or the minimum of 1).<\/p>\n<p>Following the train example, trains heading to the same destination (partition) can now <strong>depart as soon as their are full<\/strong>, regardless of whether the previous train came back or not. If a congestion happens on the destination, the manager will reduce the amount of trains that can be on course on the same track at any given point in time until it reaches a balance.<\/p>\n<h3>What are the expected improvements?<\/h3>\n<p>This change means that we are <strong>increasing the data flow<\/strong> from the client and reactively decreasing it if the <a href=\"https:\/\/docs.microsoft.com\/azure\/cosmos-db\/set-throughput\" target=\"_blank\" rel=\"noopener noreferrer\">provisioned throughput<\/a> is not enough. Early performance numbers show an increase in throughput after this change. For example, in a container provisioned with 300,000 RU\/s, we are now able to insert 4.8 million documents in 2 minutes versus 4.3 million before (almost <strong>20% more<\/strong> on the same time).<\/p>\n<p>When measuring time taken, in a container provisioned with 1 million RU\/s, inserting 5 million documents now takes 77 seconds versus 350 seconds before (almost <strong>80% faster<\/strong>).<\/p>\n<p>In scenarios where the provisioned throughput is much lower compared with the data volume (for example, inserting 10,000 documents in a container provisioned with 3,000 RU\/s), we see an increase in throttling but the overall elapsed time remains in the same line as before.<\/p>\n<h3>Next steps<\/h3>\n<ul>\n<li>Get the performance benefits by <a href=\"https:\/\/www.nuget.org\/packages\/Microsoft.Azure.Cosmos\" target=\"_blank\" rel=\"noopener noreferrer\">updating to the latest SDK<\/a>.<\/li>\n<li>If you are still using the Bulk Executor library, see our <a href=\"https:\/\/docs.microsoft.com\/azure\/cosmos-db\/how-to-migrate-from-bulk-executor-library\" target=\"_blank\" rel=\"noopener noreferrer\">new migration guide to V3 SDK<\/a>.<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Learn about the improvements made on Bulk support to increase throughput usage in the latest Azure Cosmos DB .NET SDK.<\/p>\n","protected":false},"author":9477,"featured_media":61,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[14,19],"tags":[],"class_list":["post-796","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-core-sql-api","category-tips-and-tricks"],"acf":[],"blog_post_summary":"<p>Learn about the improvements made on Bulk support to increase throughput usage in the latest Azure Cosmos DB .NET SDK.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts\/796","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/users\/9477"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/comments?post=796"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts\/796\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/media\/61"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/media?parent=796"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/categories?post=796"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/tags?post=796"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}