{"id":1719,"date":"2021-11-30T07:27:16","date_gmt":"2021-11-30T15:27:16","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/azure-sdk\/?p=1719"},"modified":"2021-11-30T07:40:09","modified_gmt":"2021-11-30T15:40:09","slug":"tuning-your-uploads-and-downloads-with-the-azure-storage-client-library-for-net","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/azure-sdk\/tuning-your-uploads-and-downloads-with-the-azure-storage-client-library-for-net\/","title":{"rendered":"Tuning your uploads and downloads with the Azure Storage client library for .NET"},"content":{"rendered":"<p>When transferring data with the Azure Storage client libraries, a lot is happening behind-the-scenes. These workings can affect speed, memory usage, and sometimes whether the transfer succeeds. This post will help you get the most out of Storage client library data transfers.<\/p>\n<p>These concepts apply to the <a href=\"https:\/\/www.nuget.org\/packages\/Azure.Storage.Blobs\">Azure.Storage.Blobs<\/a> and <a href=\"https:\/\/www.nuget.org\/packages\/Azure.Storage.Files.DataLake\">Azure.Storage.Files.DataLake<\/a> packages. Specifically, we&#8217;re looking at APIs that accept <a href=\"https:\/\/docs.microsoft.com\/dotnet\/api\/azure.storage.storagetransferoptions\">StorageTransferOptions<\/a> as a parameter. Commonly used examples are:<\/p>\n<ul>\n<li><code>BlobClient.UploadAsync(Stream stream, ...)<\/code><\/li>\n<li><code>BlobClient.UploadAsync(string path, ...)<\/code><\/li>\n<li><code>BlobClient.DownloadToAsync(Stream stream, ...)<\/code><\/li>\n<li><code>BlobClient.DownloadToAsync(string path, ...)<\/code><\/li>\n<li><code>DataLakeFileClient.UploadAsync(Stream stream, ...)<\/code><\/li>\n<li><code>DataLakeFileClient.UploadAsync(string path, ...)<\/code><\/li>\n<li><code>DataLakeFileClient.ReadToAsync(Stream stream, ...)<\/code><\/li>\n<li><code>DataLakeFileClient.ReadToAsync(string path, ...)<\/code><\/li>\n<\/ul>\n<h2><code>StorageTransferOptions<\/code><\/h2>\n<p><code>StorageTransferOptions<\/code> is the key class for tuning your performance. Storage transfers are partitioned into several subtransfers based on the values in this class. Here, you define values for the following properties, which are the basis for managing your transfer:<\/p>\n<ul>\n<li><code>MaximumConcurrency<\/code>: the maximum number of parallel subtransfers that can take place at once.\n<ul>\n<li>From launch until the present (<code>Azure.Storage.Blobs<\/code> 12.10.0 and <code>Azure.Storage.Files.DataLake<\/code> 12.8.0), only asynchronous operations can parallelize transfers. Synchronous operations will ignore this value and work in sequence.<\/li>\n<li>The effectiveness of this value is subject to the restrictions set by .NET&#8217;s connection pool limit, which may hinder you by default. For more information about these restrictions, see this <a href=\"https:\/\/devblogs.microsoft.com\/azure-sdk\/net-framework-connection-pool-limits\/\">blog post<\/a>.<\/li>\n<\/ul>\n<\/li>\n<li><code>MaximumTransferSize<\/code>: the maximum data size of a subtransfer, in bytes.\n<ul>\n<li>To keep data moving, the client libraries may not always reach this value for every subtransfer for several reasons.<\/li>\n<li>Different REST APIs have different maximum values they support for transfer, and those values have changed across service versions. Check your documentation to determine the limits you can select for this value.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>You can also define a value for <code>InitialTransferSize<\/code>. Unlike the name suggests, your <code>MaximumTransferSize<\/code> does <strong>not<\/strong> limit this value. In fact, you often want <code>InitialTransferSize<\/code> to be <em>at least<\/em> as large as your <code>MaximumTransferSize<\/code>, if not larger. <code>InitialTransferSize<\/code> defines a separate data size limitation for an initial attempt to do the entire operation at once with no subtransfers. Using a single transfer cuts down on overhead, leading to faster transfers for some data lengths based on your <code>MaximumTransferSize<\/code>. If unsure of what&#8217;s best for you, setting this property to the same value used for <code>MaximumTransferSize<\/code> is a safe option.<\/p>\n<p>While the class contains nullable values, the client libraries will use defaults for each individual value when not provided. These defaults are fine in a data center environment, but likely unsuitable for home consumer environments. Poorly tuned <code>StorageTransferOptions<\/code> can result in excessively long operations and even timeouts. You should always be proactive in determining your values for this class.<\/p>\n<h2>Uploads<\/h2>\n<p>The Storage client libraries will split a given upload stream into various subuploads based on provided <code>StorageTransferOptions<\/code>, each with their own dedicated REST call. With <code>BlobClient<\/code>, this operation will be <a href=\"https:\/\/docs.microsoft.com\/rest\/api\/storageservices\/put-block\">Put Block<\/a> and with <code>DataLakeFileClient<\/code>, this operation will be <a href=\"https:\/\/docs.microsoft.com\/rest\/api\/storageservices\/datalakestoragegen2\/path\/update\">Append Data<\/a>. The Storage client libraries manage these REST operations in parallel (depending on transfer options) to complete the total upload.<\/p>\n<p><em>Note: block blobs have a maximum block count of 50,000. Your blob, then, has a maximum size of 50,000 times <code>MaximumTransferSize<\/code>.<\/em><\/p>\n<h3>Buffering on uploads<\/h3>\n<p>The Storage REST layer doesn&#8217;t support picking up a REST upload where you left off. Individual transfers are either completed or lost. To ensure resiliency, if a stream isn&#8217;t seekable, the Storage client libraries will buffer the data for each individual REST call before starting the upload. Outside of network speed, this behavior is also why you may be interested in setting a smaller value for <code>MaximumTransferSize<\/code> even when uploading in sequence. <code>MaximumTransferSize<\/code> is the maximum division of data to be retried after a connection failure.<\/p>\n<p>If uploading with parallel REST calls to maximize network throughput, the client libraries need sources they can read from in parallel. Since streams are sequential, when uploading in parallel, the Storage client libraries will buffer the data for each individual REST call before starting the upload <strong>even if the provided stream is already seekable<\/strong>.<\/p>\n<p>To avoid the Storage client libraries buffering your data for upload, you must provide a seekable stream and ensure <code>MaximumConcurrency<\/code> is set to 1. While this strategy should suffice in most situations, your code could be using other features of the client libraries that require buffering anyway. In this case, buffering will still occur.<\/p>\n<h3><code>InitialTransferSize<\/code> on upload<\/h3>\n<p>When a seekable stream is provided, its length is checked against this value. If the stream length is within this value, the entire stream will be uploaded as a single REST call. Otherwise, upload will be done in parts as described previously in this document.<\/p>\n<p><em>Note: when using <code>BlobClient<\/code>, an upload within the <code>InitialTransferSize<\/code> will be performed using <a href=\"https:\/\/docs.microsoft.com\/rest\/api\/storageservices\/put-blob\">Put Blob<\/a>, rather than Put Block.<\/em><\/p>\n<p><code>InitialTransferSize<\/code> has no effect on an unseekable stream and will be ignored.<\/p>\n<h2>Downloads<\/h2>\n<p>The Storage client libraries will split a given download request into various subdownloads based on provided <code>StorageTransferOptions<\/code>, each with their own dedicated REST call. The client libraries manage these REST operations in parallel (depending on transfer options) to complete the total download.<\/p>\n<h3>Buffering on downloads<\/h3>\n<p>Receiving multiple HTTP responses simultaneously with body contents will have memory implications. However, the Storage client libraries don&#8217;t explicitly add a buffer step for downloaded contents. Incoming responses are processed in order. The client libraries configure a 16-kilobyte buffer for copying streams from HTTP response stream to caller-provided destination stream\/file path.<\/p>\n<h3><code>InitialTransferSize<\/code> on download<\/h3>\n<p>The Storage client libraries will make one download range request using <code>InitialTransferSize<\/code> before anything else. Upon downloading that range, total resource size will be known. If the initial request downloaded the whole content, we&#8217;re done! Otherwise, the download steps described previously will begin.<\/p>\n<h2>Summary<\/h2>\n<p><code>StorageTransferOptions<\/code> contains the tools to optimize your transfers. It provides options that affect transfer speeds and memory usage. Unless you&#8217;re working with trivial file sizes, be proactive in configuring these options based on the environment in which your client will run.<\/p>\n<p><!-- FOOTER: DO NOT EDIT OR REMOVE --><\/p>\n<p><div  class=\"d-flex justify-content-center\"><a class=\"cta_button_link btn-primary mb-24\" href=\"https:\/\/aka.ms\/azsdk\/releases\" target=\"_blank\">Azure SDK Releases<\/a><\/div><\/p>\n<h2>Azure SDK Blog Contributions<\/h2>\n<p>Thanks for reading this Azure SDK blog post. We hope you learned something new, and we welcome you to share the post. We&#8217;re open to Azure SDK blog contributions from our readers. To get started, contact us at <a href=\"mailto:azsdkblog@microsoft.com\">azsdkblog@microsoft.com<\/a> with your idea, and we&#8217;ll set you up as a guest blogger.<\/p>\n<ul>\n<li>Azure SDK Website: <a href=\"https:\/\/aka.ms\/azsdk\">aka.ms\/azsdk<\/a><\/li>\n<li>Azure SDK Intro (3-minute video): <a href=\"https:\/\/aka.ms\/azsdk\/intro\">aka.ms\/azsdk\/intro<\/a><\/li>\n<li>Azure SDK Intro Deck (PowerPoint deck): <a href=\"https:\/\/aka.ms\/azsdk\/intro\/deck\">aka.ms\/azsdk\/intro\/deck<\/a><\/li>\n<li>Azure SDK Releases: <a href=\"https:\/\/aka.ms\/azsdk\/releases\">aka.ms\/azsdk\/releases<\/a><\/li>\n<li>Azure SDK Blog: <a href=\"https:\/\/aka.ms\/azsdk\/blog\">aka.ms\/azsdk\/blog<\/a><\/li>\n<li>Azure SDK Twitter: <a href=\"https:\/\/twitter.com\/AzureSDK\">twitter.com\/AzureSDK<\/a><\/li>\n<li>Azure SDK Design Guidelines: <a href=\"https:\/\/aka.ms\/azsdk\/guide\">aka.ms\/azsdk\/guide<\/a><\/li>\n<li>Azure REST API Guidelines: <a href=\"https:\/\/aka.ms\/azapi\/guidelines\">aka.ms\/azapi\/guidelines<\/a><\/li>\n<li>Azure SDKs &amp; Tools: <a href=\"https:\/\/azure.microsoft.com\/downloads\">azure.microsoft.com\/downloads<\/a><\/li>\n<li>Azure SDK Central Repository: <a href=\"https:\/\/github.com\/azure\/azure-sdk#azure-sdk\">github.com\/azure\/azure-sdk<\/a><\/li>\n<li>Azure SDK for .NET: <a href=\"https:\/\/github.com\/azure\/azure-sdk-for-net\">github.com\/azure\/azure-sdk-for-net<\/a><\/li>\n<li>Azure SDK for Java: <a href=\"https:\/\/github.com\/azure\/azure-sdk-for-java\">github.com\/azure\/azure-sdk-for-java<\/a><\/li>\n<li>Azure SDK for Python: <a href=\"https:\/\/github.com\/azure\/azure-sdk-for-python\">github.com\/azure\/azure-sdk-for-python<\/a><\/li>\n<li>Azure SDK for JavaScript\/TypeScript: <a href=\"https:\/\/github.com\/azure\/azure-sdk-for-js\">github.com\/azure\/azure-sdk-for-js<\/a><\/li>\n<li>Azure SDK for Android: <a href=\"https:\/\/github.com\/Azure\/azure-sdk-for-android\">github.com\/Azure\/azure-sdk-for-android<\/a><\/li>\n<li>Azure SDK for iOS: <a href=\"https:\/\/github.com\/Azure\/azure-sdk-for-ios\">github.com\/Azure\/azure-sdk-for-ios<\/a><\/li>\n<li>Azure SDK for Go: <a href=\"https:\/\/github.com\/Azure\/azure-sdk-for-go\">github.com\/Azure\/azure-sdk-for-go<\/a><\/li>\n<li>Azure SDK for C: <a href=\"https:\/\/github.com\/Azure\/azure-sdk-for-c\">github.com\/Azure\/azure-sdk-for-c<\/a><\/li>\n<li>Azure SDK for C++: <a href=\"https:\/\/github.com\/Azure\/azure-sdk-for-cpp\">github.com\/Azure\/azure-sdk-for-cpp<\/a><\/li>\n<\/ul>\n<p><!-- FOOTER: DO NOT EDIT OR REMOVE --><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Learn how to get better performance out of your Azure Storage transfers and avoid timeouts.<\/p>\n","protected":false},"author":42728,"featured_media":1722,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[701,750,706,703,738],"class_list":["post-1719","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-azure-sdk","tag-net","tag-azure-sdk","tag-azuresdk","tag-clientlibraries","tag-storage"],"acf":[],"blog_post_summary":"<p>Learn how to get better performance out of your Azure Storage transfers and avoid timeouts.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/posts\/1719","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/users\/42728"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/comments?post=1719"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/posts\/1719\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/media\/1722"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/media?parent=1719"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/categories?post=1719"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/tags?post=1719"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}