To build scalable applications it’s important to understand how your downstream dependencies scale and what limitations you can hit.
The majority of Azure services expose functionality over HTTP REST APIs. The Azure SDKs, in turn, wrap the HTTP communication into an easy-to-use set of client and model types.
Every time you call a method on a Client
class, an HTTP request is sent to the service. Sending an HTTP request requires a socket connection to be established between client and the server. Establishing a connection is an expensive operation that could take longer than the processing of the request itself. To combat this, .NET maintains a pool of HTTP connections that can be reused instead of opening a new one for each request.
The post details the specifics of HTTP connection pooling based on the .NET runtime you are using and ways to tune it to make sure connection limits don’t negatively affect your application performance.
NOTE: most of this is not applicable for applications using .NET Core. See .NET Core section for details.
.NET Framework
Connection pooling in the .NET framework is controlled by the ServicePointManager
class and the most important fact to remember is that the pool, by default, is limited to 2 connections to a particular endpoint (host+port pair) in non-web applications, and to unlimited connection per endpoint in ASP.NET applications that have autoConfig
enabled (without autoConfig
the limit is set to 10). After the maximum number of connections is reached, HTTP requests will be queued until one of the existing connections becomes available again.
Imagine writing a console application that uploads files to Azure Blob Storage. To speed up the process you decided to upload using using 20 parallel threads. The default connection pool limit means that even though you have 20 BlockBlobClient.UploadAsync
calls running in parallel only 2 of them would be actually uploading data and the rest would be stuck in the queue.
NOTE: The connection pool is centrally managed on .NET Framework. Every ServiceEndpoint
has one or more connection groups and the limit is applied to connections in a connection group. HttpClient
creates a connection group per-client so every HttpClient
instance gets it’s own limit while instances of HttpWebRequest
reuse the default connection group and all share the same limit (unless ConnectionGroupName is set). All Azure SDK client by default use a shared instance of HttpClient
and as such share the same pool of connections across all of them.
Symptoms of connection pool starvation
There are 3 main symptoms of connection pool starvation:
- Timeouts in the form of
TaskCanceledException
- Latency spikes under load
- Low throughput
Every outgoing HTTP request has a timeout associated with it (typically 100 seconds) and the time waiting for a connection is counted towards the timeout. If no connection becomes available after the 100 seconds elapse the SDK call would fail with a TaskCanceledException
.
NOTE: because most Azure SDKs are set up to retry intermittent connection issues they would try sending the request multiple times before surfacing the failure, so it might take a multiple of default timeout to see the exception raised.
A typical exception caused by connection pool starvation might look like the following:
System.AggregateException : Retry failed after 4 tries. (A task was canceled.) (A task was canceled.) (A task was canceled.) (A task was canceled.)
----> System.Threading.Tasks.TaskCanceledException : A task was canceled.
----> System.Threading.Tasks.TaskCanceledException : A task was canceled.
----> System.Threading.Tasks.TaskCanceledException : A task was canceled.
----> System.Threading.Tasks.TaskCanceledException : A task was canceled.
at Azure.Core.Pipeline.RetryPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline, Boolean async)
at Azure.Core.Pipeline.HttpPipelineSynchronousPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline)
at Azure.Core.Pipeline.HttpPipelineSynchronousPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline)
at Azure.Core.Pipeline.HttpPipelineSynchronousPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline)
at Azure.Core.Tests.PipelineTestBase.ExecuteRequest(HttpMessage message, HttpPipeline pipeline, CancellationToken cancellationToken)
Long-running requests with larger payloads or on slow network connection are more susceptible to timeout exceptions because they typically occupy connections for a longer time.
Another less obvious symptom of a thread pool starvation is latency spikes. Let’s take a web application that typically serves around 10 customers at the same time. Because most of the time the connection requirement is under or just near the limit it’s operating with optimal performance. But the client count raising might causes it to hit the connection pool limit and makes parallel request compete for a limited connection pool resources increasing the response latency.
Low throughput in parallelized workloads might be another symptom. Let’s take the console application we’ve discussed in the previous part. Considering that the local disk and network connection is fast and a single upload doesn’t saturate the entire network connection, adding more parallel uploads should increase network utilization and improve the overall throughput. But if application is limited by the connection pool size this won’t happen.
Avoid undisposed response streams
Another common way to starve the connection pool is by not disposing unbuffered streams returned by some client SDK methods.
Most Azure SDK client methods will buffer and deserialize the response for you. But some methods operate on large blocks of data – that are impractical to fully load in memory – and would return an active network stream allowing data to be read and processed in chunks.
These methods will have the stream as part of the Value
inside the Response<T>
. One common example of such a method is the BlockBlobClient.DownloadAsync
that returns Response<DownloadInfo>
and BlobDownloadInfo
having a Content
property.
Each of these streams represents a network connection borrowed from the pool and they are only returned when disposed or read to the end. By not doing that you are “burning” connections forever decreasing the pool size. This can quickly lead to a situation where there are no more connections to use for sending requests and all of the requests fail with a timeout exception.
Changing the limits
You can use app.config
/web.config
files to change the limit or do it in code. It’s also possible to change the limit on per-endpoint basis.
ServicePoint servicePoint = ServicePointManager.FindServicePoint(new Uri("http://my-example-account.blob.core.windows.net/"));
servicePoint.ConnectionLimit = 40;
ServicePointManager.DefaultConnectionLimit = 20;
<configuration>
<system.net>
<connectionManagement>
<add address="http://my-example-account.blob.core.windows.com" maxconnection="40" />
<add address="*" maxconnection="20" />
</connectionManagement>
</system.net>
</configuration>
We recommend setting the limit to a maximum number of parallel request you expect to send and load testing/monitoring your application to achieve the optimal performance.
NOTE: Default limits are applied when the first request is issued to a particular endpoint. After that changing the global value won’t have any effect on existing connections.
The limit is also be changed by manually providing a customized HttpClient
instance while constructing an Azure SDK client:
using HttpClient client = new HttpClient(
new HttpClientHandler()
{
MaxConnectionsPerServer = 40
});
SecretClientOptions options = new SecretClientOptions
{
Transport = new HttpClientTransport(client)
};
SecretClient client = new SecretClient(new Uri("<endpoint>"), options);
.NET Core
There was a major change around connection pool management in .NET Core. Connection pooling happens at the HttpClient
level and the pool size is not limited by default. This means that HTTP connections would be automatically scaled to satisfy your workload and you shouldn’t be affected by issues described in this post.
Issues with an infinite connection pool size
Setting connection pool size to infinite might sound like a good idea but it has it’s own set of issues. Azure limits the amount of network connections a Virtual Machine or AppService instance can make and exceeding the limit would cause connections to be slowed down or terminated. If your application produces spikes of outbound requests an adjustment using ServicePointManager
on .NET Framework or MaxConnectionsPerServer property on .NET Core/.NET Framework might be required to avoid exceeding the limit.
You can read more about troubleshooting intermittent outbound connection errors in Azure App Service and load balancer limits.
Future improvements Azure.Core
We are making changes in the upcoming Azure.Core library (#15263) to automatically increase the connection pool size for Azure endpoints to 50
in applications where the global setting is kept at the default value of 2
. It will be fixed in the Azure.Core 1.5.1 October release.
Azure SDK Blog Contributions
Thank you for reading this Azure SDK blog post! We hope that you learned something new and welcome you to share this post. We are open to Azure SDK blog contributions. Please contact us at azsdkblog@microsoft.com with your topic and we’ll get you setup as a guest blogger.
Azure SDK Links
- Azure SDK Website: aka.ms/azsdk
- Azure SDK Intro (3 minute video): aka.ms/azsdk/intro
- Azure SDK Intro Deck (PowerPoint deck): aka.ms/azsdk/intro/deck
- Azure SDK Releases: aka.ms/azsdk/releases
- Azure SDK Blog: aka.ms/azsdk/blog
- Azure SDK Twitter: twitter.com/AzureSDK
- Azure SDK Design Guidelines: aka.ms/azsdk/guide
- Azure SDKs & Tools: azure.microsoft.com/downloads
- Azure SDK Central Repository: github.com/azure/azure-sdk
- Azure SDK for .NET: github.com/azure/azure-sdk-for-net
- Azure SDK for Java: github.com/azure/azure-sdk-for-java
- Azure SDK for Python: github.com/azure/azure-sdk-for-python
- Azure SDK for JavaScript/TypeScript: github.com/azure/azure-sdk-for-js
- Azure SDK for Android: github.com/Azure/azure-sdk-for-android
- Azure SDK for iOS: github.com/Azure/azure-sdk-for-ios
- Azure SDK for Go: github.com/Azure/azure-sdk-for-go
- Azure SDK for C: github.com/Azure/azure-sdk-for-c
- Azure SDK for C++: github.com/Azure/azure-sdk-for-cpp
Hi,
Thanks for the detailed blog post. I noticed there’s an incorrect link in the article to autoConfig.
I think it should be https://docs.microsoft.com/dotnet/api/system.web.configuration.processmodelsection.autoconfig (notice the 8 at the end).
Regards,
Stephanvs
Thank you, fixed!