Inspired by the blog posts by Stephen Toub about performance in .NET we are writing a similar post to highlight the performance improvements done to ASP.NET Core in 6.0.
Benchmarking Setup
We will be using BenchmarkDotNet for the majority of the examples throughout. A repo at https://github.com/BrennanConroy/BlogPost60Bench is provided that includes the majority of the benchmarks used in this post.
Most of the benchmark results in this post were generated with the following command line:
dotnet run -c Release -f net48 --runtimes net48 netcoreapp3.1 net5.0 net6.0
Then selecting a specific benchmark to run from the list.
This tells BenchmarkDotNet:
- Build everything in a release configuration.
- Build it targeting the .NET Framework 4.8 surface area.
- Run each benchmark on each of .NET Framework 4.8, .NET Core 3.1, .NET 5, and .NET 6.
For some benchmarks, they were only run on .NET 6 (e.g. if comparing two ways of coding something on the same version):
dotnet run -c Release -f net6.0 --runtimes net6.0
and for others only a subset of the versions were run, e.g.
dotnet run -c Release -f net5.0 --runtimes net5.0 net6.0
I’ll include the command used to run each of the benchmarks as they come up.
Most of the results in the post were generated by running the above benchmarks on Windows, primarily so that .NET Framework 4.8 could be included in the result set. However, unless otherwise called out, in general all of these benchmarks show comparable improvements when run on Linux or on macOS. Simply ensure that you have installed each runtime you want to measure. The benchmarks were run with a nightly build of .NET 6 RC1, along with the latest released downloads of .NET 5 and .NET Core 3.1.
Span<T>
Every release since the addition of Span<T>
in .NET 2.1 we have converted more code to use spans both internally and as part of the public API to improve performance. This release is no exception.
PR dotnet/aspnetcore#28855 removed a temporary string allocation in PathString
coming from string.SubString
when adding two PathString
instances and instead uses a Span<char>
for the temporary string. In the benchmark below we use a short string and a longer string to show the performance difference from avoiding the temporary string.
dotnet run -c Release -f net48 --runtimes net48 net5.0 net6.0 --filter *PathStringBenchmark*
private PathString _first = new PathString("/first/");
private PathString _second = new PathString("/second/");
private PathString _long = new PathString("/longerpathstringtoshowsubstring/");
[Benchmark]
public PathString AddShortString()
{
return _first.Add(_second);
}
[Benchmark]
public PathString AddLongString()
{
return _first.Add(_long);
}
Method | Runtime | Toolchain | Mean | Ratio | Allocated |
---|---|---|---|---|---|
AddShortString | .NET Framework 4.8 | net48 | 23.51 ns | 1.00 | 96 B |
AddShortString | .NET 5.0 | net5.0 | 22.73 ns | 0.97 | 96 B |
AddShortString | .NET 6.0 | net6.0 | 14.92 ns | 0.64 | 56 B |
AddLongString | .NET Framework 4.8 | net48 | 30.89 ns | 1.00 | 201 B |
AddLongString | .NET 5.0 | net5.0 | 25.18 ns | 0.82 | 192 B |
AddLongString | .NET 6.0 | net6.0 | 15.69 ns | 0.51 | 104 B |
dotnet/aspnetcore#34001 introduced a new Span based API for enumerating a query string that is allocation free in a common case of no encoded characters, and lower allocations when the query string contains encoded characters.
dotnet run -c Release -f net6.0 --runtimes net6.0 --filter *QueryEnumerableBenchmark*
#if NET6_0_OR_GREATER
public enum QueryEnum
{
Simple = 1,
Encoded,
}
[ParamsAllValues]
public QueryEnum QueryParam { get; set; }
private string SimpleQueryString = "?key1=value1&key2=value2";
private string QueryStringWithEncoding = "?key1=valu%20&key2=value%20";
[Benchmark(Baseline = true)]
public void QueryHelper()
{
var queryString = QueryParam == QueryEnum.Simple ? SimpleQueryString : QueryStringWithEncoding;
foreach (var queryParam in QueryHelpers.ParseQuery(queryString))
{
_ = queryParam.Key;
_ = queryParam.Value;
}
}
[Benchmark]
public void QueryEnumerable()
{
var queryString = QueryParam == QueryEnum.Simple ? SimpleQueryString : QueryStringWithEncoding;
foreach (var queryParam in new QueryStringEnumerable(queryString))
{
_ = queryParam.DecodeName();
_ = queryParam.DecodeValue();
}
}
#endif
Method | QueryParam | Mean | Ratio | Allocated |
---|---|---|---|---|
QueryHelper | Simple | 243.13 ns | 1.00 | 360 B |
QueryEnumerable | Simple | 91.43 ns | 0.38 | – |
QueryHelper | Encoded | 351.25 ns | 1.00 | 432 B |
QueryEnumerable | Encoded | 197.59 ns | 0.56 | 152 B |
It’s important to note that there is no free lunch. In the new QueryStringEnumerable
API case, if you are planning on enumerating the query string values multiple times it can actually be more expensive than using QueryHelpers.ParseQuery
and storing the dictionary of the parsed query string values.
dotnet/aspnetcore#29448 from @paulomorgado uses the string.Create method that allows initializing a string after it’s created if you know the final size it will be. This was used to remove some temporary string allocations in UriHelper.BuildAbsolute
.
dotnet run -c Release -f netcoreapp3.1 --runtimes netcoreapp3.1 net6.0 --filter *UriHelperBenchmark*
#if NETCOREAPP
[Benchmark]
public void BuildAbsolute()
{
_ = UriHelper.BuildAbsolute("https", new HostString("localhost"));
}
#endif
Method | Runtime | Toolchain | Mean | Ratio | Allocated |
---|---|---|---|---|---|
BuildAbsolute | .NET Core 3.1 | netcoreapp3.1 | 92.87 ns | 1.00 | 176 B |
BuildAbsolute | .NET 6.0 | net6.0 | 52.88 ns | 0.57 | 64 B |
PR dotnet/aspnetcore#31267 converted some parsing logic in ContentDispositionHeaderValue
to use Span<T>
based APIs to avoid temporary strings and a temporary byte[]
in common cases.
dotnet run -c Release -f net48 --runtimes net48 netcoreapp3.1 net5.0 net6.0 --filter *ContentDispositionBenchmark*
[Benchmark]
public void ParseContentDispositionHeader()
{
var contentDisposition = new ContentDispositionHeaderValue("inline");
contentDisposition.FileName = "FileÃName.bat";
}
Method | Runtime | Toolchain | Mean | Ratio | Allocated |
---|---|---|---|---|---|
ContentDispositionHeader | .NET Framework 4.8 | net48 | 654.9 ns | 1.00 | 570 B |
ContentDispositionHeader | .NET Core 3.1 | netcoreapp3.1 | 581.5 ns | 0.89 | 536 B |
ContentDispositionHeader | .NET 5.0 | net5.0 | 519.2 ns | 0.79 | 536 B |
ContentDispositionHeader | .NET 6.0 | net6.0 | 295.4 ns | 0.45 | 312 B |
Idle Connections
One of the major components of ASP.NET Core is hosting a server which brings with it a host of different problems to optimize for. We’ll focus on improvements to idle connections in 6.0 where we made many changes to reduce the amount a memory used when a connection is waiting for data.
There were three distinct types of changes we made, one was to reduce the size of the objects used by connections, this includes System.IO.Pipelines, SocketConnections, and SocketSenders. The second type of change was to pool commonly accessed objects so we can reuse old instances and save on allocations. The third type of change was to take advantage of something called “zero byte reads”. This is where we try to read from the connection with a zero byte buffer, if there is data available the read will return with no data, but we will know there is now data available and can provide a buffer to read that data immediately. This avoids allocating a buffer up front for a read that may complete at a future time, so we can avoid a large allocation until we know data is available.
dotnet/runtime#49270 reduced the size of System.IO.Pipelines from ~560 bytes to ~368 bytes which is a 34% size reduction, there are at least 2 pipes per connection so this was a great win. dotnet/aspnetcore#31308 refactored the Socket layer of Kestrel to avoid a few async state machines and reduce the size of remaining state machines to get a ~33% allocation savings for each connection.
dotnet/aspnetcore#30769 removed a per connection PipeOptions
allocation and moved the allocation to the connection factory so we only allocate one for the entire lifetime of the server and reuse the same options for every connection. dotnet/aspnetcore#31311 from @benaadams replaced well known header values in WebSocket requests with interned strings which allowed the strings allocated during header parsing to be garbage collected, reducing the memory usage of the long lived WebSocket connections. dotnet/aspnetcore#30771 refactored the Sockets layer in Kestrel to first avoid allocating a SocketReceiver object + a SocketAwaitableEventArgs and combine it into a single object, that saved a few bytes and resulted in less unique objects allocated per connection. That PR also pooled the SocketSender class so instead of creating one per connection you now on average have number of cores SocketSender. So in the below benchmark when we have 10,000 connections there are only 16 allocated on my machine instead of 10,000 which is a savings of ~46 MB!
Another similar sized change is dotnet/runtime#49123 which adds support for zero-byte reads in SslStream
so our 10,000 idle connections go from ~46 MB to ~2.3 MB from SslStream
allocations. dotnet/runtime#49117 added support for zero-byte reads on StreamPipeReader
which was then used by Kestrel in dotnet/aspnetcore#30863 to start using the zero-byte reads in SslStream
.
The culmination of all these changes is a massive reduction in memory usage for idle connections.
The following numbers are not from a BenchmarkDotNet app as it’s measuring idle connections and it was easier to setup with a client and server application.
Console and WebApplication code are pasted in the following gist: https://gist.github.com/BrennanConroy/02e8459d63305b4acaa0a021686f54c7
Below is the amount of memory 10,000 idle secure WebSocket connections (WSS) take on the server on different frameworks.
Framework | Memory |
---|---|
net48 | 665.4 MB |
net5.0 | 603.1 MB |
net6.0 | 160.8 MB |
That’s an almost 4x memory reduction from net5.0 to net6.0!
Entity Framework Core
EF Core made some massive improvements in 6.0, it is 31% faster at executing queries and the TechEmpower Fortunes benchmark improved by 70% with Runtime updates, optimized benchmarks and the EF improvements.
These improvements came from improving object pooling, intelligently checking if telemetry is enabled, and adding an option to opt out of thread safety checks when you know your app uses DbContext safely.
See the Announcing Entity Framework Core 6.0 Preview 4: Performance Edition blog post which highlights many of the improvements in detail.
Blazor
Native byte[]
Interop
Blazor now has efficient support for byte arrays when performing JavaScript interop. Previously, byte arrays sent to and from JavaScript were Base64 encoded so they could be serialized as JSON, which increased the transfer size and the CPU load. The Base64 encoding has now been optimized away in .NET 6 allowing users to transparently work with byte[]
in .NET and Uint8Array
in JavaScript. Documentation on using this feature for JavaScript to .NET and .NET to JavaScript.
Let’s take a look at a quick benchmark to see the difference between byte[]
interop in .NET 5 and .NET 6. The following Razor code creates a 22 kB byte[]
, and sends it to a JavaScript receiveAndReturnBytes
function, which immediately returns the byte[]
. This roundtrip of data is repeated 10,000 times and the time data is printed to the screen. This code is the same for .NET 5 and .NET 6.
<button @onclick="@RoundtripData">Roundtrip Data</button>
<hr />
@Message
@code {
public string Message { get; set; } = "Press button to benchmark";
private async Task RoundtripData()
{
var bytes = new byte[1024*22];
List<double> timeForInterop = new List<double>();
var testTime = DateTime.Now;
for (var i = 0; i < 10_000; i++)
{
var interopTime = DateTime.Now;
var result = await JSRuntime.InvokeAsync<byte[]>("receiveAndReturnBytes", bytes);
timeForInterop.Add(DateTime.Now.Subtract(interopTime).TotalMilliseconds);
}
Message = $"Round-tripped: {bytes.Length / 1024d} kB 10,000 times and it took on average {timeForInterop.Average():F3}ms, and in total {DateTime.Now.Subtract(testTime).TotalMilliseconds:F1}ms";
}
}
Next we take a look at the receiveAndReturnBytes
JavaScript function. In .NET 5. We must first decode the Base64 encoded byte array into a Uint8Array
so it may be used in application code. Then we must re-encode it into Base64 before returning the data to the server.
function receiveAndReturnBytes(bytesReceivedBase64Encoded) {
const bytesReceived = base64ToArrayBuffer(bytesReceivedBase64Encoded);
// Use Uint8Array data in application
const bytesToSendBase64Encoded = base64EncodeByteArray(bytesReceived);
if (bytesReceivedBase64Encoded != bytesToSendBase64Encoded) {
throw new Error("Expected input/output to match.")
}
return bytesToSendBase64Encoded;
}
// https://stackoverflow.com/a/21797381
function base64ToArrayBuffer(base64) {
const binaryString = atob(base64);
const length = binaryString.length;
const result = new Uint8Array(length);
for (let i = 0; i < length; i++) {
result[i] = binaryString.charCodeAt(i);
}
return result;
}
function base64EncodeByteArray(data) {
const charBytes = new Array(data.length);
for (var i = 0; i < data.length; i++) {
charBytes[i] = String.fromCharCode(data[i]);
}
const dataBase64Encoded = btoa(charBytes.join(''));
return dataBase64Encoded;
}
The encoded/decoding adds significant overhead both on the client and server, along with requiring extensive boiler plate code as well. So how would this be done in .NET 6? Well, it’s quite a bit simpler:
function receiveAndReturnBytes(bytesReceived) {
// bytesReceived comes as a Uint8Array ready for use
// and can be used by the application or immediately returned.
return bytesReceived;
}
So it’s definitely easier to write, but how does it perform? Running these snippets in a blazorserver
template in .NET 5 and .NET 6 respectively, under Release
configuration, we see .NET 6 offers a 78% performance improvement in byte[]
interop!
—————– | .NET 6 (ms) | .NET 5 (ms) | Improvement |
---|---|---|---|
Total Time | 5273 | 24463 | 78% |
Additionally, this byte array interop support is leveraged within the framework to enable bidirectional streaming interop between JavaScript and .NET. Users are now able to transport arbitrary binary data. Documentation on streaming from .NET to JavaScript is available here, and the JavaScript to .NET documentation is here.
Input File
Using the Blazor Streaming Interop​ mentioned above, we now support uploading large files via the InputFile​ component (previously uploads were limited to ~2GB). This component also features significant speed improvements on account of native byte[] streaming as opposed to going through Base64 encoding. For instance, a 100 MB file is uploaded 77% quicker in comparison to .NET 5.
.NET 6 (ms) | .NET 5 (ms) | Percentage |
---|---|---|
2591 | 10504 | 75% |
2607 | 11764 | 78% |
2632 | 11821 | 78% |
Average: | 77% |
Note the streaming interop support also enables efficient downloads of (large) files, for more details, please see the documentation.
The InputFile
component was upgraded to utilize streaming via dotnet/aspnetcore#33900.
Hodgepodge
dotnet/aspnetcore#30320 from @benaadams modernized our Typescript libraries and optimized them so websites load faster. The signalr.min.js file went from 36.8 kB compressed and 132 kB uncompressed, to 16.1 kB compressed and 42.2 kB uncompressed. And the blazor.server.js file 86.7 kB compressed and 276 kB uncompressed, to 43.9 kB compressed and 130 kB uncompressed.
dotnet/aspnetcore#31322 from @benaadams removes some unnecessary casts when getting common features from the connections feature collection. This gives a ~50% improvement when accessing common features from the collection. Seeing the performance improvement in a benchmark isn’t really possible unfortunately because it requires a bunch of internal types so I’ll include the numbers from the PR here, and if you’re interested in running them, the PR includes benchmarks that can run against the internal code.
Method | Mean | Op/s | Diff |
---|---|---|---|
Get<IHttpRequestFeature>* | 8.507 ns | 117,554,189.6 | +50.0% |
Get<IHttpResponseFeature>* | 9.034 ns | 110,689,963.7 | – |
Get<IHttpResponseBodyFeature>* | 9.466 ns | 105,636,431.7 | +58.7% |
Get<IRouteValuesFeature>* | 10.007 ns | 99,927,927.4 | +50.0% |
Get<IEndpointFeature>* | 10.564 ns | 94,656,794.2 | +44.7% |
dotnet/aspnetcore#31519 also from @benaadams adds default interface methods to the IHeaderDictionary
type for accessing common headers via properties named after the header name. No more mistyping common headers when accessing the header dictionary! More interestingly for this blog post, this change allows server implementations to return a custom header dictionary that implements these new interface methods more optimally. For example, instead of querying an internal dictionary for a header value which requires hashing the key and looking up an entry, the server might have the header value stored directly in a field and can return the field directly. This change resulted in up to 480% improvements in some cases when getting or setting header values. Once again, to properly benchmark this change to show the improvements it requires using internal types for the setup so I will be including the numbers from the PR, and for those interested in trying it out the PR contains benchmarks that run on the internal code.
Method | Branch | Type | Mean | Op/s | Delta |
---|---|---|---|---|---|
GetHeaders | before | Plaintext | 25.793 ns | 38,770,569.6 | – |
GetHeaders | after | Plaintext | 12.775 ns | 78,279,480.0 | +101.9% |
GetHeaders | before | Common | 121.355 ns | 8,240,299.3 | – |
GetHeaders | after | Common | 37.598 ns | 26,597,474.6 | +222.8% |
GetHeaders | before | Unknown | 366.456 ns | 2,728,840.7 | – |
GetHeaders | after | Unknown | 223.472 ns | 4,474,824.0 | +64.0% |
SetHeaders | before | Plaintext | 49.324 ns | 20,273,931.8 | – |
SetHeaders | after | Plaintext | 34.996 ns | 28,574,778.8 | +40.9% |
SetHeaders | before | Common | 635.060 ns | 1,574,654.3 | – |
SetHeaders | after | Common | 108.041 ns | 9,255,723.7 | +487.7% |
SetHeaders | before | Unknown | 1,439.945 ns | 694,470.8 | – |
SetHeaders | after | Unknown | 517.067 ns | 1,933,985.7 | +178.4% |
dotnet/aspnetcore#31466 used the new CancellationTokenSource.TryReset() method introduced in .NET 6 to reuse CancellationTokenSource’s if a connection closed without being canceled. The below numbers were collected by running bombardier against Kestrel with 125 connections and it ran for ~100,000 requests.
Branch | Type | Allocations | Bytes |
---|---|---|---|
Before | CancellationTokenSource | 98,314 | 4,719,072 |
After | CancellationTokenSource | 125 | 6,000 |
dotnet/aspnetcore#31528 and dotnet/aspnetcore#34075 made similar changes for reusing CancellationTokenSource
‘s for HTTPS handshakes and HTTP3 streams respectively.
dotnet/aspnetcore#31660 improved the perf of server to client streaming in SignalR by reusing the allocated StreamItem
object for the whole stream instead of allocating one per stream item. And dotnet/aspnetcore#31661 stores the HubCallerClients
object on the SignalR connection instead of allocating it per Hub method call.
dotnet/aspnetcore#31506 from @ShreyasJejurkar refactored the internals of the WebSocket handshake to avoid a temporary List<T>
allocation. dotnet/aspnetcore#32829 from @gfoidl refactored QueryCollection
to reduce allocations and vectorize some of the code. dotnet/aspnetcore#32234 from @benaadams removed an unused field in HttpRequestHeaders
enumeration which improves the perf by no longer assigning to the field for every header enumerated.
dotnet/aspnetcore#31333 from martincostello converted Http.Sys to use LoggerMessage.Define
which is the high performance logging API. This avoids unnecessary boxing of value types, parsing of the logging format string, and in some cases avoids allocations of strings or objects when the log level isn’t enabled.
dotnet/aspnetcore#31784 adds a new IApplicationBuilder.Use
overload for registering middleware that avoids some unnecessary per-request allocations when running the middleware.
Old code looks like:
app.Use(async (context, next) =>
{
await next();
});
New code looks like:
app.Use(async (context, next) =>
{
await next(context);
});
The below benchmark simulates the middleware pipeline without setting up a server to showcase the improvement. An int
is used instead of HttpContext
for a request and the middleware returns a completed task.
dotnet run -c Release -f net6.0 --runtimes net6.0 --filter *UseMiddlewareBenchmark*
static private Func<Func<int, Task>, Func<int, Task>> UseOld(Func<int, Func<Task>, Task> middleware)
{
return next =>
{
return context =>
{
Func<Task> simpleNext = () => next(context);
return middleware(context, simpleNext);
};
};
}
static private Func<Func<int, Task>, Func<int, Task>> UseNew(Func<int, Func<int, Task>, Task> middleware)
{
return next => context => middleware(context, next);
}
Func<int, Task> Middleware = UseOld((c, n) => n())(i => Task.CompletedTask);
Func<int, Task> NewMiddleware = UseNew((c, n) => n(c))(i => Task.CompletedTask);
[Benchmark(Baseline = true)]
public Task Use()
{
return Middleware(10);
}
[Benchmark]
public Task UseNew()
{
return NewMiddleware(10);
}
Method | Mean | Ratio | Allocated |
---|---|---|---|
Use | 15.832 ns | 1.00 | 96 B |
UseNew | 2.592 ns | 0.16 | – |
Summary
I hope you enjoyed reading about some of the improvements made in ASP.NET Core 6.0! And I encourage you to take a look at the performance improvements in .NET 6 blog post that goes over performance in the Runtime.
Yep. Really fast.
First chance exception 0XC0000005.dmp
Entry point coreclr!::
Source
coreclr!WKS::gc_heap::background_sweep+778 ….
Please file an issue at https://github.com/dotnet/runtime/issues/new?assignees=&labels=&template=01_bug_report.yml so the team can make the product better.
Seems fixed in version 6.0.2
Thank you Brennan for this detailed report. It is nice to see how the performance is permanently improved. Thanks also for mentioning the names of the contributors. This is very important for the community. It is really a pleasure to work with ASP.NET Core every day: thanks to the whole team for that.
My website menphis.com.br suffered from .NET 5 but .NET 6 makes it shine.
Cool! I’m all for this! Very great write-up, the format is good too (kudos to both you Brennan, and Stephen for being the inspirator :)).
One small thing regarding the format I’d like to see: the tables showcase some impressive results, but it’s a little hard to parse in this plain format. I’d rather see an additional diagram or some kind of summary graph that shows the results in a visually more easily digestible way.
Thanks!