{"id":43050,"date":"2022-11-01T08:51:00","date_gmt":"2022-11-01T15:51:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/dotnet\/?p=43050"},"modified":"2023-04-09T21:24:22","modified_gmt":"2023-04-10T04:24:22","slug":"performance-improvements-in-aspnet-core-7","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/dotnet\/performance-improvements-in-aspnet-core-7\/","title":{"rendered":"Performance improvements in ASP.NET Core 7"},"content":{"rendered":"<p>Performance is a feature of .NET. In every release the .NET team and community contributors spend time making performance improvements, so .NET apps are faster and use less resources.<\/p>\n<p>This blog post highlights some of the performance improvements in ASP.NET Core 7. This is a continuation of last year&#8217;s post on <a href=\"https:\/\/devblogs.microsoft.com\/dotnet\/performance-improvements-in-aspnet-core-6\">Performance improvements in ASP.NET Core 6<\/a>. And, of course, it continues to be inspired by <a href=\"https:\/\/devblogs.microsoft.com\/dotnet\/performance_improvements_in_net_7\">Performance Improvements in .NET 7<\/a>. Many of those improvements either indirectly or directly improve the performance of ASP.NET Core as well.<\/p>\n<h2>Benchmarking Setup<\/h2>\n<p>We will use <a href=\"https:\/\/github.com\/dotnet\/benchmarkdotnet\">BenchmarkDotNet<\/a> for most of the examples in this blog post.<\/p>\n<p>To setup a benchmarking project:<\/p>\n<ol>\n<li>Create a new console app (<code>dotnet new console<\/code>)<\/li>\n<li>Add a Nuget reference to BenchmarkDotnet (<code>dotnet add package BenchmarkDotnet<\/code>) version 0.13.2+<\/li>\n<li>Change Program.cs to <code>var summary = BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run();<\/code><\/li>\n<li>Add the benchmarking code snippet below that you want to run<\/li>\n<li>Run <code>dotnet run -c Release<\/code> and enter the number of the benchmark you want to run when prompted<\/li>\n<\/ol>\n<p>Some of the benchmarks test internal types, and a self-contained benchmark cannot be written. In those cases I&#8217;ll either reference numbers that are gotten by running the benchmarks in the repository, or I&#8217;ll provide a simplified example to showcase what the improvement is doing.<\/p>\n<p>There are also some cases where I will reference our end-to-end benchmarks which are public at https:\/\/aka.ms\/aspnet\/benchmarks. Although we only display the last few months of data so that the page will load in a reasonable amount of time.<\/p>\n<h2>General server<\/h2>\n<p>Ampere machines are ARM based, have many cores, and are being used as servers in cloud environments due to their lower power consumption and parity performance with x64 machines. As part of .NET 7, we identified areas where many core machines weren&#8217;t scaling very well and fixed them to bring massive performance gains. <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/69386\">dotnet\/runtime#69386<\/a> and <a href=\"https:\/\/github.com\/dotnet\/aspnetcore\/pull\/42237\">dotnet\/aspnetcore#42237<\/a> partitioned the global thread pool queue and the memory pool used by socket connections respectively. Partitioning enables cores to operate on their own queues, which helps reduce contention and improve scalability with large core count machines. On our 80-core Ampere machine, the plaintext platform benchmark improved 514%, 2.4m RPS to 14.6m RPS, and the JSON platform improved 311%, 270k RPS to 1.1m RPS!<\/p>\n<p>There are a couple tradeoffs that were made to get this perf increase. First off, strict FIFO ordering of work items to the global thread queue is no longer guaranteed because there are now multiple queues being read from. Second, there is the potential for a small increase in CPU usage when a machine has low load due to work stealing needing to search more queues to find work.<\/p>\n<p>A significant change (windows specific) came from <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/64834\">dotnet\/runtime#64834<\/a>, which switched the Windows IO pool to use a managed implementation. While this change by itself resulted in perf improvements, such as an ~11% increase in RPS for our JSON platform benchmark, it also allowed us to remove a thread pool dispatch in Kestrel in <a href=\"https:\/\/github.com\/dotnet\/aspnetcore\/pull\/43449\">dotnet\/aspnetcore#43449<\/a> that was previously there to get off the IO thread. Removing dispatching on Windows gave another ~18% RPS increase resulting in a total of ~27% RPS increase, going from 800k RPS to 1.1m RPS.<\/p>\n<p><a href=\"https:\/\/github.com\/dotnet\/runtime\/issues\/12892\">Throwing exceptions can be expensive<\/a> and <a href=\"https:\/\/github.com\/dotnet\/aspnetcore\/pull\/38094\">dotnet\/aspnetcore#38094<\/a> identified an area in Kestrel&#8217;s Socket transport where we could avoid throwing an exception at one layer during connection closure. Not throwing exceptions resulted in reduced CPU usage in our connection close benchmarks. 50% to 40% CPU on Linux, 15% to 14% CPU on Windows, and 24% to 18% CPU on 28 core ARM Linux! Another nice side-effect of the change is that the number of exceptions per second, as shown in the graph below, drastically dropped, which is always a nice thing to see.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2022\/10\/net7_exceptionsgraph.png\" alt=\"Graph showing large decrease in Exceptions per second\" \/><\/p>\n<p>We started using <code>PoolingAsyncValueTaskMethodBuilder<\/code> in 6.0 with <a href=\"https:\/\/github.com\/dotnet\/aspnetcore\/pull\/35011\">dotnet\/aspnetcore#35011<\/a>, which updated a lot of the <code>ReadAsync<\/code> methods in Kestrel to reduce the memory used when reading from requests. In 7.0 we&#8217;ve applied the <code>PoolingAsyncValueTaskMethodBuilder<\/code> to a few more methods in <a href=\"https:\/\/github.com\/dotnet\/aspnetcore\/pull\/41345\">dotnet\/aspnetcore#41345<\/a>, <a href=\"https:\/\/github.com\/dotnet\/runtime\/issues\/68467\">dotnet\/runtime#68467<\/a>, and <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/68457\">dotnet\/runtime#68457<\/a>.<\/p>\n<p>WebSockets are an excellent example for showcasing the allocation differences because they are long lived connections that read multiple times from the request.\nA benchmark performed 1000 reads on a single WebSocket connection in the following images.<\/p>\n<p>In 6.0, 1000 reads resulted in 3000 allocated state machines.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2022\/10\/net6_websocket1000requests.png\" alt=\"allocation list showing 3000 state machine allocations\" \/><\/p>\n<p>And digging into them, we can see three separate state machines per read.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2022\/10\/net6_statemachines.png\" alt=\"allocation list showing 3 separate state machines, each allocated 1000 times\" \/><\/p>\n<p>In 7.0, all state machine allocations have been eliminated from WebSocket connection reads.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2022\/10\/net7_websocket1000requests.png\" alt=\"allocation list showing state machine allocations now at 11 total\" \/><\/p>\n<p>Please note that <code>PoolingAsyncValueTaskMethodBuilder<\/code> isn&#8217;t just free performance that should be applied to all async APIs. While it may look nice from an allocation perspective and could improve microbenchmarks, it can perform worse in real-world applications. Thoroughly measure pooling before committing to using it, which is why we have only applied the feature to specific APIs.<\/p>\n<h2>HTTP\/2<\/h2>\n<p>In 6.0, we identified an area with high lock contention in HTTP\/2 processing in Kestrel. HTTP\/2 has the concept of multiple streams over a single connection. When a stream writes to the connection, it needs to take a lock which can block other concurrent streams. We experimented with a few different approaches to improve concurrency. We found a potential improvement by queuing the writes to a <a href=\"https:\/\/learn.microsoft.com\/dotnet\/core\/extensions\/channels\"><code>Channel<\/code><\/a> and letting a single consumer task process the queue and do all the writing, which removes most of the lock contention. PR <a href=\"https:\/\/github.com\/dotnet\/aspnetcore\/pull\/40925\">dotnet\/aspnetcore#40925<\/a> rewrote the HTTP\/2 output processing to use the <code>Channel<\/code> approach, and the results speak for themselves.\nUsing a gRPC benchmark of 70 streams per connection and 28 connections, we saw 110k RPS with the server CPU sitting around 14%, which is a good indicator that either we weren&#8217;t generating enough load from the client or there was something stopping the server from doing more processing. After the change, RPS went to 4.1m, and the server CPU is now 100%, showing we are generating enough load and the server isn&#8217;t being blocked by the lock contention anymore! This change also improved the single stream multiple connection benchmark from 1.2m to 6.8m RPS. This benchmark wasn&#8217;t suffering from lock contention and was already at 100% CPU before the change, so it was a pleasant surprise when it improved this much by changing our approach for handling HTTP\/2 frames!<\/p>\n<p>It&#8217;s always nice to see significant improvements in graphs, so here is the lock contention from before and after the change:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2022\/10\/net7_grpc70s28cLockContention.png\" alt=\"graph showing sharp decrease in lock contention\" \/><\/p>\n<p>And here is the RPS improvement:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2022\/10\/net7_grpc70s28c.png\" alt=\"graph showing sharp increase in RPS\" \/><\/p>\n<p>Another concept in HTTP\/2 is called flow-control. Flow-control is a protocol honored by both the client and server to specify how much data can be sent to either side before waiting to send more data. On connection start, a window size is specified and used as the max amount of data allowed to be sent over the connection until a <code>WINDOW_UPDATE<\/code> frame is received. This frame specifies how much data has been read and lets the sender know that more data can be sent over the connection. By default, Kestrel used a window size of 96kb and will send a <code>WINDOW_UPDATE<\/code> once about half the window has been read. The window size means a client uploading a large file will send between 48kb and 96kb at a time to the server before receiving a <code>WINDOW_UPDATE<\/code> frame. Using these numbers we can get rough numbers for how long a 108Mb file with 10ms round-trip latency would take. 108mb \/ 48kb = 2,250 segments. 2,250 segments \/ 10ms = 22.5 seconds for the upper bound. 108mb \/ 96kb = 1,125 segments. 1,125 segments \/ 10ms = 11.25 seconds for the lower bound. These numbers aren&#8217;t precise because there will be some overhead in sending and processing the data, but they give us a rough idea of how long it can take. <a href=\"https:\/\/github.com\/dotnet\/aspnetcore\/pull\/43302\">dotnet\/aspnetcore#43302<\/a> increased the default window size used by Kestrel to 768kb and shows that the upload of a 108mb file now takes 4.3 seconds vs 26.9 seconds before. The new upper and lower bound become 2.8 seconds &#8211; 1.4 seconds, again without taking overhead into account.<\/p>\n<p>That raises the question, why not make the window size as big as possible to allow faster uploads? The reason is that there is still a connection level limit on how many bytes can be sent at a time and that limit is there to avoid any single connection from using too much memory on the server.<\/p>\n<h2>HTTP\/3<\/h2>\n<p>ASP.NET Core 6 introduced experimental support for HTTP\/3. In 7.0, HTTP\/3 is no longer experimental but still opt-in. Many changes to make HTTP\/3 non-experimental were around reliability, correctness, and finalizing the API shape. But that didn&#8217;t stop us from making performance improvements. <\/p>\n<p>Let&#8217;s start with this massive 900x performance improvement by <a href=\"https:\/\/github.com\/dotnet\/aspnetcore\/pull\/38826\">dotnet\/aspnetcore#38826<\/a>, which improves the performance of QPack, which HTTP\/3 uses to encode headers. Both the client and server use QPack, and we take advantage of that by sharing the .NET QPack implementation with the server code in ASP.NET Core and the client code (HttpClient) in .NET. So any improvements to QPack benefits both the client and server!<\/p>\n<p>QPack handles header compression to send and receive headers more efficiently. <a href=\"https:\/\/github.com\/dotnet\/aspnetcore\/pull\/38565\">dotnet\/aspnetcore#38565<\/a> taught QPack about a bunch of common headers. <a href=\"https:\/\/github.com\/dotnet\/aspnetcore\/pull\/38681\">dotnet\/aspnetcore#38681<\/a> further improved QPack by compressing some header values as well.<\/p>\n<p>Given the headers:<\/p>\n<pre><code class=\"language-csharp\">headers.ContentLength = 0;\r\nheaders.ContentType = \"application\/json\";\r\nheaders.Age = \"0\";\r\nheaders.AcceptRanges = \"bytes\";\r\nheaders.AccessControlAllowOrigin = \"*\";<\/code><\/pre>\n<p>Originally the output from QPack was 109 bytes: <code>0x00 0x00 0x37 0x05 0x63 0x6F 0x6E 0x74 0x65 0x6E 0x74 0x2D 0x74 0x79 0x70 0x65 ...<\/code><\/p>\n<p>After the two changes above, the QPack output becomes the following 7 bytes: <code>0x00 0x00 0xEE 0xE0 0xE3 0xC2 0xC4<\/code><\/p>\n<p>Looking at the <code>0x63<\/code> byte through <code>0x65<\/code>, these represent the ASCII string <code>content-type<\/code> in hex. In .NET 7, we are compressing these into indexes, so each header in this example becomes a single byte.<\/p>\n<p>Running a benchmark before and after the changes shows a 5x improvement.<\/p>\n<table>\n<thead>\n<tr>\n<th>Before:<\/th>\n<th style=\"text-align: right\">Method<\/th>\n<th style=\"text-align: right\">Mean<\/th>\n<th style=\"text-align: right\">Error<\/th>\n<th style=\"text-align: right\">StdDev<\/th>\n<th>Op\/s<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>DecodeHeaderFieldLine_Static_Multiple<\/td>\n<td style=\"text-align: right\">235.32 ns<\/td>\n<td style=\"text-align: right\">2.981 ns<\/td>\n<td style=\"text-align: right\">2.788 ns<\/td>\n<td style=\"text-align: right\">4,249,586.2<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<table>\n<thead>\n<tr>\n<th>After:<\/th>\n<th style=\"text-align: right\">Method<\/th>\n<th style=\"text-align: right\">Mean<\/th>\n<th style=\"text-align: right\">Error<\/th>\n<th style=\"text-align: right\">StdDev<\/th>\n<th>Op\/s<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>DecodeHeaderFieldLine_Static_Multiple<\/td>\n<td style=\"text-align: right\">45.47 ns<\/td>\n<td style=\"text-align: right\">0.556 ns<\/td>\n<td style=\"text-align: right\">0.520 ns<\/td>\n<td style=\"text-align: right\">21,992,040.2<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2>Other<\/h2>\n<h3>SignalR<\/h3>\n<p><a href=\"https:\/\/github.com\/dotnet\/aspnetcore\/pull\/41465\">dotnet\/aspnetcore#41465<\/a> identified an area in <a href=\"https:\/\/learn.microsoft.com\/aspnet\/core\/signalr\/introduction\">SignalR<\/a> where we were allocating the same strings over and over again. The allocation was removed by caching the strings and comparing them against the raw <code>Span&lt;byte&gt;<\/code>. The change did make the code path zero alloc, but it made microbenchmarks a few nanoseconds slower (which can be fine since we&#8217;re reducing GC pressure in full apps). Still, we were not completely happy with that so <a href=\"https:\/\/github.com\/dotnet\/aspnetcore\/pull\/41644\">dotnet\/aspnetcore#41644<\/a> improved the change. It assumes that case-sensitive comparisons will be the most common (which they should be in this use case) and avoids UTF8 encoding when doing same-case comparisons. The .NET 7 code is now allocation free and faster.<\/p>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th style=\"text-align: right\">Mean<\/th>\n<th style=\"text-align: right\">Error<\/th>\n<th style=\"text-align: right\">StdDev<\/th>\n<th style=\"text-align: right\">Gen0<\/th>\n<th style=\"text-align: right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>StringLookup<\/td>\n<td style=\"text-align: right\">100.19 ns<\/td>\n<td style=\"text-align: right\">1.343 ns<\/td>\n<td style=\"text-align: right\">1.256 ns<\/td>\n<td style=\"text-align: right\">0.0038<\/td>\n<td style=\"text-align: right\">32 B<\/td>\n<\/tr>\n<tr>\n<td>Utf8LookupBefore<\/td>\n<td style=\"text-align: right\">109.24 ns<\/td>\n<td style=\"text-align: right\">2.243 ns<\/td>\n<td style=\"text-align: right\">2.203 ns<\/td>\n<td style=\"text-align: right\">&#8211;<\/td>\n<td style=\"text-align: right\">&#8211;<\/td>\n<\/tr>\n<tr>\n<td>Utf8LookupAfter<\/td>\n<td style=\"text-align: right\">85.20 ns<\/td>\n<td style=\"text-align: right\">0.831 ns<\/td>\n<td style=\"text-align: right\">0.777 ns<\/td>\n<td style=\"text-align: right\">&#8211;<\/td>\n<td style=\"text-align: right\">&#8211;<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h3>Auth<\/h3>\n<p><a href=\"https:\/\/github.com\/dotnet\/aspnetcore\/pull\/43210\">dotnet\/aspnetcore#43210<\/a> from <a href=\"https:\/\/github.com\/Kahbazi\">@Kahbazi<\/a> cached <code>PolicyAuthorizationResult<\/code>&#8216;s because they are immutable and, in common cases, they are created with the same properties. You can see how effective this is with the following simplified benchmark.<\/p>\n<pre><code class=\"language-csharp\">[MemoryDiagnoser]\r\npublic class CachedBenchmark\r\n{\r\n    private static readonly object _cachedObject = new object();\r\n\r\n    [Benchmark]\r\n    public object GetObject()\r\n    {\r\n        return new object();\r\n    }\r\n\r\n    [Benchmark]\r\n    public object GetCachedObject()\r\n    {\r\n        return _cachedObject;\r\n    }\r\n}<\/code><\/pre>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th style=\"text-align: right\">Mean<\/th>\n<th style=\"text-align: right\">Error<\/th>\n<th style=\"text-align: right\">StdDev<\/th>\n<th style=\"text-align: right\">Gen0<\/th>\n<th style=\"text-align: right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>GetObject<\/td>\n<td style=\"text-align: right\">3.5884 ns<\/td>\n<td style=\"text-align: right\">0.0488 ns<\/td>\n<td style=\"text-align: right\">0.0432 ns<\/td>\n<td style=\"text-align: right\">0.0029<\/td>\n<td style=\"text-align: right\">24 B<\/td>\n<\/tr>\n<tr>\n<td>GetCachedObject<\/td>\n<td style=\"text-align: right\">0.7896 ns<\/td>\n<td style=\"text-align: right\">0.0439 ns<\/td>\n<td style=\"text-align: right\">0.0389 ns<\/td>\n<td style=\"text-align: right\">&#8211;<\/td>\n<td style=\"text-align: right\">&#8211;<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><a href=\"https:\/\/github.com\/dotnet\/aspnetcore\/pull\/43268\">dotnet\/aspnetcore#43268<\/a>, also from <a href=\"https:\/\/github.com\/Kahbazi\">@Kahbazi<\/a>, applied the same change to multiple types in Authentication, and additionally added a cache for <code>Task&lt;AuthorizationPolicy&gt;<\/code> when resolving the authorization policy. The server knows about authorization policies at startup time, so it can create all the tasks upfront and save the per-request task allocation.<\/p>\n<pre><code class=\"language-csharp\">[MemoryDiagnoser]\r\npublic class CachedTaskBenchmark\r\n{\r\n    private static readonly object _cachedObject = new object();\r\n    private static readonly Dictionary&lt;string, object&gt; _cachedObjects = new Dictionary&lt;string, object&gt;()\r\n        { { \"policy\", _cachedObject } };\r\n    private static readonly Dictionary&lt;string, Task&lt;object&gt;&gt; _cachedTasks = new Dictionary&lt;string, Task&lt;object&gt;&gt;()\r\n        { { \"policy\", Task.FromResult(_cachedObject) } };\r\n\r\n    [Benchmark(Baseline = true)]\r\n    public Task&lt;object&gt; GetTask()\r\n    {\r\n        return Task.FromResult(_cachedObjects[\"policy\"]);\r\n    }\r\n    [Benchmark]\r\n    public Task&lt;object&gt; GetCachedTask()\r\n    {\r\n        return _cachedTasks[\"policy\"];\r\n    }\r\n}<\/code><\/pre>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th style=\"text-align: right\">Mean<\/th>\n<th style=\"text-align: right\">Error<\/th>\n<th style=\"text-align: right\">StdDev<\/th>\n<th style=\"text-align: right\">Ratio<\/th>\n<th style=\"text-align: right\">Gen0<\/th>\n<th style=\"text-align: right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>GetTask<\/td>\n<td style=\"text-align: right\">22.59 ns<\/td>\n<td style=\"text-align: right\">0.322 ns<\/td>\n<td style=\"text-align: right\">0.285 ns<\/td>\n<td style=\"text-align: right\">1.00<\/td>\n<td style=\"text-align: right\">0.0092<\/td>\n<td style=\"text-align: right\">72 B<\/td>\n<\/tr>\n<tr>\n<td>GetCachedTask<\/td>\n<td style=\"text-align: right\">11.70 ns<\/td>\n<td style=\"text-align: right\">0.065 ns<\/td>\n<td style=\"text-align: right\">0.055 ns<\/td>\n<td style=\"text-align: right\">0.52<\/td>\n<td style=\"text-align: right\">&#8211;<\/td>\n<td style=\"text-align: right\">&#8211;<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><a href=\"https:\/\/github.com\/dotnet\/aspnetcore\/pull\/43124\">dotnet\/aspnetcore#43124<\/a> adds a cache to the Authorization middleware that will avoid recomputing the combined <code>AuthorizationPolicy<\/code> per request for each endpoint. Because endpoints generally stay the same after startup, we don&#8217;t need to grab the authorization metadata off endpoints for every request and combine them into a single policy. We can instead cache the combined policy on first access to the endpoint. Caching can have significant savings if you implement custom <code>IAuthorizationPolicyProvider<\/code>&#8216;s that have expensive operations like database access and only need to run them once for the application&#8217;s lifetime.<\/p>\n<h3>HttpResult<\/h3>\n<p><a href=\"https:\/\/github.com\/dotnet\/aspnetcore\/pull\/40965\">dotnet\/aspnetcore#40965<\/a> is an excellent example of exploring multiple routes to achieve better performance. The goal was to cache <code>HttpResult<\/code> types. Most result types are like <code>UnauthorizedHttpResult<\/code>, which has no arguments and can be cached by creating a static instance once and always returning it. A more interesting result is <code>StatusCodeHttpResult<\/code>, which can be given any integer to represent the status code to return to the caller. The PR explored multiple ways to cache the <code>StatusCodeHttpResult<\/code> object and showed the performance numbers for each approach.<\/p>\n<table>\n<thead>\n<tr>\n<th><strong>Known status codes (e.g. 200):<\/strong><\/th>\n<th style=\"text-align: right\">Method<\/th>\n<th style=\"text-align: right\">Mean<\/th>\n<th style=\"text-align: right\">Error<\/th>\n<th style=\"text-align: right\">StdDev<\/th>\n<th style=\"text-align: right\">Gen 0<\/th>\n<th>Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>NoCache<\/td>\n<td style=\"text-align: right\">2.725 ns<\/td>\n<td style=\"text-align: right\">0.0285 ns<\/td>\n<td style=\"text-align: right\">0.0253 ns<\/td>\n<td style=\"text-align: right\">0.0001<\/td>\n<td style=\"text-align: right\">24 B<\/td>\n<\/tr>\n<tr>\n<td>StaticCacheWithDictionary<\/td>\n<td style=\"text-align: right\">5.733 ns<\/td>\n<td style=\"text-align: right\">0.0373 ns<\/td>\n<td style=\"text-align: right\">0.0331 ns<\/td>\n<td style=\"text-align: right\">&#8211;<\/td>\n<td style=\"text-align: right\">&#8211;<\/td>\n<\/tr>\n<tr>\n<td>DynamicCacheWithFixedSizeArray<\/td>\n<td style=\"text-align: right\">2.184 ns<\/td>\n<td style=\"text-align: right\">0.0227 ns<\/td>\n<td style=\"text-align: right\">0.0212 ns<\/td>\n<td style=\"text-align: right\">&#8211;<\/td>\n<td style=\"text-align: right\">&#8211;<\/td>\n<\/tr>\n<tr>\n<td>DynamicCacheWithFixedSizeArrayPerStatusGroup<\/td>\n<td style=\"text-align: right\">3.371 ns<\/td>\n<td style=\"text-align: right\">0.0151 ns<\/td>\n<td style=\"text-align: right\">0.0134 ns<\/td>\n<td style=\"text-align: right\">&#8211;<\/td>\n<td style=\"text-align: right\">&#8211;<\/td>\n<\/tr>\n<tr>\n<td>DynamicCacheWithConcurrentDictionary<\/td>\n<td style=\"text-align: right\">5.450 ns<\/td>\n<td style=\"text-align: right\">0.1495 ns<\/td>\n<td style=\"text-align: right\">0.1468 ns<\/td>\n<td style=\"text-align: right\">&#8211;<\/td>\n<td style=\"text-align: right\">&#8211;<\/td>\n<\/tr>\n<tr>\n<td>StaticCacheWithSwitchExpression<\/td>\n<td style=\"text-align: right\">1.867 ns<\/td>\n<td style=\"text-align: right\">0.0045 ns<\/td>\n<td style=\"text-align: right\">0.0042 ns<\/td>\n<td style=\"text-align: right\">&#8211;<\/td>\n<td style=\"text-align: right\">&#8211;<\/td>\n<\/tr>\n<tr>\n<td>DynamicCacheWithSwitchExpression<\/td>\n<td style=\"text-align: right\">1.889 ns<\/td>\n<td style=\"text-align: right\">0.0143 ns<\/td>\n<td style=\"text-align: right\">0.0119 ns<\/td>\n<td style=\"text-align: right\">&#8211;<\/td>\n<td style=\"text-align: right\">&#8211;<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<table>\n<thead>\n<tr>\n<th><strong>Unknown status codes (e.g. 150):<\/strong><\/th>\n<th style=\"text-align: right\">Method<\/th>\n<th style=\"text-align: right\">Mean<\/th>\n<th style=\"text-align: right\">Error<\/th>\n<th style=\"text-align: right\">StdDev<\/th>\n<th style=\"text-align: right\">Gen 0<\/th>\n<th>Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>NoCache<\/td>\n<td style=\"text-align: right\">2.477 ns<\/td>\n<td style=\"text-align: right\">0.0818 ns<\/td>\n<td style=\"text-align: right\">0.1005 ns<\/td>\n<td style=\"text-align: right\">0.0001<\/td>\n<td style=\"text-align: right\">24 B<\/td>\n<\/tr>\n<tr>\n<td>StaticCacheWithDictionary<\/td>\n<td style=\"text-align: right\">8.479 ns<\/td>\n<td style=\"text-align: right\">0.0650 ns<\/td>\n<td style=\"text-align: right\">0.0576 ns<\/td>\n<td style=\"text-align: right\">0.0001<\/td>\n<td style=\"text-align: right\">24 B<\/td>\n<\/tr>\n<tr>\n<td>DynamicCacheWithFixedSizeArray<\/td>\n<td style=\"text-align: right\">2.234 ns<\/td>\n<td style=\"text-align: right\">0.0361 ns<\/td>\n<td style=\"text-align: right\">0.0302 ns<\/td>\n<td style=\"text-align: right\">&#8211;<\/td>\n<td style=\"text-align: right\">&#8211;<\/td>\n<\/tr>\n<tr>\n<td>DynamicCacheWithFixedSizeArrayPerStatusGroup<\/td>\n<td style=\"text-align: right\">4.809 ns<\/td>\n<td style=\"text-align: right\">0.0360 ns<\/td>\n<td style=\"text-align: right\">0.0281 ns<\/td>\n<td style=\"text-align: right\">0.0001<\/td>\n<td style=\"text-align: right\">24 B<\/td>\n<\/tr>\n<tr>\n<td>DynamicCacheWithConcurrentDictionary<\/td>\n<td style=\"text-align: right\">6.076 ns<\/td>\n<td style=\"text-align: right\">0.0672 ns<\/td>\n<td style=\"text-align: right\">0.0595 ns<\/td>\n<td style=\"text-align: right\">&#8211;<\/td>\n<td style=\"text-align: right\">&#8211;<\/td>\n<\/tr>\n<tr>\n<td>StaticCacheWithSwitchExpression<\/td>\n<td style=\"text-align: right\">4.195 ns<\/td>\n<td style=\"text-align: right\">0.0823 ns<\/td>\n<td style=\"text-align: right\">0.0770 ns<\/td>\n<td style=\"text-align: right\">0.0001<\/td>\n<td style=\"text-align: right\">24 B<\/td>\n<\/tr>\n<tr>\n<td>DynamicCacheWithSwitchExpression<\/td>\n<td style=\"text-align: right\">4.146 ns<\/td>\n<td style=\"text-align: right\">0.0401 ns<\/td>\n<td style=\"text-align: right\">0.0335 ns<\/td>\n<td style=\"text-align: right\">0.0001<\/td>\n<td style=\"text-align: right\">24 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>We ended up picking &#8220;StaticCacheWithSwitchExpression&#8221;, which uses a <a href=\"https:\/\/learn.microsoft.com\/visualstudio\/modeling\/code-generation-and-t4-text-templates\">T4 template<\/a> to generate cached fields for well-known status codes and a switch expression to return them.\nThis approach gave the best performance for known status codes, which will be the common case for most apps.<\/p>\n<h3>IndexOfAny<\/h3>\n<p><a href=\"https:\/\/github.com\/dotnet\/aspnetcore\/pull\/39743\">dotnet\/aspnetcore#39743<\/a> from <a href=\"https:\/\/github.com\/martincostello\">@martincostello<\/a> noticed some places where we were passing a <code>char[]<\/code> of length 2 to <code>string.IndexOfAny<\/code>. The char array overload is slower than passing the 2 <code>char<\/code>s directly to the <code>ReadOnlySpan&lt;char&gt;.IndexOfAny<\/code> method. This change updated multiple call sites to remove the <code>char[]<\/code> and use the faster method. Note, <code>IndexOfAny<\/code> provides this method for 2 and 3 characters.<\/p>\n<pre><code class=\"language-csharp\">public class IndexOfAnyBenchmarks\r\n{\r\n    private const string AUrlWithAPathAndQueryString = \"http:\/\/www.example.com\/path\/to\/file.html?query=string\";\r\n    private static readonly char[] QueryStringAndFragmentTokens = new[] { '?', '#' };\r\n\r\n    [Benchmark(Baseline = true)]\r\n    public int IndexOfAny_String()\r\n    {\r\n        return AUrlWithAPathAndQueryString.IndexOfAny(QueryStringAndFragmentTokens);\r\n    }\r\n\r\n    [Benchmark]\r\n    public int IndexOfAny_Span_Array()\r\n    {\r\n        return AUrlWithAPathAndQueryString.AsSpan().IndexOfAny(QueryStringAndFragmentTokens);\r\n    }\r\n\r\n    [Benchmark]\r\n    public int IndexOfAny_Span_Two_Chars()\r\n    {\r\n        return AUrlWithAPathAndQueryString.AsSpan().IndexOfAny('?', '#');\r\n    }\r\n}<\/code><\/pre>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th style=\"text-align: right\">Mean<\/th>\n<th style=\"text-align: right\">Error<\/th>\n<th style=\"text-align: right\">StdDev<\/th>\n<th style=\"text-align: right\">Ratio<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>IndexOfAny_String<\/td>\n<td style=\"text-align: right\">7.004 ns<\/td>\n<td style=\"text-align: right\">0.1166 ns<\/td>\n<td style=\"text-align: right\">0.1091 ns<\/td>\n<td style=\"text-align: right\">1.00<\/td>\n<\/tr>\n<tr>\n<td>IndexOfAny_Span_Array<\/td>\n<td style=\"text-align: right\">6.847 ns<\/td>\n<td style=\"text-align: right\">0.0371 ns<\/td>\n<td style=\"text-align: right\">0.0347 ns<\/td>\n<td style=\"text-align: right\">0.98<\/td>\n<\/tr>\n<tr>\n<td>IndexOfAny_Span_Two_Chars<\/td>\n<td style=\"text-align: right\">5.161 ns<\/td>\n<td style=\"text-align: right\">0.0697 ns<\/td>\n<td style=\"text-align: right\">0.0582 ns<\/td>\n<td style=\"text-align: right\">0.73<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h3>Filters<\/h3>\n<p>In .NET 7, we introduced <a href=\"https:\/\/learn.microsoft.com\/aspnet\/core\/fundamentals\/minimal-apis\/min-api-filters?view=aspnetcore-7.0\">filters for Minimal APIs<\/a>. When designing the feature, we were very performance conscious. We profiled filters during previews to find areas where we could improve the code&#8217;s performance after the feature&#8217;s initial merge.<\/p>\n<p><a href=\"https:\/\/github.com\/dotnet\/aspnetcore\/pull\/41740\">dotnet\/aspnetcore#41740<\/a> fixed a case where we allocated an empty array for every request to parameterless endpoints when using filters. This was fixed by using <code>Array.Empty&lt;object&gt;()<\/code> instead of <code>new object[0]<\/code>. This might seem obvious, but when writing <a href=\"https:\/\/learn.microsoft.com\/dotnet\/csharp\/programming-guide\/concepts\/expression-trees\/\">Expressions Trees<\/a>, it&#8217;s quite easy to do.<\/p>\n<p>[dotnet\/aspnetcore#41379] removed the <a href=\"https:\/\/learn.microsoft.com\/dotnet\/csharp\/programming-guide\/types\/boxing-and-unboxing\">boxing<\/a> allocation for <code>ValueTask&lt;object&gt;<\/code> returning methods in Minimal APIs, which is what filters used to wrap the user provided delegate. Removing boxing made it so that the only overhead of adding a filter is allocating a context object that lets the user code inspect the arguments from a request to their endpoint.<\/p>\n<p><a href=\"https:\/\/github.com\/dotnet\/aspnetcore\/pull\/41406\">dotnet\/aspnetcore#41406<\/a> improved the allocations for creating the filter context object by adding generic class implementations for 1 to 10 parameter arguments in your endpoint. This change avoids the <code>object[]<\/code> allocation for holding the parameter values and any boxing that would occur if using struct parameters.\nThe improvement can be shown with a simplified example:<\/p>\n<pre><code class=\"language-csharp\">[MemoryDiagnoser]\r\npublic class FilterContext\r\n{\r\n    internal abstract class Context\r\n    {\r\n        public abstract T GetArgument&lt;T&gt;(int index);\r\n    }\r\n\r\n    internal sealed class DefaultContext : Context\r\n    {\r\n        public DefaultContext(params object[] arguments)\r\n        {\r\n            Arguments = arguments;\r\n        }\r\n\r\n        private IList&lt;object?&gt; Arguments { get; }\r\n\r\n        public override T GetArgument&lt;T&gt;(int index)\r\n        {\r\n            return (T)Arguments[index]!;\r\n        }\r\n    }\r\n\r\n    internal sealed class Context&lt;T0&gt; : Context\r\n    {\r\n        public Context(T0 argument)\r\n        {\r\n            Arg0 = argument;\r\n        }\r\n\r\n        public T0 Arg0 { get; set; }\r\n\r\n        public override T GetArgument&lt;T&gt;(int index)\r\n        {\r\n            return index switch\r\n            {\r\n                0 =&gt; (T)(object)Arg0!,\r\n                _ =&gt; throw new IndexOutOfRangeException()\r\n            };\r\n        }\r\n    }\r\n\r\n    [Benchmark]\r\n    public TimeSpan GetArgBoxed()\r\n    {\r\n        var defaultContext = new DefaultContext(new TimeSpan());\r\n        return defaultContext.GetArgument&lt;TimeSpan&gt;(0);\r\n    }\r\n\r\n    [Benchmark]\r\n    public TimeSpan GetArg()\r\n    {\r\n        var typedContext = new Context&lt;TimeSpan&gt;(new TimeSpan());\r\n        return typedContext.GetArgument&lt;TimeSpan&gt;(0);\r\n    }\r\n}<\/code><\/pre>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th style=\"text-align: right\">Mean<\/th>\n<th style=\"text-align: right\">Error<\/th>\n<th style=\"text-align: right\">StdDev<\/th>\n<th style=\"text-align: right\">Gen0<\/th>\n<th style=\"text-align: right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>GetArgBoxed<\/td>\n<td style=\"text-align: right\">16.865 ns<\/td>\n<td style=\"text-align: right\">0.1878 ns<\/td>\n<td style=\"text-align: right\">0.1466 ns<\/td>\n<td style=\"text-align: right\">0.0102<\/td>\n<td style=\"text-align: right\">80 B<\/td>\n<\/tr>\n<tr>\n<td>GetArg<\/td>\n<td style=\"text-align: right\">3.292 ns<\/td>\n<td style=\"text-align: right\">0.0806 ns<\/td>\n<td style=\"text-align: right\">0.0792 ns<\/td>\n<td style=\"text-align: right\">0.0031<\/td>\n<td style=\"text-align: right\">24 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2>Summary<\/h2>\n<p>Try out .NET 7 and let us know how your app&#8217;s performance has changed! We are always looking for feedback on how to improve the product and look forward to your contributions, be it an issue report or a PR.\nIf you want more performance goodness, you can read the <a href=\"https:\/\/devblogs.microsoft.com\/dotnet\/performance_improvements_in_net_7\/\">Performance Improvements in .NET 7<\/a> post. Also, take a look at <a href=\"https:\/\/devblogs.microsoft.com\/dotnet\/category\/developer-stories\/\">Developer Stories<\/a> which showcases multiple teams at Microsoft migrating from .NET Framework to .NET Core and seeing major performance and <a href=\"https:\/\/www.investopedia.com\/terms\/c\/cogs.asp\">COGS<\/a> wins.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>.NET 7 brings a great amount of performance improvements to ASP.NET Core developers. Find out what is new and how to take adavantage of the latest enhancements.<\/p>\n","protected":false},"author":82107,"featured_media":43051,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[685,197,7509,3009],"tags":[7611,32,108],"class_list":["post-43050","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-dotnet","category-aspnet","category-aspnetcore","category-performance","tag-dotnet-7","tag-asp-net-core","tag-performance"],"acf":[],"blog_post_summary":"<p>.NET 7 brings a great amount of performance improvements to ASP.NET Core developers. Find out what is new and how to take adavantage of the latest enhancements.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts\/43050","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/users\/82107"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/comments?post=43050"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts\/43050\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/media\/43051"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/media?parent=43050"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/categories?post=43050"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/tags?post=43050"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}