{"id":28871,"date":"2020-07-13T03:54:25","date_gmt":"2020-07-13T10:54:25","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/dotnet\/?p=28871"},"modified":"2025-10-29T11:24:42","modified_gmt":"2025-10-29T18:24:42","slug":"performance-improvements-in-net-5","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/dotnet\/performance-improvements-in-net-5\/","title":{"rendered":"Performance Improvements in .NET 5"},"content":{"rendered":"<p>In previous releases of .NET Core, I&#8217;ve blogged about the significant performance improvements that found their way into the release. For each post, from <a href=\"https:\/\/blogs.msdn.microsoft.com\/dotnet\/2017\/06\/07\/performance-improvements-in-net-core\/\" rel=\"nofollow\">.NET Core 2.0<\/a> to <a href=\"https:\/\/devblogs.microsoft.com\/dotnet\/performance-improvements-in-net-core-2-1\" rel=\"nofollow\">.NET Core 2.1<\/a> to <a href=\"https:\/\/devblogs.microsoft.com\/dotnet\/performance-improvements-in-net-core-3-0\/\" rel=\"nofollow\">.NET Core 3.0<\/a>, I found myself having more and more to talk about.  Yet interestingly, after each I also found myself wondering whether there&#8217;d be enough meaningful improvements next time to warrant another post.  Now that .NET 5 is shipping preview releases, I can definitively say the answer is, again, &#8220;yes&#8221;.  .NET 5 has already seen a wealth of performance improvements, and even though it&#8217;s not scheduled for final release until <a href=\"https:\/\/github.com\/dotnet\/core\/blob\/master\/roadmap.md\">later this year<\/a> and there&#8217;s very likely to be a lot more improvements that find their way in by then, I wanted to highlight a bunch of the improvements that are already available now.  In this post, I&#8217;ll highlight ~250 pull requests that have contributed to myriad of performance improvements across .NET 5.<\/p>\n<h3><a id=\"user-content-setup\" class=\"anchor\" aria-hidden=\"true\" href=\"#setup\"><svg class=\"octicon octicon-link\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" height=\"16\" aria-hidden=\"true\"><path fill-rule=\"evenodd\" d=\"M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z\"><\/path><\/svg><\/a><a name=\"setup\"><\/a>Setup<\/h3>\n<p><a href=\"https:\/\/github.com\/dotnet\/benchmarkdotnet\">Benchmark.NET<\/a> is now the canonical tool for measuring the performance of .NET code, making it simple to analyze the throughput and allocation of code snippets.  As such, the majority of my examples in this post are measured using microbenchmarks written using that tool.  To make it easy to follow-along at home (literally for many of us these days), I started by creating a directory and using the <code>dotnet<\/code> tool to scaffold it:<\/p>\n<div class=\"highlight highlight-source-shell\">\n<pre>mkdir Benchmarks\r\n<span class=\"pl-c1\">cd<\/span> Benchmarks\r\ndotnet new console<\/pre>\n<\/div>\n<p>and I augmented the contents of the generated Benchmarks.csproj to look like the following:<\/p>\n<div class=\"highlight highlight-text-xml\">\n<pre>&lt;<span class=\"pl-ent\">Project<\/span> <span class=\"pl-e\">Sdk<\/span>=<span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>Microsoft.NET.Sdk<span class=\"pl-pds\">\"<\/span><\/span>&gt;\r\n\r\n  &lt;<span class=\"pl-ent\">PropertyGroup<\/span>&gt;\r\n    &lt;<span class=\"pl-ent\">OutputType<\/span>&gt;Exe&lt;\/<span class=\"pl-ent\">OutputType<\/span>&gt;\r\n    &lt;<span class=\"pl-ent\">AllowUnsafeBlocks<\/span>&gt;true&lt;\/<span class=\"pl-ent\">AllowUnsafeBlocks<\/span>&gt;\r\n    &lt;<span class=\"pl-ent\">ServerGarbageCollection<\/span>&gt;true&lt;\/<span class=\"pl-ent\">ServerGarbageCollection<\/span>&gt;\r\n    &lt;<span class=\"pl-ent\">TargetFrameworks<\/span>&gt;net5.0;netcoreapp3.1;net48&lt;\/<span class=\"pl-ent\">TargetFrameworks<\/span>&gt;\r\n  &lt;\/<span class=\"pl-ent\">PropertyGroup<\/span>&gt;\r\n\r\n  &lt;<span class=\"pl-ent\">ItemGroup<\/span>&gt;\r\n    &lt;<span class=\"pl-ent\">PackageReference<\/span> <span class=\"pl-e\">Include<\/span>=<span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>benchmarkdotnet<span class=\"pl-pds\">\"<\/span><\/span> <span class=\"pl-e\">Version<\/span>=<span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>0.12.1<span class=\"pl-pds\">\"<\/span><\/span> \/&gt;\r\n  &lt;\/<span class=\"pl-ent\">ItemGroup<\/span>&gt;\r\n\r\n  &lt;<span class=\"pl-ent\">ItemGroup<\/span> <span class=\"pl-e\">Condition<\/span>=<span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span> '$(TargetFramework)' == 'net48' <span class=\"pl-pds\">\"<\/span><\/span>&gt;\r\n    &lt;<span class=\"pl-ent\">PackageReference<\/span> <span class=\"pl-e\">Include<\/span>=<span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>System.Memory<span class=\"pl-pds\">\"<\/span><\/span> <span class=\"pl-e\">Version<\/span>=<span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>4.5.4<span class=\"pl-pds\">\"<\/span><\/span> \/&gt;\r\n    &lt;<span class=\"pl-ent\">PackageReference<\/span> <span class=\"pl-e\">Include<\/span>=<span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>System.Text.Json<span class=\"pl-pds\">\"<\/span><\/span> <span class=\"pl-e\">Version<\/span>=<span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>4.7.2<span class=\"pl-pds\">\"<\/span><\/span> \/&gt;\r\n    &lt;<span class=\"pl-ent\">Reference<\/span> <span class=\"pl-e\">Include<\/span>=<span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>System.Net.Http<span class=\"pl-pds\">\"<\/span><\/span> \/&gt;\r\n  &lt;\/<span class=\"pl-ent\">ItemGroup<\/span>&gt;\r\n\r\n&lt;\/<span class=\"pl-ent\">Project<\/span>&gt;<\/pre>\n<\/div>\n<p>This lets me execute the benchmarks against .NET Framework 4.8, .NET Core 3.1, and .NET 5 (I currently have a <a href=\"https:\/\/github.com\/dotnet\/installer\/blob\/master\/README.md#installers-and-binaries\">nightly build<\/a> installed for Preview 8).  The .csproj also references the <code>Benchmark.NET<\/code> NuGet package (the latest release of which is version 12.1) in order to be able to use its features, and then references several other libraries and packages, specifically in support of being able to run tests on .NET Framework 4.8.<\/p>\n<p>Then, I updated the generated Program.cs file in the same folder to look like this:<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">using<\/span> <span class=\"pl-en\">BenchmarkDotNet<\/span>.<span class=\"pl-en\">Attributes<\/span>;\r\n<span class=\"pl-k\">using<\/span> <span class=\"pl-en\">BenchmarkDotNet<\/span>.<span class=\"pl-en\">Diagnosers<\/span>;\r\n<span class=\"pl-k\">using<\/span> <span class=\"pl-en\">BenchmarkDotNet<\/span>.<span class=\"pl-en\">Running<\/span>;\r\n<span class=\"pl-k\">using<\/span> <span class=\"pl-en\">System<\/span>;\r\n<span class=\"pl-k\">using<\/span> <span class=\"pl-en\">System<\/span>.<span class=\"pl-en\">Buffers<\/span>.<span class=\"pl-en\">Text<\/span>;\r\n<span class=\"pl-k\">using<\/span> <span class=\"pl-en\">System<\/span>.<span class=\"pl-en\">Collections<\/span>;\r\n<span class=\"pl-k\">using<\/span> <span class=\"pl-en\">System<\/span>.<span class=\"pl-en\">Collections<\/span>.<span class=\"pl-en\">Concurrent<\/span>;\r\n<span class=\"pl-k\">using<\/span> <span class=\"pl-en\">System<\/span>.<span class=\"pl-en\">Collections<\/span>.<span class=\"pl-en\">Generic<\/span>;\r\n<span class=\"pl-k\">using<\/span> <span class=\"pl-en\">System<\/span>.<span class=\"pl-en\">Collections<\/span>.<span class=\"pl-en\">Immutable<\/span>;\r\n<span class=\"pl-k\">using<\/span> <span class=\"pl-en\">System<\/span>.<span class=\"pl-en\">IO<\/span>;\r\n<span class=\"pl-k\">using<\/span> <span class=\"pl-en\">System<\/span>.<span class=\"pl-en\">Linq<\/span>;\r\n<span class=\"pl-k\">using<\/span> <span class=\"pl-en\">System<\/span>.<span class=\"pl-en\">Net<\/span>;\r\n<span class=\"pl-k\">using<\/span> <span class=\"pl-en\">System<\/span>.<span class=\"pl-en\">Net<\/span>.<span class=\"pl-en\">Http<\/span>;\r\n<span class=\"pl-k\">using<\/span> <span class=\"pl-en\">System<\/span>.<span class=\"pl-en\">Net<\/span>.<span class=\"pl-en\">Security<\/span>;\r\n<span class=\"pl-k\">using<\/span> <span class=\"pl-en\">System<\/span>.<span class=\"pl-en\">Net<\/span>.<span class=\"pl-en\">Sockets<\/span>;\r\n<span class=\"pl-k\">using<\/span> <span class=\"pl-en\">System<\/span>.<span class=\"pl-en\">Runtime<\/span>.<span class=\"pl-en\">CompilerServices<\/span>;\r\n<span class=\"pl-k\">using<\/span> <span class=\"pl-en\">System<\/span>.<span class=\"pl-en\">Threading<\/span>;\r\n<span class=\"pl-k\">using<\/span> <span class=\"pl-en\">System<\/span>.<span class=\"pl-en\">Threading<\/span>.<span class=\"pl-en\">Tasks<\/span>;\r\n<span class=\"pl-k\">using<\/span> <span class=\"pl-en\">System<\/span>.<span class=\"pl-en\">Text<\/span>;\r\n<span class=\"pl-k\">using<\/span> <span class=\"pl-en\">System<\/span>.<span class=\"pl-en\">Text<\/span>.<span class=\"pl-en\">Json<\/span>;\r\n<span class=\"pl-k\">using<\/span> <span class=\"pl-en\">System<\/span>.<span class=\"pl-en\">Text<\/span>.<span class=\"pl-en\">RegularExpressions<\/span>;\r\n\r\n[<span class=\"pl-en\">MemoryDiagnoser<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">class<\/span> <span class=\"pl-en\">Program<\/span>\r\n{\r\n    <span class=\"pl-k\">static<\/span> <span class=\"pl-k\">void<\/span> <span class=\"pl-en\">Main<\/span>(<span class=\"pl-k\">string<\/span>[] <span class=\"pl-smi\">args<\/span>) <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">BenchmarkSwitcher<\/span>.<span class=\"pl-en\">FromAssemblies<\/span>(<span class=\"pl-k\">new<\/span>[] { <span class=\"pl-k\">typeof<\/span>(<span class=\"pl-en\">Program<\/span>).<span class=\"pl-smi\">Assembly<\/span> }).<span class=\"pl-en\">Run<\/span>(<span class=\"pl-smi\">args<\/span>);\r\n\r\n    <span class=\"pl-c\"><span class=\"pl-c\">\/\/<\/span> BENCHMARKS GO HERE<\/span>\r\n}<\/pre>\n<\/div>\n<p>and for each test, I copy\/paste the benchmark code shown in each example to where it shows <code>\"\/\/ BENCHMARKS GO HERE\"<\/code>.<\/p>\n<p>To run the benchmarks, I then do:<\/p>\n<div class=\"highlight highlight-source-shell\">\n<pre>dotnet run -c Release -f net48 --runtimes net48 netcoreapp31 netcoreapp50 --filter <span class=\"pl-k\">**<\/span> --join<\/pre>\n<\/div>\n<p>This tells Benchmark.NET to:<\/p>\n<ul>\n<li>Build the benchmarks using the .NET Framework 4.8 surface area (which is the lowest-common denominator of all three targets and thus works for all of them).<\/li>\n<li>Run the benchmarks against each of .NET Framework 4.8, .NET Core 3.1, and .NET 5.<\/li>\n<li>Include all benchmarks in the assembly (don&#8217;t filter out any).<\/li>\n<li>Join the output together from all results from all benchmarks and display that at the end of the run (rather than interspersed throughout).<\/li>\n<\/ul>\n<p>In some cases where the API in question doesn&#8217;t exist for a particular target, I just leave off that part of the command-line.<\/p>\n<p>Finally, a few caveats:<\/p>\n<ul>\n<li>My <a href=\"https:\/\/devblogs.microsoft.com\/dotnet\/performance-improvements-in-net-core-3-0\/\" rel=\"nofollow\">last benchmarks post<\/a> was about .NET Core 3.0.  I didn&#8217;t write one about .NET Core 3.1 because, from a runtime and core libraries perspective, it saw relatively few improvements over its predecessor released just a few months prior.  However, there were some improvements, on top of which in some cases we&#8217;ve already back-ported improvements made for .NET 5 back to .NET Core 3.1, where the changes were deemed impactful enough to warrant being added to the Long Term Support (LTS) release.  As such, all of my comparisons here are against the latest .NET Core 3.1 servicing release (3.1.5) rather than against .NET Core 3.0.<\/li>\n<li>As the comparisons are about .NET 5 vs .NET Core 3.1, and as .NET Core 3.1 didn&#8217;t include the mono runtime, I&#8217;ve refrained from covering improvements made to mono, as well as to core library improvements specifically focused on <a href=\"https:\/\/devblogs.microsoft.com\/aspnet\/blazor-webassembly-3-2-0-now-available\/\" rel=\"nofollow\">&#8220;Blazor&#8221;<\/a>.  Thus when I refer to &#8220;the runtime&#8221;, I&#8217;m referring to coreclr, even though as of .NET 5 there are multiple runtimes under its umbrella, and all of them have been improved.<\/li>\n<li>Most of my examples were run on Windows, because I wanted to be able to compare against .NET Framework 4.8 as well.  However, unless otherwise mentioned, all of the examples shown accrue equally to Windows, Linux, and macOS.<\/li>\n<li>The standard caveat: all measurements here are on my desktop machine, and your mileage may vary.  Microbenchmarks can be very sensitive to any number of factors, including processor count, processor architecture, memory and cache speeds, and on and on.  However, in general I&#8217;ve focused on performance improvements and included examples that should generally withstand any such differences.<\/li>\n<\/ul>\n<p>Let&#8217;s get started&#8230;<\/p>\n<h2><a id=\"user-content-gc\" class=\"anchor\" aria-hidden=\"true\" href=\"#gc\"><svg class=\"octicon octicon-link\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" height=\"16\" aria-hidden=\"true\"><path fill-rule=\"evenodd\" d=\"M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z\"><\/path><\/svg><\/a><a name=\"gc\"><\/a>GC<\/h2>\n<p>For anyone interested in .NET and performance, garbage collection is frequently top of mind.  Lots of effort goes into reducing allocation, not because the act of allocating is itself particularly expensive, but because of the follow-on costs in cleaning up after those allocations via the garbage collector (GC).  No matter how much work goes into reducing allocations, however, the vast majority of workloads will incur them, and thus it&#8217;s important to continually push the boundaries of what the GC is able to accomplish, and how quickly.<\/p>\n<p>This release has seen a lot of effort go into improving the GC.  For example, <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/25986\">dotnet\/coreclr#25986<\/a> implements a form of work stealing for the &#8220;mark&#8221; phase of the GC.  The .NET GC is a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Tracing_garbage_collection\" rel=\"nofollow\">&#8220;tracing&#8221;<\/a> collector, meaning that (at a very high level) when it runs it starts from a set of &#8220;roots&#8221; (known locations that are inherently reachable, such as a static field) and traverses from object to object, &#8220;marking&#8221; each as being reachable; after all such traversals, any objects not marked are unreachable and can be collected.  This marking represents a significant portion of the time spent performing collections, and this PR improves marking performance by better balancing the work performed by each thread involved in the collection.  When running with the &#8220;Server GC&#8221;, a thread per core is involved in collections, and as threads finish their allotted portions of the marking work, they&#8217;re now able to &#8220;steal&#8221; undone work from other threads in order to help the overall collection complete more quickly.<\/p>\n<p>As another example, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/35896\">dotnet\/runtime#35896<\/a> optimizes decommits on the &#8220;ephemeral&#8221; segment (gen0 and gen1 are referred to as &#8220;ephemeral&#8221; because they&#8217;re objects expected to last for only a short time).  Decommitting is the act of giving pages of memory back to the operating system at the end of segments after the last live object on that segment.  The question for the GC then becomes, when should such decommits happen, and how much should it decommit at any point in time, given that it may end up needing to allocate additional pages for additional allocations at some point in the near future.<\/p>\n<p>Or take <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/32795\">dotnet\/runtime#32795<\/a>, which improves the GC&#8217;s scalability on machines with higher core counts by reducing lock contention involved in the GC&#8217;s scanning of statics.  Or <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/37894\">dotnet\/runtime#37894<\/a>, which avoids costly memory resets (essentially telling the OS that the relevant memory is no longer interesting) unless the GC sees it&#8217;s in a low-memory situation. Or <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/37159\">dotnet\/runtime#37159<\/a>, which (although not yet merged, is expected to be for .NET 5) builds on the work of <a href=\"https:\/\/github.com\/damageboy\">@damageboy<\/a> to vectorize sorting employed in the GC.  Or <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/27729\">dotnet\/coreclr#27729<\/a>, which reduces the time it takes for the GC to suspend threads, something that&#8217;s necessary in order for it to get a stable view so that it can accurately determine which are being used.<\/p>\n<p>This is only a partial list of changes made to improve the GC itself, but that last bullet brings me to a topic of particular fascination for me, as it speaks to a lot of the work we&#8217;ve done in .NET in recent years.  In this release, we&#8217;ve continued, and even accelerated, the process of porting native implementations in the coreclr runtime from C\/C++ to instead be normal C# managed code in System.Private.Corelib.  Such a move has a plethora of benefits, including making it much easier for us to share a single implementation across multiple runtimes (like coreclr and mono), and even making it easier for us to evolve API surface area, such as by reusing the same logic to handle both arrays and spans.  But one thing that takes some folks by surprise is that such benefits also include performance, in multiple ways.  One such way harkens back to one of the original motivations for using a managed runtime: safety.  By default, code written in C# is &#8220;safe&#8221;, in that the runtime ensures all memory accesses are bounds checked, and only by explicit action visible in the code (e.g. using the <code>unsafe<\/code> keyword, the <code>Marshal<\/code> class, the <code>Unsafe<\/code> class, etc.) is a developer able to remove such validation.  As a result, as maintainers of an open source project, our job of shipping a secure system is made significantly easier when contributions come in the form of managed code: while such code can of course contain bugs that might slip through code reviews and automated testing, we can sleep better at night knowing that the chances for such bugs to introduce security problems are drastically reduced.  That in turn means we&#8217;re more likely to accept improvements to managed code and at a higher velocity, with it being faster for a contributor to provide and faster for us to help validate.  We&#8217;ve also found a larger number of contributors interested in exploring performance improvements when it comes in the form of C# rather than C.  And more experimentation from more people progressing at a faster rate yields better performance.<\/p>\n<p>There are, however, more direct forms of performance improvements we&#8217;ve seen from such porting.  There is a relatively small amount of overhead required for managed code to call into the runtime, but when such calls are made at high frequency, such overhead adds up.  Consider <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/27700\">dotnet\/coreclr#27700<\/a>, which moved the implementation of the sorting of arrays of primitive types out of native code in coreclr and up into C# in Corelib.  In addition to that code then powering new public APIs for sorting spans, it also made it cheaper to sort smaller arrays where the cost of doing so is dominated by the transition from managed code.  We can see this with a small benchmark, which is just using <code>Array.Sort<\/code> to sort <code>int[]<\/code>, <code>double[]<\/code>, and <code>string[]<\/code> arrays of 10 items:<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">public<\/span> <span class=\"pl-k\">class<\/span> <span class=\"pl-en\">DoubleSorting<\/span> : <span class=\"pl-en\">Sorting<\/span>&lt;<span class=\"pl-k\">double<\/span>&gt; { <span class=\"pl-k\">protected<\/span> <span class=\"pl-k\">override<\/span> <span class=\"pl-k\">double<\/span> <span class=\"pl-en\">GetNext<\/span>() <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">_random<\/span>.<span class=\"pl-en\">Next<\/span>(); }\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">class<\/span> <span class=\"pl-en\">Int32Sorting<\/span> : <span class=\"pl-en\">Sorting<\/span>&lt;<span class=\"pl-k\">int<\/span>&gt; { <span class=\"pl-k\">protected<\/span> <span class=\"pl-k\">override<\/span> <span class=\"pl-k\">int<\/span> <span class=\"pl-en\">GetNext<\/span>() <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">_random<\/span>.<span class=\"pl-en\">Next<\/span>(); }\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">class<\/span> <span class=\"pl-en\">StringSorting<\/span> : <span class=\"pl-en\">Sorting<\/span>&lt;<span class=\"pl-k\">string<\/span>&gt;\r\n{\r\n    <span class=\"pl-k\">protected<\/span> <span class=\"pl-k\">override<\/span> <span class=\"pl-k\">string<\/span> <span class=\"pl-en\">GetNext<\/span>()\r\n    {\r\n        <span class=\"pl-k\">var<\/span> <span class=\"pl-smi\">dest<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-k\">char<\/span>[<span class=\"pl-smi\">_random<\/span>.<span class=\"pl-en\">Next<\/span>(<span class=\"pl-c1\">1<\/span>, <span class=\"pl-c1\">5<\/span>)];\r\n        <span class=\"pl-k\">for<\/span> (<span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">0<\/span>; <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">&lt;<\/span> <span class=\"pl-smi\">dest<\/span>.<span class=\"pl-smi\">Length<\/span>; <span class=\"pl-smi\">i<\/span><span class=\"pl-k\">++<\/span>) <span class=\"pl-smi\">dest<\/span>[<span class=\"pl-smi\">i<\/span>] <span class=\"pl-k\">=<\/span> (<span class=\"pl-smi\">char<\/span>)(<span class=\"pl-s\">'a'<\/span> <span class=\"pl-k\">+<\/span> <span class=\"pl-smi\">_random<\/span>.<span class=\"pl-en\">Next<\/span>(<span class=\"pl-c1\">26<\/span>));\r\n        <span class=\"pl-k\">return<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-k\">string<\/span>(<span class=\"pl-smi\">dest<\/span>);\r\n    }\r\n}\r\n\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">abstract<\/span> <span class=\"pl-k\">class<\/span> <span class=\"pl-en\">Sorting<\/span>&lt;<span class=\"pl-en\">T<\/span>&gt;\r\n{\r\n    <span class=\"pl-k\">protected<\/span> <span class=\"pl-en\">Random<\/span> <span class=\"pl-smi\">_random<\/span>;\r\n    <span class=\"pl-k\">private<\/span> <span class=\"pl-en\">T<\/span>[] <span class=\"pl-smi\">_orig<\/span>, <span class=\"pl-smi\">_array<\/span>;\r\n\r\n    [<span class=\"pl-en\">Params<\/span>(<span class=\"pl-c1\">10<\/span>)]\r\n    <span class=\"pl-k\">public<\/span> <span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">Size<\/span> { <span class=\"pl-k\">get<\/span>; <span class=\"pl-k\">set<\/span>; }\r\n\r\n    <span class=\"pl-k\">protected<\/span> <span class=\"pl-k\">abstract<\/span> <span class=\"pl-en\">T<\/span> <span class=\"pl-en\">GetNext<\/span>();\r\n\r\n    [<span class=\"pl-en\">GlobalSetup<\/span>]\r\n    <span class=\"pl-k\">public<\/span> <span class=\"pl-k\">void<\/span> <span class=\"pl-en\">Setup<\/span>()\r\n    {\r\n        <span class=\"pl-smi\">_random<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">Random<\/span>(<span class=\"pl-c1\">42<\/span>);\r\n        <span class=\"pl-smi\">_orig<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-smi\">Enumerable<\/span>.<span class=\"pl-en\">Range<\/span>(<span class=\"pl-c1\">0<\/span>, <span class=\"pl-smi\">Size<\/span>).<span class=\"pl-en\">Select<\/span>(<span class=\"pl-smi\">_<\/span> <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-en\">GetNext<\/span>()).<span class=\"pl-en\">ToArray<\/span>();\r\n        <span class=\"pl-smi\">_array<\/span> <span class=\"pl-k\">=<\/span> (<span class=\"pl-en\">T<\/span>[])<span class=\"pl-smi\">_orig<\/span>.<span class=\"pl-en\">Clone<\/span>();\r\n        <span class=\"pl-smi\">Array<\/span>.<span class=\"pl-en\">Sort<\/span>(<span class=\"pl-smi\">_array<\/span>);\r\n    }\r\n\r\n    [<span class=\"pl-en\">Benchmark<\/span>]\r\n    <span class=\"pl-k\">public<\/span> <span class=\"pl-k\">void<\/span> <span class=\"pl-en\">Random<\/span>()\r\n    {\r\n        <span class=\"pl-smi\">_orig<\/span>.<span class=\"pl-en\">AsSpan<\/span>().<span class=\"pl-en\">CopyTo<\/span>(<span class=\"pl-smi\">_array<\/span>);\r\n        <span class=\"pl-smi\">Array<\/span>.<span class=\"pl-en\">Sort<\/span>(<span class=\"pl-smi\">_array<\/span>);\r\n    }\r\n}<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Type<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>DoubleSorting<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">88.88 ns<\/td>\n<td align=\"right\">1.00<\/td>\n<\/tr>\n<tr>\n<td>DoubleSorting<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">73.29 ns<\/td>\n<td align=\"right\">0.83<\/td>\n<\/tr>\n<tr>\n<td>DoubleSorting<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">35.83 ns<\/td>\n<td align=\"right\">0.40<\/td>\n<\/tr>\n<tr>\n<td><\/td>\n<td><\/td>\n<td align=\"right\"><\/td>\n<td align=\"right\"><\/td>\n<\/tr>\n<tr>\n<td>Int32Sorting<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">66.34 ns<\/td>\n<td align=\"right\">1.00<\/td>\n<\/tr>\n<tr>\n<td>Int32Sorting<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">48.47 ns<\/td>\n<td align=\"right\">0.73<\/td>\n<\/tr>\n<tr>\n<td>Int32Sorting<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">31.07 ns<\/td>\n<td align=\"right\">0.47<\/td>\n<\/tr>\n<tr>\n<td><\/td>\n<td><\/td>\n<td align=\"right\"><\/td>\n<td align=\"right\"><\/td>\n<\/tr>\n<tr>\n<td>StringSorting<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">2,193.86 ns<\/td>\n<td align=\"right\">1.00<\/td>\n<\/tr>\n<tr>\n<td>StringSorting<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">1,713.11 ns<\/td>\n<td align=\"right\">0.78<\/td>\n<\/tr>\n<tr>\n<td>StringSorting<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">1,400.96 ns<\/td>\n<td align=\"right\">0.64<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>This in and of itself is a nice benefit of the move, as is the fact that in .NET 5 via <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/37630\">dotnet\/runtime#37630<\/a> we also added <code>System.Half<\/code>, a new 16-bit floating-point primitive, and being in managed code, this sorting implementation&#8217;s optimizations almost immediately applied to it, whereas the previous native implementation would have required significant additional work, with no C++ standard type for <code>half<\/code>.  But, there&#8217;s an arguably even more impactful performance benefit here, and it brings us back to where I started this discussion: GC.<\/p>\n<p>One of the interesting metrics for the GC is &#8220;pause time&#8221;, which effectively means how long the GC must pause the runtime in order to perform its work.  Longer pause times have a direct impact on latency, which can be a crucial metric for all manner of workloads.  As alluded to earlier, the GC may need to suspend threads in order to get a consistent view of the world and to ensure that it can move objects around safely, but if a thread is currently executing C\/C++ code in the runtime, the GC may need to wait until that call completes before it&#8217;s able to suspend the thread.  Thus, the more work we can do in managed code instead of native code, the better off we are for GC pause times.  We can use the same <code>Array.Sort<\/code> example to see this.  Consider this program:<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">using<\/span> <span class=\"pl-en\">System<\/span>;\r\n<span class=\"pl-k\">using<\/span> <span class=\"pl-en\">System<\/span>.<span class=\"pl-en\">Diagnostics<\/span>;\r\n<span class=\"pl-k\">using<\/span> <span class=\"pl-en\">System<\/span>.<span class=\"pl-en\">Threading<\/span>;\r\n\r\n<span class=\"pl-k\">class<\/span> <span class=\"pl-en\">Program<\/span>\r\n{\r\n    <span class=\"pl-k\">public<\/span> <span class=\"pl-k\">static<\/span> <span class=\"pl-k\">void<\/span> <span class=\"pl-en\">Main<\/span>()\r\n    {\r\n        <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">Thread<\/span>(() <span class=\"pl-k\">=&gt;<\/span>\r\n        {\r\n            <span class=\"pl-k\">var<\/span> <span class=\"pl-smi\">a<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-k\">int<\/span>[<span class=\"pl-c1\">20<\/span>];\r\n            <span class=\"pl-k\">while<\/span> (<span class=\"pl-c1\">true<\/span>) <span class=\"pl-smi\">Array<\/span>.<span class=\"pl-en\">Sort<\/span>(<span class=\"pl-smi\">a<\/span>);\r\n        }) { <span class=\"pl-smi\">IsBackground<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">true<\/span> }.<span class=\"pl-en\">Start<\/span>();\r\n\r\n        <span class=\"pl-k\">var<\/span> <span class=\"pl-smi\">sw<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">Stopwatch<\/span>();\r\n        <span class=\"pl-k\">while<\/span> (<span class=\"pl-c1\">true<\/span>)\r\n        {\r\n            <span class=\"pl-smi\">sw<\/span>.<span class=\"pl-en\">Restart<\/span>();\r\n            <span class=\"pl-k\">for<\/span> (<span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">0<\/span>; <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">&lt;<\/span> <span class=\"pl-c1\">10<\/span>; <span class=\"pl-smi\">i<\/span><span class=\"pl-k\">++<\/span>)\r\n            {\r\n                <span class=\"pl-smi\">GC<\/span>.<span class=\"pl-en\">Collect<\/span>();\r\n                <span class=\"pl-smi\">Thread<\/span>.<span class=\"pl-en\">Sleep<\/span>(<span class=\"pl-c1\">15<\/span>);\r\n            }\r\n            <span class=\"pl-smi\">Console<\/span>.<span class=\"pl-en\">WriteLine<\/span>(<span class=\"pl-smi\">sw<\/span>.<span class=\"pl-smi\">Elapsed<\/span>.<span class=\"pl-smi\">TotalSeconds<\/span>);\r\n        }\r\n    }\r\n}<\/pre>\n<\/div>\n<p>This is spinning up a thread that just sits in a tight loop sorting a small array over and over, while on the main thread it performs 10 GCs, each with approximately 15 milliseconds between them.  So, we&#8217;d expect that loop to take a little more than 150 milliseconds.  But when I run this on .NET Core 3.1, I get numbers of seconds like this:<\/p>\n<div class=\"highlight highlight-source-shell\">\n<pre>6.6419048\r\n5.5663149\r\n5.7430339\r\n6.032052\r\n7.8892468<\/pre>\n<\/div>\n<p>The GC has difficulty here interrupting the thread performing the sorts, causing the GC pause times to be way higher than desirable.  Thankfully, when I instead run this on .NET 5, I get numbers like this:<\/p>\n<div class=\"highlight highlight-source-shell\">\n<pre>0.159311\r\n0.159453\r\n0.1594669\r\n0.1593328\r\n0.1586566<\/pre>\n<\/div>\n<p>which is exactly what we predicted we should get.  By moving the Array.Sort implementation into managed code, where the runtime can more easily suspend the implementation when it wants to, we&#8217;ve made it possible for the GC to be much better at its job.<\/p>\n<p>This isn&#8217;t limited to just <code>Array.Sort<\/code>, of course.  A bunch of PRs performed such porting, for example <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/32722\">dotnet\/runtime#32722<\/a> moving the <code>stdelemref<\/code> and <code>ldelemaref<\/code> JIT helpers to C#, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/32353\">dotnet\/runtime#32353<\/a> moving portions of the <code>unbox<\/code> helper to C# (and instrumenting the rest with appropriate GC polling locations that let the GC suspend appropriately in the rest), <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/27603\">dotnet\/coreclr#27603<\/a> \/ <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/27634\">dotnet\/coreclr#27634<\/a> \/ <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/27123\">dotnet\/coreclr#27123<\/a> \/ <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/27776\">dotnet\/coreclr#27776<\/a> moving more array implementations like <code>Array.Clear<\/code> and <code>Array.Copy<\/code> to C#, <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/27216\">dotnet\/coreclr#27216<\/a> moving more of <code>Buffer<\/code> to C#, and <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/27792\">dotnet\/coreclr#27792<\/a> moving <code>Enum.CompareTo<\/code> to C#.  Some of these changes then enabled subsequent gains, such as with <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/32342\">dotnet\/runtime#32342<\/a> and <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/35733\">dotnet\/runtime#35733<\/a>, which employed the improvements in <code>Buffer.Memmove<\/code> to achieve additional gains in various <code>string<\/code> and <code>Array<\/code> methods.<\/p>\n<p>As one final thought on this set of changes, another interesting thing to note is how micro-optimizations made in one release may be based on assumptions that are later invalidated, and when employing such micro-optimizations, one needs to be ready and willing to adapt.  In my .NET Core 3.0 blog post, I called out &#8220;peanut butter&#8221; changes like <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/21756\">dotnet\/coreclr#21756<\/a>, which switched lots of call sites from using <code>Array.Copy(source, destination, length)<\/code> to instead use <code>Array.Copy(source, sourceOffset, destination, destinationOffset, length)<\/code>, because the overhead involved in the former getting the lower bounds of the source and destination arrays was measurable.  But with the aforementioned set of changes that moved array-processing code to C#, the simpler overload&#8217;s overheads disappeared, making it both the simpler and faster choice for these operations.  And such, for .NET 5 PRs <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/27641\">dotnet\/coreclr#27641<\/a> and <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/42343\">dotnet\/corefx#42343<\/a> switched all of these call sites and more back to using the simpler overload. <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/36304\">dotnet\/runtime#36304<\/a> is another example of undoing previous optimizations due to changes that made them obsolete or actually harmful. You&#8217;ve always been able to pass a single character to <code>String.Split<\/code>, e.g. <code>version.Split('.')<\/code>. The problem, however, was the only overload of <code>Split<\/code> that this could bind to was <code>Split(params char[] separator)<\/code>, which means that every such call resulted in the C# compiler generating a <code>char[]<\/code> allocation.  To work around that, previous releases saw caches added, allocating arrays ahead of time and storing them into statics that could then be used by <code>Split<\/code> calls to avoid the per-call <code>char[]<\/code>.  Now that there&#8217;s a <code>Split(char separator, StringSplitOptions options = StringSplitOptions.None)<\/code> overload in .NET, we no longer need the array at all.<\/p>\n<p>As one last example, I showed how moving code out of the runtime and into managed code can help with GC pauses, but there are of course other ways code remaining in the runtime can help with that. <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/36179\">dotnet\/runtime#36179<\/a> reduced GC pauses due to exception handling by ensuring the runtime was in <a href=\"https:\/\/github.com\/dotnet\/runtime\/blob\/4fdf9ff8812869dcf957ce0d2eb07c0d5779d1c6\/docs\/coding-guidelines\/clr-code-guide.md#218-use-the-right-gc-mode--preemptive-vs-cooperative\">preemptive mode<\/a> around code such as getting &#8220;Watson&#8221; bucket parameters (basically, a set of data that uniquely identifies this particular exception and call stack for reporting purposes).<\/p>\n<h2><a id=\"user-content-jit\" class=\"anchor\" aria-hidden=\"true\" href=\"#jit\"><svg class=\"octicon octicon-link\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" height=\"16\" aria-hidden=\"true\"><path fill-rule=\"evenodd\" d=\"M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z\"><\/path><\/svg><\/a><a name=\"jit\"><\/a>JIT<\/h2>\n<p>.NET 5 is an exciting version for the Just-In-Time (JIT) compiler, too, with many improvements of all manner finding their way into the release. As with any compiler, improvements made to the JIT can have wide-reaching effects.  Often individual changes have a small impact on an individual piece of code, but such changes are then magnified by the sheer number of places they apply.<\/p>\n<p>There is an almost unbounded number of optimizations that can be added to the JIT, and given an unlimited amount of time to run such optimizations, the JIT could create the most optimal code for any given scenario.  But the JIT doesn&#8217;t have an unbounded amount of time.  The &#8220;just-in-time&#8221; nature of the JIT means it&#8217;s performing the compilation as the app runs: when a method that hasn&#8217;t yet been compiled is invoked, the JIT needs to provide the assembly code for it on-demand.  That means the thread can&#8217;t make forward progress until the compilation has completed, which in turn means the JIT needs to be strategic in what optimizations it applies and how it chooses to use its limited time budget.  Various techniques are used to give the JIT more time, such as using &#8220;ahead of time&#8221; compilation (AOT) on some portions of the app to do as much of the compilation work as is possible before the app is executed (for example, the core libraries are all AOT compiled using a technology named <a href=\"https:\/\/github.com\/dotnet\/runtime\/blob\/99aae90739c2ad5642a36873334c82a8b7fb2de9\/docs\/design\/coreclr\/botr\/readytorun-overview.md\">&#8220;ReadyToRun&#8221;<\/a>, which you may hear referred to as &#8220;R2R&#8221; or even &#8220;crossgen&#8221;, which is the tool that produces these images), or by using <a href=\"https:\/\/github.com\/dotnet\/runtime\/blob\/9900dfb4b2e32cf02ca846adaf11e93211629ede\/docs\/design\/features\/tiered-compilation.md\">&#8220;tiered compilation&#8221;<\/a>, which allows the JIT to initially compile a method with few-to-no optimizations applied and thus be very fast in doing so, and only spend more time recompiling it with many more optimizations when it&#8217;s deemed valuable, namely when the method is shown to be used repeatedly.  However, more generally the developers contributing to the JIT simply choose to use the allotted time budget for optimizations that prove to be valuable given the code developers are writing and the code patterns they&#8217;re employing.  That means that as .NET evolves and gains new capabilities, new language features, and new library features, the JIT also evolves with optimizations suited to the newer style of code being written.<\/p>\n<p>A great example of that is with <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/32538\">dotnet\/runtime#32538<\/a> from <a href=\"https:\/\/github.com\/benaadams\">@benaadams<\/a>.  <code>Span&lt;T&gt;<\/code> has been permeating all layers of the .NET stack, as developers working on the runtime, core libraries, ASP.NET Core, and beyond recognize its power when it comes to writing safe and efficient code that also unifies handling for strings, managed arrays, natively-allocated memory, and other forms of data.  Similarly, value types (structs) are being used much more pervasively as a way to avoid object allocation overheads via stack allocation. But this heavy reliance on such types also introduces additional headaches for the runtime.  The coreclr runtime uses a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Tracing_garbage_collection#Precise_vs._conservative_and_internal_pointers\" rel=\"nofollow\">&#8220;precise&#8221; garbage collector<\/a>, which means the GC is able to track with 100% accuracy what values refer to managed objects and what values don&#8217;t; that has benefits, but it also has cost (in contrast, the mono runtime uses a &#8220;conservative&#8221; garbage collector, which has some performance benefits, but also means it may interpret an arbitrary value on the stack that happens to be the same as a managed object&#8217;s address as being a live reference to that object).  One such cost is that the JIT needs to help the GC by guaranteeing that any local that could be interpreted as an object reference is zero&#8217;d out prior to the GC paying attention to it; otherwise, the GC could end up seeing a garbage value in a local that hadn&#8217;t been set yet, and assume it referred to a valid object, at which point &#8220;bad things&#8221; can happen.  The more reference locals there are, the more clearing needs to be done.  If you&#8217;re just clearing a few locals, it&#8217;s probably not noticeable.  But as the number increases, the amount of time spent clearing those locals can add up, especially in a small method used in a very hot code path.  This situation has become much more common with spans and structs, where coding patterns often result in many more references (a <code>Span&lt;T&gt;<\/code> contains a reference) that need to be zero&#8217;d.  The aforementioned PR addressed this by updating the JIT&#8217;s generated code for the prolog blocks that perform this zero&#8217;ing to use <code>xmm<\/code> registers rather than using the <code>rep stosd<\/code> instruction.  Effectively, it vectorized the zeroing.  You can see the impact of this with the following benchmark:<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre>[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">int<\/span> <span class=\"pl-en\">Zeroing<\/span>()\r\n{\r\n    <span class=\"pl-en\">ReadOnlySpan<\/span>&lt;<span class=\"pl-k\">char<\/span>&gt; <span class=\"pl-smi\">s1<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>hello world<span class=\"pl-pds\">\"<\/span><\/span>;\r\n    <span class=\"pl-en\">ReadOnlySpan<\/span>&lt;<span class=\"pl-k\">char<\/span>&gt; <span class=\"pl-smi\">s2<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-en\">Nop<\/span>(<span class=\"pl-smi\">s1<\/span>);\r\n    <span class=\"pl-en\">ReadOnlySpan<\/span>&lt;<span class=\"pl-k\">char<\/span>&gt; <span class=\"pl-smi\">s3<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-en\">Nop<\/span>(<span class=\"pl-smi\">s2<\/span>);\r\n    <span class=\"pl-en\">ReadOnlySpan<\/span>&lt;<span class=\"pl-k\">char<\/span>&gt; <span class=\"pl-smi\">s4<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-en\">Nop<\/span>(<span class=\"pl-smi\">s3<\/span>);\r\n    <span class=\"pl-en\">ReadOnlySpan<\/span>&lt;<span class=\"pl-k\">char<\/span>&gt; <span class=\"pl-smi\">s5<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-en\">Nop<\/span>(<span class=\"pl-smi\">s4<\/span>);\r\n    <span class=\"pl-en\">ReadOnlySpan<\/span>&lt;<span class=\"pl-k\">char<\/span>&gt; <span class=\"pl-smi\">s6<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-en\">Nop<\/span>(<span class=\"pl-smi\">s5<\/span>);\r\n    <span class=\"pl-en\">ReadOnlySpan<\/span>&lt;<span class=\"pl-k\">char<\/span>&gt; <span class=\"pl-smi\">s7<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-en\">Nop<\/span>(<span class=\"pl-smi\">s6<\/span>);\r\n    <span class=\"pl-en\">ReadOnlySpan<\/span>&lt;<span class=\"pl-k\">char<\/span>&gt; <span class=\"pl-smi\">s8<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-en\">Nop<\/span>(<span class=\"pl-smi\">s7<\/span>);\r\n    <span class=\"pl-en\">ReadOnlySpan<\/span>&lt;<span class=\"pl-k\">char<\/span>&gt; <span class=\"pl-smi\">s9<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-en\">Nop<\/span>(<span class=\"pl-smi\">s8<\/span>);\r\n    <span class=\"pl-en\">ReadOnlySpan<\/span>&lt;<span class=\"pl-k\">char<\/span>&gt; <span class=\"pl-smi\">s10<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-en\">Nop<\/span>(<span class=\"pl-smi\">s9<\/span>);\r\n    <span class=\"pl-k\">return<\/span> <span class=\"pl-smi\">s1<\/span>.<span class=\"pl-smi\">Length<\/span> <span class=\"pl-k\">+<\/span> <span class=\"pl-smi\">s2<\/span>.<span class=\"pl-smi\">Length<\/span> <span class=\"pl-k\">+<\/span> <span class=\"pl-smi\">s3<\/span>.<span class=\"pl-smi\">Length<\/span> <span class=\"pl-k\">+<\/span> <span class=\"pl-smi\">s4<\/span>.<span class=\"pl-smi\">Length<\/span> <span class=\"pl-k\">+<\/span> <span class=\"pl-smi\">s5<\/span>.<span class=\"pl-smi\">Length<\/span> <span class=\"pl-k\">+<\/span> <span class=\"pl-smi\">s6<\/span>.<span class=\"pl-smi\">Length<\/span> <span class=\"pl-k\">+<\/span> <span class=\"pl-smi\">s7<\/span>.<span class=\"pl-smi\">Length<\/span> <span class=\"pl-k\">+<\/span> <span class=\"pl-smi\">s8<\/span>.<span class=\"pl-smi\">Length<\/span> <span class=\"pl-k\">+<\/span> <span class=\"pl-smi\">s9<\/span>.<span class=\"pl-smi\">Length<\/span> <span class=\"pl-k\">+<\/span> <span class=\"pl-smi\">s10<\/span>.<span class=\"pl-smi\">Length<\/span>;\r\n}\r\n\r\n[<span class=\"pl-en\">MethodImpl<\/span>(<span class=\"pl-smi\">MethodImplOptions<\/span>.<span class=\"pl-smi\">NoInlining<\/span>)]\r\n<span class=\"pl-k\">private<\/span> <span class=\"pl-k\">static<\/span> <span class=\"pl-en\">ReadOnlySpan<\/span>&lt;<span class=\"pl-k\">char<\/span>&gt; <span class=\"pl-en\">Nop<\/span>(<span class=\"pl-en\">ReadOnlySpan<\/span>&lt;<span class=\"pl-k\">char<\/span>&gt; <span class=\"pl-smi\">span<\/span>) <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">default<\/span>;<\/pre>\n<\/div>\n<p>On my machine, I get results like the following:<\/p>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Zeroing<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">22.85 ns<\/td>\n<td align=\"right\">1.00<\/td>\n<\/tr>\n<tr>\n<td>Zeroing<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">18.60 ns<\/td>\n<td align=\"right\">0.81<\/td>\n<\/tr>\n<tr>\n<td>Zeroing<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">15.07 ns<\/td>\n<td align=\"right\">0.66<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Note that such zero&#8217;ing is actually needed in more situations than I mentioned.  In particular, by default the C# specification requires that all locals be initialized to their default values before the developer&#8217;s code is executed.  You can see this with an example like this:<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">using<\/span> <span class=\"pl-en\">System<\/span>;\r\n<span class=\"pl-k\">using<\/span> <span class=\"pl-en\">System<\/span>.<span class=\"pl-en\">Runtime<\/span>.<span class=\"pl-en\">CompilerServices<\/span>;\r\n<span class=\"pl-k\">using<\/span> <span class=\"pl-en\">System<\/span>.<span class=\"pl-en\">Threading<\/span>;\r\n\r\n<span class=\"pl-k\">unsafe<\/span> <span class=\"pl-k\">class<\/span> <span class=\"pl-en\">Program<\/span>\r\n{\r\n    <span class=\"pl-k\">static<\/span> <span class=\"pl-k\">void<\/span> <span class=\"pl-en\">Main<\/span>()\r\n    {\r\n        <span class=\"pl-k\">while<\/span> (<span class=\"pl-c1\">true<\/span>)\r\n        {\r\n            <span class=\"pl-en\">Example<\/span>();\r\n            <span class=\"pl-smi\">Thread<\/span>.<span class=\"pl-en\">Sleep<\/span>(<span class=\"pl-c1\">1<\/span>);\r\n        }\r\n    }\r\n\r\n    [<span class=\"pl-en\">MethodImpl<\/span>(<span class=\"pl-smi\">MethodImplOptions<\/span>.<span class=\"pl-smi\">NoInlining<\/span>)]\r\n    <span class=\"pl-k\">static<\/span> <span class=\"pl-k\">void<\/span> <span class=\"pl-en\">Example<\/span>()\r\n    {\r\n        <span class=\"pl-en\">Guid<\/span> <span class=\"pl-smi\">g<\/span>;\r\n        <span class=\"pl-smi\">Console<\/span>.<span class=\"pl-en\">WriteLine<\/span>(<span class=\"pl-k\">*<\/span><span class=\"pl-k\">&amp;<\/span><span class=\"pl-smi\">g<\/span>);\r\n    }\r\n}<\/pre>\n<\/div>\n<p>Run that, and you should see only <code>Guid<\/code>s of all <code>0<\/code>s output.  That&#8217;s because the C# compiler is emitting a <code>.locals init<\/code> flag into the IL for the compiled <code>Example<\/code> method, and that <code>.locals init<\/code> tells the JIT it needs to zero out all locals, not just those that contain references.  However, in .NET 5, there&#8217;s a new attribute in the runtime (<a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/454\">dotnet\/runtime#454<\/a>):<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">namespace<\/span> <span class=\"pl-en\">System<\/span>.<span class=\"pl-en\">Runtime<\/span>.<span class=\"pl-en\">CompilerServices<\/span>\r\n{\r\n    [<span class=\"pl-en\">AttributeUsage<\/span>(<span class=\"pl-smi\">AttributeTargets<\/span>.<span class=\"pl-smi\">Module<\/span> <span class=\"pl-k\">|<\/span> <span class=\"pl-smi\">AttributeTargets<\/span>.<span class=\"pl-smi\">Class<\/span> <span class=\"pl-k\">|<\/span> <span class=\"pl-smi\">AttributeTargets<\/span>.<span class=\"pl-smi\">Struct<\/span> <span class=\"pl-k\">|<\/span> <span class=\"pl-smi\">AttributeTargets<\/span>.<span class=\"pl-smi\">Constructor<\/span> <span class=\"pl-k\">|<\/span> <span class=\"pl-smi\">AttributeTargets<\/span>.<span class=\"pl-smi\">Method<\/span> <span class=\"pl-k\">|<\/span> <span class=\"pl-smi\">AttributeTargets<\/span>.<span class=\"pl-smi\">Property<\/span> <span class=\"pl-k\">|<\/span> <span class=\"pl-smi\">AttributeTargets<\/span>.<span class=\"pl-smi\">Event<\/span> <span class=\"pl-k\">|<\/span> <span class=\"pl-smi\">AttributeTargets<\/span>.<span class=\"pl-smi\">Interface<\/span>, <span class=\"pl-en\">Inherited<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">false<\/span>)]\r\n    <span class=\"pl-k\">public<\/span> <span class=\"pl-k\">sealed<\/span> <span class=\"pl-k\">class<\/span> <span class=\"pl-en\">SkipLocalsInitAttribute<\/span> : <span class=\"pl-en\">Attribute<\/span> { }\r\n}<\/pre>\n<\/div>\n<p>This attribute is recognized by the C# compiler and is used to tell the compiler to not emit the <code>.locals init<\/code> when it otherwise would have.  If we make a small tweak to the previous example, adding the attribute to the whole module:<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">using<\/span> <span class=\"pl-en\">System<\/span>;\r\n<span class=\"pl-k\">using<\/span> <span class=\"pl-en\">System<\/span>.<span class=\"pl-en\">Runtime<\/span>.<span class=\"pl-en\">CompilerServices<\/span>;\r\n<span class=\"pl-k\">using<\/span> <span class=\"pl-en\">System<\/span>.<span class=\"pl-en\">Threading<\/span>;\r\n\r\n[<span class=\"pl-k\">module<\/span>: <span class=\"pl-en\">SkipLocalsInit<\/span>]\r\n\r\n<span class=\"pl-k\">unsafe<\/span> <span class=\"pl-k\">class<\/span> <span class=\"pl-en\">Program<\/span>\r\n{\r\n    <span class=\"pl-k\">static<\/span> <span class=\"pl-k\">void<\/span> <span class=\"pl-en\">Main<\/span>()\r\n    {\r\n        <span class=\"pl-k\">while<\/span> (<span class=\"pl-c1\">true<\/span>)\r\n        {\r\n            <span class=\"pl-en\">Example<\/span>();\r\n            <span class=\"pl-smi\">Thread<\/span>.<span class=\"pl-en\">Sleep<\/span>(<span class=\"pl-c1\">1<\/span>);\r\n        }\r\n    }\r\n\r\n    [<span class=\"pl-en\">MethodImpl<\/span>(<span class=\"pl-smi\">MethodImplOptions<\/span>.<span class=\"pl-smi\">NoInlining<\/span>)]\r\n    <span class=\"pl-k\">static<\/span> <span class=\"pl-k\">void<\/span> <span class=\"pl-en\">Example<\/span>()\r\n    {\r\n        <span class=\"pl-en\">Guid<\/span> <span class=\"pl-smi\">g<\/span>;\r\n        <span class=\"pl-smi\">Console<\/span>.<span class=\"pl-en\">WriteLine<\/span>(<span class=\"pl-k\">*<\/span><span class=\"pl-k\">&amp;<\/span><span class=\"pl-smi\">g<\/span>);\r\n    }\r\n}<\/pre>\n<\/div>\n<p>you should now see different results, in particular you should very likely see non-zero <code>Guid<\/code>s.  As of <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/37541\">dotnet\/runtime#37541<\/a>, the core libraries in .NET 5 all use this attribute now to disable <code>.locals init<\/code> (in previous releases, <code>.locals init<\/code> was stripped out by a post-compilation step employed when building the core libraries).  Note that the C# compiler only allows <code>SkipLocalsInit<\/code> to be used in <code>unsafe<\/code> contexts, because it can easily result in corruption in code that hasn&#8217;t been appropriately validated for its use (so be thoughtful if \/ when you apply it).<\/p>\n<p>In addition to making zero&#8217;ing faster, there also have been changes to remove the zero&#8217;ing entirely.  For example, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/31960\">dotnet\/runtime#31960<\/a>, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/36918\">dotnet\/runtime#36918<\/a>, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/37786\">dotnet\/runtime#37786<\/a>, and <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/38314\">dotnet\/runtime#38314<\/a> all contributed to removing zero&#8217;ing when the JIT could prove it to be duplicative.<\/p>\n<p>Such zero&#8217;ing is an example of a tax incurred for managed code, with the runtime needing it in order to provide guarantees of its model and of the requirements of the languages above it.  Another such tax is bounds checking.  One of the great advantages of using managed code is that a whole class of potential security vulnerabilities are made irrelevant by default.  The runtime ensures that indexing into arrays, strings, and spans is bounds-checked, meaning the runtime injects checks to ensure that the index being requested is within the bounds of the data being indexed (i.e. greater than or equal to zero and less then the length of the data).  Here&#8217;s a simple example:<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">public<\/span> <span class=\"pl-k\">static<\/span> <span class=\"pl-k\">char<\/span> <span class=\"pl-en\">Get<\/span>(<span class=\"pl-k\">string<\/span> <span class=\"pl-smi\">s<\/span>, <span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">i<\/span>) <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">s<\/span>[<span class=\"pl-smi\">i<\/span>];<\/pre>\n<\/div>\n<p>For this code to be safe, the runtime needs to generate a check that <code>i<\/code> falls within the bounds of string <code>s<\/code>, which the JIT does by using assembly like the following:<\/p>\n<div class=\"highlight highlight-source-assembly\">\n<pre><span class=\"pl-c\">; Program.Get(System.String, Int32)<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">sub<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">rsp<\/span><span class=\"pl-s1\">,<\/span><span class=\"pl-c1\">28<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">cmp<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">edx<\/span><span class=\"pl-s1\">,[<\/span><span class=\"pl-v\">rcx<\/span><span class=\"pl-s1\">+<\/span><span class=\"pl-c1\">8<\/span><span class=\"pl-s1\">]<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">jae<\/span><span class=\"pl-en\">       short M01_L00<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">movsxd<\/span><span class=\"pl-en\">    <\/span><span class=\"pl-v\">rax<\/span><span class=\"pl-s1\">,<\/span><span class=\"pl-v\">edx<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">movzx<\/span><span class=\"pl-en\">     <\/span><span class=\"pl-v\">eax<\/span><span class=\"pl-s1\">,<\/span><span class=\"pl-en\">word ptr <\/span><span class=\"pl-s1\">[<\/span><span class=\"pl-v\">rcx<\/span><span class=\"pl-s1\">+<\/span><span class=\"pl-v\">rax<\/span><span class=\"pl-s1\">*<\/span><span class=\"pl-c1\">2<\/span><span class=\"pl-s1\">+<\/span><span class=\"pl-en\">0C<\/span><span class=\"pl-s1\">]<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">add<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">rsp<\/span><span class=\"pl-s1\">,<\/span><span class=\"pl-c1\">28<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">ret<\/span>\r\n<span class=\"pl-en\">M01_L00:<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">call<\/span><span class=\"pl-en\">      CORINFO_HELP_RNGCHKFAIL<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">int<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-c1\">3<\/span>\r\n<span class=\"pl-c\">; Total bytes of code 28<\/span><\/pre>\n<\/div>\n<p>This assembly was generated via a handy feature of Benchmark.NET: add <code>[DisassemblyDiagnoser]<\/code> to the class containing the benchmarks, and it spits out the disassembled assembly code.  We can see that the assembly takes the string (passed via the <code>rcx<\/code> register) and loads the string&#8217;s length (which is stored 8 bytes into the object, hence the <code>[rcx+8]<\/code>), comparing that with <code>i<\/code> passed in the <code>edx<\/code> register, and if with an unsigned comparison (unsigned so that any negative values wrap around to be larger than the length) <code>i<\/code> is greater than or equal to the length, jumping to a helper <code>COREINFO_HELP_RNGCHKFAIL<\/code> that throws an exception.  Just a few instructions, but certain kinds of code can spend a lot of cycles indexing, and thus it&#8217;s helpful when the JIT can eliminate as many of the bounds checks as it can prove to be unnecessary.<\/p>\n<p>The JIT has already been capable of removing bounds checks in a variety of situations.  For example, when you write the loop:<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">int<\/span>[] <span class=\"pl-smi\">arr<\/span> <span class=\"pl-k\">=<\/span> ...;\r\n<span class=\"pl-k\">for<\/span> (<span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">0<\/span>; <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">&lt;<\/span> <span class=\"pl-smi\">arr<\/span>.<span class=\"pl-smi\">Length<\/span>; <span class=\"pl-smi\">i<\/span><span class=\"pl-k\">++<\/span>)\r\n    <span class=\"pl-en\">Use<\/span>(<span class=\"pl-smi\">arr<\/span>[<span class=\"pl-smi\">i<\/span>]);<\/pre>\n<\/div>\n<p>the JIT can prove that <code>i<\/code> will never be outside the bounds of the array, and so it can elide the bounds checks it would otherwise generate.  In .NET 5, it can remove bounds checking in more places.  For example, consider this function that writes the bytes of an integer as characters to a span:<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">private<\/span> <span class=\"pl-k\">static<\/span> <span class=\"pl-k\">bool<\/span> <span class=\"pl-en\">TryToHex<\/span>(<span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">value<\/span>, <span class=\"pl-en\">Span<\/span>&lt;<span class=\"pl-k\">char<\/span>&gt; <span class=\"pl-smi\">span<\/span>)\r\n{\r\n    <span class=\"pl-k\">if<\/span> ((<span class=\"pl-k\">uint<\/span>)<span class=\"pl-smi\">span<\/span>.<span class=\"pl-smi\">Length<\/span> <span class=\"pl-k\">&lt;=<\/span> <span class=\"pl-c1\">7<\/span>)\r\n        <span class=\"pl-k\">return<\/span> <span class=\"pl-c1\">false<\/span>;\r\n\r\n    <span class=\"pl-en\">ReadOnlySpan<\/span>&lt;<span class=\"pl-k\">byte<\/span>&gt; <span class=\"pl-smi\">map<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-k\">byte<\/span>[] { (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'0'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'1'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'2'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'3'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'4'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'5'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'6'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'7'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'8'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'9'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'A'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'B'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'C'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'D'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'E'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'F'<\/span> }; ;\r\n    <span class=\"pl-smi\">span<\/span>[<span class=\"pl-c1\">0<\/span>] <span class=\"pl-k\">=<\/span> (<span class=\"pl-k\">char<\/span>)<span class=\"pl-smi\">map<\/span>[(<span class=\"pl-smi\">value<\/span> <span class=\"pl-k\">&gt;&gt;<\/span> <span class=\"pl-c1\">28<\/span>) <span class=\"pl-k\">&amp;<\/span> <span class=\"pl-c1\">0xF<\/span>];\r\n    <span class=\"pl-smi\">span<\/span>[<span class=\"pl-c1\">1<\/span>] <span class=\"pl-k\">=<\/span> (<span class=\"pl-k\">char<\/span>)<span class=\"pl-smi\">map<\/span>[(<span class=\"pl-smi\">value<\/span> <span class=\"pl-k\">&gt;&gt;<\/span> <span class=\"pl-c1\">24<\/span>) <span class=\"pl-k\">&amp;<\/span> <span class=\"pl-c1\">0xF<\/span>];\r\n    <span class=\"pl-smi\">span<\/span>[<span class=\"pl-c1\">2<\/span>] <span class=\"pl-k\">=<\/span> (<span class=\"pl-k\">char<\/span>)<span class=\"pl-smi\">map<\/span>[(<span class=\"pl-smi\">value<\/span> <span class=\"pl-k\">&gt;&gt;<\/span> <span class=\"pl-c1\">20<\/span>) <span class=\"pl-k\">&amp;<\/span> <span class=\"pl-c1\">0xF<\/span>];\r\n    <span class=\"pl-smi\">span<\/span>[<span class=\"pl-c1\">3<\/span>] <span class=\"pl-k\">=<\/span> (<span class=\"pl-k\">char<\/span>)<span class=\"pl-smi\">map<\/span>[(<span class=\"pl-smi\">value<\/span> <span class=\"pl-k\">&gt;&gt;<\/span> <span class=\"pl-c1\">16<\/span>) <span class=\"pl-k\">&amp;<\/span> <span class=\"pl-c1\">0xF<\/span>];\r\n    <span class=\"pl-smi\">span<\/span>[<span class=\"pl-c1\">4<\/span>] <span class=\"pl-k\">=<\/span> (<span class=\"pl-k\">char<\/span>)<span class=\"pl-smi\">map<\/span>[(<span class=\"pl-smi\">value<\/span> <span class=\"pl-k\">&gt;&gt;<\/span> <span class=\"pl-c1\">12<\/span>) <span class=\"pl-k\">&amp;<\/span> <span class=\"pl-c1\">0xF<\/span>];\r\n    <span class=\"pl-smi\">span<\/span>[<span class=\"pl-c1\">5<\/span>] <span class=\"pl-k\">=<\/span> (<span class=\"pl-k\">char<\/span>)<span class=\"pl-smi\">map<\/span>[(<span class=\"pl-smi\">value<\/span> <span class=\"pl-k\">&gt;&gt;<\/span> <span class=\"pl-c1\">8<\/span>) <span class=\"pl-k\">&amp;<\/span> <span class=\"pl-c1\">0xF<\/span>];\r\n    <span class=\"pl-smi\">span<\/span>[<span class=\"pl-c1\">6<\/span>] <span class=\"pl-k\">=<\/span> (<span class=\"pl-k\">char<\/span>)<span class=\"pl-smi\">map<\/span>[(<span class=\"pl-smi\">value<\/span> <span class=\"pl-k\">&gt;&gt;<\/span> <span class=\"pl-c1\">4<\/span>) <span class=\"pl-k\">&amp;<\/span> <span class=\"pl-c1\">0xF<\/span>];\r\n    <span class=\"pl-smi\">span<\/span>[<span class=\"pl-c1\">7<\/span>] <span class=\"pl-k\">=<\/span> (<span class=\"pl-k\">char<\/span>)<span class=\"pl-smi\">map<\/span>[<span class=\"pl-smi\">value<\/span> <span class=\"pl-k\">&amp;<\/span> <span class=\"pl-c1\">0xF<\/span>];\r\n    <span class=\"pl-k\">return<\/span> <span class=\"pl-c1\">true<\/span>;\r\n}\r\n\r\n<span class=\"pl-k\">private<\/span> <span class=\"pl-k\">char<\/span>[] <span class=\"pl-smi\">_buffer<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-k\">char<\/span>[<span class=\"pl-c1\">100<\/span>];\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">bool<\/span> <span class=\"pl-en\">BoundsChecking<\/span>() <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-en\">TryToHex<\/span>(<span class=\"pl-smi\">int<\/span>.<span class=\"pl-smi\">MaxValue<\/span>, <span class=\"pl-smi\">_buffer<\/span>);<\/pre>\n<\/div>\n<p>First, in this example it&#8217;s worth noting we&#8217;re relying on a C# compiler optimization.  Note the:<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-en\">ReadOnlySpan<\/span>&lt;<span class=\"pl-k\">byte<\/span>&gt; <span class=\"pl-smi\">map<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-k\">byte<\/span>[] { (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'0'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'1'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'2'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'3'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'4'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'5'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'6'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'7'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'8'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'9'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'A'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'B'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'C'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'D'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'E'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'F'<\/span> };<\/pre>\n<\/div>\n<p>That looks terribly expensive, like we&#8217;re allocating a byte array on each call to <code>TryToHex<\/code>.  In fact, it&#8217;s not, and it&#8217;s actually better than if we had done:<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">private<\/span> <span class=\"pl-k\">static<\/span> <span class=\"pl-k\">readonly<\/span> <span class=\"pl-k\">byte<\/span>[] <span class=\"pl-smi\">s_map<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-k\">byte<\/span>[] { (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'0'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'1'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'2'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'3'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'4'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'5'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'6'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'7'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'8'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'9'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'A'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'B'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'C'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'D'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'E'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'F'<\/span> };\r\n...\r\n<span class=\"pl-en\">ReadOnlySpan<\/span>&lt;<span class=\"pl-k\">byte<\/span>&gt; <span class=\"pl-smi\">map<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-smi\">s_map<\/span>;<\/pre>\n<\/div>\n<p>The C# compiler recognizes the pattern of a new byte array being assigned directly to a <code>ReadOnlySpan&lt;byte&gt;<\/code> (it also recognizes <code>sbyte<\/code> and <code>bool<\/code>, but nothing larger than a byte because of endianness concerns).  Because the array nature is then completely hidden by the span, the C# compiler emits that by actually storing the bytes into the assembly&#8217;s data section, and the span is just created by wrapping it around a pointer to the static data and the length:<\/p>\n<div class=\"highlight highlight-source-assembly\">\n<pre><span class=\"pl-en\">IL_000c: ldsflda valuetype <\/span><span class=\"pl-s\">'&lt;PrivateImplementationDetails&gt;'<\/span><span class=\"pl-en\">\/<\/span><span class=\"pl-s\">'__StaticArrayInitTypeSize=16'<\/span><span class=\"pl-en\"> <\/span><span class=\"pl-s\">'&lt;PrivateImplementationDetails&gt;'<\/span><span class=\"pl-en\">::<\/span><span class=\"pl-s\">'2125B2C332B1113AAE9BFC5E9F7E3B4C91D828CB942C2DF1EEB02502ECCAE9E9'<\/span>\r\n<span class=\"pl-en\">IL_0011: ldc.i4.s <\/span><span class=\"pl-c1\">16<\/span>\r\n<span class=\"pl-en\">IL_0013: newobj instance void valuetype <\/span><span class=\"pl-s1\">[<\/span><span class=\"pl-en\">System.Runtime<\/span><span class=\"pl-s1\">]<\/span><span class=\"pl-en\">System.ReadOnlySpan'<\/span><span class=\"pl-c1\">1<\/span><span class=\"pl-en\">&lt;uint8&gt;::.ctor(void<\/span><span class=\"pl-s1\">*,<\/span><span class=\"pl-en\"> int32)<\/span><\/pre>\n<\/div>\n<p>This is important for this JIT discussion, because of that <code>ldc.i4.s 16<\/code> in the above.  That&#8217;s the IL loading the length of 16 to use to create the span, and the JIT can see that.  It knows then that the span has a length of 16, which means if it can prove that an access is always to a value greater than or equal to 0 and less than 16, it needn&#8217;t bounds check that access. <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/1644\">dotnet\/runtime#1644<\/a> did exactly that, recognizing patterns like <code>array[index % const]<\/code>, and eliding the bounds check when the <code>const<\/code> was less than or equal to the length.  In the previous <code>TryToHex<\/code> example, the JIT can see that the <code>map<\/code> span has a length of 16, and it can see that all of the indexing into it is done with <code>&amp; 0xF<\/code>, meaning all values will end up being in range, and thus it can eliminate all of the bounds checks on <code>map<\/code>.  Combine that with the fact that it could already see that no bounds checking is needed on the writes into the <code>span<\/code> (because it could see the length check earlier in the method guarded all of the indexing into <code>span<\/code>), and this whole method is bounds-check-free in .NET 5.  On my machine, this benchmark yields results like the following:<\/p>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<th align=\"right\">Code Size<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>BoundsChecking<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">14.466 ns<\/td>\n<td align=\"right\">1.00<\/td>\n<td align=\"right\">830 B<\/td>\n<\/tr>\n<tr>\n<td>BoundsChecking<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">4.264 ns<\/td>\n<td align=\"right\">0.29<\/td>\n<td align=\"right\">320 B<\/td>\n<\/tr>\n<tr>\n<td>BoundsChecking<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">3.641 ns<\/td>\n<td align=\"right\">0.25<\/td>\n<td align=\"right\">249 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Note the .NET 5 run is not only 15% faster than the .NET Core 3.1 run, we can see its assembly code size is 22% smaller (the extra &#8220;Code Size&#8221; column comes from my having added <code>[DisassemblyDiagnoser]<\/code> to the benchmark class).<\/p>\n<p>Another nice bounds checking removal comes from <a href=\"https:\/\/github.com\/nathan-moore\">@nathan-moore<\/a> in <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/36263\">dotnet\/runtime#36263<\/a>.  I mentioned that the JIT is already able to remove bounds checking for the very common pattern of iterating from 0 to the array, string, or span&#8217;s length, but there are variations on this that are also relatively common but that weren&#8217;t previously recognized.  For example, consider this microbenchmark which calls a method that detects whether a span of integers is sorted:<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">private<\/span> <span class=\"pl-k\">int<\/span>[] <span class=\"pl-smi\">_array<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-smi\">Enumerable<\/span>.<span class=\"pl-en\">Range<\/span>(<span class=\"pl-c1\">0<\/span>, <span class=\"pl-c1\">1000<\/span>).<span class=\"pl-en\">ToArray<\/span>();\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">bool<\/span> <span class=\"pl-en\">IsSorted<\/span>() <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-en\">IsSorted<\/span>(<span class=\"pl-smi\">_array<\/span>);\r\n\r\n<span class=\"pl-k\">private<\/span> <span class=\"pl-k\">static<\/span> <span class=\"pl-k\">bool<\/span> <span class=\"pl-en\">IsSorted<\/span>(<span class=\"pl-en\">ReadOnlySpan<\/span>&lt;<span class=\"pl-k\">int<\/span>&gt; <span class=\"pl-smi\">span<\/span>)\r\n{\r\n    <span class=\"pl-k\">for<\/span> (<span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">0<\/span>; <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">&lt;<\/span> <span class=\"pl-smi\">span<\/span>.<span class=\"pl-smi\">Length<\/span> <span class=\"pl-k\">-<\/span> <span class=\"pl-c1\">1<\/span>; <span class=\"pl-smi\">i<\/span><span class=\"pl-k\">++<\/span>)\r\n        <span class=\"pl-k\">if<\/span> (<span class=\"pl-smi\">span<\/span>[<span class=\"pl-smi\">i<\/span>] <span class=\"pl-k\">&gt;<\/span> <span class=\"pl-smi\">span<\/span>[<span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">+<\/span> <span class=\"pl-c1\">1<\/span>])\r\n            <span class=\"pl-k\">return<\/span> <span class=\"pl-c1\">false<\/span>;\r\n\r\n    <span class=\"pl-k\">return<\/span> <span class=\"pl-c1\">true<\/span>;\r\n}<\/pre>\n<\/div>\n<p>This slight variation from the recognized pattern was enough previously to prevent the JIT from eliding the bounds checks.  Not anymore.  .NET 5 on my machine is able to execute this 20% faster:<\/p>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<th align=\"right\">Code Size<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>IsSorted<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">1,083.8 ns<\/td>\n<td align=\"right\">1.00<\/td>\n<td align=\"right\">236 B<\/td>\n<\/tr>\n<tr>\n<td>IsSorted<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">581.2 ns<\/td>\n<td align=\"right\">0.54<\/td>\n<td align=\"right\">136 B<\/td>\n<\/tr>\n<tr>\n<td>IsSorted<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">463.0 ns<\/td>\n<td align=\"right\">0.43<\/td>\n<td align=\"right\">105 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Another case where the JIT ensures checks are in place for a category of error are null checks.  The JIT does this in coordination with the runtime, with the JIT ensuring appropriate instructions are in place to incur hardware exceptions and with the runtime then translating such faults into .NET exceptions (e.g. <a href=\"https:\/\/github.com\/dotnet\/runtime\/blob\/9df02475e09859a8d24852011cf3515f7a665670\/src\/coreclr\/src\/vm\/excep.cpp#L3073\">here<\/a>).  But sometimes instructions are necessary only for null checks rather than also accomplishing other necessary functionality, and as long as the required null check happens due to some instruction, the unnecessary duplicative ones can be removed.  Consider this code:<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">private<\/span> (<span class=\"pl-k\">int<\/span> <span class=\"pl-en\">i<\/span>, <span class=\"pl-k\">int<\/span> <span class=\"pl-en\">j<\/span>) <span class=\"pl-smi\">_value<\/span>;\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">int<\/span> <span class=\"pl-en\">NullCheck<\/span>() <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">_value<\/span>.<span class=\"pl-smi\">j<\/span><span class=\"pl-k\">++<\/span>;<\/pre>\n<\/div>\n<p>As a runnable benchmark, this does too little work to accurately measure with Benchmark.NET, but it&#8217;s a great way to see what assembly code is generated.  With .NET Core 3.1, this method results in this assembly:<\/p>\n<div class=\"highlight highlight-source-assembly\">\n<pre><span class=\"pl-c\">; Program.NullCheck()<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">nop<\/span><span class=\"pl-en\">       dword ptr <\/span><span class=\"pl-s1\">[<\/span><span class=\"pl-v\">rax<\/span><span class=\"pl-s1\">+<\/span><span class=\"pl-v\">rax<\/span><span class=\"pl-s1\">]<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">cmp<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-s1\">[<\/span><span class=\"pl-v\">rcx<\/span><span class=\"pl-s1\">],<\/span><span class=\"pl-v\">ecx<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">add<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">rcx<\/span><span class=\"pl-s1\">,<\/span><span class=\"pl-c1\">8<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">add<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">rcx<\/span><span class=\"pl-s1\">,<\/span><span class=\"pl-c1\">4<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">mov<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">eax<\/span><span class=\"pl-s1\">,[<\/span><span class=\"pl-v\">rcx<\/span><span class=\"pl-s1\">]<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">lea<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">edx<\/span><span class=\"pl-s1\">,[<\/span><span class=\"pl-v\">rax<\/span><span class=\"pl-s1\">+<\/span><span class=\"pl-c1\">1<\/span><span class=\"pl-s1\">]<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">mov<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-s1\">[<\/span><span class=\"pl-v\">rcx<\/span><span class=\"pl-s1\">],<\/span><span class=\"pl-v\">edx<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">ret<\/span>\r\n<span class=\"pl-c\">; Total bytes of code 23<\/span><\/pre>\n<\/div>\n<p>That <code>cmp [rcx],ecx<\/code> instruction is performing a null check on <code>this<\/code> as part of calculating the address of <code>j<\/code>.  Then the <code>mov eax,[rcx]<\/code> instruction is performing another null check as part of dereferencing <code>j<\/code>&#8216;s location.  That first null check is thus not actually necessary, with the instruction not providing any other benefits.  So, thanks to PRs like <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/1735\">dotnet\/runtime#1735<\/a> and <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/32641\">dotnet\/runtime#32641<\/a>, such duplication is recognized by the JIT in many more cases than before, and for .NET 5 we now end up with:<\/p>\n<div class=\"highlight highlight-source-assembly\">\n<pre><span class=\"pl-c\">; Program.NullCheck()<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">add<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">rcx<\/span><span class=\"pl-s1\">,<\/span><span class=\"pl-en\">0C<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">mov<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">eax<\/span><span class=\"pl-s1\">,[<\/span><span class=\"pl-v\">rcx<\/span><span class=\"pl-s1\">]<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">lea<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">edx<\/span><span class=\"pl-s1\">,[<\/span><span class=\"pl-v\">rax<\/span><span class=\"pl-s1\">+<\/span><span class=\"pl-c1\">1<\/span><span class=\"pl-s1\">]<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">mov<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-s1\">[<\/span><span class=\"pl-v\">rcx<\/span><span class=\"pl-s1\">],<\/span><span class=\"pl-v\">edx<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">ret<\/span>\r\n<span class=\"pl-c\">; Total bytes of code 12<\/span><\/pre>\n<\/div>\n<p>Covariance is another case where the JIT needs to inject checks to ensure that a developer can&#8217;t accidentally break type or memory safety.  Consider code like:<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">class<\/span> <span class=\"pl-en\">A<\/span> { }\r\n<span class=\"pl-k\">class<\/span> <span class=\"pl-en\">B<\/span> { }\r\n<span class=\"pl-k\">object<\/span>[] <span class=\"pl-smi\">arr<\/span> <span class=\"pl-k\">=<\/span> ...;\r\n<span class=\"pl-smi\">arr<\/span>[<span class=\"pl-c1\">0<\/span>] <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">A<\/span>();<\/pre>\n<\/div>\n<p>Is this code valid?  It depends.  Arrays in .NET are &#8220;covariant&#8221;, which means I can pass around an array <code>DerivedType[]<\/code> as a <code>BaseType[]<\/code>, where <code>DerivedType<\/code> derives from <code>BaseType<\/code>.  That means in this example, the <code>arr<\/code> could have been constructed as <code>new A[1]<\/code> or <code>new object[1]<\/code> or <code>new B[1]<\/code>. This code should run fine with the first two, but if the <code>arr<\/code> is actually a <code>B[]<\/code>, trying to store an <code>A<\/code> instance into it must fail; otherwise, code that&#8217;s using the array as a <code>B[]<\/code> could try to use <code>B[0]<\/code> as a <code>B<\/code> and things could go badly quickly.  So, the runtime needs to protect against this by doing covariance checking, which really means when a reference type instance is stored into an array, the runtime needs to check that the assigned type is in fact compatible with the concrete type of the array.  With <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/189\">dotnet\/runtime#189<\/a>, the JIT is now able to eliminate more covariance checks, specifically in the case where the element type of the array is sealed, like <code>string<\/code>.  As a result of this, a microbenchmark like this now runs faster:<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">private<\/span> <span class=\"pl-k\">string<\/span>[] <span class=\"pl-smi\">_array<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-k\">string<\/span>[<span class=\"pl-c1\">1000<\/span>];\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">void<\/span> <span class=\"pl-en\">CovariantChecking<\/span>()\r\n{\r\n    <span class=\"pl-k\">string<\/span>[] <span class=\"pl-smi\">array<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-smi\">_array<\/span>;\r\n    <span class=\"pl-k\">for<\/span> (<span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">0<\/span>; <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">&lt;<\/span> <span class=\"pl-smi\">array<\/span>.<span class=\"pl-smi\">Length<\/span>; <span class=\"pl-smi\">i<\/span><span class=\"pl-k\">++<\/span>)\r\n        <span class=\"pl-smi\">array<\/span>[<span class=\"pl-smi\">i<\/span>] <span class=\"pl-k\">=<\/span> <span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>default<span class=\"pl-pds\">\"<\/span><\/span>;\r\n}<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<th align=\"right\">Code Size<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>CovariantChecking<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">2.121 us<\/td>\n<td align=\"right\">1.00<\/td>\n<td align=\"right\">57 B<\/td>\n<\/tr>\n<tr>\n<td>CovariantChecking<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">2.122 us<\/td>\n<td align=\"right\">1.00<\/td>\n<td align=\"right\">57 B<\/td>\n<\/tr>\n<tr>\n<td>CovariantChecking<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">1.666 us<\/td>\n<td align=\"right\">0.79<\/td>\n<td align=\"right\">52 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Related to this are type checks.  I mentioned earlier that <code>Span&lt;T&gt;<\/code> solved a bunch of problems but also introduced new patterns that then drove improvements in other areas of the system; that goes as well for the implementation of <code>Span&lt;T&gt;<\/code> itself.  <code>Span&lt;T&gt;<\/code>&#8216;s constructor does a covariance check that requires a <code>T[]<\/code> to actually be a <code>T[]<\/code> and not a <code>U[]<\/code> where <code>U<\/code> derives from <code>T<\/code>, e.g. this program:<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">using<\/span> <span class=\"pl-en\">System<\/span>;\r\n\r\n<span class=\"pl-k\">class<\/span> <span class=\"pl-en\">Program<\/span>\r\n{\r\n    <span class=\"pl-k\">static<\/span> <span class=\"pl-k\">void<\/span> <span class=\"pl-en\">Main<\/span>() <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">Span<\/span>&lt;<span class=\"pl-en\">A<\/span>&gt;(<span class=\"pl-k\">new<\/span> <span class=\"pl-en\">B<\/span>[<span class=\"pl-c1\">42<\/span>]);\r\n}\r\n\r\n<span class=\"pl-k\">class<\/span> <span class=\"pl-en\">A<\/span> { }\r\n<span class=\"pl-k\">class<\/span> <span class=\"pl-en\">B<\/span> : <span class=\"pl-en\">A<\/span> { }<\/pre>\n<\/div>\n<p>will result in an exception:<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-smi\">System<\/span>.<span class=\"pl-smi\">ArrayTypeMismatchException<\/span>: <span class=\"pl-smi\">Attempted<\/span> <span class=\"pl-smi\">to<\/span> <span class=\"pl-smi\">access<\/span> <span class=\"pl-smi\">an<\/span> <span class=\"pl-smi\">element<\/span> <span class=\"pl-k\">as<\/span> <span class=\"pl-en\">a<\/span> <span class=\"pl-smi\">type<\/span> <span class=\"pl-smi\">incompatible<\/span> <span class=\"pl-smi\">with<\/span> <span class=\"pl-smi\">the<\/span> <span class=\"pl-smi\">array<\/span>.<\/pre>\n<\/div>\n<p>That exception stems from <a href=\"https:\/\/github.com\/dotnet\/runtime\/blob\/f170db722be6fb695ca229bcbe46be0caa8b3a48\/src\/libraries\/System.Private.CoreLib\/src\/System\/Span.cs#L46-L47\">this check<\/a> in <code>Span&lt;T&gt;<\/code>&#8216;s constructor:<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">if<\/span> (<span class=\"pl-k\">!<\/span><span class=\"pl-k\">typeof<\/span>(<span class=\"pl-en\">T<\/span>).<span class=\"pl-smi\">IsValueType<\/span> <span class=\"pl-k\">&amp;&amp;<\/span> <span class=\"pl-smi\">array<\/span>.<span class=\"pl-en\">GetType<\/span>() <span class=\"pl-k\">!=<\/span> <span class=\"pl-k\">typeof<\/span>(<span class=\"pl-en\">T<\/span>[]))\r\n    <span class=\"pl-smi\">ThrowHelper<\/span>.<span class=\"pl-en\">ThrowArrayTypeMismatchException<\/span>();<\/pre>\n<\/div>\n<p>PR <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/32790\">dotnet\/runtime#32790<\/a> optimized just such a <code>array.GetType() != typeof(T[])<\/code> check when <code>T<\/code> is sealed, while <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/1157\">dotnet\/runtime#1157<\/a> recognizes the <code>typeof(T).IsValueType<\/code> pattern and replaces it with a constant value (PR <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/1195\">dotnet\/runtime#1195<\/a> does the same for <code>typeof(T1).IsAssignableFrom(typeof(T2))<\/code>).  The net effect of that is huge improvement on a microbenchmark like this:<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">class<\/span> <span class=\"pl-en\">A<\/span> { }\r\n<span class=\"pl-k\">sealed<\/span> <span class=\"pl-k\">class<\/span> <span class=\"pl-en\">B<\/span> : <span class=\"pl-en\">A<\/span> { }\r\n\r\n<span class=\"pl-k\">private<\/span> <span class=\"pl-en\">B<\/span>[] <span class=\"pl-smi\">_array<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">B<\/span>[<span class=\"pl-c1\">42<\/span>];\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">int<\/span> <span class=\"pl-en\">Ctor<\/span>() <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">Span<\/span>&lt;<span class=\"pl-en\">B<\/span>&gt;(<span class=\"pl-smi\">_array<\/span>).<span class=\"pl-smi\">Length<\/span>;<\/pre>\n<\/div>\n<p>for which I get results like:<\/p>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<th align=\"right\">Code Size<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Ctor<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">48.8670 ns<\/td>\n<td align=\"right\">1.00<\/td>\n<td align=\"right\">66 B<\/td>\n<\/tr>\n<tr>\n<td>Ctor<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">7.6695 ns<\/td>\n<td align=\"right\">0.16<\/td>\n<td align=\"right\">66 B<\/td>\n<\/tr>\n<tr>\n<td>Ctor<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">0.4959 ns<\/td>\n<td align=\"right\">0.01<\/td>\n<td align=\"right\">17 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The explanation of the difference is obvious when looking at the generated assembly, even when not completely versed in assembly code.  Here&#8217;s what the <code>[DisassemblyDiagnoser]<\/code> shows was generated on .NET Core 3.1:<\/p>\n<div class=\"highlight highlight-source-assembly\">\n<pre><span class=\"pl-c\">; Program.Ctor()<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">push<\/span><span class=\"pl-en\">      <\/span><span class=\"pl-v\">rdi<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">push<\/span><span class=\"pl-en\">      <\/span><span class=\"pl-v\">rsi<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">sub<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">rsp<\/span><span class=\"pl-s1\">,<\/span><span class=\"pl-c1\">28<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">mov<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">rsi<\/span><span class=\"pl-s1\">,[<\/span><span class=\"pl-v\">rcx<\/span><span class=\"pl-s1\">+<\/span><span class=\"pl-c1\">8<\/span><span class=\"pl-s1\">]<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">test<\/span><span class=\"pl-en\">      <\/span><span class=\"pl-v\">rsi<\/span><span class=\"pl-s1\">,<\/span><span class=\"pl-v\">rsi<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">jne<\/span><span class=\"pl-en\">       short M00_L00<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">xor<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">eax<\/span><span class=\"pl-s1\">,<\/span><span class=\"pl-v\">eax<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">jmp<\/span><span class=\"pl-en\">       short M00_L01<\/span>\r\n<span class=\"pl-en\">M00_L00:<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">mov<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">rcx<\/span><span class=\"pl-s1\">,<\/span><span class=\"pl-v\">rsi<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">call<\/span><span class=\"pl-en\">      System.Object.GetType()<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">mov<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">rdi<\/span><span class=\"pl-s1\">,<\/span><span class=\"pl-v\">rax<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">mov<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">rcx<\/span><span class=\"pl-s1\">,<\/span><span class=\"pl-en\">7FFE4B2D18AA<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">call<\/span><span class=\"pl-en\">      CORINFO_HELP_TYPEHANDLE_TO_RUNTIMETYPE<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">cmp<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">rdi<\/span><span class=\"pl-s1\">,<\/span><span class=\"pl-v\">rax<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">jne<\/span><span class=\"pl-en\">       short M00_L02<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">mov<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">eax<\/span><span class=\"pl-s1\">,[<\/span><span class=\"pl-v\">rsi<\/span><span class=\"pl-s1\">+<\/span><span class=\"pl-c1\">8<\/span><span class=\"pl-s1\">]<\/span>\r\n<span class=\"pl-en\">M00_L01:<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">add<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">rsp<\/span><span class=\"pl-s1\">,<\/span><span class=\"pl-c1\">28<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">pop<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">rsi<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">pop<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">rdi<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">ret<\/span>\r\n<span class=\"pl-en\">M00_L02:<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">call<\/span><span class=\"pl-en\">      System.ThrowHelper.ThrowArrayTypeMismatchException()<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">int<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-c1\">3<\/span>\r\n<span class=\"pl-c\">; Total bytes of code 66<\/span><\/pre>\n<\/div>\n<p>and here&#8217;s what it shows for .NET 5:<\/p>\n<div class=\"highlight highlight-source-assembly\">\n<pre><span class=\"pl-c\">; Program.Ctor()<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">mov<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">rax<\/span><span class=\"pl-s1\">,[<\/span><span class=\"pl-v\">rcx<\/span><span class=\"pl-s1\">+<\/span><span class=\"pl-c1\">8<\/span><span class=\"pl-s1\">]<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">test<\/span><span class=\"pl-en\">      <\/span><span class=\"pl-v\">rax<\/span><span class=\"pl-s1\">,<\/span><span class=\"pl-v\">rax<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">jne<\/span><span class=\"pl-en\">       short M00_L00<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">xor<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">eax<\/span><span class=\"pl-s1\">,<\/span><span class=\"pl-v\">eax<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">jmp<\/span><span class=\"pl-en\">       short M00_L01<\/span>\r\n<span class=\"pl-en\">M00_L00:<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">mov<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">eax<\/span><span class=\"pl-s1\">,[<\/span><span class=\"pl-v\">rax<\/span><span class=\"pl-s1\">+<\/span><span class=\"pl-c1\">8<\/span><span class=\"pl-s1\">]<\/span>\r\n<span class=\"pl-en\">M00_L01:<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">ret<\/span>\r\n<span class=\"pl-c\">; Total bytes of code 17<\/span><\/pre>\n<\/div>\n<p>As another example, in the GC discussion earlier I called out a bunch of benefits we&#8217;ve experienced from porting native runtime code to be managed C# code.  One that I didn&#8217;t mention then but will now is that it&#8217;s resulted in us making other improvements in the system that addressed key blockers to such porting but that then also serve to improve many other cases.  A good example of that is <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/38229\">dotnet\/runtime#38229<\/a>.  When we first moved the native array sorting implementation to managed, we inadvertently incurred a regression for floating-point values, a regression that was helpfully spotted by <a href=\"https:\/\/github.com\/nietras\">@nietras<\/a> and which was subsequently fixed in <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/37941\">dotnet\/runtime#37941<\/a>.  The regression was due to the native implementation employing a special optimization that we were missing in the managed port (for floating-point arrays, moving all NaN values to the beginning of the array such that subsequent comparison operations could ignore the possibility of NaNs), and we successfully brought that over.  The problem, however, was expressing this in a way that didn&#8217;t result in tons of code duplication: the native implementation used templates, and the managed implementation used generics, but a limitation in inlining with generics made it such that helpers introduced to avoid lots of code duplication were causing non-inlineable method calls on every comparison employed in the sort.  PR <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/38229\">dotnet\/runtime#38229<\/a> addressed that by enabling the JIT to inline shared generic code within the same type.  Consider this microbenchmark:<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">private<\/span> <span class=\"pl-en\">C<\/span> <span class=\"pl-smi\">c1<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">C<\/span>() { <span class=\"pl-smi\">Value<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">1<\/span> }, <span class=\"pl-smi\">c2<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">C<\/span>() { <span class=\"pl-smi\">Value<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">2<\/span> }, <span class=\"pl-smi\">c3<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">C<\/span>() { <span class=\"pl-smi\">Value<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">3<\/span> };\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">int<\/span> <span class=\"pl-en\">Compare<\/span>() <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">Comparer<\/span>&lt;<span class=\"pl-en\">C<\/span>&gt;.<span class=\"pl-en\">Smallest<\/span>(<span class=\"pl-smi\">c1<\/span>, <span class=\"pl-smi\">c2<\/span>, <span class=\"pl-smi\">c3<\/span>);\r\n\r\n<span class=\"pl-k\">class<\/span> <span class=\"pl-en\">Comparer<\/span>&lt;<span class=\"pl-en\">T<\/span>&gt; <span class=\"pl-k\">where<\/span> <span class=\"pl-en\">T<\/span> : <span class=\"pl-en\">IComparable<\/span>&lt;<span class=\"pl-en\">T<\/span>&gt;\r\n{\r\n    <span class=\"pl-k\">public<\/span> <span class=\"pl-k\">static<\/span> <span class=\"pl-k\">int<\/span> <span class=\"pl-en\">Smallest<\/span>(<span class=\"pl-en\">T<\/span> <span class=\"pl-smi\">t1<\/span>, <span class=\"pl-en\">T<\/span> <span class=\"pl-smi\">t2<\/span>, <span class=\"pl-en\">T<\/span> <span class=\"pl-smi\">t3<\/span>) <span class=\"pl-k\">=&gt;<\/span>\r\n        <span class=\"pl-en\">Compare<\/span>(<span class=\"pl-smi\">t1<\/span>, <span class=\"pl-smi\">t2<\/span>) <span class=\"pl-k\">&lt;=<\/span> <span class=\"pl-c1\">0<\/span> <span class=\"pl-k\">?<\/span>\r\n            (<span class=\"pl-en\">Compare<\/span>(<span class=\"pl-smi\">t1<\/span>, <span class=\"pl-smi\">t3<\/span>) <span class=\"pl-k\">&lt;=<\/span> <span class=\"pl-c1\">0<\/span> <span class=\"pl-k\">?<\/span> <span class=\"pl-c1\">0<\/span> <span class=\"pl-k\">:<\/span> <span class=\"pl-c1\">2<\/span>) <span class=\"pl-k\">:<\/span>\r\n            (<span class=\"pl-en\">Compare<\/span>(<span class=\"pl-smi\">t2<\/span>, <span class=\"pl-smi\">t3<\/span>) <span class=\"pl-k\">&lt;=<\/span> <span class=\"pl-c1\">0<\/span> <span class=\"pl-k\">?<\/span> <span class=\"pl-c1\">1<\/span> <span class=\"pl-k\">:<\/span> <span class=\"pl-c1\">2<\/span>);\r\n\r\n    [<span class=\"pl-en\">MethodImpl<\/span>(<span class=\"pl-smi\">MethodImplOptions<\/span>.<span class=\"pl-smi\">AggressiveInlining<\/span>)]\r\n    <span class=\"pl-k\">private<\/span> <span class=\"pl-k\">static<\/span> <span class=\"pl-k\">int<\/span> <span class=\"pl-en\">Compare<\/span>(<span class=\"pl-en\">T<\/span> <span class=\"pl-smi\">t1<\/span>, <span class=\"pl-en\">T<\/span> <span class=\"pl-smi\">t2<\/span>) <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">t1<\/span>.<span class=\"pl-en\">CompareTo<\/span>(<span class=\"pl-smi\">t2<\/span>);\r\n}\r\n\r\n<span class=\"pl-k\">class<\/span> <span class=\"pl-en\">C<\/span> : <span class=\"pl-en\">IComparable<\/span>&lt;<span class=\"pl-en\">C<\/span>&gt;\r\n{\r\n    <span class=\"pl-k\">public<\/span> <span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">Value<\/span>;\r\n    <span class=\"pl-k\">public<\/span> <span class=\"pl-k\">int<\/span> <span class=\"pl-en\">CompareTo<\/span>(<span class=\"pl-en\">C<\/span> <span class=\"pl-smi\">other<\/span>) <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">other<\/span> <span class=\"pl-k\">is<\/span> <span class=\"pl-en\">null<\/span> ? <span class=\"pl-c1\">1<\/span> : <span class=\"pl-smi\">Value<\/span>.<span class=\"pl-en\">CompareTo<\/span>(<span class=\"pl-smi\">other<\/span>.<span class=\"pl-smi\">Value<\/span>);\r\n}<\/pre>\n<\/div>\n<p>The <code>Smallest<\/code> method is comparing the three supplied values and returning the index of the smallest.  It is a method on a generic type, and it&#8217;s calling to another method on that same type, which is in turn making calls out to methods on an instance of the generic type parameter.  As the benchmark is using <code>C<\/code> as the generic type, and as <code>C<\/code> is a reference type, the JIT will not specialize the code for this method specifically for <code>C<\/code>, and will instead use a &#8220;shared&#8221; implementation it generates to be used for all reference types.  In order for the <code>Compare<\/code> method to then call out to the correct interface implementation of <code>CompareTo<\/code>, that shared generic implementation employs a dictionary that maps from the generic type to the right target.  In previous versions of .NET, methods containing those generic dictionary lookups were not inlineable, which means that this <code>Smallest<\/code> method can&#8217;t inline the three calls it makes to <code>Compare<\/code>, even though <code>Compare<\/code> is attributed as <code>MethodImplOptions.AggressiveInlining<\/code>.  The aforementioned PR removed that limitation, resulting in a very measurable speedup on this example (and making the array sorting regression fix feasible):<\/p>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Compare<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">8.632 ns<\/td>\n<td align=\"right\">1.00<\/td>\n<\/tr>\n<tr>\n<td>Compare<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">9.259 ns<\/td>\n<td align=\"right\">1.07<\/td>\n<\/tr>\n<tr>\n<td>Compare<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">5.282 ns<\/td>\n<td align=\"right\">0.61<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Most of the cited improvements here have focused on throughput, with the JIT producing code that executes more quickly, and that faster code is often (though not always) smaller.  Folks working on the JIT actually pay a lot of attention to code size, in many cases using it as a primary metric for whether a change is beneficial or not.  Smaller code is not always faster code (instructions can be the same size but have very different cost profiles), but at a high level it&#8217;s a reasonable metric, and smaller code does have direct benefits, such as less impact on instruction caches, less code to load, etc.  In some cases, changes are focused entirely on reducing code size, such as in cases where unnecessary duplication occurs.  Consider this simple benchmark:<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">private<\/span> <span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">_offset<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">0<\/span>;\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">int<\/span> <span class=\"pl-en\">ThrowHelpers<\/span>()\r\n{\r\n    <span class=\"pl-k\">var<\/span> <span class=\"pl-smi\">arr<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-k\">int<\/span>[<span class=\"pl-c1\">10<\/span>];\r\n    <span class=\"pl-k\">var<\/span> <span class=\"pl-smi\">s0<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">Span<\/span>&lt;<span class=\"pl-k\">int<\/span>&gt;(<span class=\"pl-smi\">arr<\/span>, <span class=\"pl-smi\">_offset<\/span>, <span class=\"pl-c1\">1<\/span>);\r\n    <span class=\"pl-k\">var<\/span> <span class=\"pl-smi\">s1<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">Span<\/span>&lt;<span class=\"pl-k\">int<\/span>&gt;(<span class=\"pl-smi\">arr<\/span>, <span class=\"pl-smi\">_offset<\/span> <span class=\"pl-k\">+<\/span> <span class=\"pl-c1\">1<\/span>, <span class=\"pl-c1\">1<\/span>);\r\n    <span class=\"pl-k\">var<\/span> <span class=\"pl-smi\">s2<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">Span<\/span>&lt;<span class=\"pl-k\">int<\/span>&gt;(<span class=\"pl-smi\">arr<\/span>, <span class=\"pl-smi\">_offset<\/span> <span class=\"pl-k\">+<\/span> <span class=\"pl-c1\">2<\/span>, <span class=\"pl-c1\">1<\/span>);\r\n    <span class=\"pl-k\">var<\/span> <span class=\"pl-smi\">s3<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">Span<\/span>&lt;<span class=\"pl-k\">int<\/span>&gt;(<span class=\"pl-smi\">arr<\/span>, <span class=\"pl-smi\">_offset<\/span> <span class=\"pl-k\">+<\/span> <span class=\"pl-c1\">3<\/span>, <span class=\"pl-c1\">1<\/span>);\r\n    <span class=\"pl-k\">var<\/span> <span class=\"pl-smi\">s4<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">Span<\/span>&lt;<span class=\"pl-k\">int<\/span>&gt;(<span class=\"pl-smi\">arr<\/span>, <span class=\"pl-smi\">_offset<\/span> <span class=\"pl-k\">+<\/span> <span class=\"pl-c1\">4<\/span>, <span class=\"pl-c1\">1<\/span>);\r\n    <span class=\"pl-k\">var<\/span> <span class=\"pl-smi\">s5<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">Span<\/span>&lt;<span class=\"pl-k\">int<\/span>&gt;(<span class=\"pl-smi\">arr<\/span>, <span class=\"pl-smi\">_offset<\/span> <span class=\"pl-k\">+<\/span> <span class=\"pl-c1\">5<\/span>, <span class=\"pl-c1\">1<\/span>);\r\n    <span class=\"pl-k\">return<\/span> <span class=\"pl-smi\">s0<\/span>[<span class=\"pl-c1\">0<\/span>] <span class=\"pl-k\">+<\/span> <span class=\"pl-smi\">s1<\/span>[<span class=\"pl-c1\">0<\/span>] <span class=\"pl-k\">+<\/span> <span class=\"pl-smi\">s2<\/span>[<span class=\"pl-c1\">0<\/span>] <span class=\"pl-k\">+<\/span> <span class=\"pl-smi\">s3<\/span>[<span class=\"pl-c1\">0<\/span>] <span class=\"pl-k\">+<\/span> <span class=\"pl-smi\">s4<\/span>[<span class=\"pl-c1\">0<\/span>] <span class=\"pl-k\">+<\/span> <span class=\"pl-smi\">s5<\/span>[<span class=\"pl-c1\">0<\/span>];\r\n}<\/pre>\n<\/div>\n<p>The <code>Span&lt;T&gt;<\/code> constructor does <a href=\"https:\/\/github.com\/dotnet\/runtime\/blob\/932098fe90d146a73ebd86a2e595398b63b1a600\/src\/libraries\/System.Private.CoreLib\/src\/System\/Span.cs#L68-L80\">argument validation<\/a>, which, when <code>T<\/code> is a value type, results in there being two call sites to a method on the <code>ThrowHelper<\/code> class, one that throws for a failed null check on the input array and one that throws when offset and count are out of range (<code>ThrowHelper<\/code> contains non-inlinable methods like <code>ThrowArgumentNullException<\/code>, which contains the actual <code>throw<\/code> and avoids the associated code size at every call site; the JIT currently isn&#8217;t capable of &#8220;outlining&#8221;, the opposite of &#8220;inlining&#8221;, so it needs to be done manually in cases where it matters).  In the above example, we&#8217;re creating six spans, which means six calls to the <code>Span&lt;T&gt;<\/code> constructor, all of which will be inlined.  The JIT can see that the array is non-null, so it can eliminate the null check and the <code>ThrowArgumentNullException<\/code> from inlined code, but it doesn&#8217;t know whether the offset and count are in range, so it needs to retain the range check and the call site for the <code>ThrowHelper.ThrowArgumentOutOfRangeException<\/code> method.  In .NET Core 3.1, that results in code like the following being generated for this <code>ThrowHelpers<\/code> method:<\/p>\n<div class=\"highlight highlight-source-assembly\">\n<pre><span class=\"pl-en\">M00_L00:<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">call<\/span><span class=\"pl-en\">      System.ThrowHelper.ThrowArgumentOutOfRangeException()<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">int<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-c1\">3<\/span>\r\n<span class=\"pl-en\">M00_L01:<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">call<\/span><span class=\"pl-en\">      System.ThrowHelper.ThrowArgumentOutOfRangeException()<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">int<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-c1\">3<\/span>\r\n<span class=\"pl-en\">M00_L02:<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">call<\/span><span class=\"pl-en\">      System.ThrowHelper.ThrowArgumentOutOfRangeException()<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">int<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-c1\">3<\/span>\r\n<span class=\"pl-en\">M00_L03:<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">call<\/span><span class=\"pl-en\">      System.ThrowHelper.ThrowArgumentOutOfRangeException()<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">int<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-c1\">3<\/span>\r\n<span class=\"pl-en\">M00_L04:<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">call<\/span><span class=\"pl-en\">      System.ThrowHelper.ThrowArgumentOutOfRangeException()<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">int<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-c1\">3<\/span>\r\n<span class=\"pl-en\">M00_L05:<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">call<\/span><span class=\"pl-en\">      System.ThrowHelper.ThrowArgumentOutOfRangeException()<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">int<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-c1\">3<\/span><\/pre>\n<\/div>\n<p>In .NET 5, thanks to <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/27113\">dotnet\/coreclr#27113<\/a>, the JIT is able to recognize this duplication, and instead of all six call sites, it&#8217;ll end up consolidating them into just one:<\/p>\n<div class=\"highlight highlight-source-assembly\">\n<pre><span class=\"pl-en\">M00_L00:<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">call<\/span><span class=\"pl-en\">      System.ThrowHelper.ThrowArgumentOutOfRangeException()<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">int<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-c1\">3<\/span><\/pre>\n<\/div>\n<p>with all failed checks jumping to this shared location rather than each having its own copy.<\/p>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Code Size<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>ThrowHelpers<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">424 B<\/td>\n<\/tr>\n<tr>\n<td>ThrowHelpers<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">252 B<\/td>\n<\/tr>\n<tr>\n<td>ThrowHelpers<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">222 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>These are just some of the myriad of improvements that have gone into the JIT in .NET 5.  There are many more. <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/32368\">dotnet\/runtime#32368<\/a> causes the JIT to see an array&#8217;s length as unsigned, which results in it being able to use better instructions for some mathematical operations (e.g. division) performed on the length. <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/25458\">dotnet\/coreclr#25458<\/a> enables the JIT to use faster 0-based comparisons for some unsigned integer operations, e.g. using the equivalent of <code>a != 0<\/code> when the developer actually wrote <code>a &gt;= 1<\/code>. <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/1378\">dotnet\/runtime#1378<\/a> allows the JIT to recognize &#8220;constantString&#8221;.Length as a constant value. <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/26740\">dotnet\/runtime#26740<\/a> reduces the size of ReadyToRun images by removing <code>nop<\/code> padding. <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/33024\">dotnet\/runtime#330234<\/a> optimizes the instructions generated when performing <code>x * 2<\/code> when <code>x<\/code> is a <code>float<\/code> or <code>double<\/code>, using an add instead of a multiply. <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/27060\">dotnet\/runtime#27060<\/a> improves the code generated for the <code>Math.FusedMultiplyAdd<\/code> intrinsic. <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/27384\">dotnet\/runtime#27384<\/a> makes volatile operations cheaper on ARM64 by using better fence instructions than were previously used, and <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/38179\">dotnet\/runtime#38179<\/a> performs a peephole optimization on ARM64 to remove a bunch of redundant <code>mov<\/code> instructions. And on and on.<\/p>\n<p>There are also some significant changes in the JIT that are disabled by default, with the goal of getting real-world feedback on them and being able to enable them by default post-.NET 5.  For example, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/32969\">dotnet\/runtime#32969<\/a> provides an initial implementation of &#8220;On Stack Replacement&#8221; (OSR).  I mentioned tiered compilation earlier, which enables the JIT to first generate minimally-optimized code for a method, and then subsequently recompile a method with much more optimization when that method is shown to be important.  This enables faster start-up time by allowing code to get going more quickly and only upgrading impactful methods once things are running.  However, tiered compilation relies on being able to replace an implementation, and the next time it&#8217;s called, the new one will be invoked.  But what about long-running methods?  Tiered compilation is disabled by default for methods that contain loops (or, more specifically, backward branches) because they could end up running for a long time such that the replacement may not be used in a timely manner.  OSR enables methods to be updated while their code is executing, while they&#8217;re &#8220;on stack&#8221;; lots of great details are in the <a href=\"https:\/\/github.com\/dotnet\/runtime\/blob\/master\/docs\/design\/features\/OnStackReplacement.md\">design document<\/a> included in that PR (also related to tiered compilation, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/1457\">dotnet\/runtime#1457<\/a> improves the call-counting mechanism by which tiered compilation decides which methods should be recompiled, and when). You can experiment with OSR by setting both the <code>COMPlus_TC_QuickJitForLoops<\/code> and <code>COMPlus_TC_OnStackReplacement<\/code> environment variables to <code>1<\/code>. As another example, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/1180\">dotnet\/runtime#1180<\/a> improves the generated code quality for code inside try blocks, enabling the JIT to keep values in registers where it previously couldn&#8217;t.  You can experiment with this by setting the <code>COMPlus_EnableEHWriteThr<\/code> environment variable to <code>1<\/code>.<\/p>\n<p>There are also a bunch of pending pull requests to the JIT that haven&#8217;t yet been merged but that very well could be before .NET 5 is released (in addition to, I expect, many more that haven&#8217;t been put up yet but will before .NET 5 ships in a few months).  For example, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/32716\">dotnet\/runtime#32716<\/a> enables the JIT to replace some branching comparison like <code>a == 42 ? 3 : 2<\/code> with branchless implementations, which can help with performance when the hardware isn&#8217;t able to correctly predict which branch would be taken. Or <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/37226\">dotnet\/runtime#37226<\/a>, which enables the JIT to take a pattern like <code>\"hello\"[0]<\/code> and replace it with just <code>h<\/code>; while generally a developer doesn&#8217;t write such code, this can help when inlining is involved, with a constant string passed into a method that gets inlined and that indexes into a constant location (generally after a length check, which, thanks to <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/1378\">dotnet\/runtime#1378<\/a>, can also become a const).  Or <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/1224\">dotnet\/runtime#1224<\/a>, which improves the code generation for the <code>Bmi2.MultiplyNoFlags<\/code> intrinsic.  Or <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/37836\">dotnet\/runtime#37836<\/a>, which turns <code>BitOperations.PopCount<\/code> into an intrinsic in a manner that enables the JIT to recognize when it&#8217;s called with a constant argument and replace the whole operation with a precomputed constant.  Or <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/37245\">dotnet\/runtime#37254<\/a>, which removes null checks emitted when working with const strings. Or <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/32000\">dotnet\/runtime#32000<\/a> from <a href=\"https:\/\/github.com\/damageboy\">@damageboy<\/a>, which optimizes double negations.<\/p>\n<h3><a id=\"user-content-intrinsics\" class=\"anchor\" aria-hidden=\"true\" href=\"#intrinsics\"><svg class=\"octicon octicon-link\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" height=\"16\" aria-hidden=\"true\"><path fill-rule=\"evenodd\" d=\"M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z\"><\/path><\/svg><\/a><a name=\"intrinsics\"><\/a>Intrinsics<\/h3>\n<p>In .NET Core 3.0, over a thousand new hardware intrinsics methods were added and recognized by the JIT to enable C# code to directly target instruction sets like SSE4 and AVX2 (see the <a href=\"https:\/\/docs.microsoft.com\/en-us\/dotnet\/api\/system.runtime.intrinsics.x86\" rel=\"nofollow\">docs<\/a>).  These were then used to great benefit in a bunch of APIs in the core libraries.  However, the intrinsics were limited to x86\/x64 architectures.  In .NET 5, a ton of effort has gone into adding thousands more, specific to ARM64, thanks to multiple contributors, and in particular <a href=\"https:\/\/github.com\/TamarChristinaArm\">@TamarChristinaArm<\/a> from Arm Holdings.  And as with their x86\/x64 counterparts, these intrinsics have been put to good use inside core library functionality.  For example, the <code>BitOperations.PopCount()<\/code> method was previously optimized to use the x86 POPCNT intrinsic, and for .NET 5, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/35636\">dotnet\/runtime#35636<\/a> augments it to also be able to use the ARM VCNT or ARM64 CNT equivalent.  Similarly, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/34486\">dotnet\/runtime#34486<\/a> modified <code>BitOperations.LeadingZeroCount<\/code>, <code>TrailingZeroCount<\/code>, and <code>Log2<\/code> to utilize the corresponding instrincs. And at a higher level, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/33749\/\">dotnet\/runtime#33749<\/a> from <a href=\"https:\/\/github.com\/Gnbrkm41\">@Gnbrkm41<\/a> augments multiple methods in <code>BitArray<\/code> to use ARM64 intrinsics to go along with the previously added support for SSE2 and AVX2. Lots of work has gone into ensuring that the <code>Vector<\/code> APIs perform well on ARM64, too, such as with <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/37139\">dotnet\/runtime#37139<\/a> and <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/36156\">dotnet\/runtime#36156<\/a>.<\/p>\n<p>Beyond ARM64, additional work has been done to vectorize more operations.  For example, <a href=\"https:\/\/github.com\/Gnbrkm41\">@Gnbrkm41<\/a> also submitted <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/31993\">dotnet\/runtime#31993<\/a>, which utilized ROUNDPS\/ROUNDPD on x64 and FRINPT\/FRINTM on ARM64 to improve the code generated for the new <code>Vector.Ceiling<\/code> and <code>Vector.Floor<\/code> methods.  And <code>BitOperations<\/code> (which is a relatively low-level type implemented for most operations as a 1:1 wrapper around the most appropriate hardware intrinsics) was not only improved in <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/35650\">dotnet\/runtime#35650<\/a> from <a href=\"https:\/\/github.com\/saucecontrol\">@saucecontrol<\/a> but also had its usage in Corelib improved to be more efficient.<\/p>\n<p>Finally, a whole slew of changes went into the JIT to better handle hardware intrinsics and vectorization in general, such as <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/35421\">dotnet\/runtime#35421<\/a>, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/31834\">dotnet\/runtime#31834<\/a>, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/1280\">dotnet\/runtime#1280<\/a>, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/35857\">dotnet\/runtime#35857<\/a>,  <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/36267\">dotnet\/runtime#36267<\/a>, and  <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/35525\">dotnet\/runtime#35525<\/a>.<\/p>\n<h2><a id=\"user-content-runtime-helpers\" class=\"anchor\" aria-hidden=\"true\" href=\"#runtime-helpers\"><svg class=\"octicon octicon-link\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" height=\"16\" aria-hidden=\"true\"><path fill-rule=\"evenodd\" d=\"M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z\"><\/path><\/svg><\/a><a name=\"runtime-helpers\"><\/a>Runtime Helpers<\/h2>\n<p>The GC and JIT represent large portions of the runtime, but there still remains significant portions of functionality in the runtime outside of these components, and those have similarly seen improvements.<\/p>\n<p>It&#8217;s interesting to note that the JIT doesn&#8217;t generate code from scratch for everything.  There are many places where pre-existing helper functions are invoked by the JIT, with the runtime supplying those helpers, and improvements to those helpers can have meaningful impact on programs. <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/23548\">dotnet\/runtime#23548<\/a> is a great example.  In libraries like <code>System.Linq<\/code>, we&#8217;ve shied away from adding additional type checks for covariant interfaces because of significantly higher overhead for them versus for normal interfaces. <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/23548\">dotnet\/runtime#23548<\/a> (subsequently tweaked in <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/34427\">dotnet\/runtime#34427<\/a>) essentially adds a cache, such that the cost of these casts are amortized and end up being much faster overall.  This is evident from a simple microbenchmark:<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">private<\/span> <span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-k\">string<\/span>&gt; <span class=\"pl-smi\">_list<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-k\">string<\/span>&gt;();\r\n\r\n<span class=\"pl-c\"><span class=\"pl-c\">\/\/<\/span> IReadOnlyCollection&lt;out T&gt; is covariant<\/span>\r\n[<span class=\"pl-en\">Benchmark<\/span>] <span class=\"pl-k\">public<\/span> <span class=\"pl-k\">bool<\/span> <span class=\"pl-en\">IsIReadOnlyCollection<\/span>() <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-en\">IsIReadOnlyCollection<\/span>(<span class=\"pl-smi\">_list<\/span>);\r\n[<span class=\"pl-en\">MethodImpl<\/span>(<span class=\"pl-smi\">MethodImplOptions<\/span>.<span class=\"pl-smi\">NoInlining<\/span>)]  <span class=\"pl-k\">private<\/span> <span class=\"pl-k\">static<\/span> <span class=\"pl-k\">bool<\/span> <span class=\"pl-en\">IsIReadOnlyCollection<\/span>(<span class=\"pl-k\">object<\/span> <span class=\"pl-smi\">o<\/span>) <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">o<\/span> <span class=\"pl-k\">is<\/span> <span class=\"pl-en\">IReadOnlyCollection<\/span>&lt;<span class=\"pl-k\">int<\/span>&gt;;<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<th align=\"right\">Code Size<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>IsIReadOnlyCollection<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">105.460 ns<\/td>\n<td align=\"right\">1.00<\/td>\n<td align=\"right\">53 B<\/td>\n<\/tr>\n<tr>\n<td>IsIReadOnlyCollection<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">56.252 ns<\/td>\n<td align=\"right\">0.53<\/td>\n<td align=\"right\">59 B<\/td>\n<\/tr>\n<tr>\n<td>IsIReadOnlyCollection<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">3.383 ns<\/td>\n<td align=\"right\">0.03<\/td>\n<td align=\"right\">45 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Another set of impactful changes came in <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/32270\">dotnet\/runtime#32270<\/a> (with JIT support in <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/31957\">dotnet\/runtime#31957<\/a>).  In the past, generic methods maintained just a few dedicated dictionary slots that could be used for fast lookup of the types associated with the generic method; once those slots were exhausted, it fell back to a slower lookup table.  The need for this limitation no longer exists, and these changes enabled fast lookup slots to be used for all generic lookups.<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre>[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">void<\/span> <span class=\"pl-en\">GenericDictionaries<\/span>()\r\n{\r\n    <span class=\"pl-k\">for<\/span> (<span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">0<\/span>; <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">&lt;<\/span> <span class=\"pl-c1\">14<\/span>; <span class=\"pl-smi\">i<\/span><span class=\"pl-k\">++<\/span>)\r\n        <span class=\"pl-en\">GenericMethod<\/span>&lt;<span class=\"pl-k\">string<\/span>&gt;(<span class=\"pl-smi\">i<\/span>);\r\n}\r\n\r\n[<span class=\"pl-en\">MethodImpl<\/span>(<span class=\"pl-smi\">MethodImplOptions<\/span>.<span class=\"pl-smi\">NoInlining<\/span>)]\r\n<span class=\"pl-k\">private<\/span> <span class=\"pl-k\">static<\/span> <span class=\"pl-k\">object<\/span> <span class=\"pl-en\">GenericMethod<\/span>&lt;<span class=\"pl-en\">T<\/span>&gt;(<span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">level<\/span>)\r\n{\r\n    <span class=\"pl-k\">switch<\/span> (<span class=\"pl-smi\">level<\/span>)\r\n    {\r\n        <span class=\"pl-k\">case<\/span> <span class=\"pl-c1\">0<\/span>: <span class=\"pl-k\">return<\/span> <span class=\"pl-k\">typeof<\/span>(<span class=\"pl-en\">T<\/span>);\r\n        <span class=\"pl-k\">case<\/span> <span class=\"pl-c1\">1<\/span>: <span class=\"pl-k\">return<\/span> <span class=\"pl-k\">typeof<\/span>(<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">T<\/span>&gt;);\r\n        <span class=\"pl-k\">case<\/span> <span class=\"pl-c1\">2<\/span>: <span class=\"pl-k\">return<\/span> <span class=\"pl-k\">typeof<\/span>(<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">T<\/span>&gt;&gt;);\r\n        <span class=\"pl-k\">case<\/span> <span class=\"pl-c1\">3<\/span>: <span class=\"pl-k\">return<\/span> <span class=\"pl-k\">typeof<\/span>(<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">T<\/span>&gt;&gt;&gt;);\r\n        <span class=\"pl-k\">case<\/span> <span class=\"pl-c1\">4<\/span>: <span class=\"pl-k\">return<\/span> <span class=\"pl-k\">typeof<\/span>(<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">T<\/span>&gt;&gt;&gt;&gt;);\r\n        <span class=\"pl-k\">case<\/span> <span class=\"pl-c1\">5<\/span>: <span class=\"pl-k\">return<\/span> <span class=\"pl-k\">typeof<\/span>(<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">T<\/span>&gt;&gt;&gt;&gt;&gt;);\r\n        <span class=\"pl-k\">case<\/span> <span class=\"pl-c1\">6<\/span>: <span class=\"pl-k\">return<\/span> <span class=\"pl-k\">typeof<\/span>(<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">T<\/span>&gt;&gt;&gt;&gt;&gt;&gt;);\r\n        <span class=\"pl-k\">case<\/span> <span class=\"pl-c1\">7<\/span>: <span class=\"pl-k\">return<\/span> <span class=\"pl-k\">typeof<\/span>(<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">T<\/span>&gt;&gt;&gt;&gt;&gt;&gt;&gt;);\r\n        <span class=\"pl-k\">case<\/span> <span class=\"pl-c1\">8<\/span>: <span class=\"pl-k\">return<\/span> <span class=\"pl-k\">typeof<\/span>(<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">T<\/span>&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;);\r\n        <span class=\"pl-k\">case<\/span> <span class=\"pl-c1\">9<\/span>: <span class=\"pl-k\">return<\/span> <span class=\"pl-k\">typeof<\/span>(<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">T<\/span>&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;);\r\n        <span class=\"pl-k\">case<\/span> <span class=\"pl-c1\">10<\/span>: <span class=\"pl-k\">return<\/span> <span class=\"pl-k\">typeof<\/span>(<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">T<\/span>&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;);\r\n        <span class=\"pl-k\">case<\/span> <span class=\"pl-c1\">11<\/span>: <span class=\"pl-k\">return<\/span> <span class=\"pl-k\">typeof<\/span>(<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">T<\/span>&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;);\r\n        <span class=\"pl-k\">case<\/span> <span class=\"pl-c1\">12<\/span>: <span class=\"pl-k\">return<\/span> <span class=\"pl-k\">typeof<\/span>(<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">T<\/span>&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;);\r\n        <span class=\"pl-k\">default<\/span>: <span class=\"pl-k\">return<\/span> <span class=\"pl-k\">typeof<\/span>(<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">T<\/span>&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;&gt;);\r\n    }\r\n}<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>GenericDictionaries<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">104.33 ns<\/td>\n<td align=\"right\">1.00<\/td>\n<\/tr>\n<tr>\n<td>GenericDictionaries<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">76.71 ns<\/td>\n<td align=\"right\">0.74<\/td>\n<\/tr>\n<tr>\n<td>GenericDictionaries<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">51.53 ns<\/td>\n<td align=\"right\">0.49<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2><a id=\"user-content-text-processing\" class=\"anchor\" aria-hidden=\"true\" href=\"#text-processing\"><svg class=\"octicon octicon-link\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" height=\"16\" aria-hidden=\"true\"><path fill-rule=\"evenodd\" d=\"M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z\"><\/path><\/svg><\/a><a name=\"text-processing\"><\/a>Text Processing<\/h2>\n<p>Text-based processing is the bread-and-butter of many applications, and a lot of effort in every release goes into improving the fundamental building blocks on top of which everything else is built.  Such changes extend from microoptimizations in helpers processing individual characters all the way up to overhauls of entire text-processing libraries.<\/p>\n<p><code>System.Char<\/code> received some nice improvements in .NET 5.  For example, <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/26848\">dotnet\/coreclr#26848<\/a> improved the performance of <code>char.IsWhiteSpace<\/code> by tweaking the implementation to require fewer instructions and less branching.  Improvements to <code>char.IsWhiteSpace<\/code> then manifest in a bunch of other methods that rely on it, like <code>string.IsEmptyOrWhiteSpace<\/code> and <code>Trim<\/code>:<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre>[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">int<\/span> <span class=\"pl-en\">Trim<\/span>() <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span> test <span class=\"pl-pds\">\"<\/span><\/span>.<span class=\"pl-en\">AsSpan<\/span>().<span class=\"pl-en\">Trim<\/span>().<span class=\"pl-smi\">Length<\/span>;<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<th align=\"right\">Code Size<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Trim<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">21.694 ns<\/td>\n<td align=\"right\">1.00<\/td>\n<td align=\"right\">569 B<\/td>\n<\/tr>\n<tr>\n<td>Trim<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">8.079 ns<\/td>\n<td align=\"right\">0.37<\/td>\n<td align=\"right\">377 B<\/td>\n<\/tr>\n<tr>\n<td>Trim<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">6.556 ns<\/td>\n<td align=\"right\">0.30<\/td>\n<td align=\"right\">365 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Another nice example, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/35194\">dotnet\/runtime#35194<\/a> improved the performance of <code>char.ToUpperInvariant<\/code> and <code>char.ToLowerInvariant<\/code> by improving the inlineability of various methods, streamlining the call paths from the public APIs down to the core functionality, and further tweaking the implementation to ensure the JIT was generating the best code.<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre>[<span class=\"pl-en\">Benchmark<\/span>]\r\n[<span class=\"pl-en\">Arguments<\/span>(<span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>It's exciting to see great performance!<span class=\"pl-pds\">\"<\/span><\/span>)]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">int<\/span> <span class=\"pl-en\">ToUpperInvariant<\/span>(<span class=\"pl-k\">string<\/span> <span class=\"pl-smi\">s<\/span>)\r\n{\r\n    <span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">sum<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">0<\/span>;\r\n\r\n    <span class=\"pl-k\">for<\/span> (<span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">0<\/span>; <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">&lt;<\/span> <span class=\"pl-smi\">s<\/span>.<span class=\"pl-smi\">Length<\/span>; <span class=\"pl-smi\">i<\/span><span class=\"pl-k\">++<\/span>)\r\n        <span class=\"pl-smi\">sum<\/span> <span class=\"pl-k\">+=<\/span> <span class=\"pl-smi\">char<\/span>.<span class=\"pl-en\">ToUpperInvariant<\/span>(<span class=\"pl-smi\">s<\/span>[<span class=\"pl-smi\">i<\/span>]);\r\n\r\n    <span class=\"pl-k\">return<\/span> <span class=\"pl-smi\">sum<\/span>;\r\n}<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<th align=\"right\">Code Size<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>ToUpperInvariant<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">208.34 ns<\/td>\n<td align=\"right\">1.00<\/td>\n<td align=\"right\">171 B<\/td>\n<\/tr>\n<tr>\n<td>ToUpperInvariant<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">166.10 ns<\/td>\n<td align=\"right\">0.80<\/td>\n<td align=\"right\">164 B<\/td>\n<\/tr>\n<tr>\n<td>ToUpperInvariant<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">69.15 ns<\/td>\n<td align=\"right\">0.33<\/td>\n<td align=\"right\">105 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Going beyond single characters, in practically every release of .NET Core, we&#8217;ve worked to push the envelope for how fast we can make the existing formatting APIs.  This release is no different.  And even though previous releases saw significant wins, this one moves the bar further.<\/p>\n<p><code>Int32.ToString()<\/code> is an incredibly common operation, and it&#8217;s important it be fast. <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/32528\">dotnet\/runtime#32528<\/a> from <a href=\"https:\/\/github.com\/ts2do\">@ts2do<\/a> made it even faster by adding inlineable fast paths for the key formatting routines employed by the method and by streamlining the path taken by various public APIs to get to those routines.  Other primitive <code>ToString<\/code> operations were also improved.  For example, <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/27056\">dotnet\/runtime#27056<\/a> streamlines some code paths to enable less cruft in getting from the public API to the point where bits are actually written out to memory.<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre>[<span class=\"pl-en\">Benchmark<\/span>] <span class=\"pl-k\">public<\/span> <span class=\"pl-k\">string<\/span> <span class=\"pl-en\">ToString12345<\/span>() <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-c1\">12345<\/span>.<span class=\"pl-en\">ToString<\/span>();\r\n[<span class=\"pl-en\">Benchmark<\/span>] <span class=\"pl-k\">public<\/span> <span class=\"pl-k\">string<\/span> <span class=\"pl-en\">ToString123<\/span>() <span class=\"pl-k\">=&gt;<\/span> ((<span class=\"pl-k\">byte<\/span>)<span class=\"pl-c1\">123<\/span>).<span class=\"pl-en\">ToString<\/span>();<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>ToString12345<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">45.737 ns<\/td>\n<td align=\"right\">1.00<\/td>\n<td align=\"right\">40 B<\/td>\n<\/tr>\n<tr>\n<td>ToString12345<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">20.006 ns<\/td>\n<td align=\"right\">0.44<\/td>\n<td align=\"right\">32 B<\/td>\n<\/tr>\n<tr>\n<td>ToString12345<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">10.742 ns<\/td>\n<td align=\"right\">0.23<\/td>\n<td align=\"right\">32 B<\/td>\n<\/tr>\n<tr>\n<td><\/td>\n<td><\/td>\n<td align=\"right\"><\/td>\n<td align=\"right\"><\/td>\n<td align=\"right\"><\/td>\n<\/tr>\n<tr>\n<td>ToString123<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">42.791 ns<\/td>\n<td align=\"right\">1.00<\/td>\n<td align=\"right\">32 B<\/td>\n<\/tr>\n<tr>\n<td>ToString123<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">18.014 ns<\/td>\n<td align=\"right\">0.42<\/td>\n<td align=\"right\">32 B<\/td>\n<\/tr>\n<tr>\n<td>ToString123<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">7.801 ns<\/td>\n<td align=\"right\">0.18<\/td>\n<td align=\"right\">32 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>In a similar vein, in previous releases we did some fairly heavy optimizations on <code>DateTime<\/code> and <code>DateTimeOffset<\/code>, but those improvements were primarily focused on how quickly we could convert the day\/month\/year\/etc. data into the right characters or bytes and write them to the destination.  In <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/1944\">dotnet\/runtime#1944<\/a>, <a href=\"https:\/\/github.com\/ts2do\">@ts2do<\/a> focused on the step before that, optimizing the extraction of the day\/month\/year\/etc. from the raw tick count the <code>DateTime{Offset}<\/code> stores.  That ended up being very fruitful, resulting in being able to output formats like &#8220;o&#8221; (the &#8220;round-trip date\/time pattern&#8221;) 30% faster than before (the change also applied the same decomposition optimization in other places in the codebase where those components were needed from a <code>DateTime<\/code>, but the improvement is easiest to show in a benchmark for formatting):<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">private<\/span> <span class=\"pl-k\">byte<\/span>[] <span class=\"pl-smi\">_bytes<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-k\">byte<\/span>[<span class=\"pl-c1\">100<\/span>];\r\n<span class=\"pl-k\">private<\/span> <span class=\"pl-k\">char<\/span>[] <span class=\"pl-smi\">_chars<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-k\">char<\/span>[<span class=\"pl-c1\">100<\/span>];\r\n<span class=\"pl-k\">private<\/span> <span class=\"pl-en\">DateTime<\/span> <span class=\"pl-smi\">_dt<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-smi\">DateTime<\/span>.<span class=\"pl-smi\">Now<\/span>;\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>] <span class=\"pl-k\">public<\/span> <span class=\"pl-k\">bool<\/span> <span class=\"pl-en\">FormatChars<\/span>() <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">_dt<\/span>.<span class=\"pl-en\">TryFormat<\/span>(<span class=\"pl-smi\">_chars<\/span>, <span class=\"pl-k\">out<\/span> <span class=\"pl-c1\">_<\/span>, <span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>o<span class=\"pl-pds\">\"<\/span><\/span>);\r\n[<span class=\"pl-en\">Benchmark<\/span>] <span class=\"pl-k\">public<\/span> <span class=\"pl-k\">bool<\/span> <span class=\"pl-en\">FormatBytes<\/span>() <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">Utf8Formatter<\/span>.<span class=\"pl-en\">TryFormat<\/span>(<span class=\"pl-smi\">_dt<\/span>, <span class=\"pl-smi\">_bytes<\/span>, <span class=\"pl-k\">out<\/span> <span class=\"pl-c1\">_<\/span>, <span class=\"pl-s\">'O'<\/span>);<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>FormatChars<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">242.4 ns<\/td>\n<td align=\"right\">1.00<\/td>\n<\/tr>\n<tr>\n<td>FormatChars<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">176.4 ns<\/td>\n<td align=\"right\">0.73<\/td>\n<\/tr>\n<tr>\n<td><\/td>\n<td><\/td>\n<td align=\"right\"><\/td>\n<td align=\"right\"><\/td>\n<\/tr>\n<tr>\n<td>FormatBytes<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">235.6 ns<\/td>\n<td align=\"right\">1.00<\/td>\n<\/tr>\n<tr>\n<td>FormatBytes<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">176.1 ns<\/td>\n<td align=\"right\">0.75<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>There were also a multitude of improvements for operations on <code>strings<\/code>, such as with <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/26621\">dotnet\/coreclr#26621<\/a> and <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/26962\">dotnet\/coreclr#26962<\/a>, which in some cases significantly improved the performance of culture-aware <code>StartsWith<\/code> and <code>EndsWith<\/code> operations on Linux.<\/p>\n<p>Of course, low-level processing is all well and good, but applications these days spend a lot of time doing higher-level operations like encoding of data in a particular format, such as UTF8.  Previous .NET Core releases saw <code>Encoding.UTF8<\/code> optimized, but in .NET 5 it&#8217;s still improved further.  <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/27268\">dotnet\/runtime#27268<\/a> optimizes it more, in particular for smaller inputs, by taking better advantage of stack allocation and improvements made in JIT devirtualization (where the JIT is able to avoid virtual dispatch due to being able to discover the actual concrete type of the instance it&#8217;s working with).<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre>[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">string<\/span> <span class=\"pl-en\">Roundtrip<\/span>()\r\n{\r\n    <span class=\"pl-k\">byte<\/span>[] <span class=\"pl-smi\">bytes<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-smi\">Encoding<\/span>.<span class=\"pl-smi\">UTF8<\/span>.<span class=\"pl-en\">GetBytes<\/span>(<span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>this is a test<span class=\"pl-pds\">\"<\/span><\/span>);\r\n    <span class=\"pl-k\">return<\/span> <span class=\"pl-smi\">Encoding<\/span>.<span class=\"pl-smi\">UTF8<\/span>.<span class=\"pl-en\">GetString<\/span>(<span class=\"pl-smi\">bytes<\/span>);\r\n}<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Roundtrip<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">113.69 ns<\/td>\n<td align=\"right\">1.00<\/td>\n<td align=\"right\">96 B<\/td>\n<\/tr>\n<tr>\n<td>Roundtrip<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">49.76 ns<\/td>\n<td align=\"right\">0.44<\/td>\n<td align=\"right\">96 B<\/td>\n<\/tr>\n<tr>\n<td>Roundtrip<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">36.70 ns<\/td>\n<td align=\"right\">0.32<\/td>\n<td align=\"right\">96 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>As important as UTF8 is, the &#8220;ISO-8859-1&#8221; encoding, otherwise known as &#8220;Latin1&#8221; (and which is now publicly exposed as <code>Encoding.Latin1<\/code> via <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/37550\">dotnet\/runtime#37550<\/a>), is also very important, in particular for networking protocols like HTTP. <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/32994\">dotnet\/runtime#32994<\/a> vectorized its implementation, based in large part on similar optimizations previously done for <code>Encoding.ASCII<\/code>.  This yields a really nice performance boost, which can measurably impact higher-level usage in clients like <code>HttpClient<\/code> and in servers like Kestrel.<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">private<\/span> <span class=\"pl-k\">static<\/span> <span class=\"pl-k\">readonly<\/span> <span class=\"pl-en\">Encoding<\/span> <span class=\"pl-smi\">s_latin1<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-smi\">Encoding<\/span>.<span class=\"pl-en\">GetEncoding<\/span>(<span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>iso-8859-1<span class=\"pl-pds\">\"<\/span><\/span>);\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">string<\/span> <span class=\"pl-en\">Roundtrip<\/span>()\r\n{\r\n    <span class=\"pl-k\">byte<\/span>[] <span class=\"pl-smi\">bytes<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-smi\">s_latin1<\/span>.<span class=\"pl-en\">GetBytes<\/span>(<span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>this is a test. this is only a test. did it work?<span class=\"pl-pds\">\"<\/span><\/span>);\r\n    <span class=\"pl-k\">return<\/span> <span class=\"pl-smi\">s_latin1<\/span>.<span class=\"pl-en\">GetString<\/span>(<span class=\"pl-smi\">bytes<\/span>);\r\n}<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Roundtrip<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">221.85 ns<\/td>\n<td align=\"right\">209 B<\/td>\n<\/tr>\n<tr>\n<td>Roundtrip<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">193.20 ns<\/td>\n<td align=\"right\">200 B<\/td>\n<\/tr>\n<tr>\n<td>Roundtrip<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">41.76 ns<\/td>\n<td align=\"right\">200 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Performance improvements to encoding also expanded to the encoders in <code>System.Text.Encodings.Web<\/code>, where PRs <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/42073\">dotnet\/corefx#42073<\/a> and <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/284\">dotnet\/runtime#284<\/a> from <a href=\"https:\/\/github.com\/gfoidl\">@gfoidl<\/a> improved the various <code>TextEncoder<\/code> types.  This included using SSSE3 instructions to vectorize <code>FindFirstCharacterToEncodeUtf8<\/code> as well as <code>FindFirstCharToEncode<\/code> in the <code>JavaScriptEncoder.Default<\/code> implementation.<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">private<\/span> <span class=\"pl-k\">char<\/span>[] <span class=\"pl-smi\">_dest<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-k\">char<\/span>[<span class=\"pl-c1\">1000<\/span>];\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">void<\/span> <span class=\"pl-en\">Encode<\/span>() <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">JavaScriptEncoder<\/span>.<span class=\"pl-smi\">Default<\/span>.<span class=\"pl-en\">Encode<\/span>(<span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>This is a test to see how fast we can encode something that does not actually need encoding<span class=\"pl-pds\">\"<\/span><\/span>, <span class=\"pl-smi\">_dest<\/span>, <span class=\"pl-k\">out<\/span> <span class=\"pl-c1\">_<\/span>, <span class=\"pl-k\">out<\/span> <span class=\"pl-c1\">_<\/span>);<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Encode<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">102.52 ns<\/td>\n<td align=\"right\">1.00<\/td>\n<\/tr>\n<tr>\n<td>Encode<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">33.39 ns<\/td>\n<td align=\"right\">0.33<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h3><a id=\"user-content-regular-expressions\" class=\"anchor\" aria-hidden=\"true\" href=\"#regular-expressions\"><svg class=\"octicon octicon-link\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" height=\"16\" aria-hidden=\"true\"><path fill-rule=\"evenodd\" d=\"M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z\"><\/path><\/svg><\/a><a name=\"regular-expressions\"><\/a>Regular Expressions<\/h3>\n<p>A very specific but extremely common form of parsing is via regular expressions. Back in early April, I shared a <a href=\"https:\/\/devblogs.microsoft.com\/dotnet\/regex-performance-improvements-in-net-5\/\" rel=\"nofollow\">detailed blog post<\/a> about some of the myriad of performance improvements that have gone into .NET 5 for System.Text.RegularExpressions.  I&#8217;m not going to rehash all of that here, but I would encourage you to read it if haven&#8217;t already, as it represents significant advancements in the library.  However, I also noted in that post that we would continue to improve <code>Regex<\/code>, and we have, in particular adding in more support for special but common cases.<\/p>\n<p>One such improvement was in newline handling when specifying <code>RegexOptions.Multiline<\/code>, which changes the meaning of the <code>^<\/code> and <code>$<\/code> anchors to match at the beginning and end of any line rather than just the beginning and end of the whole input string.  We previously didn&#8217;t do any special handling of beginning-of-line anchors (<code>^<\/code> when <code>Multiline<\/code> is specified), which meant that as part of the <code>FindFirstChar<\/code> operation (see the aforementioned blog post for background on what that refers to), we wouldn&#8217;t skip ahead as much as we otherwise could.  <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/34566\">dotnet\/runtime#34566<\/a> taught <code>FindFirstChar<\/code> how to use a vectorized <code>IndexOf<\/code> to jump ahead to the next relevant location.  The impact of that is highlighted in this benchmark, which is processing the text of &#8220;Romeo and Juliet&#8221; as downloaded from <a href=\"http:\/\/www.gutenberg.org\/cache\/epub\/1112\/pg1112.txt\" rel=\"nofollow\">Project Gutenberg<\/a>:<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">private<\/span> <span class=\"pl-k\">readonly<\/span> <span class=\"pl-k\">string<\/span> <span class=\"pl-smi\">_input<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">HttpClient<\/span>().<span class=\"pl-en\">GetStringAsync<\/span>(<span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>http:\/\/www.gutenberg.org\/cache\/epub\/1112\/pg1112.txt<span class=\"pl-pds\">\"<\/span><\/span>).<span class=\"pl-smi\">Result<\/span>;\r\n<span class=\"pl-k\">private<\/span> <span class=\"pl-en\">Regex<\/span> <span class=\"pl-smi\">_regex<\/span>;\r\n\r\n[<span class=\"pl-en\">Params<\/span>(<span class=\"pl-c1\">false<\/span>, <span class=\"pl-c1\">true<\/span>)]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-smi\">bool<\/span> <span class=\"pl-smi\">Compiled<\/span> { <span class=\"pl-smi\">get<\/span>; <span class=\"pl-smi\">set<\/span>; }\r\n\r\n[<span class=\"pl-en\">GlobalSetup<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">void<\/span> <span class=\"pl-en\">Setup<\/span>() <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">_regex<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">Regex<\/span>(<span class=\"pl-s\"><span class=\"pl-pds\">@\"<\/span>^.*\\blove\\b.*$<span class=\"pl-pds\">\"<\/span><\/span>, <span class=\"pl-smi\">RegexOptions<\/span>.<span class=\"pl-smi\">Multiline<\/span> <span class=\"pl-k\">|<\/span> (<span class=\"pl-smi\">Compiled<\/span> <span class=\"pl-k\">?<\/span> <span class=\"pl-smi\">RegexOptions<\/span>.<span class=\"pl-smi\">Compiled<\/span> <span class=\"pl-k\">:<\/span> <span class=\"pl-smi\">RegexOptions<\/span>.<span class=\"pl-smi\">None<\/span>));\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">int<\/span> <span class=\"pl-en\">Count<\/span>() <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">_regex<\/span>.<span class=\"pl-en\">Matches<\/span>(<span class=\"pl-smi\">_input<\/span>).<span class=\"pl-smi\">Count<\/span>;<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th>Compiled<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Count<\/td>\n<td>.NET FW 4.8<\/td>\n<td>False<\/td>\n<td align=\"right\">26.207 ms<\/td>\n<td align=\"right\">1.00<\/td>\n<\/tr>\n<tr>\n<td>Count<\/td>\n<td>.NET Core 3.1<\/td>\n<td>False<\/td>\n<td align=\"right\">21.106 ms<\/td>\n<td align=\"right\">0.80<\/td>\n<\/tr>\n<tr>\n<td>Count<\/td>\n<td>.NET 5.0<\/td>\n<td>False<\/td>\n<td align=\"right\">4.065 ms<\/td>\n<td align=\"right\">0.16<\/td>\n<\/tr>\n<tr>\n<td><\/td>\n<td><\/td>\n<td><\/td>\n<td align=\"right\"><\/td>\n<td align=\"right\"><\/td>\n<\/tr>\n<tr>\n<td>Count<\/td>\n<td>.NET FW 4.8<\/td>\n<td>True<\/td>\n<td align=\"right\">16.944 ms<\/td>\n<td align=\"right\">1.00<\/td>\n<\/tr>\n<tr>\n<td>Count<\/td>\n<td>.NET Core 3.1<\/td>\n<td>True<\/td>\n<td align=\"right\">15.287 ms<\/td>\n<td align=\"right\">0.90<\/td>\n<\/tr>\n<tr>\n<td>Count<\/td>\n<td>.NET 5.0<\/td>\n<td>True<\/td>\n<td align=\"right\">2.172 ms<\/td>\n<td align=\"right\">0.13<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Another such improvement was in the handling of <code>RegexOptions.IgnoreCase<\/code>.  The implementation of <code>IgnoreCase<\/code> uses <code>char.ToLower{Invariant}<\/code> to get the relevant characters to be compared, but that has overhead due to culture-specific mappings. <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/35185\">dotnet\/runtime#35185<\/a> enables those overheads to be avoided when the only character that could possibly lowercase to the character being compared against is that character itself.<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">private<\/span> <span class=\"pl-k\">readonly<\/span> <span class=\"pl-en\">Regex<\/span> <span class=\"pl-smi\">_regex<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">Regex<\/span>(<span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>hello.*world<span class=\"pl-pds\">\"<\/span><\/span>, <span class=\"pl-smi\">RegexOptions<\/span>.<span class=\"pl-smi\">Compiled<\/span> <span class=\"pl-k\">|<\/span> <span class=\"pl-smi\">RegexOptions<\/span>.<span class=\"pl-smi\">IgnoreCase<\/span>);\r\n<span class=\"pl-k\">private<\/span> <span class=\"pl-k\">readonly<\/span> <span class=\"pl-k\">string<\/span> <span class=\"pl-smi\">_input<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>abcdHELLO<span class=\"pl-pds\">\"<\/span><\/span> <span class=\"pl-k\">+<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-k\">string<\/span>(<span class=\"pl-s\">'a'<\/span>, <span class=\"pl-c1\">128<\/span>) <span class=\"pl-k\">+<\/span> <span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>WORLD123<span class=\"pl-pds\">\"<\/span><\/span>;\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>] <span class=\"pl-k\">public<\/span> <span class=\"pl-k\">bool<\/span> <span class=\"pl-en\">IsMatch<\/span>() <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">_regex<\/span>.<span class=\"pl-en\">IsMatch<\/span>(<span class=\"pl-smi\">_input<\/span>);<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>IsMatch<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">2,558.1 ns<\/td>\n<td align=\"right\">1.00<\/td>\n<\/tr>\n<tr>\n<td>IsMatch<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">789.3 ns<\/td>\n<td align=\"right\">0.31<\/td>\n<\/tr>\n<tr>\n<td>IsMatch<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">129.0 ns<\/td>\n<td align=\"right\">0.05<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Related to that improvement is <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/35203\">dotnet\/runtime#35203<\/a>, which, also in service of <code>RegexOptions.IgnoreCase<\/code>, reduces the number of virtual calls the implementation was making to <code>CultureInfo.TextInfo<\/code>, caching the <code>TextInfo<\/code> instead of the <code>CultureInfo<\/code> from which it came.<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">private<\/span> <span class=\"pl-k\">readonly<\/span> <span class=\"pl-en\">Regex<\/span> <span class=\"pl-smi\">_regex<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">Regex<\/span>(<span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>Hello, <span class=\"pl-cce\">\\\\<\/span>w+.<span class=\"pl-pds\">\"<\/span><\/span>, <span class=\"pl-smi\">RegexOptions<\/span>.<span class=\"pl-smi\">Compiled<\/span> <span class=\"pl-k\">|<\/span> <span class=\"pl-smi\">RegexOptions<\/span>.<span class=\"pl-smi\">IgnoreCase<\/span>);\r\n<span class=\"pl-k\">private<\/span> <span class=\"pl-k\">readonly<\/span> <span class=\"pl-k\">string<\/span> <span class=\"pl-smi\">_input<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>This is a test to see how well this does.  Hello, world.<span class=\"pl-pds\">\"<\/span><\/span>;\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>] <span class=\"pl-k\">public<\/span> <span class=\"pl-k\">bool<\/span> <span class=\"pl-en\">IsMatch<\/span>() <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">_regex<\/span>.<span class=\"pl-en\">IsMatch<\/span>(<span class=\"pl-smi\">_input<\/span>);<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>IsMatch<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">712.9 ns<\/td>\n<td align=\"right\">1.00<\/td>\n<\/tr>\n<tr>\n<td>IsMatch<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">343.5 ns<\/td>\n<td align=\"right\">0.48<\/td>\n<\/tr>\n<tr>\n<td>IsMatch<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">100.9 ns<\/td>\n<td align=\"right\">0.14<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>One of my favorite recent optimizations, though, was <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/35824\">dotnet\/runtime#35824<\/a> (which was then augmented further in <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/35936\">dotnet\/runtime#35936<\/a>).  The change recognizes that, for a regex beginning with an atomic loop (one explicitly written or more commonly one upgraded to being atomic by automatic analysis of the expression), we can update the next starting position in the scan loop (again, see the blog post for details) based on where the loop ended rather than on where it started.  For many inputs, this can provide a big reduction in overhead.  Using the benchmark and data from <a href=\"https:\/\/github.com\/mariomka\/regex-benchmark\">https:\/\/github.com\/mariomka\/regex-benchmark<\/a>:<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">private<\/span> <span class=\"pl-en\">Regex<\/span> <span class=\"pl-smi\">_email<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">Regex<\/span>(<span class=\"pl-s\"><span class=\"pl-pds\">@\"<\/span>[\\w\\.+-]+@[\\w\\.-]+\\.[\\w\\.-]+<span class=\"pl-pds\">\"<\/span><\/span>, <span class=\"pl-smi\">RegexOptions<\/span>.<span class=\"pl-smi\">Compiled<\/span>);\r\n<span class=\"pl-k\">private<\/span> <span class=\"pl-en\">Regex<\/span> <span class=\"pl-smi\">_uri<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">Regex<\/span>(<span class=\"pl-s\"><span class=\"pl-pds\">@\"<\/span>[\\w]+:\/\/[^\/\\s?#]+[^\\s?#]+(?:\\?[^\\s#]*)?(?:#[^\\s]*)?<span class=\"pl-pds\">\"<\/span><\/span>, <span class=\"pl-smi\">RegexOptions<\/span>.<span class=\"pl-smi\">Compiled<\/span>);\r\n<span class=\"pl-k\">private<\/span> <span class=\"pl-en\">Regex<\/span> <span class=\"pl-smi\">_ip<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">Regex<\/span>(<span class=\"pl-s\"><span class=\"pl-pds\">@\"<\/span>(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9])\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9])<span class=\"pl-pds\">\"<\/span><\/span>, <span class=\"pl-smi\">RegexOptions<\/span>.<span class=\"pl-smi\">Compiled<\/span>);\r\n\r\n<span class=\"pl-k\">private<\/span> <span class=\"pl-k\">string<\/span> <span class=\"pl-smi\">_input<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">HttpClient<\/span>().<span class=\"pl-en\">GetStringAsync<\/span>(<span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>https:\/\/raw.githubusercontent.com\/mariomka\/regex-benchmark\/652d55810691ad88e1c2292a2646d301d3928903\/input-text.txt<span class=\"pl-pds\">\"<\/span><\/span>).<span class=\"pl-smi\">Result<\/span>;\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>] <span class=\"pl-k\">public<\/span> <span class=\"pl-k\">int<\/span> <span class=\"pl-en\">Email<\/span>() <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">_email<\/span>.<span class=\"pl-en\">Matches<\/span>(<span class=\"pl-smi\">_input<\/span>).<span class=\"pl-smi\">Count<\/span>;\r\n[<span class=\"pl-en\">Benchmark<\/span>] <span class=\"pl-k\">public<\/span> <span class=\"pl-k\">int<\/span> <span class=\"pl-en\">Uri<\/span>() <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">_uri<\/span>.<span class=\"pl-en\">Matches<\/span>(<span class=\"pl-smi\">_input<\/span>).<span class=\"pl-smi\">Count<\/span>;\r\n[<span class=\"pl-en\">Benchmark<\/span>] <span class=\"pl-k\">public<\/span> <span class=\"pl-k\">int<\/span> <span class=\"pl-en\">IP<\/span>() <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">_ip<\/span>.<span class=\"pl-en\">Matches<\/span>(<span class=\"pl-smi\">_input<\/span>).<span class=\"pl-smi\">Count<\/span>;<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Email<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">1,036.729 ms<\/td>\n<td align=\"right\">1.00<\/td>\n<\/tr>\n<tr>\n<td>Email<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">930.238 ms<\/td>\n<td align=\"right\">0.90<\/td>\n<\/tr>\n<tr>\n<td>Email<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">50.911 ms<\/td>\n<td align=\"right\">0.05<\/td>\n<\/tr>\n<tr>\n<td><\/td>\n<td><\/td>\n<td align=\"right\"><\/td>\n<td align=\"right\"><\/td>\n<\/tr>\n<tr>\n<td>Uri<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">870.114 ms<\/td>\n<td align=\"right\">1.00<\/td>\n<\/tr>\n<tr>\n<td>Uri<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">759.079 ms<\/td>\n<td align=\"right\">0.87<\/td>\n<\/tr>\n<tr>\n<td>Uri<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">50.022 ms<\/td>\n<td align=\"right\">0.06<\/td>\n<\/tr>\n<tr>\n<td><\/td>\n<td><\/td>\n<td align=\"right\"><\/td>\n<td align=\"right\"><\/td>\n<\/tr>\n<tr>\n<td>IP<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">75.718 ms<\/td>\n<td align=\"right\">1.00<\/td>\n<\/tr>\n<tr>\n<td>IP<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">61.818 ms<\/td>\n<td align=\"right\">0.82<\/td>\n<\/tr>\n<tr>\n<td>IP<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">6.837 ms<\/td>\n<td align=\"right\">0.09<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Finally, not all focus was on the raw throughput of actually executing regular expressions.  One of the ways developers can get the best throughput with <code>Regex<\/code> is by specifying <code>RegexOptions.Compiled<\/code>, which uses Reflection Emit to at runtime generate IL, which in turn needs to be JIT compiled.  Depending on the expressions employed, <code>Regex<\/code> may spit out a fair amount of IL, which then can require a non-trivial amount of JIT processing to churn into assembly code. <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/35352\">dotnet\/runtime#35352<\/a> improved the JIT itself to help with this case, fixing some potentially quadratic-execution-time code paths the regex-generated IL was triggering. And <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/35321\">dotnet\/runtime#35321<\/a> tweaked the IL operations used by <code>Regex<\/code> engine to employ patterns much closer to what the C# compiler would emit, which is important because those same patterns are what the JIT is more tuned to optimize well.  On some real-world workloads featuring several hundred complex regular expressions, these combined to reduce the time it took to JIT the expressions by upwards of 20%.<\/p>\n<h2><a id=\"user-content-threading-and-async\" class=\"anchor\" aria-hidden=\"true\" href=\"#threading-and-async\"><svg class=\"octicon octicon-link\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" height=\"16\" aria-hidden=\"true\"><path fill-rule=\"evenodd\" d=\"M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z\"><\/path><\/svg><\/a><a name=\"threading-and-async\"><\/a>Threading and Async<\/h2>\n<p>One of the biggest changes around asynchrony in .NET 5 is actually not enabled by default, but is another experiment to get feedback. The <a href=\"https:\/\/devblogs.microsoft.com\/dotnet\/async-valuetask-pooling-in-net-5\/\" rel=\"nofollow\">Async ValueTask Pooling in .NET 5<\/a> blog post explains this in much more detail, but essentially <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/26310\">dotnet\/coreclr#26310<\/a> introduced the ability for <code>async ValueTask<\/code> and <code>async ValueTask&lt;T&gt;<\/code> to implicitly cache and reuse the object created to represent an asynchronously completing operation, making the overhead of such methods amortized-allocation-free. The optimization is currently opt-in, meaning you need to set the <code>DOTNET_SYSTEM_THREADING_POOLASYNCVALUETASKS<\/code> environment variable to <code>1<\/code> in order to enable it.  One of the difficulties with enabling this is for code that might be doing something more complex than just <code>await SomeValueTaskReturningMethod()<\/code>, as <code>ValueTasks<\/code> have more constraints than <code>Task<\/code>s about how they can be used.  To help with that, a new <a href=\"https:\/\/docs.microsoft.com\/en-us\/visualstudio\/code-quality\/ca2012\" rel=\"nofollow\"><code>UseValueTasksCorrectly<\/code> analyzer<\/a> was released that will flag most such misuse.<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre>[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">async<\/span> <span class=\"pl-en\">Task<\/span> <span class=\"pl-en\">ValueTaskCost<\/span>()\r\n{\r\n    <span class=\"pl-k\">for<\/span> (<span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">0<\/span>; <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">&lt;<\/span> <span class=\"pl-c1\">1_000<\/span>; <span class=\"pl-smi\">i<\/span><span class=\"pl-k\">++<\/span>)\r\n        <span class=\"pl-k\">await<\/span> <span class=\"pl-en\">YieldOnce<\/span>();\r\n}\r\n\r\n<span class=\"pl-k\">private<\/span> <span class=\"pl-k\">static<\/span> <span class=\"pl-k\">async<\/span> <span class=\"pl-en\">ValueTask<\/span> <span class=\"pl-en\">YieldOnce<\/span>() <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-k\">await<\/span> <span class=\"pl-smi\">Task<\/span>.<span class=\"pl-en\">Yield<\/span>();<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>ValueTaskCost<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">1,635.6 us<\/td>\n<td align=\"right\">1.00<\/td>\n<td align=\"right\">294010 B<\/td>\n<\/tr>\n<tr>\n<td>ValueTaskCost<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">842.7 us<\/td>\n<td align=\"right\">0.51<\/td>\n<td align=\"right\">120184 B<\/td>\n<\/tr>\n<tr>\n<td>ValueTaskCost<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">812.3 us<\/td>\n<td align=\"right\">0.50<\/td>\n<td align=\"right\">186 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Some changes in the C# compiler accrue additional benefits to async methods in .NET 5 (in that the core libraries in .NET 5 are compiled with the newer compiler).  Every async method has a &#8220;builder&#8221; that&#8217;s responsible for producing and completing the returned task, with the C# compiler generating code as part of an async method to use one. <a href=\"https:\/\/github.com\/dotnet\/roslyn\/pull\/41253\">dotnet\/roslyn#41253<\/a> from <a href=\"https:\/\/github.com\/benaadams\">@benaadams<\/a> avoids a struct copy generated as part of that code, which can help reduce overheads, in particular for <code>async ValueTask&lt;T&gt;<\/code> methods where the builder is relatively large (and grows as <code>T<\/code> grows). <a href=\"https:\/\/github.com\/dotnet\/roslyn\/pull\/45262\">dotnet\/roslyn#45262<\/a> also from <a href=\"https:\/\/github.com\/benaadams\">@benaadams<\/a> also tweaks the same generated code to play better with the JIT&#8217;s zero&#8217;ing improvements discussed previously.<\/p>\n<p>There are also some improvements in specific APIs.  <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/35575\">dotnet\/runtime#35575<\/a> was born out of some specific usage of <code>Task.ContinueWith<\/code>, where a continuation is used purely for the purposes of logging an exception in the &#8220;antecedent&#8221; <code>Task<\/code> continued from.  The common case here is that the <code>Task<\/code> doesn&#8217;t fault, and this PR does a better job optimizing for that case.<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">const<\/span> <span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">Iters<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">1_000_000<\/span>;\r\n\r\n<span class=\"pl-k\">private<\/span> <span class=\"pl-en\">AsyncTaskMethodBuilder<\/span>[] <span class=\"pl-smi\">tasks<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">AsyncTaskMethodBuilder<\/span>[<span class=\"pl-smi\">Iters<\/span>];\r\n\r\n[<span class=\"pl-en\">IterationSetup<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">void<\/span> <span class=\"pl-en\">Setup<\/span>()\r\n{\r\n    <span class=\"pl-smi\">Array<\/span>.<span class=\"pl-en\">Clear<\/span>(<span class=\"pl-smi\">tasks<\/span>, <span class=\"pl-c1\">0<\/span>, <span class=\"pl-smi\">tasks<\/span>.<span class=\"pl-smi\">Length<\/span>);\r\n    <span class=\"pl-k\">for<\/span> (<span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">0<\/span>; <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">&lt;<\/span> <span class=\"pl-smi\">tasks<\/span>.<span class=\"pl-smi\">Length<\/span>; <span class=\"pl-smi\">i<\/span><span class=\"pl-k\">++<\/span>)\r\n        <span class=\"pl-c1\">_<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-smi\">tasks<\/span>[<span class=\"pl-smi\">i<\/span>].<span class=\"pl-smi\">Task<\/span>;\r\n}\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>(<span class=\"pl-en\">OperationsPerInvoke<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-smi\">Iters<\/span>)]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">void<\/span> <span class=\"pl-en\">Cancel<\/span>()\r\n{\r\n    <span class=\"pl-k\">for<\/span> (<span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">0<\/span>; <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">&lt;<\/span> <span class=\"pl-smi\">tasks<\/span>.<span class=\"pl-smi\">Length<\/span>; <span class=\"pl-smi\">i<\/span><span class=\"pl-k\">++<\/span>)\r\n    {\r\n        <span class=\"pl-smi\">tasks<\/span>[<span class=\"pl-smi\">i<\/span>].<span class=\"pl-smi\">Task<\/span>.<span class=\"pl-en\">ContinueWith<\/span>(<span class=\"pl-smi\">_<\/span> <span class=\"pl-k\">=&gt;<\/span> { }, <span class=\"pl-smi\">CancellationToken<\/span>.<span class=\"pl-smi\">None<\/span>, <span class=\"pl-smi\">TaskContinuationOptions<\/span>.<span class=\"pl-smi\">OnlyOnFaulted<\/span> <span class=\"pl-k\">|<\/span> <span class=\"pl-smi\">TaskContinuationOptions<\/span>.<span class=\"pl-smi\">ExecuteSynchronously<\/span>, <span class=\"pl-smi\">TaskScheduler<\/span>.<span class=\"pl-smi\">Default<\/span>);\r\n        <span class=\"pl-smi\">tasks<\/span>[<span class=\"pl-smi\">i<\/span>].<span class=\"pl-en\">SetResult<\/span>();\r\n    }\r\n}<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Cancel<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">239.2 ns<\/td>\n<td align=\"right\">1.00<\/td>\n<td align=\"right\">193 B<\/td>\n<\/tr>\n<tr>\n<td>Cancel<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">140.3 ns<\/td>\n<td align=\"right\">0.59<\/td>\n<td align=\"right\">192 B<\/td>\n<\/tr>\n<tr>\n<td>Cancel<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">106.4 ns<\/td>\n<td align=\"right\">0.44<\/td>\n<td align=\"right\">112 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>There were also tweaks to help with specific architectures. Because of the strong memory model employed by x86\/x64 architectures, <code>volatile<\/code> essentially evaporates at JIT time when targeting x86\/x64.  That is not the case for ARM\/ARM64, which have weaker memory models and where <code>volatile<\/code> results in fences being emitted by the JIT.  <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/36697\">dotnet\/runtime#36697<\/a> removes several volatile accesses per work item queued to the <code>ThreadPool<\/code>, making the <code>ThreadPool<\/code> faster on ARM. <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/34225\">dotnet\/runtime#34225<\/a> hoisted a volatile access in <code>ConcurrentDictionary<\/code> out of a loop, which in turn improved throughput of some members on <code>ConcurrentDictionary<\/code> on ARM by as much as 30%. And <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/36976\">dotnet\/runtime#36976<\/a> removed <code>volatile<\/code> entirely from another <code>ConcurrentDictionary<\/code> field.<\/p>\n<h2><a id=\"user-content-collections\" class=\"anchor\" aria-hidden=\"true\" href=\"#collections\"><svg class=\"octicon octicon-link\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" height=\"16\" aria-hidden=\"true\"><path fill-rule=\"evenodd\" d=\"M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z\"><\/path><\/svg><\/a><a name=\"collections\"><\/a>Collections<\/h2>\n<p>Over the years, C# has gained a plethora of valuable features.  Many of these features are focused on developers being able to more succinctly write code, with the language\/compiler being responsible for all the boilerplate, such as with <a href=\"https:\/\/devblogs.microsoft.com\/dotnet\/welcome-to-c-9-0\/\" rel=\"nofollow\">records in C# 9<\/a>.  However, a few features are focused less on productivity and more on performance, and such features are a great boon to the core libraries, which can often use them to make everyone&#8217;s program&#8217;s more efficient.  <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/27195\">dotnet\/runtime#27195<\/a> from <a href=\"https:\/\/github.com\/benaadams\">@benaadams<\/a> is a good example of this.  The PR improves <code>Dictionary&lt;TKey, TValue&gt;<\/code>, taking advantage of ref returns and ref locals, which were introduced in C# 7.  <code>Dictionary&lt;TKey, TValue&gt;<\/code>&#8216;s implementation is backed by an array of entries in the dictionary, and the dictionary has a core routine for looking up a key&#8217;s index in its entries array; that routine is then used from multiple functions, like the indexer, <code>TryGetValue<\/code>, <code>ContainsKey<\/code>, and so on. However, that sharing comes at a cost: by handing back the index and leaving it up to the caller to get the data from that slot as needed, the caller would need to re-index into the array, incurring a second bounds check.  With ref returns, that shared routine could instead hand back a ref to the slot rather than the raw index, enabling the caller to avoid the second bounds check while also avoiding making a copy of the entire entry.  The PR also included some low-level tuning of the generated assembly, reorganizing fields and the operations used to update those fields in a way that enabled the JIT to better tune the generated assembly.<\/p>\n<p><code>Dictionary&lt;TKey,TValue&gt;<\/code>&#8216;s performance was improved further by several more PRs.  Like many hash tables, <code>Dictionary&lt;TKey,TValue&gt;<\/code> is partitioned into &#8220;buckets&#8221;, each of which is essentially a linked list of entries (stored in an array, not with individual node objects per item).  For a given key, a hashing function (<code>TKey<\/code>&#8216;s <code>GetHashCode<\/code> or the supplied <code>IComparer&lt;T&gt;<\/code>&#8216;s <code>GetHashCode<\/code>) is used to compute a hash code for the supplied key, and then that hash code is mapped deterministically to a bucket; once the bucket is found, the implementation then iterates through the chain of entries in that bucket looking for the target key.  The implementation tries to keep the number of entries in each bucket small, growing and rebalancing as necessary to maintain that condition.  As such, a large portion of the cost of a lookup is computing the hashcode-to-bucket mapping.  In order to help maintain a good distribution across the buckets, especially when a less-than-ideal hash code generator is employed by the supplied <code>TKey<\/code> or comparer, the dictionary uses a prime number of buckets, and the bucket mapping is done by <code>hashcode % numBuckets<\/code>.  But at the speeds important here, the division employed by the <code>%<\/code> operator is relatively expensive. Building on <a href=\"https:\/\/lemire.me\/blog\/2019\/02\/08\/faster-remainders-when-the-divisor-is-a-constant-beating-compilers-and-libdivide\/\" rel=\"nofollow\">Daniel Lemire&#8217;s work<\/a>, <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/27299\">dotnet\/coreclr#27299<\/a> from <a href=\"https:\/\/github.com\/benaadams\">@benaadams<\/a> and then <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/406\">dotnet\/runtime#406<\/a> changed the use of <code>%<\/code> in 64-bit processes to instead use a couple of multiplications and shifts to achieve the same result but faster.<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">private<\/span> <span class=\"pl-en\">Dictionary<\/span>&lt;<span class=\"pl-k\">int<\/span>, <span class=\"pl-k\">int<\/span>&gt; <span class=\"pl-smi\">_dictionary<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-smi\">Enumerable<\/span>.<span class=\"pl-en\">Range<\/span>(<span class=\"pl-c1\">0<\/span>, <span class=\"pl-c1\">10_000<\/span>).<span class=\"pl-en\">ToDictionary<\/span>(<span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">i<\/span>);\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">int<\/span> <span class=\"pl-en\">Sum<\/span>()\r\n{\r\n    <span class=\"pl-en\">Dictionary<\/span>&lt;<span class=\"pl-k\">int<\/span>, <span class=\"pl-k\">int<\/span>&gt; <span class=\"pl-smi\">dictionary<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-smi\">_dictionary<\/span>;\r\n    <span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">sum<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">0<\/span>;\r\n\r\n    <span class=\"pl-k\">for<\/span> (<span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">0<\/span>; <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">&lt;<\/span> <span class=\"pl-c1\">10_000<\/span>; <span class=\"pl-smi\">i<\/span><span class=\"pl-k\">++<\/span>)\r\n        <span class=\"pl-k\">if<\/span> (<span class=\"pl-smi\">dictionary<\/span>.<span class=\"pl-en\">TryGetValue<\/span>(<span class=\"pl-smi\">i<\/span>, <span class=\"pl-k\">out<\/span> <span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">value<\/span>))\r\n            <span class=\"pl-smi\">sum<\/span> <span class=\"pl-k\">+=<\/span> <span class=\"pl-smi\">value<\/span>;\r\n\r\n    <span class=\"pl-k\">return<\/span> <span class=\"pl-smi\">sum<\/span>;\r\n}<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Sum<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">77.45 us<\/td>\n<td align=\"right\">1.00<\/td>\n<\/tr>\n<tr>\n<td>Sum<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">67.35 us<\/td>\n<td align=\"right\">0.87<\/td>\n<\/tr>\n<tr>\n<td>Sum<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">44.10 us<\/td>\n<td align=\"right\">0.57<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><code>HashSet&lt;T&gt;<\/code> is very similar to <code>Dictionary&lt;TKey, TValue&gt;<\/code>.  While it exposes a different set of operations (no pun intended), other than only storing a key rather than a key and a value, its data structure is fundamentally the same&#8230; or, at least, it used to be.  Over the years, given how much more <code>Dictionary&lt;TKey,TValue&gt;<\/code> is used than <code>HashSet&lt;T&gt;<\/code>, more effort has gone into optimizing <code>Dictionary&lt;TKey, TValue&gt;<\/code>&#8216;s implementation, and the two implementations have drifted. <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/40106\">dotnet\/corefx#40106<\/a> from <a href=\"https:\/\/github.com\/JeffreyZhao\">@JeffreyZhao<\/a> ported some of the improvements from dictionary to hash set, and then <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/37180\">dotnet\/runtime#37180<\/a> effectively rewrote <code>HashSet&lt;T&gt;<\/code>&#8216;s implementation by re-syncing it with dictionary&#8217;s (along with moving it lower in the stack so that some places a dictionary was being used for a set could be properly replaced).  The net result is that <code>HashSet&lt;T&gt;<\/code> ends up experiencing similar gains (more so even, because it was starting from a worse place).<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">private<\/span> <span class=\"pl-en\">HashSet<\/span>&lt;<span class=\"pl-k\">int<\/span>&gt; <span class=\"pl-smi\">_set<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-smi\">Enumerable<\/span>.<span class=\"pl-en\">Range<\/span>(<span class=\"pl-c1\">0<\/span>, <span class=\"pl-c1\">10_000<\/span>).<span class=\"pl-en\">ToHashSet<\/span>();\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">int<\/span> <span class=\"pl-en\">Sum<\/span>()\r\n{\r\n    <span class=\"pl-en\">HashSet<\/span>&lt;<span class=\"pl-k\">int<\/span>&gt; <span class=\"pl-smi\">set<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-smi\">_set<\/span>;\r\n    <span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">sum<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">0<\/span>;\r\n\r\n    <span class=\"pl-k\">for<\/span> (<span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">0<\/span>; <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">&lt;<\/span> <span class=\"pl-c1\">10_000<\/span>; <span class=\"pl-smi\">i<\/span><span class=\"pl-k\">++<\/span>)\r\n        <span class=\"pl-k\">if<\/span> (<span class=\"pl-smi\">set<\/span>.<span class=\"pl-en\">Contains<\/span>(<span class=\"pl-smi\">i<\/span>))\r\n            <span class=\"pl-smi\">sum<\/span> <span class=\"pl-k\">+=<\/span> <span class=\"pl-smi\">i<\/span>;\r\n\r\n    <span class=\"pl-k\">return<\/span> <span class=\"pl-smi\">sum<\/span>;\r\n}<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Sum<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">76.29 us<\/td>\n<td align=\"right\">1.00<\/td>\n<\/tr>\n<tr>\n<td>Sum<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">79.23 us<\/td>\n<td align=\"right\">1.04<\/td>\n<\/tr>\n<tr>\n<td>Sum<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">42.63 us<\/td>\n<td align=\"right\">0.56<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Similarly, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/37081\">dotnet\/runtime#37081<\/a> ported similar improvements from <code>Dictionary&lt;TKey, TValue&gt;<\/code> to <code>ConcurrentDictionary&lt;TKey, TValue&gt;<\/code>.<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">private<\/span> <span class=\"pl-en\">ConcurrentDictionary<\/span>&lt;<span class=\"pl-k\">int<\/span>, <span class=\"pl-k\">int<\/span>&gt; <span class=\"pl-smi\">_dictionary<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">ConcurrentDictionary<\/span>&lt;<span class=\"pl-k\">int<\/span>, <span class=\"pl-k\">int<\/span>&gt;(<span class=\"pl-smi\">Enumerable<\/span>.<span class=\"pl-en\">Range<\/span>(<span class=\"pl-c1\">0<\/span>, <span class=\"pl-c1\">10_000<\/span>).<span class=\"pl-en\">Select<\/span>(<span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">KeyValuePair<\/span>&lt;<span class=\"pl-k\">int<\/span>, <span class=\"pl-k\">int<\/span>&gt;(<span class=\"pl-smi\">i<\/span>, <span class=\"pl-smi\">i<\/span>)));\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">int<\/span> <span class=\"pl-en\">Sum<\/span>()\r\n{\r\n    <span class=\"pl-en\">ConcurrentDictionary<\/span>&lt;<span class=\"pl-k\">int<\/span>, <span class=\"pl-k\">int<\/span>&gt; <span class=\"pl-smi\">dictionary<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-smi\">_dictionary<\/span>;\r\n    <span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">sum<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">0<\/span>;\r\n\r\n    <span class=\"pl-k\">for<\/span> (<span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">0<\/span>; <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">&lt;<\/span> <span class=\"pl-c1\">10_000<\/span>; <span class=\"pl-smi\">i<\/span><span class=\"pl-k\">++<\/span>)\r\n        <span class=\"pl-k\">if<\/span> (<span class=\"pl-smi\">dictionary<\/span>.<span class=\"pl-en\">TryGetValue<\/span>(<span class=\"pl-smi\">i<\/span>, <span class=\"pl-k\">out<\/span> <span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">value<\/span>))\r\n            <span class=\"pl-smi\">sum<\/span> <span class=\"pl-k\">+=<\/span> <span class=\"pl-smi\">value<\/span>;\r\n\r\n    <span class=\"pl-k\">return<\/span> <span class=\"pl-smi\">sum<\/span>;\r\n}<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Sum<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">115.25 us<\/td>\n<td align=\"right\">1.00<\/td>\n<\/tr>\n<tr>\n<td>Sum<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">84.30 us<\/td>\n<td align=\"right\">0.73<\/td>\n<\/tr>\n<tr>\n<td>Sum<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">49.52 us<\/td>\n<td align=\"right\">0.43<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>System.Collections.Immutable has also seen improvements in the release. <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/1183\">dotnet\/runtime#1183<\/a> is a one-line but impactful change from <a href=\"https:\/\/github.com\/hnrqbaggio\">@hnrqbaggio<\/a> to improve the performance of <code>foreach<\/code>&#8216;ing over an <code>ImmutableArray&lt;T&gt;<\/code> by adding <code>[MethodImpl(MethodImplOptions.AggressiveInlining)]<\/code> to <code>ImmutableArray&lt;T&gt;<\/code>&#8216;s <code>GetEnumerator<\/code> method.  We&#8217;re generally very cautious about sprinkling <code>AggressiveInlining<\/code> around: it can make microbenchmarks look really good, since it ends up eliminating the overhead of calling the relevant method, but it can also significantly increase code size, which can then negatively impact a whole bunch of things, such as causing the instruction cache to become much less effective.  In this case, however, it not only improves throughput but also actually reduces code size.  Inlining is a powerful optimization, not just because it eliminates the overhead of a call, but because it exposes the contents of the callee to the caller.  The JIT generally doesn&#8217;t do interprocedural analysis, due to the JIT&#8217;s limited time budget for optimizations, but inlining overcomes that by merging the caller and the callee, at which point the JIT optimizations of the caller factor in the callee.  Imagine a method <code>public static int GetValue() =&gt; 42;<\/code> and a caller that does <code>if (GetValue() * 2 &gt; 100) { ... lots of code ... }<\/code>.  If <code>GetValue()<\/code> isn&#8217;t inlined, that comparison and &#8220;lots of code&#8221; will get JIT&#8217;d, but if <code>GetValue()<\/code> is inlined, the JIT will see this as <code>if (84 &gt; 100) { ... lots of code ... }<\/code>, and the whole block will be dropped.  Thankfully such a simple method will almost always be automatically inlined, but <code>ImmutableArray&lt;T&gt;<\/code>&#8216;s <code>GetEnumerator<\/code> is just large enough that the JIT doesn&#8217;t recognize automatically how beneficial it will be.  In practice, when the <code>GetEnumerator<\/code> is inlined, the JIT ends up being able to better recognize that the <code>foreach<\/code> is iterating over an array, and instead of the generated code for <code>Sum<\/code> being:<\/p>\n<div class=\"highlight highlight-source-assembly\">\n<pre><span class=\"pl-c\">; Program.Sum()<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">push<\/span><span class=\"pl-en\">      <\/span><span class=\"pl-v\">rsi<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">sub<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">rsp<\/span><span class=\"pl-s1\">,<\/span><span class=\"pl-c1\">30<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">xor<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">eax<\/span><span class=\"pl-s1\">,<\/span><span class=\"pl-v\">eax<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">mov<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-s1\">[<\/span><span class=\"pl-v\">rsp<\/span><span class=\"pl-s1\">+<\/span><span class=\"pl-c1\">20<\/span><span class=\"pl-s1\">],<\/span><span class=\"pl-v\">rax<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">mov<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-s1\">[<\/span><span class=\"pl-v\">rsp<\/span><span class=\"pl-s1\">+<\/span><span class=\"pl-c1\">28<\/span><span class=\"pl-s1\">],<\/span><span class=\"pl-v\">rax<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">xor<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">esi<\/span><span class=\"pl-s1\">,<\/span><span class=\"pl-v\">esi<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">cmp<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-s1\">[<\/span><span class=\"pl-v\">rcx<\/span><span class=\"pl-s1\">],<\/span><span class=\"pl-v\">ecx<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">add<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">rcx<\/span><span class=\"pl-s1\">,<\/span><span class=\"pl-c1\">8<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">lea<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">rdx<\/span><span class=\"pl-s1\">,[<\/span><span class=\"pl-v\">rsp<\/span><span class=\"pl-s1\">+<\/span><span class=\"pl-c1\">20<\/span><span class=\"pl-s1\">]<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">call<\/span><span class=\"pl-en\">      System.Collections.Immutable.ImmutableArray'<\/span><span class=\"pl-c1\">1<\/span><span class=\"pl-s1\">[[<\/span><span class=\"pl-en\">System.Int32<\/span><span class=\"pl-s1\">,<\/span><span class=\"pl-en\"> System.Private.CoreLib<\/span><span class=\"pl-s1\">]]<\/span><span class=\"pl-en\">.GetEnumerator()<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">jmp<\/span><span class=\"pl-en\">       short M00_L01<\/span>\r\n<span class=\"pl-en\">M00_L00:<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">cmp<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-s1\">[<\/span><span class=\"pl-v\">rsp<\/span><span class=\"pl-s1\">+<\/span><span class=\"pl-c1\">28<\/span><span class=\"pl-s1\">],<\/span><span class=\"pl-v\">edx<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">jae<\/span><span class=\"pl-en\">       short M00_L02<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">mov<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">rax<\/span><span class=\"pl-s1\">,[<\/span><span class=\"pl-v\">rsp<\/span><span class=\"pl-s1\">+<\/span><span class=\"pl-c1\">20<\/span><span class=\"pl-s1\">]<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">mov<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">edx<\/span><span class=\"pl-s1\">,[<\/span><span class=\"pl-v\">rsp<\/span><span class=\"pl-s1\">+<\/span><span class=\"pl-c1\">28<\/span><span class=\"pl-s1\">]<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">movsxd<\/span><span class=\"pl-en\">    <\/span><span class=\"pl-v\">rdx<\/span><span class=\"pl-s1\">,<\/span><span class=\"pl-v\">edx<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">mov<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">eax<\/span><span class=\"pl-s1\">,[<\/span><span class=\"pl-v\">rax<\/span><span class=\"pl-s1\">+<\/span><span class=\"pl-v\">rdx<\/span><span class=\"pl-s1\">*<\/span><span class=\"pl-c1\">4<\/span><span class=\"pl-s1\">+<\/span><span class=\"pl-c1\">10<\/span><span class=\"pl-s1\">]<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">add<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">esi<\/span><span class=\"pl-s1\">,<\/span><span class=\"pl-v\">eax<\/span>\r\n<span class=\"pl-en\">M00_L01:<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">mov<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">eax<\/span><span class=\"pl-s1\">,[<\/span><span class=\"pl-v\">rsp<\/span><span class=\"pl-s1\">+<\/span><span class=\"pl-c1\">28<\/span><span class=\"pl-s1\">]<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">inc<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">eax<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">mov<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-s1\">[<\/span><span class=\"pl-v\">rsp<\/span><span class=\"pl-s1\">+<\/span><span class=\"pl-c1\">28<\/span><span class=\"pl-s1\">],<\/span><span class=\"pl-v\">eax<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">mov<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">rdx<\/span><span class=\"pl-s1\">,[<\/span><span class=\"pl-v\">rsp<\/span><span class=\"pl-s1\">+<\/span><span class=\"pl-c1\">20<\/span><span class=\"pl-s1\">]<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">mov<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">edx<\/span><span class=\"pl-s1\">,[<\/span><span class=\"pl-v\">rdx<\/span><span class=\"pl-s1\">+<\/span><span class=\"pl-c1\">8<\/span><span class=\"pl-s1\">]<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">cmp<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">edx<\/span><span class=\"pl-s1\">,<\/span><span class=\"pl-v\">eax<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">jg<\/span><span class=\"pl-en\">        short M00_L00<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">mov<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">eax<\/span><span class=\"pl-s1\">,<\/span><span class=\"pl-v\">esi<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">add<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">rsp<\/span><span class=\"pl-s1\">,<\/span><span class=\"pl-c1\">30<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">pop<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">rsi<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">ret<\/span>\r\n<span class=\"pl-en\">M00_L02:<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">call<\/span><span class=\"pl-en\">      CORINFO_HELP_RNGCHKFAIL<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">int<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-c1\">3<\/span>\r\n<span class=\"pl-c\">; Total bytes of code 97<\/span><\/pre>\n<\/div>\n<p>as it is in .NET Core 3.1, in .NET 5 it ends up being<\/p>\n<div class=\"highlight highlight-source-assembly\">\n<pre><span class=\"pl-c\">; Program.Sum()<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">sub<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">rsp<\/span><span class=\"pl-s1\">,<\/span><span class=\"pl-c1\">28<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">xor<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">eax<\/span><span class=\"pl-s1\">,<\/span><span class=\"pl-v\">eax<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">add<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">rcx<\/span><span class=\"pl-s1\">,<\/span><span class=\"pl-c1\">8<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">mov<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">rdx<\/span><span class=\"pl-s1\">,[<\/span><span class=\"pl-v\">rcx<\/span><span class=\"pl-s1\">]<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">mov<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">ecx<\/span><span class=\"pl-s1\">,[<\/span><span class=\"pl-v\">rdx<\/span><span class=\"pl-s1\">+<\/span><span class=\"pl-c1\">8<\/span><span class=\"pl-s1\">]<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">mov<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">r8d<\/span><span class=\"pl-s1\">,<\/span><span class=\"pl-en\">0FFFFFFFF<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">jmp<\/span><span class=\"pl-en\">       short M00_L01<\/span>\r\n<span class=\"pl-en\">M00_L00:<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">cmp<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">r8d<\/span><span class=\"pl-s1\">,<\/span><span class=\"pl-v\">ecx<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">jae<\/span><span class=\"pl-en\">       short M00_L02<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">movsxd<\/span><span class=\"pl-en\">    <\/span><span class=\"pl-v\">r9<\/span><span class=\"pl-s1\">,<\/span><span class=\"pl-v\">r8d<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">mov<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">r9d<\/span><span class=\"pl-s1\">,[<\/span><span class=\"pl-v\">rdx<\/span><span class=\"pl-s1\">+<\/span><span class=\"pl-v\">r9<\/span><span class=\"pl-s1\">*<\/span><span class=\"pl-c1\">4<\/span><span class=\"pl-s1\">+<\/span><span class=\"pl-c1\">10<\/span><span class=\"pl-s1\">]<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">add<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">eax<\/span><span class=\"pl-s1\">,<\/span><span class=\"pl-v\">r9d<\/span>\r\n<span class=\"pl-en\">M00_L01:<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">inc<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">r8d<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">cmp<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">ecx<\/span><span class=\"pl-s1\">,<\/span><span class=\"pl-v\">r8d<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">jg<\/span><span class=\"pl-en\">        short M00_L00<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">add<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-v\">rsp<\/span><span class=\"pl-s1\">,<\/span><span class=\"pl-c1\">28<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">ret<\/span>\r\n<span class=\"pl-en\">M00_L02:<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">call<\/span><span class=\"pl-en\">      CORINFO_HELP_RNGCHKFAIL<\/span>\r\n<span class=\"pl-en\">       <\/span><span class=\"pl-k\">int<\/span><span class=\"pl-en\">       <\/span><span class=\"pl-c1\">3<\/span>\r\n<span class=\"pl-c\">; Total bytes of code 59<\/span><\/pre>\n<\/div>\n<p>So, much smaller code and much faster execution:<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">private<\/span> <span class=\"pl-en\">ImmutableArray<\/span>&lt;<span class=\"pl-k\">int<\/span>&gt; <span class=\"pl-smi\">_array<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-smi\">ImmutableArray<\/span>.<span class=\"pl-en\">Create<\/span>(<span class=\"pl-smi\">Enumerable<\/span>.<span class=\"pl-en\">Range<\/span>(<span class=\"pl-c1\">0<\/span>, <span class=\"pl-c1\">100_000<\/span>).<span class=\"pl-en\">ToArray<\/span>());\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">int<\/span> <span class=\"pl-en\">Sum<\/span>()\r\n{\r\n    <span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">sum<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">0<\/span>;\r\n\r\n    <span class=\"pl-k\">foreach<\/span> (<span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">in<\/span> <span class=\"pl-smi\">_array<\/span>)\r\n        <span class=\"pl-smi\">sum<\/span> <span class=\"pl-k\">+=<\/span> <span class=\"pl-smi\">i<\/span>;\r\n\r\n    <span class=\"pl-k\">return<\/span> <span class=\"pl-smi\">sum<\/span>;\r\n}<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Sum<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">187.60 us<\/td>\n<td align=\"right\">1.00<\/td>\n<\/tr>\n<tr>\n<td>Sum<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">187.32 us<\/td>\n<td align=\"right\">1.00<\/td>\n<\/tr>\n<tr>\n<td>Sum<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">46.59 us<\/td>\n<td align=\"right\">0.25<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><code>ImmutableList&lt;T&gt;.Contains<\/code> also saw significant improvements due to <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/40540\">dotnet\/corefx#40540<\/a> from <a href=\"https:\/\/github.com\/shortspider\">@shortspider<\/a>.  <code>Contains<\/code> had been implemented using <code>ImmutableList&lt;T&gt;<\/code>&#8216;s <code>IndexOf<\/code> method, which is in turn implemented on top of its <code>Enumerator<\/code>.  Under the covers <code>ImmutableList&lt;T&gt;<\/code> is implemented today as an <a href=\"https:\/\/en.wikipedia.org\/wiki\/AVL_tree\" rel=\"nofollow\">AVL tree<\/a>, a form of self-balancing binary search tree, and in order to walk such a tree in order, it needs to retain a non-trivial amount of state, and <code>ImmutableList&lt;T&gt;<\/code>&#8216;s enumerator goes to great pains to avoid allocating per enumeration in order to store that state.  That results in non-trivial overhead.  However, <code>Contains<\/code> doesn&#8217;t care about the exact index of an element in the list (nor which of potentially multiple copies is found), just that it&#8217;s there, and as such, it can employ a trivial recursive tree search. (And because the tree is balanced, we&#8217;re not concerned about stack overflow conditions.)<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">private<\/span> <span class=\"pl-en\">ImmutableList<\/span>&lt;<span class=\"pl-k\">int<\/span>&gt; <span class=\"pl-smi\">_list<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-smi\">ImmutableList<\/span>.<span class=\"pl-en\">Create<\/span>(<span class=\"pl-smi\">Enumerable<\/span>.<span class=\"pl-en\">Range<\/span>(<span class=\"pl-c1\">0<\/span>, <span class=\"pl-c1\">1_000<\/span>).<span class=\"pl-en\">ToArray<\/span>());\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">int<\/span> <span class=\"pl-en\">Sum<\/span>()\r\n{\r\n    <span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">sum<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">0<\/span>;\r\n\r\n    <span class=\"pl-k\">for<\/span> (<span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">0<\/span>; <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">&lt;<\/span> <span class=\"pl-c1\">1_000<\/span>; <span class=\"pl-smi\">i<\/span><span class=\"pl-k\">++<\/span>)\r\n        <span class=\"pl-k\">if<\/span> (<span class=\"pl-smi\">_list<\/span>.<span class=\"pl-en\">Contains<\/span>(<span class=\"pl-smi\">i<\/span>))\r\n            <span class=\"pl-smi\">sum<\/span> <span class=\"pl-k\">+=<\/span> <span class=\"pl-smi\">i<\/span>;\r\n\r\n    <span class=\"pl-k\">return<\/span> <span class=\"pl-smi\">sum<\/span>;\r\n}<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Sum<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">22.259 ms<\/td>\n<td align=\"right\">1.00<\/td>\n<\/tr>\n<tr>\n<td>Sum<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">22.872 ms<\/td>\n<td align=\"right\">1.03<\/td>\n<\/tr>\n<tr>\n<td>Sum<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">2.066 ms<\/td>\n<td align=\"right\">0.09<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The previously highlighted collection improvements were all to general-purpose collections, meant to be used with whatever data the developer needs stored.  But not all collection types are like that: some are much more specialized to a particular data type, and such collections see performance improvements in .NET 5 as well.  <code>BitArray<\/code> is one such example, with several PRs this release making significant improvements to its performance.  In particular, <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/41896\">dotnet\/corefx#41896<\/a> from <a href=\"https:\/\/github.com\/Gnbrkm41\">@Gnbrkm41<\/a> utilized AVX2 and SSE2 intrinsics to vectorize many of the operations on <code>BitArray<\/code> (<a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/33749\">dotnet\/runtime#33749<\/a> subsequently added ARM64 intrinsics, as well):<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">private<\/span> <span class=\"pl-k\">bool<\/span>[] <span class=\"pl-smi\">_array<\/span>;\r\n\r\n[<span class=\"pl-en\">GlobalSetup<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">void<\/span> <span class=\"pl-en\">Setup<\/span>()\r\n{\r\n    <span class=\"pl-k\">var<\/span> <span class=\"pl-smi\">r<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">Random<\/span>(<span class=\"pl-c1\">42<\/span>);\r\n    <span class=\"pl-smi\">_array<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-smi\">Enumerable<\/span>.<span class=\"pl-en\">Range<\/span>(<span class=\"pl-c1\">0<\/span>, <span class=\"pl-c1\">1000<\/span>).<span class=\"pl-en\">Select<\/span>(<span class=\"pl-smi\">_<\/span> <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">r<\/span>.<span class=\"pl-en\">Next<\/span>(<span class=\"pl-c1\">0<\/span>, <span class=\"pl-c1\">2<\/span>) <span class=\"pl-k\">==<\/span> <span class=\"pl-c1\">0<\/span>).<span class=\"pl-en\">ToArray<\/span>();\r\n}\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-en\">BitArray<\/span> <span class=\"pl-en\">Create<\/span>() <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">BitArray<\/span>(<span class=\"pl-smi\">_array<\/span>);<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Create<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">1,140.91 ns<\/td>\n<td align=\"right\">1.00<\/td>\n<\/tr>\n<tr>\n<td>Create<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">861.97 ns<\/td>\n<td align=\"right\">0.76<\/td>\n<\/tr>\n<tr>\n<td>Create<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">49.08 ns<\/td>\n<td align=\"right\">0.04<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h3><a id=\"user-content-linq\" class=\"anchor\" aria-hidden=\"true\" href=\"#linq\"><svg class=\"octicon octicon-link\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" height=\"16\" aria-hidden=\"true\"><path fill-rule=\"evenodd\" d=\"M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z\"><\/path><\/svg><\/a><a name=\"linq\"><\/a>LINQ<\/h3>\n<p>Previous releases of .NET Core saw a large amount of churn in the <code>System.Linq<\/code> codebase, in particular to improve performance.  That flow has slowed, but .NET 5 still sees performance improvements in LINQ.<\/p>\n<p>One noteable improvement is in <code>OrderBy<\/code>.  As discussed earlier, there were multiple motivations for moving coreclr&#8217;s native sorting implementation up into managed code, one of which was being able to reuse it easily as part of span-based sorting methods.  Such APIs were exposed publicly, and with <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/1888#issuecomment-575861604\">dotnet\/runtime#1888<\/a>, we were able to utilize that span-based sorting in <code>System.Linq<\/code>.  This was beneficial in particular because it enabled utilizing the <code>Comparison&lt;T&gt;<\/code>-based sorting routines, which in turn enabled avoiding multiple levels of indirection on every comparison operation.<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre>[<span class=\"pl-en\">GlobalSetup<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">void<\/span> <span class=\"pl-en\">Setup<\/span>()\r\n{\r\n    <span class=\"pl-k\">var<\/span> <span class=\"pl-smi\">r<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">Random<\/span>(<span class=\"pl-c1\">42<\/span>);\r\n    <span class=\"pl-smi\">_array<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-smi\">Enumerable<\/span>.<span class=\"pl-en\">Range<\/span>(<span class=\"pl-c1\">0<\/span>, <span class=\"pl-c1\">1_000<\/span>).<span class=\"pl-en\">Select<\/span>(<span class=\"pl-smi\">_<\/span> <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">r<\/span>.<span class=\"pl-en\">Next<\/span>()).<span class=\"pl-en\">ToArray<\/span>();\r\n}\r\n\r\n<span class=\"pl-k\">private<\/span> <span class=\"pl-k\">int<\/span>[] <span class=\"pl-smi\">_array<\/span>;\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">void<\/span> <span class=\"pl-en\">Sort<\/span>()\r\n{\r\n    <span class=\"pl-k\">foreach<\/span> (<span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">in<\/span> <span class=\"pl-smi\">_array<\/span>.<span class=\"pl-en\">OrderBy<\/span>(<span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">i<\/span>)) { }\r\n}<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Sort<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">100.78 us<\/td>\n<td align=\"right\">1.00<\/td>\n<\/tr>\n<tr>\n<td>Sort<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">101.03 us<\/td>\n<td align=\"right\">1.00<\/td>\n<\/tr>\n<tr>\n<td>Sort<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">85.46 us<\/td>\n<td align=\"right\">0.85<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Not bad for a one-line change.<\/p>\n<p>Another improvement was <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/41342\">dotnet\/corefx#41342<\/a> from <a href=\"https:\/\/github.com\/timandy\">@timandy<\/a>.  The PR augmented <code>Enumerable.SkipLast<\/code> to special-case <code>IList&lt;T&gt;<\/code> as well as the internal <code>IPartition&lt;T&gt;<\/code> interface (which is how various operators communicate with each other for optimization purposes) in order to re-express <code>SkipLast<\/code> as a <code>Take<\/code> operation when the length of the source could be cheaply determined.<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">private<\/span> <span class=\"pl-en\">IEnumerable<\/span>&lt;<span class=\"pl-k\">int<\/span>&gt; <span class=\"pl-smi\">data<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-smi\">Enumerable<\/span>.<span class=\"pl-en\">Range<\/span>(<span class=\"pl-c1\">0<\/span>, <span class=\"pl-c1\">100<\/span>).<span class=\"pl-en\">ToList<\/span>();\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">int<\/span> <span class=\"pl-en\">SkipLast<\/span>() <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">data<\/span>.<span class=\"pl-en\">SkipLast<\/span>(<span class=\"pl-c1\">5<\/span>).<span class=\"pl-en\">Sum<\/span>();<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>SkipLast<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">1,641.0 ns<\/td>\n<td align=\"right\">1.00<\/td>\n<td align=\"right\">248 B<\/td>\n<\/tr>\n<tr>\n<td>SkipLast<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">684.8 ns<\/td>\n<td align=\"right\">0.42<\/td>\n<td align=\"right\">48 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>As a final example, <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/40377\">dotnet\/corefx#40377<\/a> was arguably a long time coming.  This is an interesting case to me.  For a while now I&#8217;ve seen developers assume that <code>Enumerable.Any()<\/code> is more efficient than <code>Enumerable.Count() != 0<\/code>; after all, <code>Any()<\/code> only needs to determine whether there&#8217;s anything in the source, and <code>Count()<\/code> needs to determine how many things there are in the source.  Thus, with any reasonable collection, <code>Any()<\/code> should at worst case be O(1) and <code>Count()<\/code> may at worst case be O(N), so wouldn&#8217;t <code>Any()<\/code> always be preferable?  There are even Roslyn analyzers that recommend this conversion.  Unfortunately, it&#8217;s not always the case.  Until .NET 5, <code>Any()<\/code> was implemented essentially as follows:<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">using<\/span> (<span class=\"pl-en\">IEnumerator<\/span>&lt;<span class=\"pl-en\">T<\/span>&gt; <span class=\"pl-en\">e<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-en\">source<\/span>.<span class=\"pl-en\">GetEnumerator<\/span>)\r\n    <span class=\"pl-en\">return<\/span> <span class=\"pl-en\">e<\/span>.<span class=\"pl-en\">MoveNext<\/span>();<\/pre>\n<\/div>\n<p>That means that in the common case, even though it&#8217;s likely an O(1) operation, it&#8217;s going to result in an enumerator object being allocated as well as two interface dispatches.  In contrast, since the initial release of LINQ in .NET Framework 3.0, <code>Count()<\/code> has had optimized code paths that special-case <code>ICollection&lt;T&gt;<\/code> to use its <code>Count<\/code> property, in which case generally it&#8217;s going to be O(1) and allocation-free with only one interface dispatch.  As a result, for very common cases (like the source being a <code>List&lt;T&gt;<\/code>), it was actually more efficient to use <code>Count() != 0<\/code> than it was to use <code>Any()<\/code>.  While adding an interface check has some overhead, it was worthwhile adding it to make the <code>Any()<\/code> implementation predictable and consistent with <code>Count()<\/code>, such that they could be more easily reasoned about and such that the prevailing wisdom about their costs would become correct.<\/p>\n<h2><a id=\"user-content-networking\" class=\"anchor\" aria-hidden=\"true\" href=\"#networking\"><svg class=\"octicon octicon-link\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" height=\"16\" aria-hidden=\"true\"><path fill-rule=\"evenodd\" d=\"M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z\"><\/path><\/svg><\/a><a name=\"networking\"><\/a>Networking<\/h2>\n<p>Networking is a critical component of almost any application these days, and great networking performance is of paramount important.  As such, every release of .NET now sees a lot of attention paid to improving networking performance, and .NET 5 is no exception.<\/p>\n<p>Let&#8217;s start by looking at some primitives and working our way up.  <code>System.Uri<\/code> is used by most any app to represent urls, and it&#8217;s important that it be fast.  A multitude of PRs have gone into making <code>Uri<\/code> much faster in .NET 5. Arguably the most important operation for a <code>Uri<\/code> is constructing one, and <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/36915\">dotnet\/runtime#36915<\/a> made that faster for all <code>Uri<\/code>s, primarily just by paying attention to overheads and not incurring unnecessary costs:<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre>[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-en\">Uri<\/span> <span class=\"pl-en\">Ctor<\/span>() <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">Uri<\/span>(<span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>https:\/\/github.com\/dotnet\/runtime\/pull\/36915<span class=\"pl-pds\">\"<\/span><\/span>);<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Ctor<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">443.2 ns<\/td>\n<td align=\"right\">1.00<\/td>\n<td align=\"right\">225 B<\/td>\n<\/tr>\n<tr>\n<td>Ctor<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">192.3 ns<\/td>\n<td align=\"right\">0.43<\/td>\n<td align=\"right\">72 B<\/td>\n<\/tr>\n<tr>\n<td>Ctor<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">129.9 ns<\/td>\n<td align=\"right\">0.29<\/td>\n<td align=\"right\">56 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>After construction, it&#8217;s very common for applications to access the various components of a <code>Uri<\/code>, and that has been improved as well.  In particular, it&#8217;s common with a type like <code>HttpClient<\/code> to have a single <code>Uri<\/code> that&#8217;s used repeatedly for issuing requests.  The <code>HttpClient<\/code> implementation will access the <code>Uri.PathAndQuery<\/code> property in order to send that as part of the HTTP request (e.g. <code>GET \/dotnet\/runtime HTTP\/1.1<\/code>), and in the past that meant recreating a string for that portion of the <code>Uri<\/code> on every request.  Thanks to <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/36460\">dotnet\/runtime#36460<\/a>, that is now cached (as is the <code>IdnHost<\/code>):<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">private<\/span> <span class=\"pl-en\">Uri<\/span> <span class=\"pl-smi\">_uri<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">Uri<\/span>(<span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>http:\/\/github.com\/dotnet\/runtime<span class=\"pl-pds\">\"<\/span><\/span>);\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">string<\/span> <span class=\"pl-en\">PathAndQuery<\/span>() <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">_uri<\/span>.<span class=\"pl-smi\">PathAndQuery<\/span>;<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>PathAndQuery<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">17.936 ns<\/td>\n<td align=\"right\">1.00<\/td>\n<td align=\"right\">56 B<\/td>\n<\/tr>\n<tr>\n<td>PathAndQuery<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">30.891 ns<\/td>\n<td align=\"right\">1.72<\/td>\n<td align=\"right\">56 B<\/td>\n<\/tr>\n<tr>\n<td>PathAndQuery<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">2.854 ns<\/td>\n<td align=\"right\">0.16<\/td>\n<td align=\"right\">&#8211;<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Beyond that, there are a myriad of ways code interacts with <code>Uri<\/code>s, many of which have been improved.  For example, <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/41772\">dotnet\/corefx#41772<\/a> improved <code>Uri.EscapeDataString<\/code> and <code>Uri.EscapeUriString<\/code>, which escape a string according to <a href=\"https:\/\/tools.ietf.org\/html\/rfc3986\" rel=\"nofollow\">RFC 3986<\/a> and <a href=\"https:\/\/tools.ietf.org\/html\/rfc3987\" rel=\"nofollow\">RFC 3987<\/a>.  Both of these methods relied on a shared helper that employed <code>unsafe<\/code> code, that roundtripped through a <code>char[]<\/code>, and that had a lot of complexity around Unicode handling.  This PR rewrote that helper to utilize newer features of .NET, like spans and <a href=\"https:\/\/docs.microsoft.com\/en-us\/dotnet\/api\/system.text.rune\" rel=\"nofollow\">runes<\/a>, in order to make the escape operation both safe and fast.  For some inputs, the gains are modest, but for inputs involving Unicode or even for long ASCII inputs, the gains are significant.<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre>[<span class=\"pl-en\">Params<\/span>(<span class=\"pl-c1\">false<\/span>, <span class=\"pl-c1\">true<\/span>)]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-smi\">bool<\/span> <span class=\"pl-smi\">ASCII<\/span> { <span class=\"pl-smi\">get<\/span>; <span class=\"pl-smi\">set<\/span>; }\r\n\r\n[<span class=\"pl-en\">GlobalSetup<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">void<\/span> <span class=\"pl-en\">Setup<\/span>()\r\n{\r\n    <span class=\"pl-smi\">_input<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-smi\">ASCII<\/span> <span class=\"pl-k\">?<\/span>\r\n        <span class=\"pl-k\">new<\/span> <span class=\"pl-k\">string<\/span>(<span class=\"pl-s\">'s'<\/span>, <span class=\"pl-c1\">20_000<\/span>) <span class=\"pl-k\">:<\/span>\r\n        <span class=\"pl-smi\">string<\/span>.<span class=\"pl-en\">Concat<\/span>(<span class=\"pl-smi\">Enumerable<\/span>.<span class=\"pl-en\">Repeat<\/span>(<span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span><span class=\"pl-cce\">\\x<\/span>D83D<span class=\"pl-cce\">\\x<\/span>DE00<span class=\"pl-pds\">\"<\/span><\/span>, <span class=\"pl-c1\">10_000<\/span>));\r\n}\r\n\r\n<span class=\"pl-k\">private<\/span> <span class=\"pl-k\">string<\/span> <span class=\"pl-smi\">_input<\/span>;\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>] <span class=\"pl-k\">public<\/span> <span class=\"pl-k\">string<\/span> <span class=\"pl-en\">Escape<\/span>() <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">Uri<\/span>.<span class=\"pl-en\">EscapeDataString<\/span>(<span class=\"pl-smi\">_input<\/span>);<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th>ASCII<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Escape<\/td>\n<td>.NET FW 4.8<\/td>\n<td>False<\/td>\n<td align=\"right\">6,162.59 us<\/td>\n<td align=\"right\">1.00<\/td>\n<td align=\"right\">60616272 B<\/td>\n<\/tr>\n<tr>\n<td>Escape<\/td>\n<td>.NET Core 3.1<\/td>\n<td>False<\/td>\n<td align=\"right\">6,483.85 us<\/td>\n<td align=\"right\">1.06<\/td>\n<td align=\"right\">60612025 B<\/td>\n<\/tr>\n<tr>\n<td>Escape<\/td>\n<td>.NET 5.0<\/td>\n<td>False<\/td>\n<td align=\"right\">243.09 us<\/td>\n<td align=\"right\">0.04<\/td>\n<td align=\"right\">240045 B<\/td>\n<\/tr>\n<tr>\n<td><\/td>\n<td><\/td>\n<td><\/td>\n<td align=\"right\"><\/td>\n<td align=\"right\"><\/td>\n<td align=\"right\"><\/td>\n<\/tr>\n<tr>\n<td>Escape<\/td>\n<td>.NET FW 4.8<\/td>\n<td>True<\/td>\n<td align=\"right\">86.93 us<\/td>\n<td align=\"right\">1.00<\/td>\n<td align=\"right\">&#8211;<\/td>\n<\/tr>\n<tr>\n<td>Escape<\/td>\n<td>.NET Core 3.1<\/td>\n<td>True<\/td>\n<td align=\"right\">122.06 us<\/td>\n<td align=\"right\">1.40<\/td>\n<td align=\"right\">&#8211;<\/td>\n<\/tr>\n<tr>\n<td>Escape<\/td>\n<td>.NET 5.0<\/td>\n<td>True<\/td>\n<td align=\"right\">14.04 us<\/td>\n<td align=\"right\">0.16<\/td>\n<td align=\"right\">&#8211;<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/42225\">dotnet\/corefx#42225<\/a> provides corresponding improvements for <code>Uri.UnescapeDataString<\/code>.  The change included using the already vectorized <code>IndexOf<\/code> rather than a manual, pointer-based loop, in order to determine the first location of a character that needs to be unescaped, and then on top of that avoiding some unnecessary code and employing stack allocation instead of heap allocation when feasible.  While it helped to make all operations faster, the biggest gains came for strings which had nothing to unescape, meaning the <code>EscapeDataString<\/code> operation had nothing to escape and just returned its input unmodified (this condition was also subsequently helped further by <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/41684\">dotnet\/corefx#41684<\/a>, which enabled the original strings to be returned when no changes were required):<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">private<\/span> <span class=\"pl-k\">string<\/span> <span class=\"pl-smi\">_value<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-smi\">string<\/span>.<span class=\"pl-en\">Concat<\/span>(<span class=\"pl-smi\">Enumerable<\/span>.<span class=\"pl-en\">Repeat<\/span>(<span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>abcdefghijklmnopqrstuvwxyz<span class=\"pl-pds\">\"<\/span><\/span>, <span class=\"pl-c1\">20<\/span>));\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">string<\/span> <span class=\"pl-en\">Unescape<\/span>() <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">Uri<\/span>.<span class=\"pl-en\">UnescapeDataString<\/span>(<span class=\"pl-smi\">_value<\/span>);<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Unescape<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">847.44 ns<\/td>\n<td align=\"right\">1.00<\/td>\n<\/tr>\n<tr>\n<td>Unescape<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">846.84 ns<\/td>\n<td align=\"right\">1.00<\/td>\n<\/tr>\n<tr>\n<td>Unescape<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">21.84 ns<\/td>\n<td align=\"right\">0.03<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/36444\">dotnet\/runtime#36444<\/a> and <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/32713\">dotnet\/runtime#32713<\/a> made it faster to compare <code>Uri<\/code>s, and to perform related operations like putting them into dictionaries, especially for relative <code>Uri<\/code>s.<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">private<\/span> <span class=\"pl-en\">Uri<\/span>[] <span class=\"pl-smi\">_uris<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-smi\">Enumerable<\/span>.<span class=\"pl-en\">Range<\/span>(<span class=\"pl-c1\">0<\/span>, <span class=\"pl-c1\">1000<\/span>).<span class=\"pl-en\">Select<\/span>(<span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">Uri<\/span>(<span class=\"pl-s\"><span class=\"pl-pds\">$\"<\/span>\/some\/relative\/path?ID={<span class=\"pl-smi\">i<\/span>}<span class=\"pl-pds\">\"<\/span><\/span>, <span class=\"pl-smi\">UriKind<\/span>.<span class=\"pl-smi\">Relative<\/span>)).<span class=\"pl-en\">ToArray<\/span>();\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">int<\/span> <span class=\"pl-en\">Sum<\/span>()\r\n{\r\n    <span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">sum<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">0<\/span>;\r\n\r\n    <span class=\"pl-k\">foreach<\/span> (<span class=\"pl-en\">Uri<\/span> <span class=\"pl-smi\">uri<\/span> <span class=\"pl-k\">in<\/span> <span class=\"pl-smi\">_uris<\/span>)\r\n        <span class=\"pl-smi\">sum<\/span> <span class=\"pl-k\">+=<\/span> <span class=\"pl-smi\">uri<\/span>.<span class=\"pl-en\">GetHashCode<\/span>();\r\n        \r\n    <span class=\"pl-k\">return<\/span> <span class=\"pl-smi\">sum<\/span>;\r\n}<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Sum<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">330.25 us<\/td>\n<td align=\"right\">1.00<\/td>\n<\/tr>\n<tr>\n<td>Sum<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">47.64 us<\/td>\n<td align=\"right\">0.14<\/td>\n<\/tr>\n<tr>\n<td>Sum<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">18.87 us<\/td>\n<td align=\"right\">0.06<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Moving up the stack, let&#8217;s look at <code>System.Net.Sockets<\/code>.  Since the inception of .NET Core, the <a href=\"https:\/\/www.techempower.com\/benchmarks\/#section=data-r19&amp;hw=ph&amp;test=plaintext\" rel=\"nofollow\">TechEmpower benchmarks<\/a> have been used as one way of gauging progress.  Previously we focused primarily on the &#8220;Plaintext&#8221; benchmark, which has a particular set of very low-level performance characteristics, but for this release, we wanted to focus on improving two other benchmarks, &#8220;JSON Serialization&#8221; and &#8220;Fortunes&#8221; (the latter involves database access, and despite its name, the costs of the former are primarily about networking speed due to a very small JSON payload involved).  Our efforts here were primarily on Linux.  And when I say &#8220;our&#8221;, I&#8217;m not just referring to folks that work on the .NET team itself; we had a very productive collaborative effort via a working group that spanned folks beyond the core team, such as with great ideas and contributions from <a href=\"https:\/\/github.com\/tmds\">@tmds<\/a> from Red Hat and <a href=\"https:\/\/github.com\/benaadams\">@benaadams<\/a> from Illyriad Games.<\/p>\n<p>On Linux, the <code>Sockets<\/code> implementation is based on <a href=\"https:\/\/en.wikipedia.org\/wiki\/Epoll\" rel=\"nofollow\">epoll<\/a>.  To achieve the huge scale demanded of many services, we can&#8217;t just dedicate a thread per <code>Socket<\/code>, which is where we&#8217;d be if blocking I\/O were employed for all operations on the Socket.  Instead, non-blocking I\/O is used, and when the operating system isn&#8217;t ready to fulfill a request (e.g. when <code>ReadAsync<\/code> is used on a <code>Socket<\/code> but there&#8217;s no data available to read, or when <code>SendAsync<\/code> is used on a <code>Socket<\/code> but there&#8217;s no space available in the kernel&#8217;s send buffer), epoll is used to notify the <code>Socket<\/code> implementation of a change in the socket&#8217;s status so that the operation can be tried again.  epoll is a way of using one thread to block efficiently waiting for changes on any number of sockets, and so the implementation maintains a dedicated thread for waiting for changes on all of the <code>Socket<\/code>s registered with that epoll.  The implementation maintained multiple epoll threads, generally a number equal to half the number of cores in the system.  With multiple <code>Socket<\/code>s all multiplexed onto the same epoll and epoll thread, the implementation needs to be very careful not to run arbitrary work in response to a socket notification; doing so would happen on the epoll thread itself, and thus the epoll thread wouldn&#8217;t be able to process further notifications until that work completed.  Worse, if that work blocked waiting for another notification on any of the <code>Socket<\/code>s associated with that same epoll, the system would deadlock.  As such, the thread processing the epoll tried to do as little work as possible in response to a socket notification, extracting just enough information to queue the actual processing to the thread pool.<\/p>\n<p>It turns out that there was an interesting feedback loop happening between these epoll threads and the thread pool.  There was just enough overhead in queueing the work items from the epoll threads that multiple epoll threads were warranted, but multiple epoll threads resulted in some contention on that queueing, such that every additional thread added more than its fair share of overhead.  On top of that, the rate of queueing was just low enough that the thread pool would have trouble keeping all of its threads saturated in the case where a very small amount of work would happen in response to a socket operation (which is the case with the JSON serialization benchmark); this would in turn result in the thread pool spending more time sequestering and releasing threads, which made it slower, which created a feedback loop.  Long story short, less-than-ideal queueing led to slower processing and more epoll threads than truly needed.  This was rectified with two PRs, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/35330\">dotnet\/runtime#35330<\/a> and <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/35800\">dotnet\/runtime#35800<\/a>.  #35330 changed the queueing model from the epoll threads such that rather than queueing one work item per event (when the epoll wakes up in response to a notification, there may actually be multiple notifications across all of the sockets registered with it, and it will provide all of those notifications in a batch), it would queue one work item for the whole batch.  The pool thread processing it then employs a model very much like how <code>Parallel.For\/ForEach<\/code> have worked for years, which is that the queued work item can reserve a single item for itself and then queue a replica of itself to help process the remainder.  This changes the calculus such that, on most reasonable sized machines, it actually becomes beneficial to have fewer epoll threads rather than more (and, not coincidentally, we want there to be fewer), so #35800 then changes the number of epoll threads used such that there typically ends up just being one (on machines with much larger core counts, there may still be more).  We also made the epoll count configurable via the <code>DOTNET_SYSTEM_NET_SOCKETS_THREAD_COUNT<\/code> environment variable, which can be set to the desired count in order to override the system&#8217;s defaults if a developer wants to experiment with other counts and provide feedback on their results for their given workload.<\/p>\n<p>As an experiment, in <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/37974\">dotnet\/runtime#37974<\/a> from <a href=\"https:\/\/github.com\/tmds\">@tmds<\/a> we&#8217;ve also added an experimental mode (triggered by setting the <code>DOTNET_SYSTEM_NET_SOCKETS_INLINE_COMPLETIONS<\/code> environment variable to <code>1<\/code> on Linux) where we avoid queueing work to the thread pool at all, and instead just run all socket continuations (e.g. the <code>Work()<\/code> in <code>await socket.ReadAsync(); Work();<\/code>); on the epoll threads.  <em>Hic sunt dracones<\/em>!  If a socket continuation stalls, no other work associated with that epoll thread will be processed.  Worse, if that continuation actually synchronously blocks waiting for other work associated with that epoll, the system will deadlock.  However, it&#8217;s possible a well-crafted program could achieve better performance in this mode, as the locality of processing could be better and the overhead of queueing to the thread pool could be avoided.  Since all sockets work is then run on the epoll threads, it no longer makes sense to default to one; instead it defaults to a number of threads equal to the number of processors.  Again, this is an experiment, and we&#8217;d welcome feedback on any positive or negative results you see.<\/p>\n<p>There were some other impactful changes as well.  In <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/36371\">dotnet\/runtime#36371<\/a>, <a href=\"https:\/\/github.com\/tmds\">@tmds<\/a> changed some of the syscalls used for send and receive operations.  In the name of simplicity, the original implementation used the <code>sendmsg<\/code> and <code>recvmsg<\/code> syscalls for sending and receiving on sockets, regardless of how many buffers of data were being provided (these operations support vectored I\/O, where multiple buffers rather than just one can be passed to each method).  It turns out that there&#8217;s measurable overhead in doing so when there&#8217;s just one buffer, and #36371 was able to reduce the overhead of typical <code>SendAsync<\/code> and <code>ReceiveAsync<\/code> operations by preferring to use the <code>send<\/code> and <code>recv<\/code> syscalls when appropriate. In <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/36705\">dotnet\/runtime#36705<\/a> <a href=\"https:\/\/github.com\/tmds\">@tmds<\/a> also changed how requests for socket operations are handled to use a lock-free rather than lock-based approach, in order to reduce some overheads.  And in <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/36997\">dotnet\/runtime#36997<\/a>, <a href=\"https:\/\/github.com\/benaadams\">@benaadams<\/a> removed some interface casts that were showing up as measureable overhead in the sockets implementation.<\/p>\n<p>These improvements are all focused on sockets performance on Linux at scale, making them difficult to demonstrate in a microbenchmark on a single machine.  There are other improvements, however, that are easier to see.  <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/32271\">dotnet\/runtime#32271<\/a> removed several allocations from <code>Socket.Connect<\/code>, <code>Socket.Bind<\/code>, and a few other operations, where unnecessary copies were being made of some state in support of old Code Access Security (CAS) checks that are no longer relevant: the CAS checks were removed long ago, but the clones remained, so this just cleans those up, too. <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/32275\">dotnet\/runtime#32275<\/a> also removed an allocation from the Windows implementation of <code>SafeSocketHandle<\/code>. <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/787\">dotnet\/runtime#787<\/a> refactored <code>Socket.ConnectAsync<\/code> so that it could share the same internal <code>SocketAsyncEventArgs<\/code> instance that ends up being used subsequently to perform <code>ReceiveAsync<\/code> operations, thereby avoiding extra allocations for the connect. <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/34175\">dotnet\/runtime#34175<\/a> utilizes the new Pinned Object Heap introduced in .NET 5 to use pre-pinned buffers in various portions of the <code>SocketAsyncEventArgs<\/code> implementation on Windows instead of having to use a <code>GCHandle<\/code> to pin (the corresponding functionality on Linux doesn&#8217;t require pinning, so it&#8217;s not used there).  And in <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/37583\">dotnet\/runtime#37583<\/a>, <a href=\"https:\/\/github.com\/tmds\">@tmds<\/a> reduced allocations as part of the vectored I\/O <code>SendAsync<\/code>\/<code>ReceivedAsync<\/code> implementations on Unix by employing stack allocation where appropriate.<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">private<\/span> <span class=\"pl-en\">Socket<\/span> <span class=\"pl-smi\">_listener<\/span>, <span class=\"pl-smi\">_client<\/span>, <span class=\"pl-smi\">_server<\/span>;\r\n<span class=\"pl-k\">private<\/span> <span class=\"pl-k\">byte<\/span>[] <span class=\"pl-smi\">_buffer<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-k\">byte<\/span>[<span class=\"pl-c1\">8<\/span>];\r\n<span class=\"pl-k\">private<\/span> <span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">ArraySegment<\/span>&lt;<span class=\"pl-k\">byte<\/span>&gt;&gt; <span class=\"pl-smi\">_buffers<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">List<\/span>&lt;<span class=\"pl-en\">ArraySegment<\/span>&lt;<span class=\"pl-k\">byte<\/span>&gt;&gt;();\r\n\r\n[<span class=\"pl-en\">GlobalSetup<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">void<\/span> <span class=\"pl-en\">Setup<\/span>()\r\n{\r\n    <span class=\"pl-smi\">_listener<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">Socket<\/span>(<span class=\"pl-smi\">AddressFamily<\/span>.<span class=\"pl-smi\">InterNetwork<\/span>, <span class=\"pl-smi\">SocketType<\/span>.<span class=\"pl-smi\">Stream<\/span>, <span class=\"pl-smi\">ProtocolType<\/span>.<span class=\"pl-smi\">Tcp<\/span>);\r\n    <span class=\"pl-smi\">_listener<\/span>.<span class=\"pl-en\">Bind<\/span>(<span class=\"pl-k\">new<\/span> <span class=\"pl-en\">IPEndPoint<\/span>(<span class=\"pl-smi\">IPAddress<\/span>.<span class=\"pl-smi\">Loopback<\/span>, <span class=\"pl-c1\">0<\/span>));\r\n    <span class=\"pl-smi\">_listener<\/span>.<span class=\"pl-en\">Listen<\/span>(<span class=\"pl-c1\">1<\/span>);\r\n\r\n    <span class=\"pl-smi\">_client<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">Socket<\/span>(<span class=\"pl-smi\">AddressFamily<\/span>.<span class=\"pl-smi\">InterNetwork<\/span>, <span class=\"pl-smi\">SocketType<\/span>.<span class=\"pl-smi\">Stream<\/span>, <span class=\"pl-smi\">ProtocolType<\/span>.<span class=\"pl-smi\">Tcp<\/span>);\r\n    <span class=\"pl-smi\">_client<\/span>.<span class=\"pl-en\">Connect<\/span>(<span class=\"pl-smi\">_listener<\/span>.<span class=\"pl-smi\">LocalEndPoint<\/span>);\r\n\r\n    <span class=\"pl-smi\">_server<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-smi\">_listener<\/span>.<span class=\"pl-en\">Accept<\/span>();\r\n\r\n    <span class=\"pl-k\">for<\/span> (<span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">0<\/span>; <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">&lt;<\/span> <span class=\"pl-smi\">_buffer<\/span>.<span class=\"pl-smi\">Length<\/span>; <span class=\"pl-smi\">i<\/span><span class=\"pl-k\">++<\/span>)\r\n        <span class=\"pl-smi\">_buffers<\/span>.<span class=\"pl-en\">Add<\/span>(<span class=\"pl-k\">new<\/span> <span class=\"pl-en\">ArraySegment<\/span>&lt;<span class=\"pl-k\">byte<\/span>&gt;(<span class=\"pl-smi\">_buffer<\/span>, <span class=\"pl-smi\">i<\/span>, <span class=\"pl-c1\">1<\/span>));\r\n}\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">async<\/span> <span class=\"pl-en\">Task<\/span> <span class=\"pl-en\">SendReceive<\/span>()\r\n{\r\n    <span class=\"pl-k\">await<\/span> <span class=\"pl-smi\">_client<\/span>.<span class=\"pl-en\">SendAsync<\/span>(<span class=\"pl-smi\">_buffers<\/span>, <span class=\"pl-smi\">SocketFlags<\/span>.<span class=\"pl-smi\">None<\/span>);\r\n    <span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">total<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">0<\/span>;\r\n    <span class=\"pl-k\">while<\/span> (<span class=\"pl-smi\">total<\/span> <span class=\"pl-k\">&lt;<\/span> <span class=\"pl-smi\">_buffer<\/span>.<span class=\"pl-smi\">Length<\/span>)\r\n        <span class=\"pl-smi\">total<\/span> <span class=\"pl-k\">+=<\/span> <span class=\"pl-k\">await<\/span> <span class=\"pl-smi\">_server<\/span>.<span class=\"pl-en\">ReceiveAsync<\/span>(<span class=\"pl-smi\">_buffers<\/span>, <span class=\"pl-smi\">SocketFlags<\/span>.<span class=\"pl-smi\">None<\/span>);\r\n}<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>SendReceive<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">5.924 us<\/td>\n<td align=\"right\">1.00<\/td>\n<td align=\"right\">624 B<\/td>\n<\/tr>\n<tr>\n<td>SendReceive<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">5.230 us<\/td>\n<td align=\"right\">0.88<\/td>\n<td align=\"right\">144 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>On top of that, we come to <code>System.Net.Http<\/code>.  A bunch of improvements were made to <code>SocketsHttpHandler<\/code>, in two areas in particular.  The first is the processing of headers, which represents a significant portion of allocations and processing associated with the type. <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/41640\">dotnet\/corefx#41640<\/a> kicked things off by making the <code>HttpHeaders.TryAddWithoutValidation<\/code> true to its name: due to how <code>SocketsHttpHandler<\/code> was enumerating request headers to write them to the wire, it ended up performing the validation on the headers even though the developer specified &#8220;WithoutValidation&#8221;, and the PR fixed that.  Multiple PRs, including <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/35003\">dotnet\/runtime#35003<\/a>,  <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/34922\">dotnet\/runtime#34922<\/a>, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/32989\">dotnet\/runtime#32989<\/a>, and <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/34974\">dotnet\/runtime#34974<\/a> improved lookups in <code>SocketHttpHandler<\/code>&#8216;s list of known headers (which helps avoid allocations when such headers are present) and augmented that list to be more comprehensive. <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/34902\">dotnet\/runtime#34902<\/a> updated the internal collection type used in various strongly-typed header collections to incur less allocation, and <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/34724\">dotnet\/runtime#34724<\/a> made some of the allocations associated with headers pay-for-play only when they&#8217;re actually accessed (and also special-cased Date and Server response headers to avoid allocations for them in the most common cases).  The net result is a small improvement to throughput but a significant improvement to allocation:<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">private<\/span> <span class=\"pl-k\">static<\/span> <span class=\"pl-k\">readonly<\/span> <span class=\"pl-en\">Socket<\/span> <span class=\"pl-smi\">s_listener<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">Socket<\/span>(<span class=\"pl-smi\">AddressFamily<\/span>.<span class=\"pl-smi\">InterNetwork<\/span>, <span class=\"pl-smi\">SocketType<\/span>.<span class=\"pl-smi\">Stream<\/span>, <span class=\"pl-smi\">ProtocolType<\/span>.<span class=\"pl-smi\">Tcp<\/span>);\r\n<span class=\"pl-k\">private<\/span> <span class=\"pl-k\">static<\/span> <span class=\"pl-k\">readonly<\/span> <span class=\"pl-en\">HttpClient<\/span> <span class=\"pl-smi\">s_client<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">HttpClient<\/span>();\r\n<span class=\"pl-k\">private<\/span> <span class=\"pl-k\">static<\/span> <span class=\"pl-en\">Uri<\/span> <span class=\"pl-smi\">s_uri<\/span>;\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">async<\/span> <span class=\"pl-en\">Task<\/span> <span class=\"pl-en\">HttpGet<\/span>()\r\n{\r\n    <span class=\"pl-k\">var<\/span> <span class=\"pl-smi\">m<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">HttpRequestMessage<\/span>(<span class=\"pl-smi\">HttpMethod<\/span>.<span class=\"pl-smi\">Get<\/span>, <span class=\"pl-smi\">s_uri<\/span>);\r\n    <span class=\"pl-smi\">m<\/span>.<span class=\"pl-smi\">Headers<\/span>.<span class=\"pl-en\">TryAddWithoutValidation<\/span>(<span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>Authorization<span class=\"pl-pds\">\"<\/span><\/span>, <span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>ANYTHING SOMEKEY<span class=\"pl-pds\">\"<\/span><\/span>);\r\n    <span class=\"pl-smi\">m<\/span>.<span class=\"pl-smi\">Headers<\/span>.<span class=\"pl-en\">TryAddWithoutValidation<\/span>(<span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>Referer<span class=\"pl-pds\">\"<\/span><\/span>, <span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>http:\/\/someuri.com<span class=\"pl-pds\">\"<\/span><\/span>);\r\n    <span class=\"pl-smi\">m<\/span>.<span class=\"pl-smi\">Headers<\/span>.<span class=\"pl-en\">TryAddWithoutValidation<\/span>(<span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>User-Agent<span class=\"pl-pds\">\"<\/span><\/span>, <span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>Mozilla\/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/77.0.3865.90 Safari\/537.36<span class=\"pl-pds\">\"<\/span><\/span>);\r\n    <span class=\"pl-smi\">m<\/span>.<span class=\"pl-smi\">Headers<\/span>.<span class=\"pl-en\">TryAddWithoutValidation<\/span>(<span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>Host<span class=\"pl-pds\">\"<\/span><\/span>, <span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>www.somehost.com<span class=\"pl-pds\">\"<\/span><\/span>);\r\n    <span class=\"pl-k\">using<\/span> (<span class=\"pl-en\">HttpResponseMessage<\/span> <span class=\"pl-smi\">r<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">await<\/span> <span class=\"pl-smi\">s_client<\/span>.<span class=\"pl-en\">SendAsync<\/span>(<span class=\"pl-smi\">m<\/span>, <span class=\"pl-smi\">HttpCompletionOption<\/span>.<span class=\"pl-smi\">ResponseHeadersRead<\/span>))\r\n    <span class=\"pl-k\">using<\/span> (<span class=\"pl-en\">Stream<\/span> <span class=\"pl-smi\">s<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">await<\/span> <span class=\"pl-smi\">r<\/span>.<span class=\"pl-smi\">Content<\/span>.<span class=\"pl-en\">ReadAsStreamAsync<\/span>())\r\n        <span class=\"pl-k\">await<\/span> <span class=\"pl-smi\">s<\/span>.<span class=\"pl-en\">CopyToAsync<\/span>(<span class=\"pl-smi\">Stream<\/span>.<span class=\"pl-smi\">Null<\/span>);\r\n}\r\n\r\n[<span class=\"pl-en\">GlobalSetup<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">void<\/span> <span class=\"pl-en\">CreateSocketServer<\/span>()\r\n{\r\n    <span class=\"pl-smi\">s_listener<\/span>.<span class=\"pl-en\">Bind<\/span>(<span class=\"pl-k\">new<\/span> <span class=\"pl-en\">IPEndPoint<\/span>(<span class=\"pl-smi\">IPAddress<\/span>.<span class=\"pl-smi\">Loopback<\/span>, <span class=\"pl-c1\">0<\/span>));\r\n    <span class=\"pl-smi\">s_listener<\/span>.<span class=\"pl-en\">Listen<\/span>(<span class=\"pl-smi\">int<\/span>.<span class=\"pl-smi\">MaxValue<\/span>);\r\n    <span class=\"pl-k\">var<\/span> <span class=\"pl-smi\">ep<\/span> <span class=\"pl-k\">=<\/span> (<span class=\"pl-en\">IPEndPoint<\/span>)<span class=\"pl-smi\">s_listener<\/span>.<span class=\"pl-smi\">LocalEndPoint<\/span>;\r\n    <span class=\"pl-smi\">s_uri<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">Uri<\/span>(<span class=\"pl-s\"><span class=\"pl-pds\">$\"<\/span>http:\/\/{<span class=\"pl-smi\">ep<\/span>.<span class=\"pl-smi\">Address<\/span>}:{<span class=\"pl-smi\">ep<\/span>.<span class=\"pl-smi\">Port<\/span>}\/<span class=\"pl-pds\">\"<\/span><\/span>);\r\n    <span class=\"pl-k\">byte<\/span>[] <span class=\"pl-smi\">response<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-smi\">Encoding<\/span>.<span class=\"pl-smi\">UTF8<\/span>.<span class=\"pl-en\">GetBytes<\/span>(<span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>HTTP\/1.1 200 OK<span class=\"pl-cce\">\\r\\n<\/span>Date: Sun, 05 Jul 2020 12:00:00 GMT <span class=\"pl-cce\">\\r\\n<\/span>Server: Example<span class=\"pl-cce\">\\r\\n<\/span>Content-Length: 5<span class=\"pl-cce\">\\r\\n\\r\\n<\/span>Hello<span class=\"pl-pds\">\"<\/span><\/span>);\r\n    <span class=\"pl-k\">byte<\/span>[] <span class=\"pl-smi\">endSequence<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-k\">byte<\/span>[] { (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'<span class=\"pl-cce\">\\r<\/span>'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'<span class=\"pl-cce\">\\n<\/span>'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'<span class=\"pl-cce\">\\r<\/span>'<\/span>, (<span class=\"pl-smi\">byte<\/span>)<span class=\"pl-s\">'<span class=\"pl-cce\">\\n<\/span>'<\/span> };\r\n\r\n    <span class=\"pl-smi\">Task<\/span>.<span class=\"pl-en\">Run<\/span>(<span class=\"pl-k\">async<\/span> () <span class=\"pl-k\">=&gt;<\/span>\r\n    {\r\n        <span class=\"pl-k\">while<\/span> (<span class=\"pl-c1\">true<\/span>)\r\n        {\r\n            <span class=\"pl-en\">Socket<\/span> <span class=\"pl-smi\">s<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">await<\/span> <span class=\"pl-smi\">s_listener<\/span>.<span class=\"pl-en\">AcceptAsync<\/span>();\r\n            <span class=\"pl-c1\">_<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-smi\">Task<\/span>.<span class=\"pl-en\">Run<\/span>(() <span class=\"pl-k\">=&gt;<\/span>\r\n            {\r\n                <span class=\"pl-k\">using<\/span> (<span class=\"pl-k\">var<\/span> <span class=\"pl-smi\">ns<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">NetworkStream<\/span>(<span class=\"pl-smi\">s<\/span>, <span class=\"pl-c1\">true<\/span>))\r\n                {\r\n                    <span class=\"pl-k\">byte<\/span>[] <span class=\"pl-smi\">buffer<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-k\">byte<\/span>[<span class=\"pl-c1\">1024<\/span>];\r\n                    <span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">totalRead<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">0<\/span>;\r\n                    <span class=\"pl-k\">while<\/span> (<span class=\"pl-c1\">true<\/span>)\r\n                    {\r\n                        <span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">read<\/span> <span class=\"pl-k\">=<\/span>  <span class=\"pl-smi\">ns<\/span>.<span class=\"pl-en\">Read<\/span>(<span class=\"pl-smi\">buffer<\/span>, <span class=\"pl-smi\">totalRead<\/span>, <span class=\"pl-smi\">buffer<\/span>.<span class=\"pl-smi\">Length<\/span> <span class=\"pl-k\">-<\/span> <span class=\"pl-smi\">totalRead<\/span>);\r\n                        <span class=\"pl-k\">if<\/span> (<span class=\"pl-smi\">read<\/span> <span class=\"pl-k\">==<\/span> <span class=\"pl-c1\">0<\/span>) <span class=\"pl-k\">return<\/span>;\r\n                        <span class=\"pl-smi\">totalRead<\/span> <span class=\"pl-k\">+=<\/span> <span class=\"pl-smi\">read<\/span>;\r\n                        <span class=\"pl-k\">if<\/span> (<span class=\"pl-smi\">buffer<\/span>.<span class=\"pl-en\">AsSpan<\/span>(<span class=\"pl-c1\">0<\/span>, <span class=\"pl-smi\">totalRead<\/span>).<span class=\"pl-en\">IndexOf<\/span>(<span class=\"pl-smi\">endSequence<\/span>) <span class=\"pl-k\">==<\/span> <span class=\"pl-k\">-<\/span><span class=\"pl-c1\">1<\/span>)\r\n                        {\r\n                            <span class=\"pl-k\">if<\/span> (<span class=\"pl-smi\">totalRead<\/span> <span class=\"pl-k\">==<\/span> <span class=\"pl-smi\">buffer<\/span>.<span class=\"pl-smi\">Length<\/span>) <span class=\"pl-smi\">Array<\/span>.<span class=\"pl-en\">Resize<\/span>(<span class=\"pl-k\">ref<\/span> <span class=\"pl-smi\">buffer<\/span>, <span class=\"pl-smi\">buffer<\/span>.<span class=\"pl-smi\">Length<\/span> <span class=\"pl-k\">*<\/span> <span class=\"pl-c1\">2<\/span>);\r\n                            <span class=\"pl-k\">continue<\/span>;\r\n                        }\r\n\r\n                        <span class=\"pl-smi\">ns<\/span>.<span class=\"pl-en\">Write<\/span>(<span class=\"pl-smi\">response<\/span>, <span class=\"pl-c1\">0<\/span>, <span class=\"pl-smi\">response<\/span>.<span class=\"pl-smi\">Length<\/span>);\r\n\r\n                        <span class=\"pl-smi\">totalRead<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">0<\/span>;\r\n                    }\r\n                }\r\n            });\r\n        }\r\n    });\r\n}<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>HttpGet<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">123.67 us<\/td>\n<td align=\"right\">1.00<\/td>\n<td align=\"right\">98.48 KB<\/td>\n<\/tr>\n<tr>\n<td>HttpGet<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">68.57 us<\/td>\n<td align=\"right\">0.55<\/td>\n<td align=\"right\">6.07 KB<\/td>\n<\/tr>\n<tr>\n<td>HttpGet<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">66.80 us<\/td>\n<td align=\"right\">0.54<\/td>\n<td align=\"right\">2.86 KB<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Some other header-related PRs were more specialized.  For example, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/34860\">dotnet\/runtime#34860<\/a> improved parsing of the Date header just by being more thoughtful about the approach.  The previous implementation was using <code>DateTime.TryParseExact<\/code> with a long list of viable formats; that knocks the implementation off its fast path and causes it to be much slower to parse even when the input matches the first format in the list.  And in the case of Date headers today, the vast majority of headers will follow the format outlined in <a href=\"https:\/\/tools.ietf.org\/html\/rfc1123\" rel=\"nofollow\">RFC 1123<\/a>, aka &#8220;r&#8221;.  Thanks to improvements in previous releases, <code>DateTime<\/code>&#8216;s parsing of the &#8220;r&#8221; format is very fast, so we can just try that one directly first with the <code>TryParseExact<\/code> for a single format, and only if it fails fall back to the <code>TryParseExact<\/code> with the remainder.<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre>[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-en\">DateTimeOffset<\/span>? <span class=\"pl-en\">DatePreferred<\/span>()\r\n{\r\n    <span class=\"pl-k\">var<\/span> <span class=\"pl-smi\">m<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">HttpResponseMessage<\/span>();\r\n    <span class=\"pl-smi\">m<\/span>.<span class=\"pl-smi\">Headers<\/span>.<span class=\"pl-en\">TryAddWithoutValidation<\/span>(<span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>Date<span class=\"pl-pds\">\"<\/span><\/span>, <span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>Sun, 06 Nov 1994 08:49:37 GMT<span class=\"pl-pds\">\"<\/span><\/span>);\r\n    <span class=\"pl-k\">return<\/span> <span class=\"pl-smi\">m<\/span>.<span class=\"pl-smi\">Headers<\/span>.<span class=\"pl-smi\">Date<\/span>;\r\n}<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>DatePreferred<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">2,177.9 ns<\/td>\n<td align=\"right\">1.00<\/td>\n<td align=\"right\">674 B<\/td>\n<\/tr>\n<tr>\n<td>DatePreferred<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">1,510.8 ns<\/td>\n<td align=\"right\">0.69<\/td>\n<td align=\"right\">544 B<\/td>\n<\/tr>\n<tr>\n<td>DatePreferred<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">267.2 ns<\/td>\n<td align=\"right\">0.12<\/td>\n<td align=\"right\">520 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The biggest improvements, however, came for HTTP\/2 in general.  In .NET Core 3.1, the HTTP\/2 implementation was functional, but not particularly tuned, and so some effort for .NET 5 went into making the HTTP\/2 implementation better, and in particular more scalable. <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/32406\">dotnet\/runtime#32406<\/a> and <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/32624\">dotnet\/runtime#32624<\/a> significantly reduced allocations involved in HTTP\/2 GET requests by employing a custom <code>CopyToAsync<\/code> override on the response stream used for HTTP\/2 responses, by being more careful around how request headers are accessed as part of writing out the request (in order to avoid forcing lazily-initialized state into existence when it&#8217;s not necessary), and removing async-related allocations.  And <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/32557\">dotnet\/runtime#32557<\/a> reduced allocations in HTTP\/2 POST requests by being better about how cancellation was handled and reducing allocation associated with async operations there, too. On top of those, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/35694\">dotnet\/runtime#35694<\/a> included a bunch of HTTP\/2-related changes, including reducing the number of locks involved (HTTP\/2 involves more synchronization in the C# implementation than HTTP\/1.1, because in HTTP\/2 multiple requests are multiplexed onto the same socket connection), reducing the amount of work done while holding locks, in one key case changing the kind of locking mechanism used, adding more headers to the known headers optimization, and a few other tweaks to reduce overheads.  As a follow-up, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/36246\">dotnet\/runtime#36246<\/a> removed some allocations due to cancellation and trailing headers (which are common in gRPC traffic).  To demo this, I created a simple ASP.NET Core localhost server (using the Empty template and removing a small amount of code not needed for this example):<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">using<\/span> <span class=\"pl-en\">Microsoft<\/span>.<span class=\"pl-en\">AspNetCore<\/span>.<span class=\"pl-en\">Builder<\/span>;\r\n<span class=\"pl-k\">using<\/span> <span class=\"pl-en\">Microsoft<\/span>.<span class=\"pl-en\">AspNetCore<\/span>.<span class=\"pl-en\">Hosting<\/span>;\r\n<span class=\"pl-k\">using<\/span> <span class=\"pl-en\">Microsoft<\/span>.<span class=\"pl-en\">AspNetCore<\/span>.<span class=\"pl-en\">Http<\/span>;\r\n<span class=\"pl-k\">using<\/span> <span class=\"pl-en\">Microsoft<\/span>.<span class=\"pl-en\">Extensions<\/span>.<span class=\"pl-en\">Hosting<\/span>;\r\n\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">class<\/span> <span class=\"pl-en\">Program<\/span>\r\n{\r\n    <span class=\"pl-k\">public<\/span> <span class=\"pl-k\">static<\/span> <span class=\"pl-k\">void<\/span> <span class=\"pl-en\">Main<\/span>(<span class=\"pl-k\">string<\/span>[] <span class=\"pl-smi\">args<\/span>) <span class=\"pl-k\">=&gt;<\/span>\r\n        <span class=\"pl-smi\">Host<\/span>.<span class=\"pl-en\">CreateDefaultBuilder<\/span>(<span class=\"pl-smi\">args<\/span>).<span class=\"pl-en\">ConfigureWebHostDefaults<\/span>(<span class=\"pl-smi\">b<\/span> <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">b<\/span>.<span class=\"pl-en\">UseStartup<\/span>&lt;<span class=\"pl-en\">Startup<\/span>&gt;()).<span class=\"pl-en\">Build<\/span>().<span class=\"pl-en\">Run<\/span>();\r\n}\r\n\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">class<\/span> <span class=\"pl-en\">Startup<\/span>\r\n{\r\n    <span class=\"pl-k\">public<\/span> <span class=\"pl-k\">void<\/span> <span class=\"pl-en\">Configure<\/span>(<span class=\"pl-en\">IApplicationBuilder<\/span> <span class=\"pl-smi\">app<\/span>, <span class=\"pl-en\">IWebHostEnvironment<\/span> <span class=\"pl-smi\">env<\/span>)\r\n    {\r\n        <span class=\"pl-smi\">app<\/span>.<span class=\"pl-en\">UseRouting<\/span>();\r\n        <span class=\"pl-smi\">app<\/span>.<span class=\"pl-en\">UseEndpoints<\/span>(<span class=\"pl-smi\">endpoints<\/span> <span class=\"pl-k\">=&gt;<\/span>\r\n        {\r\n            <span class=\"pl-smi\">endpoints<\/span>.<span class=\"pl-en\">MapGet<\/span>(<span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>\/<span class=\"pl-pds\">\"<\/span><\/span>, <span class=\"pl-smi\">context<\/span> <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">context<\/span>.<span class=\"pl-smi\">Response<\/span>.<span class=\"pl-en\">WriteAsync<\/span>(<span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>Hello<span class=\"pl-pds\">\"<\/span><\/span>));\r\n            <span class=\"pl-smi\">endpoints<\/span>.<span class=\"pl-en\">MapPost<\/span>(<span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>\/<span class=\"pl-pds\">\"<\/span><\/span>, <span class=\"pl-smi\">context<\/span> <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">context<\/span>.<span class=\"pl-smi\">Response<\/span>.<span class=\"pl-en\">WriteAsync<\/span>(<span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>Hello<span class=\"pl-pds\">\"<\/span><\/span>));\r\n        });\r\n    }\r\n}<\/pre>\n<\/div>\n<p>Then I used this client benchmark:<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">private<\/span> <span class=\"pl-en\">HttpMessageInvoker<\/span> <span class=\"pl-smi\">_client<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">HttpMessageInvoker<\/span>(<span class=\"pl-k\">new<\/span> <span class=\"pl-en\">SocketsHttpHandler<\/span>() { <span class=\"pl-smi\">UseCookies<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">false<\/span>, <span class=\"pl-smi\">UseProxy<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">false<\/span>, <span class=\"pl-smi\">AllowAutoRedirect<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">false<\/span> });\r\n<span class=\"pl-k\">private<\/span> <span class=\"pl-en\">HttpRequestMessage<\/span> <span class=\"pl-smi\">_get<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">HttpRequestMessage<\/span>(<span class=\"pl-smi\">HttpMethod<\/span>.<span class=\"pl-smi\">Get<\/span>, <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">Uri<\/span>(<span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>https:\/\/localhost:5001\/<span class=\"pl-pds\">\"<\/span><\/span>)) { <span class=\"pl-smi\">Version<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-smi\">HttpVersion<\/span>.<span class=\"pl-smi\">Version20<\/span> };\r\n<span class=\"pl-k\">private<\/span> <span class=\"pl-en\">HttpRequestMessage<\/span> <span class=\"pl-smi\">_post<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">HttpRequestMessage<\/span>(<span class=\"pl-smi\">HttpMethod<\/span>.<span class=\"pl-smi\">Post<\/span>, <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">Uri<\/span>(<span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>https:\/\/localhost:5001\/<span class=\"pl-pds\">\"<\/span><\/span>)) { <span class=\"pl-smi\">Version<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-smi\">HttpVersion<\/span>.<span class=\"pl-smi\">Version20<\/span>, <span class=\"pl-smi\">Content<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">ByteArrayContent<\/span>(<span class=\"pl-smi\">Encoding<\/span>.<span class=\"pl-smi\">UTF8<\/span>.<span class=\"pl-en\">GetBytes<\/span>(<span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>Hello<span class=\"pl-pds\">\"<\/span><\/span>)) };\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>] <span class=\"pl-k\">public<\/span> <span class=\"pl-en\">Task<\/span> <span class=\"pl-en\">Get<\/span>() <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-en\">MakeRequest<\/span>(<span class=\"pl-smi\">_get<\/span>);\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>] <span class=\"pl-k\">public<\/span> <span class=\"pl-en\">Task<\/span> <span class=\"pl-en\">Post<\/span>() <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-en\">MakeRequest<\/span>(<span class=\"pl-smi\">_post<\/span>);\r\n\r\n<span class=\"pl-k\">private<\/span> <span class=\"pl-en\">Task<\/span> <span class=\"pl-en\">MakeRequest<\/span>(<span class=\"pl-en\">HttpRequestMessage<\/span> <span class=\"pl-smi\">request<\/span>) <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">Task<\/span>.<span class=\"pl-en\">WhenAll<\/span>(<span class=\"pl-smi\">Enumerable<\/span>.<span class=\"pl-en\">Range<\/span>(<span class=\"pl-c1\">0<\/span>, <span class=\"pl-c1\">100<\/span>).<span class=\"pl-en\">Select<\/span>(<span class=\"pl-k\">async<\/span> <span class=\"pl-smi\">_<\/span> <span class=\"pl-k\">=&gt;<\/span>\r\n{\r\n    <span class=\"pl-k\">for<\/span> (<span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">0<\/span>; <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">&lt;<\/span> <span class=\"pl-c1\">500<\/span>; <span class=\"pl-smi\">i<\/span><span class=\"pl-k\">++<\/span>)\r\n    {\r\n        <span class=\"pl-k\">using<\/span> (<span class=\"pl-en\">HttpResponseMessage<\/span> <span class=\"pl-smi\">r<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">await<\/span> <span class=\"pl-smi\">_client<\/span>.<span class=\"pl-en\">SendAsync<\/span>(<span class=\"pl-smi\">request<\/span>, <span class=\"pl-smi\">default<\/span>))\r\n        <span class=\"pl-k\">using<\/span> (<span class=\"pl-en\">Stream<\/span> <span class=\"pl-smi\">s<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">await<\/span> <span class=\"pl-smi\">r<\/span>.<span class=\"pl-smi\">Content<\/span>.<span class=\"pl-en\">ReadAsStreamAsync<\/span>())\r\n            <span class=\"pl-k\">await<\/span> <span class=\"pl-smi\">s<\/span>.<span class=\"pl-en\">CopyToAsync<\/span>(<span class=\"pl-smi\">Stream<\/span>.<span class=\"pl-smi\">Null<\/span>);\r\n    }\r\n}));<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Get<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">1,267.4 ms<\/td>\n<td align=\"right\">1.00<\/td>\n<td align=\"right\">122.76 MB<\/td>\n<\/tr>\n<tr>\n<td>Get<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">681.7 ms<\/td>\n<td align=\"right\">0.54<\/td>\n<td align=\"right\">74.01 MB<\/td>\n<\/tr>\n<tr>\n<td><\/td>\n<td><\/td>\n<td align=\"right\"><\/td>\n<td align=\"right\"><\/td>\n<td align=\"right\"><\/td>\n<\/tr>\n<tr>\n<td>Post<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">1,464.7 ms<\/td>\n<td align=\"right\">1.00<\/td>\n<td align=\"right\">280.51 MB<\/td>\n<\/tr>\n<tr>\n<td>Post<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">735.6 ms<\/td>\n<td align=\"right\">0.50<\/td>\n<td align=\"right\">132.52 MB<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Note, too, that there&#8217;s still work being done in this area for .NET 5. <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/39166\">dotnet\/runtime#38774<\/a> changes how writes are handled in the HTTP\/2 implementation and is expected to bring substantial scalability gains over the improvements that have already gone in, in particular for gRPC-based workloads.<\/p>\n<p>There were notable improvements to other networking components as well.  For example, the <code>XxAsync<\/code> APIs on the <code>Dns<\/code> type had been implemented on top of the corresponding <code>Begin\/EndXx<\/code> methods.  For .NET 5 in <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/41061\">dotnet\/corefx#41061<\/a>, that was inverted, such that the <code>Begin\/EndXx<\/code> methods were implemented on top of the <code>XxAsync<\/code> ones; that made the code simpler and a bit faster, while also having a nice impact on allocation (note that the .NET Framework 4.8 result is slightly faster because it&#8217;s not actually using async I\/O, and rather just a queued work item to the <code>ThreadPool<\/code> that performs synchronous I\/O; that results in a bit less overhead but also less scalability):<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">private<\/span> <span class=\"pl-k\">string<\/span> <span class=\"pl-smi\">_hostname<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-smi\">Dns<\/span>.<span class=\"pl-en\">GetHostName<\/span>();\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>] <span class=\"pl-k\">public<\/span> <span class=\"pl-en\">Task<\/span>&lt;<span class=\"pl-en\">IPAddress<\/span>[]&gt; <span class=\"pl-en\">Lookup<\/span>() <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">Dns<\/span>.<span class=\"pl-en\">GetHostAddressesAsync<\/span>(<span class=\"pl-smi\">_hostname<\/span>);<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Lookup<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">178.6 us<\/td>\n<td align=\"right\">1.00<\/td>\n<td align=\"right\">4146 B<\/td>\n<\/tr>\n<tr>\n<td>Lookup<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">211.5 us<\/td>\n<td align=\"right\">1.18<\/td>\n<td align=\"right\">1664 B<\/td>\n<\/tr>\n<tr>\n<td>Lookup<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">209.7 us<\/td>\n<td align=\"right\">1.17<\/td>\n<td align=\"right\">984 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>And while it&#8217;s a lesser-used type (though it is used by WCF), <code>NegotiateStream<\/code> was also similarly updated in <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/36583\">dotnet\/runtime#36583<\/a>, with all of its <code>XxAsync<\/code> methods re-implemented to use <code>async<\/code>\/<code>await<\/code>, and then in <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/37772\">dotnet\/runtime#37772<\/a> to reuse buffers rather than create new ones for each operation.  The net result is significantly less allocation in typical read\/write usage:<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">private<\/span> <span class=\"pl-k\">byte<\/span>[] <span class=\"pl-smi\">_buffer<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-k\">byte<\/span>[<span class=\"pl-c1\">1<\/span>];\r\n<span class=\"pl-k\">private<\/span> <span class=\"pl-en\">NegotiateStream<\/span> <span class=\"pl-smi\">_client<\/span>, <span class=\"pl-smi\">_server<\/span>;\r\n\r\n[<span class=\"pl-en\">GlobalSetup<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">void<\/span> <span class=\"pl-en\">Setup<\/span>()\r\n{\r\n    <span class=\"pl-smi\">using<\/span> <span class=\"pl-k\">var<\/span> <span class=\"pl-smi\">listener<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">Socket<\/span>(<span class=\"pl-smi\">AddressFamily<\/span>.<span class=\"pl-smi\">InterNetwork<\/span>, <span class=\"pl-smi\">SocketType<\/span>.<span class=\"pl-smi\">Stream<\/span>, <span class=\"pl-smi\">ProtocolType<\/span>.<span class=\"pl-smi\">Tcp<\/span>);\r\n    <span class=\"pl-smi\">listener<\/span>.<span class=\"pl-en\">Bind<\/span>(<span class=\"pl-k\">new<\/span> <span class=\"pl-en\">IPEndPoint<\/span>(<span class=\"pl-smi\">IPAddress<\/span>.<span class=\"pl-smi\">Loopback<\/span>, <span class=\"pl-c1\">0<\/span>));\r\n    <span class=\"pl-smi\">listener<\/span>.<span class=\"pl-en\">Listen<\/span>(<span class=\"pl-c1\">1<\/span>);\r\n\r\n    <span class=\"pl-k\">var<\/span> <span class=\"pl-smi\">client<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">Socket<\/span>(<span class=\"pl-smi\">AddressFamily<\/span>.<span class=\"pl-smi\">InterNetwork<\/span>, <span class=\"pl-smi\">SocketType<\/span>.<span class=\"pl-smi\">Stream<\/span>, <span class=\"pl-smi\">ProtocolType<\/span>.<span class=\"pl-smi\">Tcp<\/span>);\r\n    <span class=\"pl-smi\">client<\/span>.<span class=\"pl-en\">Connect<\/span>(<span class=\"pl-smi\">listener<\/span>.<span class=\"pl-smi\">LocalEndPoint<\/span>);\r\n\r\n    <span class=\"pl-en\">Socket<\/span> <span class=\"pl-smi\">server<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-smi\">listener<\/span>.<span class=\"pl-en\">Accept<\/span>();\r\n\r\n    <span class=\"pl-smi\">_client<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">NegotiateStream<\/span>(<span class=\"pl-k\">new<\/span> <span class=\"pl-en\">NetworkStream<\/span>(<span class=\"pl-smi\">client<\/span>, <span class=\"pl-c1\">true<\/span>));\r\n    <span class=\"pl-smi\">_server<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">NegotiateStream<\/span>(<span class=\"pl-k\">new<\/span> <span class=\"pl-en\">NetworkStream<\/span>(<span class=\"pl-smi\">server<\/span>, <span class=\"pl-c1\">true<\/span>));\r\n\r\n    <span class=\"pl-smi\">Task<\/span>.<span class=\"pl-en\">WaitAll<\/span>(\r\n        <span class=\"pl-smi\">_client<\/span>.<span class=\"pl-en\">AuthenticateAsClientAsync<\/span>(),\r\n        <span class=\"pl-smi\">_server<\/span>.<span class=\"pl-en\">AuthenticateAsServerAsync<\/span>());\r\n}\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">async<\/span> <span class=\"pl-en\">Task<\/span> <span class=\"pl-en\">WriteRead<\/span>()\r\n{\r\n    <span class=\"pl-k\">for<\/span> (<span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">0<\/span>; <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">&lt;<\/span> <span class=\"pl-c1\">100<\/span>; <span class=\"pl-smi\">i<\/span><span class=\"pl-k\">++<\/span>)\r\n    {\r\n        <span class=\"pl-k\">await<\/span> <span class=\"pl-smi\">_client<\/span>.<span class=\"pl-en\">WriteAsync<\/span>(<span class=\"pl-smi\">_buffer<\/span>);\r\n        <span class=\"pl-k\">await<\/span> <span class=\"pl-smi\">_server<\/span>.<span class=\"pl-en\">ReadAsync<\/span>(<span class=\"pl-smi\">_buffer<\/span>);\r\n    }\r\n}\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">async<\/span> <span class=\"pl-en\">Task<\/span> <span class=\"pl-en\">ReadWrite<\/span>()\r\n{\r\n    <span class=\"pl-k\">for<\/span> (<span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">0<\/span>; <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">&lt;<\/span> <span class=\"pl-c1\">100<\/span>; <span class=\"pl-smi\">i<\/span><span class=\"pl-k\">++<\/span>)\r\n    {\r\n        <span class=\"pl-k\">var<\/span> <span class=\"pl-smi\">r<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-smi\">_server<\/span>.<span class=\"pl-en\">ReadAsync<\/span>(<span class=\"pl-smi\">_buffer<\/span>);\r\n        <span class=\"pl-k\">await<\/span> <span class=\"pl-smi\">_client<\/span>.<span class=\"pl-en\">WriteAsync<\/span>(<span class=\"pl-smi\">_buffer<\/span>);\r\n        <span class=\"pl-k\">await<\/span> <span class=\"pl-smi\">r<\/span>;\r\n    }\r\n}<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>WriteRead<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">1.510 ms<\/td>\n<td align=\"right\">1.00<\/td>\n<td align=\"right\">61600 B<\/td>\n<\/tr>\n<tr>\n<td>WriteRead<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">1.294 ms<\/td>\n<td align=\"right\">0.86<\/td>\n<td align=\"right\">&#8211;<\/td>\n<\/tr>\n<tr>\n<td><\/td>\n<td><\/td>\n<td align=\"right\"><\/td>\n<td align=\"right\"><\/td>\n<td align=\"right\"><\/td>\n<\/tr>\n<tr>\n<td>ReadWrite<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">3.502 ms<\/td>\n<td align=\"right\">1.00<\/td>\n<td align=\"right\">76224 B<\/td>\n<\/tr>\n<tr>\n<td>ReadWrite<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">3.301 ms<\/td>\n<td align=\"right\">0.94<\/td>\n<td align=\"right\">226 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2><a id=\"user-content-json\" class=\"anchor\" aria-hidden=\"true\" href=\"#json\"><svg class=\"octicon octicon-link\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" height=\"16\" aria-hidden=\"true\"><path fill-rule=\"evenodd\" d=\"M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z\"><\/path><\/svg><\/a><a name=\"json\"><\/a>JSON<\/h2>\n<p>There were significant improvements made to the <code>System.Text.Json<\/code> library for .NET 5, and in particular for <code>JsonSerializer<\/code>, but many of those improvements were actually ported back to <code>.NET Core 3.1<\/code> and released as part of servicing fixes (see <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/41771\">dotnet\/corefx#41771<\/a>). Even so, there are some nice improvements that show up in .NET 5 beyond those.<\/p>\n<p><a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/2259\">dotnet\/runtime#2259<\/a> refactored the model for how converters in the <code>JsonSerializer<\/code> handle collections, resulting in measurable improvements, in particular for larger collections:<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">private<\/span> <span class=\"pl-en\">MemoryStream<\/span> <span class=\"pl-smi\">_stream<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">MemoryStream<\/span>();\r\n<span class=\"pl-k\">private<\/span> <span class=\"pl-en\">DateTime<\/span>[] <span class=\"pl-smi\">_array<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-smi\">Enumerable<\/span>.<span class=\"pl-en\">Range<\/span>(<span class=\"pl-c1\">0<\/span>, <span class=\"pl-c1\">1000<\/span>).<span class=\"pl-en\">Select<\/span>(<span class=\"pl-smi\">_<\/span> <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">DateTime<\/span>.<span class=\"pl-smi\">UtcNow<\/span>).<span class=\"pl-en\">ToArray<\/span>();\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-en\">Task<\/span> <span class=\"pl-en\">LargeArray<\/span>()\r\n{\r\n    <span class=\"pl-smi\">_stream<\/span>.<span class=\"pl-smi\">Position<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">0<\/span>;\r\n    <span class=\"pl-k\">return<\/span> <span class=\"pl-smi\">JsonSerializer<\/span>.<span class=\"pl-en\">SerializeAsync<\/span>(<span class=\"pl-smi\">_stream<\/span>, <span class=\"pl-smi\">_array<\/span>);\r\n}<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>LargeArray<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">262.06 us<\/td>\n<td align=\"right\">1.00<\/td>\n<td align=\"right\">24256 B<\/td>\n<\/tr>\n<tr>\n<td>LargeArray<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">191.34 us<\/td>\n<td align=\"right\">0.73<\/td>\n<td align=\"right\">24184 B<\/td>\n<\/tr>\n<tr>\n<td>LargeArray<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">69.40 us<\/td>\n<td align=\"right\">0.26<\/td>\n<td align=\"right\">152 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>but even for smaller ones, e.g.<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">private<\/span> <span class=\"pl-en\">MemoryStream<\/span> <span class=\"pl-smi\">_stream<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">MemoryStream<\/span>();\r\n<span class=\"pl-k\">private<\/span> <span class=\"pl-en\">JsonSerializerOptions<\/span> <span class=\"pl-smi\">_options<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">JsonSerializerOptions<\/span>();\r\n<span class=\"pl-k\">private<\/span> <span class=\"pl-en\">Dictionary<\/span>&lt;<span class=\"pl-k\">string<\/span>, <span class=\"pl-k\">int<\/span>&gt; <span class=\"pl-smi\">_instance<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">Dictionary<\/span>&lt;<span class=\"pl-k\">string<\/span>, <span class=\"pl-k\">int<\/span>&gt;()\r\n{\r\n    { <span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>One<span class=\"pl-pds\">\"<\/span><\/span>, <span class=\"pl-c1\">1<\/span> }, { <span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>Two<span class=\"pl-pds\">\"<\/span><\/span>, <span class=\"pl-c1\">2<\/span> }, { <span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>Three<span class=\"pl-pds\">\"<\/span><\/span>, <span class=\"pl-c1\">3<\/span> }, { <span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>Four<span class=\"pl-pds\">\"<\/span><\/span>, <span class=\"pl-c1\">4<\/span> }, { <span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>Five<span class=\"pl-pds\">\"<\/span><\/span>, <span class=\"pl-c1\">5<\/span> },\r\n    { <span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>Six<span class=\"pl-pds\">\"<\/span><\/span>, <span class=\"pl-c1\">6<\/span> }, { <span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>Seven<span class=\"pl-pds\">\"<\/span><\/span>, <span class=\"pl-c1\">7<\/span> }, { <span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>Eight<span class=\"pl-pds\">\"<\/span><\/span>, <span class=\"pl-c1\">8<\/span> }, { <span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>Nine<span class=\"pl-pds\">\"<\/span><\/span>, <span class=\"pl-c1\">9<\/span> }, { <span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>Ten<span class=\"pl-pds\">\"<\/span><\/span>, <span class=\"pl-c1\">10<\/span> },\r\n};\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">async<\/span> <span class=\"pl-en\">Task<\/span> <span class=\"pl-en\">Dictionary<\/span>()\r\n{\r\n    <span class=\"pl-smi\">_stream<\/span>.<span class=\"pl-smi\">Position<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">0<\/span>;\r\n    <span class=\"pl-k\">await<\/span> <span class=\"pl-smi\">JsonSerializer<\/span>.<span class=\"pl-en\">SerializeAsync<\/span>(<span class=\"pl-smi\">_stream<\/span>, <span class=\"pl-smi\">_instance<\/span>, <span class=\"pl-smi\">_options<\/span>);\r\n}<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Dictionary<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">2,141.7 ns<\/td>\n<td align=\"right\">1.00<\/td>\n<td align=\"right\">209 B<\/td>\n<\/tr>\n<tr>\n<td>Dictionary<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">1,376.6 ns<\/td>\n<td align=\"right\">0.64<\/td>\n<td align=\"right\">208 B<\/td>\n<\/tr>\n<tr>\n<td>Dictionary<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">726.1 ns<\/td>\n<td align=\"right\">0.34<\/td>\n<td align=\"right\">152 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/37976\">dotnet\/runtime#37976<\/a> also helped improve the performance of small types by adding a layer of caching to help retrieve the metadata used internally for the type being serialized and deserialized.<\/p>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">private<\/span> <span class=\"pl-en\">MemoryStream<\/span> <span class=\"pl-smi\">_stream<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">MemoryStream<\/span>();\r\n<span class=\"pl-k\">private<\/span> <span class=\"pl-en\">MyAwesomeType<\/span> <span class=\"pl-smi\">_instance<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">MyAwesomeType<\/span>() { <span class=\"pl-smi\">SomeString<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>Hello<span class=\"pl-pds\">\"<\/span><\/span>, <span class=\"pl-smi\">SomeInt<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">42<\/span>, <span class=\"pl-smi\">SomeByte<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">1<\/span>, <span class=\"pl-smi\">SomeDouble<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">1.234<\/span> };\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-en\">Task<\/span> <span class=\"pl-en\">SimpleType<\/span>()\r\n{\r\n    <span class=\"pl-smi\">_stream<\/span>.<span class=\"pl-smi\">Position<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">0<\/span>;\r\n    <span class=\"pl-k\">return<\/span> <span class=\"pl-smi\">JsonSerializer<\/span>.<span class=\"pl-en\">SerializeAsync<\/span>(<span class=\"pl-smi\">_stream<\/span>, <span class=\"pl-smi\">_instance<\/span>);\r\n}\r\n\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">struct<\/span> <span class=\"pl-en\">MyAwesomeType<\/span>\r\n{\r\n    <span class=\"pl-k\">public<\/span> <span class=\"pl-k\">string<\/span> <span class=\"pl-smi\">SomeString<\/span> { <span class=\"pl-k\">get<\/span>; <span class=\"pl-k\">set<\/span>; }\r\n    <span class=\"pl-k\">public<\/span> <span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">SomeInt<\/span> { <span class=\"pl-k\">get<\/span>; <span class=\"pl-k\">set<\/span>; }\r\n    <span class=\"pl-k\">public<\/span> <span class=\"pl-k\">double<\/span> <span class=\"pl-smi\">SomeDouble<\/span> { <span class=\"pl-k\">get<\/span>; <span class=\"pl-k\">set<\/span>; }\r\n    <span class=\"pl-k\">public<\/span> <span class=\"pl-k\">byte<\/span> <span class=\"pl-smi\">SomeByte<\/span> { <span class=\"pl-k\">get<\/span>; <span class=\"pl-k\">set<\/span>; }\r\n}<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>SimpleType<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">1,204.3 ns<\/td>\n<td align=\"right\">1.00<\/td>\n<td align=\"right\">265 B<\/td>\n<\/tr>\n<tr>\n<td>SimpleType<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">617.2 ns<\/td>\n<td align=\"right\">0.51<\/td>\n<td align=\"right\">192 B<\/td>\n<\/tr>\n<tr>\n<td>SimpleType<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">504.2 ns<\/td>\n<td align=\"right\">0.42<\/td>\n<td align=\"right\">192 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2><a id=\"user-content-trimming\" class=\"anchor\" aria-hidden=\"true\" href=\"#trimming\"><svg class=\"octicon octicon-link\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" height=\"16\" aria-hidden=\"true\"><path fill-rule=\"evenodd\" d=\"M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z\"><\/path><\/svg><\/a><a name=\"trimming\"><\/a>Trimming<\/h2>\n<p>Up until .NET Core 3.0, .NET Core was focused primarily on server workloads, with ASP.NET Core being the preeminent application model on the platform.  With .NET Core 3.0, Windows Forms and Windows Presentation Foundation (WPF) were added, bringing .NET Core to desktop applications.  With .NET Core 3.2, Blazor support for browser applications was released, but based on mono and the library&#8217;s from the mono stack.  With .NET 5, Blazor uses the .NET 5 mono runtime and all of the same .NET 5 libraries shared by every other app model.  This brings an important twist to performance: size.  While code size has always been an important issue (and is important for .NET Native applications), the scale required for a successful browser-based deployment really brings it to the forefront, as we need to be concerned about download size in a way we haven&#8217;t focused with .NET Core in the past.<\/p>\n<p>To assist with application size, the .NET SDK includes a <a href=\"https:\/\/github.com\/mono\/linker\">linker<\/a> that&#8217;s capable of trimming away unused portions of the app, not only at the assembly level, but also at the member level, doing static analysis to determine what code is and isn&#8217;t used and throwing away the parts that aren&#8217;t.  This brings an interesting set of challenges: some coding patterns employed for convenience or simplified API consumption are difficult for the linker to analyze in a way that would allow it to throw away much of anything. As a result, one of the big performance-related efforts in .NET 5 is around improving the trimmability of the libraries.<\/p>\n<p>There are two facets to this:<\/p>\n<ol>\n<li>Not removing too much (correctness). We need to make sure that the libraries can actually be trimmed safely.  In particular, reflection (even reflection only over public surface area) makes it difficult for the linker to find all members that may actually be used, e.g. code in one place in the app uses <code>typeof<\/code> to get a <code>Type<\/code> instance, and passes that to another part of the app that uses <code>GetMethod<\/code> to retrieve a <code>MethodInfo<\/code> for a public method on that type, and passes that <code>MethodInfo<\/code> to another part of the app which invokes it.  To address that, the linker employs heuristics to minimize false positives on APIs that can be removed, but to help it further, a bunch of attributes have been added in .NET 5 that enable developers to make such implicit dependencies explicit, to suppress warnings from the linker on things it might deem to be unsafe but actually aren&#8217;t, and to force warnings onto consumers to say that certain portions of the surface area simply aren&#8217;t amenable to linking.  See <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/35387\">dotnet\/runtime#35387<\/a>.<\/li>\n<li>Removing as much as possible (performance). We need to minimize the reasons why pieces of code need to be kept around.  This can manifest as refactoring implementations to change calling patterns, it can manifest as using conditions the linker can recognize and use to trim out whole swaths of code, and it can manifest as using finer-grained controls over exactly what needs to be kept and why.<\/li>\n<\/ol>\n<p>There are many examples of the second, so I&#8217;ll highlight a few to showcase the various techniques employed:<\/p>\n<ul>\n<li>Removing unnecessary code, such as in <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/41177\">dotnet\/corefx#41177<\/a>.  Here we find a lot of antiquated <code>TraceSource<\/code>\/<code>Switch<\/code> usage, which only existed to enable some debug-only tracing and asserts, but which no one was actually using anymore, and which were causing some of these types to be seen by the linker as used even in release builds.<\/li>\n<li>Removing antiquated code that once served a purpose but no longer does, such as in <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/26750\">dotnet\/coreclr#26750<\/a>.  This type used to be important to help improve ngen (the predecessor of crossgen), but it&#8217;s no longer needed.  Or such as in <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/26603\">dotnet\/coreclr#26603<\/a>, where some code was no longer actually used, but was causing types to be kept around nonetheless.<\/li>\n<li>Removing duplicate code, such as in <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/41165\">dotnet\/corefx#41165<\/a>,  <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/40935\">dotnet\/corefx#40935<\/a>, and <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/26589\">dotnet\/coreclr#26589<\/a>.  Several libraries were using their own private copy of some hash code helper routines, resulting in each having its own copy of IL for that functionality.  They could instead be updated to use the shared <code>HashCode<\/code> type, which not only helps in IL size and trimming, but also helps to avoid extra code that needs to be maintained and to better modernize the codebase to utilize the functionality we&#8217;re recommending others use as well.<\/li>\n<li>Using different APIs, such as in <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/41143\">dotnet\/corefx#41143<\/a>.  Code was using extension helper methods that were resulting in additional types being pulled in, but the &#8220;help&#8221; provided actually saved little-to-no code.  A potentially better example is <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/41142\">dotnet\/corefx#41142<\/a>, which removed use of the non-generic <code>Queue<\/code> and <code>Stack<\/code> types from the <code>System.Xml<\/code> implementations, instead using only the generic implementations (<a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/26597\">dotnet\/coreclr#26597<\/a> did something similar, with <code>WeakReference<\/code>).  Or <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/41111\">dotnet\/corefx#41111<\/a>, which changed some code in the XML library to use <code>HttpClient<\/code> rather than <code>WebRequest<\/code>, which allowed removing the entire <code>System.Net.Requests<\/code> dependency.  Or <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/41110\">dotnet\/corefx#41110<\/a>, which avoided <code>System.Net.Http<\/code> needing to use <code>System.Text.RegularExpressions<\/code>: it was unnecessary complication that could be replaced with a tiny amount of code specific to that use case.  Another example is <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/26602\">dotnet\/coreclr#26602<\/a>, where some code was unnecessarily using <code>string.ToLower()<\/code>, and replacing its usage was not only more efficient, it helped to enable that overload to be trimmed away by default. <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/26601\">dotnet\/coreclr#26601<\/a> is similar.<\/li>\n<li>Rerouting logic to avoid rooting large swaths of unneeded code, such as in <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/41075\">dotnet\/corefx#41075<\/a>.  If code just used <code>new Regex(string)<\/code>, that internally just delegated to the longer <code>Regex(string, RegexOptions)<\/code> constructor, and that constructor needs to be able to use the internal <code>RegexCompiler<\/code> in case the <code>RegexOptions.Compiled<\/code> is used.  By tweaking the code paths such that the <code>Regex(string)<\/code> constructor doesn&#8217;t depend on the <code>Regex(string, RegexOptions)<\/code> constructor, it becomes trivial for the linker to remove the whole <code>RegexCompiler<\/code> code path (and its dependency on reflection emit) if it&#8217;s not otherwise used.  <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/41101\">dotnet\/corefx#41101<\/a> then took better advantage of this by ensuring the shorter calls could be used when possible.  This is a fairly common pattern for avoiding such unnecessary rooting.   Consider <code>Environment.GetEnvironmentVariable(string)<\/code>.  It used to call to the <code>Environment.GetEnvironmentVariable(string, EnvironmentVariableTarget)<\/code> overload, passing in the default <code>EnvironmentVariableTarget.Process<\/code>.  Instead, the dependency was inverted: the <code>Environment.GetEnvironmentVariable(string)<\/code> overload contains only the logic for handling the <code>Process<\/code> case, and then the longer overload has <code>if (target == EnvironmentVariableTarget.Process) return GetEnvironmentVariable(name);<\/code>.  That way, the most common case of just using the simple overload doesn&#8217;t pull in all of the code paths necessary to handle the other much less common targets. <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/40944\">dotnet\/corefx#0944<\/a> is another example: for apps that just write to the console rather than also read from the console, it enables a lot more of the console internals to be linked away.<\/li>\n<li>Using lazy initialization, especially for static fields, such as in <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/37909\">dotnet\/runtime#37909<\/a>.  If a type is used and any of its static methods are called, its static constructor will need to be kept, and any fields initialized by the static constructor will also need to be kept.  If such fields are instead lazily initialized on first use, the fields will only need to be kept if the code that performs that lazy initialization is reachable.<\/li>\n<li>Using feature switches, such as in <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/38129\">dotnet\/runtime#38129<\/a> (further benefited from in <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/38828\">dotnet\/runtime#38828<\/a>).  In many cases, whole feature sets may not be necessary for an app, such as logging or debugging support, but from the linker&#8217;s perspective, it sees the code being used and thus is forced to keep it.  However, the linker is capable of being told about replacement values it should use for known properties, e.g. you can tell the linker that when it sees a <code>Boolean<\/code>-returning <code>SomeClass.SomeProperty<\/code>, it should replace it with a constant false, which will in turn enable it to trim out any code guarded by that property.<\/li>\n<li>Ensuring that test-only code is only in tests, as in <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/38729\">dotnet\/runtime#38729<\/a>.  In this case, some code intended only to be used for testing was getting compiled into the product assembly, and its tendrils were causing <code>System.Linq.Expressions<\/code> to be brought in as well.<\/li>\n<\/ul>\n<h2><a id=\"user-content-peanut-butter\" class=\"anchor\" aria-hidden=\"true\" href=\"#peanut-butter\"><svg class=\"octicon octicon-link\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" height=\"16\" aria-hidden=\"true\"><path fill-rule=\"evenodd\" d=\"M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z\"><\/path><\/svg><\/a><a name=\"peanut-butter\"><\/a>Peanut Butter<\/h2>\n<p>In my <a href=\"https:\/\/devblogs.microsoft.com\/dotnet\/performance-improvements-in-net-core-3-0\/\" rel=\"nofollow\">.NET Core 3.0 performance post<\/a>, I talked about &#8220;peanut butter&#8221;, lots of small improvements here and there that individually don&#8217;t necessarily make a huge difference, but are addressing costs that are otherwise smeared across the code, and fixing a bunch of these en mass can make a measurable difference.  As with previous releases, there are a myriad of these welcome improvements that have gone into .NET 5.  Here&#8217;s a smattering:<\/p>\n<ul>\n<li>Faster assembly loading.  For historical reasons, .NET Core had a lot of tiny implementation assemblies, with the split serving little meaningful purpose.  Yet every additional assembly that needs to be loaded adds overhead. <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/2189\">dotnet\/runtime#2189<\/a> and <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/31991\">dotnet\/runtime#31991<\/a> merged a bunch of small assemblies together in order to reduce the number that need to be loaded.<\/li>\n<li>Faster math. <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/27272\">dotnet\/coreclr#27272<\/a> improved checks for NaN, making the code for <code>double.IsNan<\/code> and <code>float.IsNan<\/code> smaller code and be faster. <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/35456\">dotnet\/runtime#35456<\/a> from <a href=\"https:\/\/github.com\/john-h-k\">@john-h-k<\/a> is a nice example of using SSE and AMD64 intrinsics to measurably speed up <code>Math.CopySign<\/code> and <code>MathF.CopySign<\/code>. And <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/34452\">dotnet\/runtime#34452<\/a> from <a href=\"https:\/\/github.com\/Marusyk\">@Marusyk<\/a> improved hash code generation for <code>Matrix3x2<\/code> and <code>Matrix4x4<\/code>.<\/li>\n<li>Faster crypto. In place of open-coded equivalents, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/36881\">dotnet\/runtime#36881<\/a> from <a href=\"https:\/\/github.com\/vcsjones\">@vcsjones<\/a> used the optimized <code>BinaryPrimitives<\/code> in various places within <code>System.Security.Cryptography<\/code>, yielding more maintainable and faster code, and <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/39600\">dotnet\/corefx#39600<\/a> from <a href=\"https:\/\/github.com\/VladimirKhvostov\">@VladimirKhvostov<\/a> optimized the out-of-favor-but-still-in-use <code>CryptoConfig.CreateFromName<\/code> method to be upwards of 10x faster.<\/li>\n<li>Faster interop. <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/36257\">dotnet\/runtime#36257<\/a> reduced entrypoint probing (where the runtime tries to find the exact native function to use for a P\/Invoke) by avoiding the Windows-specific &#8220;ExactSpelling&#8221; checks when on Linux and by setting it to true for more methods when on Windows. <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/33020\">dotnet\/runtime#33020<\/a> from <a href=\"https:\/\/github.com\/NextTurn\">@NextTurn<\/a> used <code>sizeof(T)<\/code> instead of <code>Marshal.SizeOf(Type)<\/code>\/<code>Marshal.SizeOf&lt;T&gt;()<\/code> in a bunch of places, as the former has much less overhead than the latter.  And <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/33967\">dotnet\/runtime#33967<\/a>, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/35098\">dotnet\/runtime#35098<\/a>, and <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/39059\">dotnet\/runtime#39059<\/a> reduced interop and marshaling costs in several libraries by using more blittable types, using spans and ref locals, using <code>sizeof<\/code>, and so on.<\/li>\n<li>Faster reflection emit. Reflection emit enables developers to write out IL at run-time, and if you can emit the same instructions in a way that takes up less space, you can save on the managed allocations needed to store the sequence.  A variety of IL opcodes have shorter variants for more common cases, e.g. <code>Ldc_I4<\/code> can be used to load any <code>int<\/code> value as a constant, but <code>Ldc_I4_S<\/code> is shorter and can be used to load any <code>sbyte<\/code>, while <code>Ldc_I4_1<\/code> is shorter still and is used to load the value <code>1<\/code>.  Some libraries take advantage of this and have their own mapping table as part of their emit code to employ the shortest relevant opcode; others don&#8217;t.  <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/35427\">dotnet\/runtime#35427<\/a>  just moved such a mapping into the <code>ILGenerator<\/code> itself, enabling us to delete all of the customized implementations in the libraries in dotnet\/runtime, and get the benefits of the mapping in all of those and others automatically.<\/li>\n<li>Faster I\/O. <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/37705\">dotnet\/runtime#37705<\/a> from <a href=\"https:\/\/github.com\/bbartels\">@bbartels<\/a> improved <code>BinaryWriter.Write(string)<\/code>, giving it a fast path for various common inputs.  And <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/35978\">dotnet\/runtime#35978<\/a> improved how relationships are managed inside <code>System.IO.Packaging<\/code> by using O(1) instead of O(N) lookups.<\/li>\n<li>Lots of small allocations here and there. For example, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/35005\">dotnet\/runtime#35005<\/a> removes a <code>MemoryStream<\/code> allocation in <code>ByteArrayContent<\/code>, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/36228\">dotnet\/runtime#36228<\/a> from <a href=\"https:\/\/github.com\/Youssef1313\">@Youssef1313 <\/a> removes a <code>List&lt;T&gt;<\/code> and underlying <code>T[]<\/code> allocation in <code>System.Reflection.MetadataLoadContext<\/code>, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/32297\">dotnet\/runtime#32297<\/a> removes a <code>char[]<\/code> allocation in <code>XmlConverter.StripWhitespace<\/code>, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/32276\">dotnet\/runtime#32276<\/a> removes a <code>byte[]<\/code> allocation on startup in <code>EventSource<\/code>, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/32298\">dotnet\/runtime#32298<\/a> removes a <code>char[]<\/code> allocation in <code>HttpUtility<\/code>, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/32299\">dotnet\/runtime#32299<\/a> removes potentially several <code>char[]<\/code>s in <code>ModuleBuilder<\/code>, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/32301\">dotnet\/runtime#32301<\/a> removes some <code>char[]<\/code> allocations from <code>String.Split<\/code> usage, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/32422\">dotnet\/runtime#32422<\/a> removes a <code>char[]<\/code> allocation in <code>AsnFormatter<\/code>, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/34551\">dotnet\/runtime#34551<\/a> removes several string allocations in <code>System.IO.FileSystem<\/code>, <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/41363\">dotnet\/corefx#41363<\/a> removes a <code>char[]<\/code> allocation in <code>JsonCamelCaseNamingPolicy<\/code>, <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/25631\">dotnet\/coreclr#25631<\/a> removes string allocations from <code>MethodBase.ToString()<\/code>, <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/41274\">dotnet\/corefx#41274<\/a> removes some unnecessary strings from <code>CertificatePal.AppendPrivateKeyInfo<\/code>, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/1155\">dotnet\/runtime#1155<\/a> from <a href=\"https:\/\/github.com\/Wraith2\">@Wraith2<\/a> removes temporary arrays from <code>SqlDecimal<\/code> via spans, <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/26584\">dotnet\/coreclr#26584<\/a> removed boxing that previously occurred when using methods like <code>GetHashCode<\/code> on some tuples, <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/27451\">dotnet\/coreclr#27451<\/a> removed several allocations from reflecting over custom attributes, <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/27013\">dotnet\/coreclr#27013<\/a> remove some string allocations from concatenations by replacing some inputs with consts, and <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/34774\">dotnet\/runtime#34774<\/a> removed some temporary <code>char[]<\/code> allocations from <code>string.Normalize<\/code>.<\/li>\n<\/ul>\n<h2><a id=\"user-content-new-performance-focused-apis\" class=\"anchor\" aria-hidden=\"true\" href=\"#new-performance-focused-apis\"><svg class=\"octicon octicon-link\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" height=\"16\" aria-hidden=\"true\"><path fill-rule=\"evenodd\" d=\"M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z\"><\/path><\/svg><\/a><a name=\"new-performance-focused-apis\"><\/a>New Performance-focused APIs<\/h2>\n<p>This post has highlighted a plethora of existing APIs that simply get better when running on .NET 5.  In addition, there are lots of new APIs in .NET 5, some of which are focused on helping developers to write faster code (many more are focused on enabling developers to perform the same operations with less code, or on enabling new functionality that wasn&#8217;t easily accomplished previously) .  Here are a few highlights, including in some cases where the APIs are already being used internally by the rest of the libraries to lower costs in existing APIs:<\/p>\n<ul>\n<li><code>Decimal(ReadOnlySpan&lt;int&gt;)<\/code> \/ <code>Decimal.TryGetBits<\/code> \/ <code>Decimal.GetBits<\/code> (<a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/32155\">dotnet\/runtime#32155<\/a>): In previous releases we added lots of span-based methods for efficiently interacting with primitives, and <code>decimal<\/code> did get span-based <code>TryFormat<\/code> and <code>{Try}Parse<\/code> methods, but these new methods in .NET 5 enable efficiently constructing a <code>decimal<\/code> from a span as well as extracting the bits from a <code>decimal<\/code> into a span. You can see this support already being used in <code>SQLDecimal<\/code>, in <code>BigInteger<\/code>, in <code>System.Linq.Expressions<\/code>, and in <code>System.Reflection.Metadata<\/code>.<\/li>\n<li><code>MemoryExtensions.Sort<\/code> (<a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/27700\">dotnet\/coreclr#27700<\/a>).  I talked about this earlier: new <code>Sort&lt;T&gt;<\/code> and <code>Sort&lt;TKey, TValue&gt;<\/code> extension methods enable sorting arbitrary spans of data.  These new public methods are already being used in <code>Array<\/code> itself (<a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/27703\">dotnet\/coreclr#27703<\/a>) as well as in <code>System.Linq<\/code> (<a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/1888\">dotnet\/runtime#1888<\/a>).<\/li>\n<li><code>GC.AllocateArray&lt;T&gt;<\/code> and <code>GC.AllocateUninitializedArray&lt;T&gt;<\/code> (<a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/33526\">dotnet\/runtime#33526<\/a>). These new APIs are like using <code>new T[length]<\/code>, except with two specialized behaviors: using the <code>Uninitialized<\/code> variant lets the GC hand back arrays without forcefully clearing them (unless they contain references, in which case it must clear at least those), and passing <code>true<\/code> to the <code>bool pinned<\/code> argument returns arrays from the new Pinned Object Heap (POH), from which arrays are guaranteed to never be moved in memory such that they can be passed to external code without pinning them (i.e. without using <code>fixed<\/code> or <code>GCHandle<\/code>). <code>StringBuilder<\/code> gained support for using the uninitialized feature (<a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/27364\">dotnet\/coreclr#27364<\/a>) to reduce the cost of expanding its internal storage, as did the new <code>TranscodingStream<\/code> (<a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/35145\">dotnet\/runtime#35145<\/a>), and even the new support for importing X509 certificates and collections from Privacy Enhanced Mail Certificate (PEM) files (<a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/38280\">dotnet\/runtime#38280<\/a>).  You can also see the pinning support being put to good use in the Windows implementation of <code>SocketsAsyncEventArgs<\/code> (<a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/34175\">dotnet\/runtime#34175<\/a>), where it needs to allocate pinned buffers for operations like <code>ReceiveMessageFrom<\/code>.<\/li>\n<li><code>StringSplitOptions.TrimEntries<\/code> (<a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/35740\">dotnet\/runtime#35740<\/a>). <code>String.Split<\/code> overloads accept a <code>StringSplitOptions<\/code> enum that enables <code>Split<\/code> to optionally remove empty entries from the resulting array.  The new <code>TrimEntries<\/code> enum value works with or without this option to first trim results.  Regardless of whether <code>RemoveEmptyEntries<\/code> is used, this enables <code>Split<\/code> to avoid allocating strings for entries that would become empty once trimmed (or for the allocated strings to be smaller), and then in conjunction with <code>RemoveEmptyEntries<\/code> for the resulting array to be smaller in such cases.  Also, it was found to be common for consumers of <code>Split<\/code> to subsequently call <code>Trim()<\/code> on each string, so doing the trimming as part of the <code>Split<\/code> call can eliminate extra string allocations for the caller.  This is used in a handful of types and methods in dotnet\/runtime, such as by <code>DataTable<\/code>, <code>HttpListener<\/code>, and <code>SocketsHttpHandler<\/code>.<\/li>\n<li><code>BinaryPrimitives.{Try}{Read\/Write}{Double\/Single}{Big\/Little}Endian<\/code> (<a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/6864\">dotnet\/runtime#6864<\/a>).  You can see these APIs being used, for example, in the new Concise Binary Object Representation (CBOR) support added in .NET 5 (<a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/34046\">dotnet\/runtime#34046<\/a>).<\/li>\n<li><code>MailAddress.TryCreate<\/code> (<a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/1052\">dotnet\/runtime#1052<\/a> from <a href=\"https:\/\/github.com\/MarcoRossignoli\">@MarcoRossignoli<\/a>) and <code>PhysicalAddress.{Try}Parse<\/code> (<a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/1057\">dotnet\/runtime#1057<\/a>).  The new <code>Try<\/code> overloads enable parsing without exceptions, and the span-based overloads enable parsing addresses from within larger contexts without incurring allocations for substrings.<\/li>\n<li><code>SocketAsyncEventArgs(bool unsafeSuppressExecutionContextFlow)<\/code> (<a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/706\">dotnet\/runtime#706<\/a> from <a href=\"https:\/\/github.com\/MarcoRossignoli\">@MarcoRossignoli<\/a>). By default, asynchronous operations in .NET flow <code>ExecutionContext<\/code>, which means call sites implicitly &#8220;capture&#8221; the current <code>ExecutionContext<\/code> and &#8220;restore&#8221; it when executing the continuation code.  This is how <code>AsyncLocal&lt;T&gt;<\/code> values propagate through asynchronous operations.  Such flowing is generally cheap, but there is still a small amount of overhead.  As socket operations can be performance-critical, this new constructor on <code>SocketAsyncEventArgs<\/code> constructor can be used when the developer knows that the context won&#8217;t be needed in the callbacks raised by the instance.  You can see this used, for example, in <code>SocketHttpHandler<\/code>&#8216;s internal <code>ConnectHelper<\/code> (<a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/1381\">dotnet\/runtime#1381<\/a>).<\/li>\n<li><code>Unsafe.SkipInit&lt;T&gt;<\/code> (<a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/41995\">dotnet\/corefx#41995<\/a>).  The C# compiler&#8217;s <a href=\"https:\/\/en.wikipedia.org\/wiki\/Definite_assignment_analysis\" rel=\"nofollow\">definite assignment<\/a> rules require that parameters and locals be assigned to in a variety of situations.  In very specific cases, that can require an extra assignment that isn&#8217;t actually needed, which, when counting every instruction and memory-write in performance-sensitive code, can be undesirable.  This method effectively enables code to pretend it wrote to the parameter or local without actually having done so.  This is used in various operations on <code>Decimal<\/code> (<a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/27377\">dotnet\/runtime#272377<\/a>), in some of the new APIs on <code>IntPtr<\/code> and <code>UIntPtr<\/code> (<a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/307\">dotnet\/runtime#307<\/a> from <a href=\"https:\/\/github.com\/john-h-k\">@john-h-k<\/a>), in <code>Matrix4x4<\/code> (<a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/36323\">dotnet\/runtime#36323<\/a> from <a href=\"https:\/\/github.com\/eanova\">@eanova<\/a>), in <code>Utf8Parser<\/code> (<a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/33507\">dotnet\/runtime#33507<\/a>), and in <code>UTF8Encoding<\/code> (<a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/31904\">dotnet\/runtime#31904<\/a>).<\/li>\n<li><code>SuppressGCTransitionAttribute<\/code> (<a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/26458\">dotnet\/coreclr#26458<\/a>).  This is an advanced attribute for use with P\/Invokes that enables the runtime to suppress the <a href=\"https:\/\/github.com\/dotnet\/runtime\/blob\/4fdf9ff8812869dcf957ce0d2eb07c0d5779d1c6\/docs\/coding-guidelines\/clr-code-guide.md#2.1.8\">cooperative-to-preemptive mode transition<\/a> it would normally incur, as it does when making internal <a href=\"https:\/\/github.com\/dotnet\/runtime\/blob\/4fdf9ff8812869dcf957ce0d2eb07c0d5779d1c6\/docs\/design\/coreclr\/botr\/corelib.md#calling-from-managed-to-native-code\">&#8220;FCalls&#8221;<\/a> into the runtime itself.  This attribute needs to be used with extreme care (see the <a href=\"https:\/\/github.com\/dotnet\/runtime\/blob\/c8a994222d8b6cb4202a85570ee860e4b34a89e9\/src\/libraries\/System.Private.CoreLib\/src\/System\/Runtime\/InteropServices\/SuppressGCTransitionAttribute.cs#L46-L51\">detailed comments<\/a> in the attribute&#8217;s description).  Even so, you can see it&#8217;s used by a few methods in Corelib (<a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/27473\">dotnet\/runtime#27473<\/a>), and there are pending changes for the JIT that will make it even better (<a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/39111\">dotnet\/runtime#39111<\/a>).<\/li>\n<li><code>CollectionsMarshal.AsSpan<\/code> (<a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/26867\">dotnet\/coreclr#26867<\/a>). This method gives callers span-based access to the backing store of a <code>List&lt;T&gt;<\/code>.<\/li>\n<li><code>MemoryMarshal.GetArrayDataReference<\/code> (<a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/1036\">dotnet\/runtime#1036<\/a>).  This method returns a reference to the first element of an array (or to where it would have been if the array wasn&#8217;t empty).  No validation is performed, so it&#8217;s both dangerous and very fast.  This method is used in a bunch of places in Corelib, all for very low-level optimizations.  For example, it&#8217;s used as part of the previously-discussed cast helpers implemented in C# (<a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/1068\">dotnet\/runtime#1068<\/a>) and as part of using <code>Buffer.Memmove<\/code> in various places (<a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/35733\">dotnet\/runtime#35733<\/a>).<\/li>\n<li><code>SslStreamCertificateContext<\/code> (<a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/38364\">dotnet\/runtime#38364<\/a>]. When <code>SslStream.AuthenticateAsServer{Async}<\/code> is provided with the certificate to use, it tries to build the complete X509 chain, an operation which can have varying amounts of associated cost and even perform I\/O if additional certificate information needs to be downloaded. In some circumstances, that could happen for the same certificate used to create any number of <code>SslStream<\/code> instances, resulting in duplicated expense. <code>SslStreamCertificateContext<\/code> serves as a sort of cache for the results of such a computation, with the work able to be performed once in advanced and then passed to <code>SslStream<\/code> for any amount of reuse.  This helps to avoid that duplicated effort, while also giving callers more predictability and control over any failures.<\/li>\n<li><code>HttpClient.Send<\/code> (<a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/34948\">dotnet\/runtime#34948<\/a>).  It may be strange to some readers to see a synchronous API called out here.  While <code>HttpClient<\/code> was designed for asynchronous usage, we have found situations where developers are unable to utilize asynchrony, such as when implementing an interface method that&#8217;s only synchronous, or being called from a native operation that requires a response synchronously, yet the need to download data is ubiquitous.  In these cases, forcing the developer to perform &#8220;sync over async&#8221; (meaning performing an asynchronous operation and then blocking waiting for it to complete) performs and scales worse than if a synchronous operation were used in the first place.  As such, .NET 5 sees limited new synchronous surface area added to <code>HttpClient<\/code> and its supporting types.  dotnet\/runtime does itself have use for this in a few places.  For example, on Linux when the <code>X509Certificates<\/code> support needs to download a certificate as part of chain building, it is generally on a code path that needs to be synchronous all the way back to an OpenSSL callback; previously this would use <code>HttpClient.GetByteArrayAsync<\/code> and then block waiting for it to complete, but that was shown to cause noticeable scalability problems for some users&#8230; <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/38502\">dotnet\/runtime#38502<\/a> changed it to use the new sync API instead.  Similarly, the older <code>HttpWebRequest<\/code> type is built on top of <code>HttpClient<\/code>, and in previous releases of .NET Core, its synchronous <code>GetResponse()<\/code> method was actually doing sync-over-async; as of <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/38511\">dotnet\/runtime#39511<\/a>, it&#8217;s now using the synchronous <code>HttpClient.Send<\/code> method.<\/li>\n<li><code>HttpContent.ReadAsStream<\/code> (<a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/37494\">dotnet\/runtime#37494<\/a>).  This is logically part of the <code>HttpClient.Send<\/code> effort mentioned above, but I&#8217;m calling it out separately because it&#8217;s useful on its own.  The existing <code>ReadAsStreamAsync<\/code> method is a bit of an oddity.  It was originally exposed as async just in case a custom HttpContent-derived type would require that, but it&#8217;s extremely rare to find any overrides of <code>HttpContent.ReadAsStreamAsync<\/code> that aren&#8217;t synchronous, and the implementation returned from requests made on <code>HttpClient<\/code> are all synchronous.  As a result, callers end up paying for the <code>Task&lt;Stream&gt;<\/code> wrapper object for the returned <code>Stream<\/code>, when in practice it&#8217;s always immediately available.  Thus, the new <code>ReadAsStream<\/code> method can actually be useful in such cases to avoid the extra <code>Task&lt;Stream&gt;<\/code> allocation.  You can see it being employed in that manner in dotnet\/runtime in various places, such as by the <code>ClientWebSocket<\/code> implementation.<\/li>\n<li>Non-generic <code>TaskCompletionSource<\/code> (<a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/37452\">dotnet\/runtime#37452<\/a>).  Since <code>Task<\/code> and <code>Task&lt;T&gt;<\/code> were introduced, <code>TaskCompletionSource&lt;T&gt;<\/code> was a way of constructing tasks that would be completed manually by the caller via it&#8217;s <code>{Try}Set<\/code> methods.  And since <code>Task&lt;T&gt;<\/code> derives from <code>Task<\/code>, the single generic type could be used for both generic <code>Task&lt;T&gt;<\/code> and non-generic <code>Task<\/code> needs.  However, this wasn&#8217;t always obvious to folks, leading to confusion about the right solution for the non-generic case, compounded by the ambiguity about which type to use for <code>T<\/code> when it was just throw-away.  .NET 5 adds a non-generic <code>TaskCompletionSource<\/code>, which not only eliminates the confusion, but helps a bit with performance as well, as it avoids the task needing to carry around space for a useless <code>T<\/code>.<\/li>\n<li><code>Task.WhenAny(Task, Task)<\/code> (<a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/34288\">dotnet\/runtime#34288<\/a> and <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/37448\">dotnet\/runtime#37488<\/a>).  Previously, any number of tasks could be passed to <code>Task.WhenAny<\/code> via its overload that accepts a <code>params Task[] tasks<\/code>.  However, in analyzing uses of this method, it was found that vast majority of call sites always passed two tasks.  The new public overload optimizes for that case, and a neat thing about this overload is that just recompiling those call sites will cause the compiler to bind to the new faster overload instead of the old one, so no code changes are needed to benefit from the overload.<\/li>\n<\/ul>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">private<\/span> <span class=\"pl-en\">Task<\/span> <span class=\"pl-smi\">_incomplete<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">TaskCompletionSource<\/span>&lt;<span class=\"pl-k\">bool<\/span>&gt;().<span class=\"pl-smi\">Task<\/span>;\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-en\">Task<\/span> <span class=\"pl-en\">OneAlreadyCompleted<\/span>() <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">Task<\/span>.<span class=\"pl-en\">WhenAny<\/span>(<span class=\"pl-smi\">Task<\/span>.<span class=\"pl-smi\">CompletedTask<\/span>, <span class=\"pl-smi\">_incomplete<\/span>);\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-en\">Task<\/span> <span class=\"pl-en\">AsyncCompletion<\/span>()\r\n{\r\n    <span class=\"pl-en\">AsyncTaskMethodBuilder<\/span> <span class=\"pl-smi\">atmb<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-smi\">default<\/span>;\r\n    <span class=\"pl-en\">Task<\/span> <span class=\"pl-smi\">result<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-smi\">Task<\/span>.<span class=\"pl-en\">WhenAny<\/span>(<span class=\"pl-smi\">atmb<\/span>.<span class=\"pl-smi\">Task<\/span>, <span class=\"pl-smi\">_incomplete<\/span>);\r\n    <span class=\"pl-smi\">atmb<\/span>.<span class=\"pl-en\">SetResult<\/span>();\r\n    <span class=\"pl-k\">return<\/span> <span class=\"pl-smi\">result<\/span>;\r\n}<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Runtime<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>OneAlreadyCompleted<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">125.387 ns<\/td>\n<td align=\"right\">1.00<\/td>\n<td align=\"right\">217 B<\/td>\n<\/tr>\n<tr>\n<td>OneAlreadyCompleted<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">89.040 ns<\/td>\n<td align=\"right\">0.71<\/td>\n<td align=\"right\">200 B<\/td>\n<\/tr>\n<tr>\n<td>OneAlreadyCompleted<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">8.391 ns<\/td>\n<td align=\"right\">0.07<\/td>\n<td align=\"right\">72 B<\/td>\n<\/tr>\n<tr>\n<td><\/td>\n<td><\/td>\n<td align=\"right\"><\/td>\n<td align=\"right\"><\/td>\n<td align=\"right\"><\/td>\n<\/tr>\n<tr>\n<td>AsyncCompletion<\/td>\n<td>.NET FW 4.8<\/td>\n<td align=\"right\">289.042 ns<\/td>\n<td align=\"right\">1.00<\/td>\n<td align=\"right\">257 B<\/td>\n<\/tr>\n<tr>\n<td>AsyncCompletion<\/td>\n<td>.NET Core 3.1<\/td>\n<td align=\"right\">195.879 ns<\/td>\n<td align=\"right\">0.68<\/td>\n<td align=\"right\">240 B<\/td>\n<\/tr>\n<tr>\n<td>AsyncCompletion<\/td>\n<td>.NET 5.0<\/td>\n<td align=\"right\">150.523 ns<\/td>\n<td align=\"right\">0.52<\/td>\n<td align=\"right\">160 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<ul>\n<li>And too many <code>System.Runtime.Intrinsics<\/code> methods to even begin to mention!<\/li>\n<\/ul>\n<h2><a id=\"user-content-new-performance-focused-analyzers\" class=\"anchor\" aria-hidden=\"true\" href=\"#new-performance-focused-analyzers\"><svg class=\"octicon octicon-link\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" height=\"16\" aria-hidden=\"true\"><path fill-rule=\"evenodd\" d=\"M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z\"><\/path><\/svg><\/a><a name=\"new-performance-focused-analyzers\"><\/a>New Performance-focused Analyzers<\/h2>\n<p>The C# &#8220;Roslyn&#8221; compiler has a very useful extension point called <a href=\"https:\/\/docs.microsoft.com\/en-us\/visualstudio\/code-quality\/roslyn-analyzers-overview\" rel=\"nofollow\">&#8220;analyzers&#8221;<\/a>, or &#8220;Roslyn analyzers&#8221;.  Analyzers plug into the compiler and are given full read access to all of the source the compiler is operating over as well as the compiler&#8217;s parsing and modeling of that code, which enables developers to plug in their own custom analyses to a compilation.  On top of that, analyzers are not only runnable as part of builds but also in the IDE as the developer is writing their code, which enables analyzers to present suggestions, warnings, and errors on how the developer may improve their code.  Analyzer developers can also author &#8220;fixers&#8221; that can be invoked in the IDE and automatically replace the flagged code with a &#8220;fixed&#8221; alternatives.  And all of these components can be distributed via NuGet packages, making it easy for developers to consume arbitrary analyses written by others.<\/p>\n<p>The <a href=\"https:\/\/github.com\/dotnet\/roslyn-analyzers\">Roslyn Analyzers<\/a> repo contains a bunch of custom analyzers, including ports of the old <a href=\"https:\/\/docs.microsoft.com\/en-us\/visualstudio\/code-quality\/install-fxcop-analyzers\" rel=\"nofollow\">FxCop rules<\/a>.  It also contains new analyzers, and for .NET 5, the .NET SDK will include a large number of these analyzers automatically, including brand new ones that have been written for this release.  Multiple of these rules are either focused on or at least partially related to performance.  Here are a few examples:<\/p>\n<ul>\n<li><a href=\"https:\/\/github.com\/dotnet\/roslyn-analyzers\/pull\/3464\">Detecting accidental allocations as part of range indexing<\/a>.  C# 8 introduced ranges, which make it easy to slice collections, e.g. <code>someCollection[1..3]<\/code>.  Such an expression translates into either use of the collection&#8217;s indexer that takes a <code>Range<\/code>, e.g. <code>public MyCollection this[Range r] { get; }<\/code>, or if no such indexer is present, into use of a <code>Slice(int start, int length)<\/code>.  By convention and design guidelines, such indexers and slice methods should return the same type over which they&#8217;re defined, so for example slicing a <code>T[]<\/code> produces another <code>T[]<\/code>, and slicing a <code>Span&lt;T&gt;<\/code> produces a <code>Span&lt;T&gt;<\/code>.  This, however, can lead to unexpected allocations hiding because of implicit casts.  For example, <code>T[]<\/code> can be implicitly cast to a <code>Span&lt;T&gt;<\/code>, but that also means that the result of slicing a <code>T[]<\/code> can be implicitly cast to a <code>Span&lt;T&gt;<\/code>, which means code like this <code>Span&lt;T&gt; span = _array[1..3];<\/code> will compile and run fine, except that it will incur an array allocation for the array slice produced by the <code>_array[1..3]<\/code> range indexing.  A more efficient way to write this would be <code>Span&lt;T&gt; span = _array.AsSpan()[1..3]<\/code>.  This analyzer will detect several such cases and offer fixers to eliminate the allocation.<\/li>\n<\/ul>\n<div class=\"highlight highlight-source-cs\">\n<pre>[<span class=\"pl-en\">Benchmark<\/span>(<span class=\"pl-en\">Baseline<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">true<\/span>)]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-en\">ReadOnlySpan<\/span>&lt;<span class=\"pl-k\">char<\/span>&gt; <span class=\"pl-en\">Slice1<\/span>()\r\n{\r\n    <span class=\"pl-en\">ReadOnlySpan<\/span>&lt;<span class=\"pl-k\">char<\/span>&gt; <span class=\"pl-smi\">span<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>hello world<span class=\"pl-pds\">\"<\/span><\/span>[<span class=\"pl-c1\">1<\/span>..<span class=\"pl-c1\">3<\/span>];\r\n    <span class=\"pl-k\">return<\/span> <span class=\"pl-smi\">span<\/span>;\r\n}\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-en\">ReadOnlySpan<\/span>&lt;<span class=\"pl-k\">char<\/span>&gt; <span class=\"pl-en\">Slice2<\/span>()\r\n{\r\n    <span class=\"pl-en\">ReadOnlySpan<\/span>&lt;<span class=\"pl-k\">char<\/span>&gt; <span class=\"pl-smi\">span<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>hello world<span class=\"pl-pds\">\"<\/span><\/span>.<span class=\"pl-en\">AsSpan<\/span>()[<span class=\"pl-c1\">1<\/span>..<span class=\"pl-c1\">3<\/span>];\r\n    <span class=\"pl-k\">return<\/span> <span class=\"pl-smi\">span<\/span>;\r\n}<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Slice1<\/td>\n<td align=\"right\">8.3337 ns<\/td>\n<td align=\"right\">1.00<\/td>\n<td align=\"right\">32 B<\/td>\n<\/tr>\n<tr>\n<td>Slice2<\/td>\n<td align=\"right\">0.4332 ns<\/td>\n<td align=\"right\">0.05<\/td>\n<td align=\"right\">&#8211;<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<ul>\n<li><a href=\"https:\/\/github.com\/dotnet\/roslyn-analyzers\/pull\/3497\">Prefer <code>Memory<\/code> overloads for <code>Stream.Read\/WriteAsync<\/code> methods<\/a>.  .NET Core 2.1 added new overloads to <code>Stream.ReadAsync<\/code> and <code>Stream.WriteAsync<\/code> that operate on <code>Memory&lt;byte&gt;<\/code> and <code>ReadOnlyMemory&lt;byte&gt;<\/code>, respectively.  This enables those methods to work with data from sources other than <code>byte[]<\/code>, and also enables optimizations like being able to avoid pinning if the <code>{ReadOnly}Memory&lt;byte&gt;<\/code> was created in a manner that specified it represented already pinned or otherwise immovable data.  However, the introduction of the new overloads also enabled a new opportunity to choose the return type for these methods, and we chose <code>ValueTask&lt;int&gt;<\/code> and <code>ValueTask<\/code>, respectively, rather than <code>Task&lt;int&gt;<\/code> and <code>Task<\/code>.  The benefit of that is enabling more synchronously completing calls to be allocation-free, and even more asynchronously completing calls to be allocation-free (though with more effort on the part of the developer of the override).  As a result, it&#8217;s frequently beneficial to prefer the newer overloads than the older ones, and this analyzer will detect use of the old and offer fixes to automatically switch to using the newer ones. <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/35941\">dotnet\/runtime#35941<\/a> has some examples of this fixing cases found in dotnet\/runtime.<\/li>\n<\/ul>\n<div class=\"highlight highlight-source-cs\">\n<pre><span class=\"pl-k\">private<\/span> <span class=\"pl-en\">NetworkStream<\/span> <span class=\"pl-smi\">_client<\/span>, <span class=\"pl-smi\">_server<\/span>;\r\n<span class=\"pl-k\">private<\/span> <span class=\"pl-k\">byte<\/span>[] <span class=\"pl-smi\">_buffer<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-k\">byte<\/span>[<span class=\"pl-c1\">10<\/span>];\r\n\r\n[<span class=\"pl-en\">GlobalSetup<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">void<\/span> <span class=\"pl-en\">Setup<\/span>()\r\n{\r\n    <span class=\"pl-smi\">using<\/span> <span class=\"pl-en\">Socket<\/span> <span class=\"pl-smi\">listener<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">Socket<\/span>(<span class=\"pl-smi\">AddressFamily<\/span>.<span class=\"pl-smi\">InterNetwork<\/span>, <span class=\"pl-smi\">SocketType<\/span>.<span class=\"pl-smi\">Stream<\/span>, <span class=\"pl-smi\">ProtocolType<\/span>.<span class=\"pl-smi\">Tcp<\/span>);\r\n    <span class=\"pl-k\">var<\/span> <span class=\"pl-smi\">client<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">Socket<\/span>(<span class=\"pl-smi\">AddressFamily<\/span>.<span class=\"pl-smi\">InterNetwork<\/span>, <span class=\"pl-smi\">SocketType<\/span>.<span class=\"pl-smi\">Stream<\/span>, <span class=\"pl-smi\">ProtocolType<\/span>.<span class=\"pl-smi\">Tcp<\/span>);\r\n    <span class=\"pl-smi\">listener<\/span>.<span class=\"pl-en\">Bind<\/span>(<span class=\"pl-k\">new<\/span> <span class=\"pl-en\">IPEndPoint<\/span>(<span class=\"pl-smi\">IPAddress<\/span>.<span class=\"pl-smi\">Loopback<\/span>, <span class=\"pl-c1\">0<\/span>));\r\n    <span class=\"pl-smi\">listener<\/span>.<span class=\"pl-en\">Listen<\/span>();\r\n    <span class=\"pl-smi\">client<\/span>.<span class=\"pl-en\">Connect<\/span>(<span class=\"pl-smi\">listener<\/span>.<span class=\"pl-smi\">LocalEndPoint<\/span>);\r\n    <span class=\"pl-smi\">_client<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">NetworkStream<\/span>(<span class=\"pl-smi\">client<\/span>);\r\n    <span class=\"pl-smi\">_server<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-k\">new<\/span> <span class=\"pl-en\">NetworkStream<\/span>(<span class=\"pl-smi\">listener<\/span>.<span class=\"pl-en\">Accept<\/span>());\r\n}\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>(<span class=\"pl-en\">Baseline<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">true<\/span>)]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">async<\/span> <span class=\"pl-en\">Task<\/span> <span class=\"pl-en\">ReadWrite1<\/span>()\r\n{\r\n    <span class=\"pl-k\">byte<\/span>[] <span class=\"pl-smi\">buffer<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-smi\">_buffer<\/span>;\r\n    <span class=\"pl-k\">for<\/span> (<span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">0<\/span>; <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">&lt;<\/span> <span class=\"pl-c1\">1000<\/span>; <span class=\"pl-smi\">i<\/span><span class=\"pl-k\">++<\/span>)\r\n    {\r\n        <span class=\"pl-k\">await<\/span> <span class=\"pl-smi\">_client<\/span>.<span class=\"pl-en\">WriteAsync<\/span>(<span class=\"pl-smi\">buffer<\/span>, <span class=\"pl-c1\">0<\/span>, <span class=\"pl-smi\">buffer<\/span>.<span class=\"pl-smi\">Length<\/span>);\r\n        <span class=\"pl-k\">await<\/span> <span class=\"pl-smi\">_server<\/span>.<span class=\"pl-en\">ReadAsync<\/span>(<span class=\"pl-smi\">buffer<\/span>, <span class=\"pl-c1\">0<\/span>, <span class=\"pl-smi\">buffer<\/span>.<span class=\"pl-smi\">Length<\/span>); <span class=\"pl-c\"><span class=\"pl-c\">\/\/<\/span> may not read everything; just for demo purposes<\/span>\r\n    }\r\n}\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">async<\/span> <span class=\"pl-en\">Task<\/span> <span class=\"pl-en\">ReadWrite2<\/span>()\r\n{\r\n    <span class=\"pl-k\">byte<\/span>[] <span class=\"pl-smi\">buffer<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-smi\">_buffer<\/span>;\r\n    <span class=\"pl-k\">for<\/span> (<span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">0<\/span>; <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">&lt;<\/span> <span class=\"pl-c1\">1000<\/span>; <span class=\"pl-smi\">i<\/span><span class=\"pl-k\">++<\/span>)\r\n    {\r\n        <span class=\"pl-k\">await<\/span> <span class=\"pl-smi\">_client<\/span>.<span class=\"pl-en\">WriteAsync<\/span>(<span class=\"pl-smi\">buffer<\/span>);\r\n        <span class=\"pl-k\">await<\/span> <span class=\"pl-smi\">_server<\/span>.<span class=\"pl-en\">ReadAsync<\/span>(<span class=\"pl-smi\">buffer<\/span>); <span class=\"pl-c\"><span class=\"pl-c\">\/\/<\/span> may not read everything; just for demo purposes<\/span>\r\n    }\r\n}<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>ReadWrite1<\/td>\n<td align=\"right\">7.604 ms<\/td>\n<td align=\"right\">1.00<\/td>\n<td align=\"right\">72001 B<\/td>\n<\/tr>\n<tr>\n<td>ReadWrite2<\/td>\n<td align=\"right\">7.549 ms<\/td>\n<td align=\"right\">0.99<\/td>\n<td align=\"right\">&#8211;<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<ul>\n<li><a href=\"https:\/\/github.com\/dotnet\/roslyn-analyzers\/pull\/3443\">Prefer typed overloads on <code>StringBuilder<\/code><\/a>. <code>StringBuilder.Append<\/code> and <code>StringBuilder.Insert<\/code> have many overloads, for appending not just strings or objects but also various primitive types, like <code>Int32<\/code>.  Even so, it&#8217;s common to see code like <code>stringBuilder.Append(intValue.ToString())<\/code>.  The <code>StringBuilder.Append(Int32)<\/code> overload can be much more efficient, not requiring allocating a string, and should be preferred.  This analyzer comes with a fixer to detect such cases and automatically switch to using the more appropriate overload.<\/li>\n<\/ul>\n<div class=\"highlight highlight-source-cs\">\n<pre>[<span class=\"pl-en\">Benchmark<\/span>(<span class=\"pl-en\">Baseline<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">true<\/span>)]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">void<\/span> <span class=\"pl-en\">Append1<\/span>()\r\n{\r\n    <span class=\"pl-smi\">_builder<\/span>.<span class=\"pl-en\">Clear<\/span>();\r\n    <span class=\"pl-k\">for<\/span> (<span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">0<\/span>; <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">&lt;<\/span> <span class=\"pl-c1\">1000<\/span>; <span class=\"pl-smi\">i<\/span><span class=\"pl-k\">++<\/span>)\r\n        <span class=\"pl-smi\">_builder<\/span>.<span class=\"pl-en\">Append<\/span>(<span class=\"pl-smi\">i<\/span>.<span class=\"pl-en\">ToString<\/span>());\r\n}\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">void<\/span> <span class=\"pl-en\">Append2<\/span>()\r\n{\r\n    <span class=\"pl-smi\">_builder<\/span>.<span class=\"pl-en\">Clear<\/span>();\r\n    <span class=\"pl-k\">for<\/span> (<span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">0<\/span>; <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">&lt;<\/span> <span class=\"pl-c1\">1000<\/span>; <span class=\"pl-smi\">i<\/span><span class=\"pl-k\">++<\/span>)\r\n        <span class=\"pl-smi\">_builder<\/span>.<span class=\"pl-en\">Append<\/span>(<span class=\"pl-smi\">i<\/span>);\r\n}<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Append1<\/td>\n<td align=\"right\">13.546 us<\/td>\n<td align=\"right\">1.00<\/td>\n<td align=\"right\">31680 B<\/td>\n<\/tr>\n<tr>\n<td>Append2<\/td>\n<td align=\"right\">9.841 us<\/td>\n<td align=\"right\">0.73<\/td>\n<td align=\"right\">&#8211;<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<ul>\n<li><a href=\"https:\/\/github.com\/dotnet\/runtime\/issues\/33786\">Prefer <code>StringBuilder.Append(char)<\/code> over <code>StringBuilder.Append(string)<\/code><\/a>.  Appending a single <code>char<\/code> to a <code>StringBuilder<\/code> is a bit more efficient than appending a <code>string<\/code> of length 1.  Yet it&#8217;s fairly common to see code like <code>private const string Separator = \":\"; ...; builder.Append(Separator);<\/code>, and this would be better if the const were changed to be <code>private const char Separator = ':';<\/code>.  The analyzer will flag many such cases and help to fix them.  Some examples of this being fixed in dotnet\/runtime in response to the analyzer are in <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/36097\">dotnet\/runtime#36097<\/a>.<\/li>\n<\/ul>\n<div class=\"highlight highlight-source-cs\">\n<pre>[<span class=\"pl-en\">Benchmark<\/span>(<span class=\"pl-en\">Baseline<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">true<\/span>)]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">void<\/span> <span class=\"pl-en\">Append1<\/span>()\r\n{\r\n    <span class=\"pl-smi\">_builder<\/span>.<span class=\"pl-en\">Clear<\/span>();\r\n    <span class=\"pl-k\">for<\/span> (<span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">0<\/span>; <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">&lt;<\/span> <span class=\"pl-c1\">1000<\/span>; <span class=\"pl-smi\">i<\/span><span class=\"pl-k\">++<\/span>)\r\n        <span class=\"pl-smi\">_builder<\/span>.<span class=\"pl-en\">Append<\/span>(<span class=\"pl-s\"><span class=\"pl-pds\">\"<\/span>:<span class=\"pl-pds\">\"<\/span><\/span>);\r\n}\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">void<\/span> <span class=\"pl-en\">Append2<\/span>()\r\n{\r\n    <span class=\"pl-smi\">_builder<\/span>.<span class=\"pl-en\">Clear<\/span>();\r\n    <span class=\"pl-k\">for<\/span> (<span class=\"pl-k\">int<\/span> <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">0<\/span>; <span class=\"pl-smi\">i<\/span> <span class=\"pl-k\">&lt;<\/span> <span class=\"pl-c1\">1000<\/span>; <span class=\"pl-smi\">i<\/span><span class=\"pl-k\">++<\/span>)\r\n        <span class=\"pl-smi\">_builder<\/span>.<span class=\"pl-en\">Append<\/span>(<span class=\"pl-s\">':'<\/span>);\r\n}<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Append1<\/td>\n<td align=\"right\">2.621 us<\/td>\n<td align=\"right\">1.00<\/td>\n<\/tr>\n<tr>\n<td>Append2<\/td>\n<td align=\"right\">1.968 us<\/td>\n<td align=\"right\">0.75<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<ul>\n<li><a href=\"https:\/\/github.com\/dotnet\/roslyn-analyzers\/pull\/3584\">Prefer <code>IsEmpty<\/code> over <code>Count<\/code><\/a>. Similar to the LINQ <code>Any()<\/code> vs <code>Count()<\/code> discussion earlier, some collection types expose both an <code>IsEmpty<\/code> property and a <code>Count<\/code> property.  In some cases, such as with a concurrent collection like <code>ConcurrentQueue&lt;T&gt;<\/code>, it can be much more expensive to determine an exact count of the number of items in the collection than to determine simply whether there are any items in the collection.  In such cases, if code was written to do a check like <code>if (collection.Count != 0)<\/code>, it can be more efficient to instead be <code>if (!collection.IsEmpty)<\/code>.  This analyzer helps to find such cases and fix them.<\/li>\n<\/ul>\n<div class=\"highlight highlight-source-cs\">\n<pre>[<span class=\"pl-en\">Benchmark<\/span>(<span class=\"pl-en\">Baseline<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">true<\/span>)]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">bool<\/span> <span class=\"pl-en\">IsEmpty1<\/span>() <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">_queue<\/span>.<span class=\"pl-smi\">Count<\/span> <span class=\"pl-k\">==<\/span> <span class=\"pl-c1\">0<\/span>;\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">bool<\/span> <span class=\"pl-en\">IsEmpty2<\/span>() <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">_queue<\/span>.<span class=\"pl-smi\">IsEmpty<\/span>;<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>IsEmpty1<\/td>\n<td align=\"right\">21.621 ns<\/td>\n<td align=\"right\">1.00<\/td>\n<\/tr>\n<tr>\n<td>IsEmpty2<\/td>\n<td align=\"right\">4.041 ns<\/td>\n<td align=\"right\">0.19<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<ul>\n<li><a href=\"https:\/\/github.com\/dotnet\/roslyn-analyzers\/pull\/3838\">Prefer <code>Environment.ProcessId<\/code><\/a>.  <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/38908\">dotnet\/runtime#38908<\/a> added a new static property <code>Environment.ProcessId<\/code>, which returns the current process&#8217; id.  It&#8217;s common to see code that previously tried to do the same thing with <code>Process.GetCurrentProcess().Id<\/code>.  The latter, however, is significantly less efficient, allocating a finalizable object and making a system call on every invocation, and in a manner that can&#8217;t easily support internal caching.  This new analyzer helps to automatically find and replace such usage.<\/li>\n<\/ul>\n<div class=\"highlight highlight-source-cs\">\n<pre>[<span class=\"pl-en\">Benchmark<\/span>(<span class=\"pl-en\">Baseline<\/span> <span class=\"pl-k\">=<\/span> <span class=\"pl-c1\">true<\/span>)]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">int<\/span> <span class=\"pl-en\">PGCPI<\/span>() <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">Process<\/span>.<span class=\"pl-en\">GetCurrentProcess<\/span>().<span class=\"pl-smi\">Id<\/span>;\r\n\r\n[<span class=\"pl-en\">Benchmark<\/span>]\r\n<span class=\"pl-k\">public<\/span> <span class=\"pl-k\">int<\/span> <span class=\"pl-en\">EPI<\/span>() <span class=\"pl-k\">=&gt;<\/span> <span class=\"pl-smi\">Environment<\/span>.<span class=\"pl-smi\">ProcessId<\/span>;<\/pre>\n<\/div>\n<table>\n<thead>\n<tr>\n<th>Method<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Ratio<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>PGCPI<\/td>\n<td align=\"right\">67.856 ns<\/td>\n<td align=\"right\">1.00<\/td>\n<td align=\"right\">280 B<\/td>\n<\/tr>\n<tr>\n<td>EPI<\/td>\n<td align=\"right\">3.191 ns<\/td>\n<td align=\"right\">0.05<\/td>\n<td align=\"right\">&#8211;<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<ul>\n<li><a href=\"https:\/\/github.com\/dotnet\/roslyn-analyzers\/pull\/3432\">Avoid stackalloc in loops<\/a>. This analyzer doesn&#8217;t so much help you to make your code faster, but rather helps you to make your code correct when you&#8217;ve employed solutions for making your code faster.  Specifically, it flags cases where <code>stackalloc<\/code> is used to allocate memory from the stack, but where it&#8217;s used in a loop.  The memory allocated from the stack as part of a <code>stackalloc<\/code> may not be released until the method returns, so if <code>stackalloc<\/code> is used in a loop, it can potentially result in allocating much more memory than the developer intended, and eventually result in a stack overflow that crashes the process.  You can see a few examples of this being fixed in <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/34149\">dotnet\/runtime#34149<\/a>.<\/li>\n<\/ul>\n<h2><a id=\"user-content-whats-next\" class=\"anchor\" aria-hidden=\"true\" href=\"#whats-next\"><svg class=\"octicon octicon-link\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" height=\"16\" aria-hidden=\"true\"><path fill-rule=\"evenodd\" d=\"M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z\"><\/path><\/svg><\/a><a name=\"whats-next\"><\/a>What&#8217;s Next?<\/h2>\n<p>Per the <a href=\"https:\/\/github.com\/dotnet\/core\/blob\/master\/roadmap.md\">.NET roadmap<\/a>, .NET 5 is scheduled to be released in November 2020, which is still several months away.  And while this post has demonstrated a huge number of performance advancements already in for the release, I expect we&#8217;ll see a plethora of additional performance improvements find there way into .NET 5, if for no other reason than there are currently PRs pending for a bunch (beyond the ones previously mentioned in other discussions), e.g. <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/34864\">dotnet\/runtime#34864<\/a> and <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/32552\">dotnet\/runtime#32552<\/a> further improve <code>Uri<\/code>, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/402\">dotnet\/runtime#402<\/a> vectorizes <code>string.Compare<\/code> for ordinal comparisons, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/36252\">dotnet\/runtime#36252<\/a> improves the performance of <code>Dictionary&lt;TKey, TValue&gt;<\/code> lookups with <code>OrdinalIgnoreCase<\/code> by extending the existing non-randomization optimization to case-insensitivity, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/34633\">dotnet\/runtime#34633<\/a> provides an asynchronous implementation of DNS resolution on Linux, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/32520\">dotnet\/runtime#32520<\/a> significantly reduces the overhead of <code>Activator.CreateInstance&lt;T&gt;()<\/code>, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/32843\">dotnet\/runtime#32843<\/a> makes <code>Utf8Parser.TryParse<\/code> faster for Int32 values, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/35654\">dotnet\/runtime#35654<\/a> improves the performance of <code>Guid<\/code> equality checks, <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/39117\">dotnet\/runtime#39117<\/a> reduces costs for <code>EventListeners<\/code> handling <code>EventSource<\/code> events, and <a href=\"https:\/\/github.com\/dotnet\/runtime\/pull\/38896\">dotnet\/runtime#38896<\/a> from <a href=\"https:\/\/github.com\/Bond-009\">@Bond-009<\/a> special-cases more inputs to <code>Task.WhenAny<\/code>.<\/p>\n<p>Finally, while we try really hard to avoid performance regressions, any release will invariably have some, and we&#8217;ll be spending time investigating ones we find. One known class of such regressions has to do with a feature enabled in .NET 5: ICU.  .NET Framework and previous releases of .NET Core on Windows have used <a href=\"https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/intl\/national-language-support\" rel=\"nofollow\">National Language Support (NLS)<\/a> APIs for globalization on Windows, whereas .NET Core on Unix has used <a href=\"http:\/\/site.icu-project.org\/\" rel=\"nofollow\">International Components for Unicode (ICU)<\/a>.  .NET 5 <a href=\"https:\/\/docs.microsoft.com\/en-us\/dotnet\/standard\/globalization-localization\/globalization-icu\" rel=\"nofollow\">switches to use ICU by default<\/a> on all operating systems if it&#8217;s available (Windows 10 includes it as of the May 2019 Update), enabling much better behavior consistency across OSes.  However, since these two technologies have different performance profiles, some operations (in particular culture-aware string operations) may end up being slower in some cases.  While we hope to mitigate most of these (which should also help to improve performance on Linux and macOS), and while any that do remain are likely to be inconsequential for your apps, you can <a href=\"https:\/\/docs.microsoft.com\/en-us\/dotnet\/standard\/globalization-localization\/globalization-icu#using-nls-instead-of-icu\" rel=\"nofollow\">opt to continue using NLS<\/a> if the changes negatively impact your particular application.<\/p>\n<p>With <a href=\"https:\/\/dotnet.microsoft.com\/download\/dotnet\/5.0\" rel=\"nofollow\">.NET 5 previews<\/a> and <a href=\"https:\/\/github.com\/dotnet\/installer\/blob\/master\/README.md#installers-and-binaries\">nightly builds<\/a> available, I&#8217;d encourage you to download the latest bits and give them a whirl with your applications.  And if you find things you think can and should be improved, we&#8217;d welcome your PRs to dotnet\/runtime!<\/p>\n<p>Happy coding!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Explore many performance improvements, big and small, that have gone into the .NET 5 runtime and core libraries to make apps and services leaner and faster.<\/p>\n","protected":false},"author":360,"featured_media":58792,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[685,196,195,3012,3013,756,3009],"tags":[4,8082],"class_list":["post-28871","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-dotnet","category-dotnet-core","category-dotnet-framework","category-internals","category-async","category-csharp","category-performance","tag-net","tag-dotnetperf"],"acf":[],"blog_post_summary":"<p>Explore many performance improvements, big and small, that have gone into the .NET 5 runtime and core libraries to make apps and services leaner and faster.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts\/28871","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/users\/360"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/comments?post=28871"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts\/28871\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/media\/58792"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/media?parent=28871"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/categories?post=28871"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/tags?post=28871"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}