{"id":17335,"date":"2018-04-18T12:29:45","date_gmt":"2018-04-18T19:29:45","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/dotnet\/?p=17335"},"modified":"2025-10-29T11:06:43","modified_gmt":"2025-10-29T18:06:43","slug":"performance-improvements-in-net-core-2-1","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/dotnet\/performance-improvements-in-net-core-2-1\/","title":{"rendered":"Performance Improvements in .NET Core 2.1"},"content":{"rendered":"<p>Back before .NET Core 2.0 shipped, I wrote a <a href=\"https:\/\/blogs.msdn.microsoft.com\/dotnet\/2017\/06\/07\/performance-improvements-in-net-core\/\" rel=\"nofollow\">post<\/a> highlighting various performance improvements in .NET Core 2.0 when compared with .NET Core 1.1 and the .NET Framework. As .NET Core 2.1 is in its final stages of being released, I thought it would be a good time to have some fun and take a tour through some of the myriad of performance improvements that have found their way into this release.<\/p>\n<p>Performance improvements show up in .NET Core 2.1 in a variety of ways. One of the big focuses of the release has been on the new <code>System.Span&lt;T&gt;<\/code> type that, along with its friends like <code>System.Memory&lt;T&gt;<\/code>, are now at the heart of the runtime and core libraries (see <a href=\"https:\/\/msdn.microsoft.com\/en-us\/magazine\/mt814808.aspx\" rel=\"nofollow\">this MSDN Magazine article<\/a> for an introduction). New libraries have been added in this release, like <a href=\"https:\/\/github.com\/dotnet\/corefx\/tree\/master\/src\/System.Memory\">System.Memory.dll<\/a>, <a href=\"https:\/\/github.com\/dotnet\/corefx\/tree\/master\/src\/System.Threading.Channels\">System.Threading.Channels.dll<\/a>, and <a href=\"https:\/\/github.com\/dotnet\/corefx\/tree\/master\/src\/System.IO.Pipelines\">System.IO.Pipelines.dll<\/a>, each targeted at specific scenarios. And many new members have been added to existing types, for example ~250 new members across existing types in the framework that accept or return the new span and memory types, and counting members on new types focusing on working with span and memory more than doubles that (e.g. the new <code>BinaryPrimitives<\/code> and <code>Utf8Formatter<\/code> types). All such improvements are worthy of their own focused blog posts, but they&#8217;re not what I&#8217;m focusing on here. Rather, I&#8217;m going to walk through some of the myriad of improvements that have been made to existing functionality, to existing types and methods, places where you upgrade a library or app from .NET Core 2.0 to 2.1 and performance just gets better. For the purposes of this post, I\u2019m focused primarily on the <a href=\"https:\/\/github.com\/dotnet\/coreclr\">runtime<\/a> and the <a href=\"https:\/\/github.com\/dotnet\/corefx\">core libraries<\/a>, but there have also been substantial performance improvements higher in the stack, as well as in tooling.<\/p>\n<h4><a id=\"user-content-setup\" class=\"anchor\" href=\"#setup\"><\/a>Setup<\/h4>\n<p>In my post on <a href=\"https:\/\/blogs.msdn.microsoft.com\/dotnet\/2017\/06\/07\/performance-improvements-in-net-core\/\" rel=\"nofollow\">.NET Core 2.0 performance<\/a>, I demonstrated improvements using simple console apps with custom measurement loops, and I got feedback that readers would have preferred if it if I&#8217;d used a standard benchmarking tool. While I explicitly opted not to do so then (the reason being making it trivial for developers to follow along by copying and pasting code samples into their own console apps), this time around I decided to experiment with the approach, made easier by tooling improvements in the interim. So, to actually run the complete code samples shown in this post, you&#8217;ll need a few things. In my setup, I have both .NET Core 2.0 and a preview of .NET Core 2.1 installed. I then did <code>dotnet new console<\/code>, and modified the resulting .csproj as follows, which includes specifying both releases as target frameworks and including a package reference for <a href=\"https:\/\/www.nuget.org\/packages\/BenchmarkDotNet\/\" rel=\"nofollow\">Benchmark.NET<\/a>, used to do the actual benchmarking.<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/20448df7cd8adb7e436c9ace9fafda5c.js\"><\/script><\/p>\n<p>Then I have the following scaffolding code in my Program.cs:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/21a1d4fc6bba108564359444b6cc1eed.js\"><\/script><\/p>\n<p>For each benchmark shown in this post, you should be able to simply copy-and-paste the relevant code to where commented in this .cs file, and then use <code>dotnet run -c Release -f netcoreapp2.0<\/code> to see the results. That will run the app using .NET Core 2.0, but the app itself is just the Benchmark.NET host, and the Benchmark.NET library will in turn create, build, and run .NET Core 2.0 and 2.1 apps for comparison. Note that in each results section, I&#8217;ve removed superfluous columns, to keep things tidy. I&#8217;ve also generally only shown results from running on Windows, when there&#8217;s no meaningful difference to highlight between platforms.<\/p>\n<p>With that, let&#8217;s explore.<\/p>\n<h2><a id=\"user-content-jit\" class=\"anchor\" href=\"#jit\"><\/a>JIT<\/h2>\n<p>A lot of work has gone into improving the Just-In-Time (JIT) compiler in .NET Core 2.1, with many optimizations that enhance a wide-range of libraries and applications. Many of these improvements were sought based on needs of the core libraries themselves, giving these improvements both targeted and broad impact.<\/p>\n<p>Let&#8217;s start with an example of a JIT improvement that can have broad impact across many types, but in particular for collection classes. .NET Core 2.1 has improvements around &#8220;devirtualization&#8221;, where the JIT is able to statically determine the target of some virtual invocations and as a result avoid virtual dispatch costs and enable potential inlining. In particular, PR <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/14125\">dotnet\/coreclr#14125<\/a> taught the JIT about the <code>EqualityComparer&lt;T&gt;.Default<\/code> member, extending the JIT&#8217;s intrinsic recognizer to recognize this getter. When a method then does <code>EqualityComparer&lt;T&gt;.Default.Equals<\/code>, for example, the JIT is able to both devirtualize and often inline the callee, which for certain <code>T<\/code> types makes a huge difference in throughput. Before this improvment, if <code>T<\/code> were <code>Int32<\/code>, the JIT would end up emitting code to make a virtual call to the underlying <code>GenericEqualityComparer&lt;T&gt;.Equals<\/code> method, but with this change, the JIT is able to inline what ends up being a call to <code>Int32.Equals<\/code>, which itself is inlineable, and <code>EqualityComparer&lt;int&gt;.Default.Equals<\/code> becomes as efficient as directly comparing two <code>Int32<\/code>s for equality. The impact of this is obvious with the following benchmark:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/2f541dc8d0fdae06559317d57f21f1d6.js\"><\/script><\/p>\n<p>On my machine, I get output like the following, showcasing an ~2.5x speedup over .NET Core 2.0:<\/p>\n<table border=\"1\" cellpadding=\"5\">\n<thead>\n<tr style=\"height: 28px;\">\n<th>Method<\/th>\n<th>Toolchain<\/th>\n<th style=\"height: 28px;\">Mean<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr style=\"height: 28px;\">\n<td>EqualityComparerInt32<\/td>\n<td>.NET Core 2.0<\/td>\n<td style=\"height: 28px;\">2.2106 ns<\/td>\n<\/tr>\n<tr style=\"height: 28px;\">\n<td>EqualityComparerInt32<\/td>\n<td>.NET Core 2.1<\/td>\n<td style=\"height: 28px;\">0.8725 ns<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Such improvements show up in indirect usage of <code>EqualityComparer&lt;T&gt;.Default<\/code>, as well. Many of the collection types in .NET, including <code>Dictionary&lt;TKey, TValue&gt;<\/code>, utilize <code>EqualityComparer&lt;T&gt;.Default<\/code>, and we can see the impact this improvement has on various operations employed by such collections. For example, PR <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/15419\">dotnet\/coreclr#15419<\/a> from <a href=\"https:\/\/github.com\/benaadams\">@benaadams<\/a> tweaked <code>Dictionary&lt;TKey, TValue&gt;<\/code>&#8216;s <code>ContainsValue<\/code> implementation to better take advantage of this devirtualization and inlining, such that running this benchmark:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/f2e0b8bac8dc840e6459a3ea28a3a5b4.js\"><\/script><\/p>\n<p>produces on my machine results like the following, showcasing an ~2.25x speedup:<\/p>\n<table border=\"1\" cellpadding=\"5\">\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Toolchain<\/th>\n<th align=\"right\">Mean<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>DictionaryContainsValue<\/td>\n<td>.NET Core 2.0<\/td>\n<td align=\"right\">3.419 us<\/td>\n<\/tr>\n<tr>\n<td>DictionaryContainsValue<\/td>\n<td>.NET Core 2.1<\/td>\n<td align=\"right\">1.519 us<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>In many situations, improvements like this in the JIT implicitly show up as improvements in higher-level code. In this specific, case, though, it required the aforementioned change, which updated code like:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/886ab85b73790e72e06615e956437aed.js\"><\/script><\/p>\n<p>to instead be like:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/169be328abbff537744c0f515e26a7c6.js\"><\/script><\/p>\n<p>In other words, previously this code had been optimized to avoid the overheads associated with using <code>EqualityComparer&lt;TValue&gt;.Default<\/code> on each iteration of the loop. But that micro-optimization then defeated the JIT&#8217;s devirtualization logic, such that what used to be an optimization is now a deoptimization, and the code had to be changed back to a pattern the JIT could recognize to make it as efficient as possible. A similar change was made in PR <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/25097\">dotnet\/corefx#25097<\/a>, in order to benefit from this improvement in LINQ&#8217;s Enumerable.Contains. However, there are many places where this JIT improvement does simply improve existing code, without any required changes. (There are also places where there are known further improvements to be made, e.g. <a href=\"https:\/\/github.com\/dotnet\/coreclr\/issues\/17273\">dotnet\/coreclr#17273<\/a>.)<\/p>\n<p>In the previous discussion, I mentioned &#8220;intrinsics&#8221; and the ability for the JIT to recognize and special-case certain methods in order to help it better optimize for specific uses. .NET Core 2.1 sees additional intrinsic work, including for some long-standing but rather poor performing methods in .NET. A key example is <a href=\"https:\/\/docs.microsoft.com\/en-us\/dotnet\/api\/system.enum.hasflag\" rel=\"nofollow\"><code>Enum.HasFlag<\/code><\/a>. This method should be simple, just doing a bit flag test to see whether a given enum value contains another, but because of how this API is defined, it&#8217;s relatively expensive to use. No more. In .NET Core 2.1 <code>Enum.HasFlag<\/code> is now a JIT intrinsic, such that the JIT generates the same quality code you would write by hand if you were doing manual bit flag testing. The evidence of this is in a simple benchmark:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/ba6dfb761fca5637912beb48db45727d.js\"><\/script><\/p>\n<p>On this test, I get results like the following, showing a 100% reduction in allocation (from 48 bytes per call to 0 bytes per call) and an ~50x improvement in throughput:<\/p>\n<table border=\"1\" cellpadding=\"5\">\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Toolchain<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>EnumHasFlag<\/td>\n<td>.NET Core 2.0<\/td>\n<td align=\"right\">14.9214 ns<\/td>\n<td align=\"right\">48 B<\/td>\n<\/tr>\n<tr>\n<td>EnumHasFlag<\/td>\n<td>.NET Core 2.1<\/td>\n<td align=\"right\">0.2932 ns<\/td>\n<td align=\"right\">0 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>This is an example where developers that cared about performance had to avoid writing code a certain way and can now write code that&#8217;s both maintainable and efficient, and also helps unaware developers fall into a &#8220;pit of success&#8221;. (Incidentally, this is also a case where Mono already <a href=\"http:\/\/www.mono-project.com\/docs\/about-mono\/releases\/4.0.0\/\" rel=\"nofollow\">had this optimization<\/a>.)<\/p>\n<p>Another example of this isn&#8217;t specific to a given API, but rather applies to the general shape of code. Consider the following implementation of string equality:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/f866942a9982712cf77d563b41b998d3.js\"><\/script><\/p>\n<p>Unfortunately, on previous releases of .NET, the code generated here was suboptimal, in particular due to the early exit from within the loop. Developers that cared about performance had to write this kind of loop in a specialized way, using <code>goto<\/code>s, for example as seen in the .NET Core 2.0 implementation of <code>String<\/code>&#8216;s <a href=\"https:\/\/github.com\/dotnet\/coreclr\/blob\/2e651d0b3eb5593f69d91c77b9d91556dccddf51\/src\/mscorlib\/src\/System\/String.Comparison.cs#L21\">CompareOrdinalIgnoreCaseHelper method<\/a>. In .NET Core 2.1, PR <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/13314\">dotnet\/coreclr#13314<\/a> rearranges basic blocks in loops to avoid needing such workarounds. You can see in .NET Core 2.1 that <code>goto<\/code> in CompareOrdinalIgnoreCaseHelper is <a href=\"https:\/\/github.com\/dotnet\/coreclr\/blob\/b3dff4d441013cbfa39ce209f45ba96d605d8e77\/src\/mscorlib\/shared\/System\/String.Comparison.cs#L47\">now gone<\/a>, and the shown benchmark is almost double the throughput of what it was in the previous release:<\/p>\n<table border=\"1\" cellpadding=\"5\">\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Toolchain<\/th>\n<th align=\"right\">Mean<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>LoopBodyLayout<\/td>\n<td>.NET Core 2.0<\/td>\n<td align=\"right\">56.30 ns<\/td>\n<\/tr>\n<tr>\n<td>LoopBodyLayout<\/td>\n<td>.NET Core 2.1<\/td>\n<td align=\"right\">30.49 ns<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Of course, folks contributing to the JIT don&#8217;t just care about such macro-level enhancements to the JIT, but also to improvements as low-level as tuning what instructions are generated for specific operations. For example, PR <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/13626\">dotnet\/coreclr#13626<\/a> from <a href=\"https:\/\/github.com\/mikedn\">@mikedn<\/a> enables the JIT to generate the more efficient BT instruction in some situations where TEST and LSH were otherwise being used. The impact of that can be seen on this benchmark extracted from that PR&#8217;s comments:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/536164a4ef3727ede0ba743669e896a0.js\"><\/script><\/p>\n<p>where with this change, .NET Core 2.1 executes this benchmark 40% faster than it did in .NET Core 2.0:<\/p>\n<table border=\"1\" cellpadding=\"5\">\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Toolchain<\/th>\n<th align=\"right\">Mean<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>LoweringTESTtoBT<\/td>\n<td>.NET Core 2.0<\/td>\n<td align=\"right\">1.414 ns<\/td>\n<\/tr>\n<tr>\n<td>LoweringTESTtoBT<\/td>\n<td>.NET Core 2.1<\/td>\n<td align=\"right\">1.057 ns<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The JIT also saw a variety of improvements in .NET Core 2.1 around boxing. One of my personal favorites (because of the impact it has on async methods, to be discussed later in this post) is PR <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/14698\">dotnet\/coreclr#14698<\/a> (and a follow-up PR <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/17006\">dotnet\/coreclr#17006<\/a>), which enables writing code that would have previously allocated and now doesn&#8217;t. Consider this benchmark:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/74cb1d5b7f004bac397c8b292855dc79.js\"><\/script><\/p>\n<p>In it, we&#8217;ve got an <code>IAnimal<\/code> with a <code>MakeSound<\/code> method, and we&#8217;ve got a method that wants to accept an arbitrary <code>T<\/code>, test to see whether it&#8217;s an <code>IAnimal<\/code> (it might be something else), and if it is, call its <code>MakeSound<\/code> method. Prior to .NET Core 2.1, this allocates, because in order to get the <code>T<\/code> as an <code>IAnimal<\/code> on which I can call <code>MakeSound<\/code>, the <code>T<\/code> needs to be cast to the interface, which for a value type results in it being boxed, and therefore allocates. In .NET Core 2.1, though, this pattern is recognized, and the JIT is able not only to undo the boxing, but also then devirtualize and inline the callee. The impact of this can be substantial when this kind of pattern shows up on hot paths. Here are the benchmark results, highlighting a significant improvement in throughput and an elimination of the boxing allocations:<\/p>\n<table border=\"1\" cellpadding=\"5\">\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Toolchain<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>BoxingAllocations<\/td>\n<td>.NET Core 2.0<\/td>\n<td align=\"right\">12.444 ns<\/td>\n<td align=\"right\">48 B<\/td>\n<\/tr>\n<tr>\n<td>BoxingAllocations<\/td>\n<td>.NET Core 2.1<\/td>\n<td align=\"right\">1.391 ns<\/td>\n<td align=\"right\">0 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>This highlights just some of the improvements that have gone into the JIT in .NET Core 2.1. And while each is impressive in its own right, the whole is greater than the sum of the parts, as work was done to ensure that all of these optimizations, from devirtualization, to boxing removal, to invocation of the unboxed entry, to inlining, to struct promotion, to copy prop through promoted fields, to cleaning up after unused struct locals, and so on, all play nicely together. Consider this example provided by <a href=\"https:\/\/github.com\/andyayersms\">@AndyAyersMS<\/a>:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/d93d64b9999d6dd944193121d318c56f.js\"><\/script><\/p>\n<p>In .NET Core 2.0, this resulted in the following assembly code generated:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/dd705708f43d2fd89a4b1e99eb1ee934.js\"><\/script><\/p>\n<p>In contrast, in .NET Core 2.1, that&#8217;s all consolidated to this being generated for Main:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/423fb4ad09e90fb6af797195cadec368.js\"><\/script><\/p>\n<p>Very nice.<\/p>\n<h2><a id=\"user-content-threading\" class=\"anchor\" href=\"#threading\"><\/a>Threading<\/h2>\n<p>Improvements to the JIT are an example of changes that can have very broad impact over large swaths of code. So, too, are changes to the runtime, and one key area where the runtime has seen significant improvements is in the area of threading. These improvements have come in a variety of forms, whether in reducing the overhead of low-level operations, or reducing lock contention in commonly used threading primitives, or reducing allocation, or generally improving the infrastructure behind async methods. Let&#8217;s look at a few examples.<\/p>\n<p>A key need in writing scalable code is taking advantage of thread statics, which are fields unique to each thread. The overhead involved in accessing a thread static is greater than that for normal statics, and it&#8217;s important that this be as low as possible as lots of functionality, in the runtime, in the core libraries, and in user code, depends on them, often on hot paths (for example, <code>Int32.Parse(string)<\/code> looks up the current culture, which is stored in a thread static). PRs <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/14398\">dotnet\/coreclr#14398<\/a> and <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/14560\">dotnet\/coreclr#14560<\/a> significantly reduced this overhead involved in accessing thread statics. So, for example, this benchmark:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/3764bb2abf8c173bd695ab8161da6670.js\"><\/script><\/p>\n<p>yields these results on my machine:<\/p>\n<table border=\"1\" cellpadding=\"5\">\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Toolchain<\/th>\n<th align=\"right\">Mean<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>ThreadStatics<\/td>\n<td>.NET Core 2.0<\/td>\n<td align=\"right\">7.322 ns<\/td>\n<\/tr>\n<tr>\n<td>ThreadStatics<\/td>\n<td>.NET Core 2.1<\/td>\n<td align=\"right\">5.269 ns<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Whereas these thread statics changes were focused on improving the throughput of an individual piece of code, other changes focused on scalability and minimizing contention between pieces of code, in various ways. For example, PR <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/14216\">dotnet\/coreclr#14216<\/a> focused on costs involved in <code>Monitor<\/code> (what&#8217;s used under the covers by <code>lock<\/code> in C#) when there&#8217;s contention, PR <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/13243\">dotnet\/coreclr#13243<\/a> focused on the scalability of <code>ReaderWriterLockSlim<\/code>, and PR <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/14527\">dotnet\/coreclr#14527<\/a> focused on reducing the contention in <code>Timer<\/code>s. Let&#8217;s take the last one as an example. Whenever a <code>System.Threading.Timer<\/code> is created, modified, fired, or removed, in .NET Core 2.0 that required taking a global timers lock; that meant that code which created lots of timers quickly would often end up serializing on this lock. To address this, .NET Core 2.1 partitions the timers across multiple locks, so that different threads running on different cores are less likely to contend with each other. The impact of that is visible in a benchmark like the following:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/6422d7d6aae0ddf9c4f808b2ed7c1a37.js\"><\/script><\/p>\n<p>This spawns multiple tasks, each of which creates a timer, does a bit of work, and then deletes the timer, and it yields the following results on my quad-core:<\/p>\n<table border=\"1\" cellpadding=\"5\">\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Toolchain<\/th>\n<th align=\"right\">Mean<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>TimerContention<\/td>\n<td>.NET Core 2.0<\/td>\n<td align=\"right\">332.8 ms<\/td>\n<\/tr>\n<tr>\n<td>TimerContention<\/td>\n<td>.NET Core 2.1<\/td>\n<td align=\"right\">135.6 ms<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Another significant improvement came in the form of both throughput improvement and allocation reduction, in <code>CancellationTokenSource<\/code>. <code>CancellationToken<\/code>s have become ubiquitous throughout the framework, in particular in asynchronous methods. It&#8217;s often the case that a single token will be created for the lifetime of some composite operation (e.g the handling of a web request), and over its lifetime, it&#8217;ll be passed in and out of many sub-operations, each of which will <code>Register<\/code> a callback with the token for the duration of that sub-operation. In .NET Core 2.0 and previous .NET releases, the implementation was heavily focused on getting as much scalability as possible, achieved via a set of lock-free algorithms that were scalable but that incurred non-trivial costs in both throughput and allocation, so much so that it overshadowed the benefits of the lock-freedom. The associated level of scalability is also generally unnecessary, as the primary use case for a single <code>CancellationToken<\/code> does not involve many parallel operations, but instead many serialized operations one after the other. In .NET Core 2.1, PR <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/12819\">dotnet\/coreclr#12819<\/a> changed the implementation to prioritize the more common scenarios; it&#8217;s still very scalable, but by switching away from a lock-free algorithm to one that instead employed striped locking (as in the <code>Timer<\/code> case), we significantly reduced allocations and improved throughput while still meeting scalability goals. These improvements can be seen from the following single-threaded benchmark:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/05510b42b90f8fbad328e544963fd99f.js\"><\/script><\/p>\n<table border=\"1\" cellpadding=\"5\">\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Toolchain<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>SerialCancellationTokenRegistration<\/td>\n<td>.NET Core 2.0<\/td>\n<td align=\"right\">95.29 ns<\/td>\n<td align=\"right\">48 B<\/td>\n<\/tr>\n<tr>\n<td>SerialCancellationTokenRegistration<\/td>\n<td>.NET Core 2.1<\/td>\n<td align=\"right\">62.45 ns<\/td>\n<td align=\"right\">0 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>and also from this multi-threaded one (run on a quad-core):<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/024ac664f88177cc24adf941fb00b4bc.js\"><\/script><\/p>\n<table border=\"1\" cellpadding=\"5\">\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Toolchain<\/th>\n<th align=\"right\">Mean<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>ParallelCancellationTokenRegistration<\/td>\n<td>.NET Core 2.0<\/td>\n<td align=\"right\">31.31 ns<\/td>\n<\/tr>\n<tr>\n<td>ParallelCancellationTokenRegistration<\/td>\n<td>.NET Core 2.1<\/td>\n<td align=\"right\">18.19 ns<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>These improvements to <code>CancellationToken<\/code> are just a piece of a larger set of improvements that have gone into async methods in .NET Core 2.1. As more and more code is written to be asynchronous and to use C#&#8217;s <code>async<\/code>\/<code>await<\/code> features, it becomes more and more important that async methods introduce as little overhead as possible. Some significant strides in that regard have been taken in .NET Core 2.1, on a variety of fronts.<\/p>\n<p>For example, on very hot paths that invoke asynchronous methods, one cost that shows up is simply the overhead involved in invoking an async method and awaiting it, in particular when it completes quickly and synchronously. In part due to the aforementioned JIT and thread static changes, and in part due to PRs like <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/15629\">dotnet\/coreclr#15629<\/a> from <a href=\"https:\/\/github.com\/benaadams\">@benaadams<\/a>, this overhead has been cut by ~30%:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/544274d785011fdb91b62424ba7c1f32.js\"><\/script><\/p>\n<table border=\"1\" cellpadding=\"5\">\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Toolchain<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>AsyncMethodAwaitInvocation<\/td>\n<td>.NET Core 2.0<\/td>\n<td align=\"right\">20.36 ns<\/td>\n<td align=\"right\">0 B<\/td>\n<\/tr>\n<tr>\n<td>AsyncMethodAwaitInvocation<\/td>\n<td>.NET Core 2.1<\/td>\n<td align=\"right\">13.48 ns<\/td>\n<td align=\"right\">0 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Bigger improvements, however, have come in the form of allocation reduction. In previous releases of .NET, the synchronous completion path for async methods was optimized for allocations, meaning that if an async method completed without ever suspending, it either wouldn&#8217;t allocate at all or at most would allocate one object (for the returned <code>Task&lt;T&gt;<\/code> if an internally cached one wasn&#8217;t available). However, asynchronous completion (where it suspends at least once) would incur multiple allocations. The first allocation would be for the returned <code>Task<\/code>\/<code>Task&lt;T&gt;<\/code> object, as the caller needs some object to hold onto to be able to know when the asynchronous operation has completed and to extract its result or exception. The second allocation is the boxing of the compiler-generated state machine: the &#8220;locals&#8221; for the async method start out on the stack, but when the method suspends, the state machine that contains these &#8220;locals&#8221; as fields gets boxed to the heap so that the data can survive across the await point. The third allocation is the <code>Action<\/code> delegate that&#8217;s passed to an awaiter and that&#8217;s used to move the state machine forward when the awaited object completes. And the fourth is a &#8220;runner&#8221; that stores additional context (e.g. ExecutionContext). These allocations can be seen by looking at a memory trace. For example, if we run this code:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/c3eded7eb8bd989af07c24020dc92cbb.js\"><\/script><\/p>\n<p>and look at the results from the Visual Studio allocation profiler, in .NET Core 2.0 we see these allocations associated with the async infrastructure:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2018\/04\/AsyncMethodAllocations_Before.png\" \/><\/p>\n<p>Due to PRs like <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/13105\">dotnet\/coreclr#13105<\/a>, <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/14178\">dotnet\/coreclr#14178<\/a> and <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/13907\">dotnet\/coreclr#13907<\/a>, the previous trace when run with .NET Core 2.1 instead looks like this:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2018\/04\/AsyncMethodAllocations_After.png\" \/><\/p>\n<p>The four allocations have been reduced to one, and the total bytes allocated has shrunk by half. When async methods are used heavily in an application, that savings adds up quickly. There have also been side benefits to the architectural changes that enabled these savings, including improved debuggability.<\/p>\n<h2><a id=\"user-content-string\" class=\"anchor\" href=\"#string\"><\/a>String<\/h2>\n<p>Moving up the stack, another area that&#8217;s seen a lot of performance love in .NET Core 2.1 is in commonly used primitive types, in particular <code>System.String<\/code>. Whether from vectorization, or using <code>System.Span&lt;T&gt;<\/code> and its optimizations internally, or adding fast paths for common scenarios, or reducing allocations, or simply trimming some fat, a bunch of functionality related to strings has gotten faster in 2.1. Let&#8217;s look at a few.<\/p>\n<p><code>String.Equal<\/code> is a workhorse of .NET applications, used for all manner of purposes, and thus it&#8217;s an ideal target for optimization. PR <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/16994\">dotnet\/coreclr#16994<\/a> improved the performance of <code>String.Equal<\/code> by vectorizing it, utilizing the already vectorized implementation of <code>Span&lt;T&gt;.SequenceEqual<\/code> as its core implementation. The effect can be seen here, in the comparison of two strings that differ only in their last character:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/b0c33a85e17b807317556f0cfe129a0b.js\"><\/script><\/p>\n<table border=\"1\" cellpadding=\"5\">\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Toolchain<\/th>\n<th align=\"right\">Mean<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>StringEquals<\/td>\n<td>.NET Core 2.0<\/td>\n<td align=\"right\">16.16 ns<\/td>\n<\/tr>\n<tr>\n<td>StringEquals<\/td>\n<td>.NET Core 2.1<\/td>\n<td align=\"right\">10.20 ns<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><code>String.IndexOf<\/code> and <code>String.LastIndexOf<\/code> are similarly vectored with PR <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/16392\">dotnet\/coreclr#16392<\/a>:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/2467bed663031103c6c714fee0b2f01b.js\"><\/script><\/p>\n<table border=\"1\" cellpadding=\"5\">\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Toolchain<\/th>\n<th align=\"right\">Mean<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>StringIndexOf<\/td>\n<td>.NET Core 2.0<\/td>\n<td align=\"right\">41.13 ns<\/td>\n<\/tr>\n<tr>\n<td>StringIndexOf<\/td>\n<td>.NET Core 2.1<\/td>\n<td align=\"right\">15.94 ns<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><code>String.IndexOfAny<\/code> was also optimized. In contrast to the previous PRs that improved performance via vectorization, PR <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/13219\">dotnet\/coreclr#13219<\/a> from <a href=\"https:\/\/github.com\/bbowyersmyth\">@bbowyersmyth<\/a> improves the performance of <code>IndexOfAny<\/code> by special-casing the most commonly-used lengths of the <code>anyOf<\/code> characters array and adding fast-paths for them:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/ef05c568d0788456861aab095957af65.js\"><\/script><\/p>\n<table border=\"1\" cellpadding=\"5\">\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Toolchain<\/th>\n<th align=\"right\">Mean<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>IndexOfAny<\/td>\n<td>.NET Core 2.0<\/td>\n<td align=\"right\">94.66 ns<\/td>\n<\/tr>\n<tr>\n<td>IndexOfAny<\/td>\n<td>.NET Core 2.1<\/td>\n<td align=\"right\">38.27 ns<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><code>String.ToLower<\/code> and <code>ToUpper<\/code> (as well as the <code>ToLower\/UpperInvariant<\/code> varieties) were improved in PR <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/17391\">dotnet\/coreclr#17391<\/a>. As with the previous PR, these were improved by adding fast-paths for common cases. First, if the string passed in is entirely ASCII, then it does all of the computation in managed code and avoids calling out to the native globalization library to do the casing. This in and of itself yields a significant throughput improvement, e.g.<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/ac18829a881da4fb7ffd688ac80b2867.js\"><\/script><\/p>\n<table border=\"1\" cellpadding=\"5\">\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Toolchain<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>StringToLowerChangesNeeded<\/td>\n<td>.NET Core 2.0<\/td>\n<td align=\"right\">187.00 ns<\/td>\n<td align=\"right\">144 B<\/td>\n<\/tr>\n<tr>\n<td>StringToLowerChangesNeeded<\/td>\n<td>.NET Core 2.1<\/td>\n<td align=\"right\">96.29 ns<\/td>\n<td align=\"right\">144 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>But things look even better when the string is already in the target casing:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/5a2beec8cce09caeabb93d6cec9071ef.js\"><\/script><\/p>\n<table border=\"1\" cellpadding=\"5\">\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Toolchain<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>StringToLowerAlreadyCased<\/td>\n<td>.NET Core 2.0<\/td>\n<td align=\"right\">197.21 ns<\/td>\n<td align=\"right\">144 B<\/td>\n<\/tr>\n<tr>\n<td>StringToLowerAlreadyCased<\/td>\n<td>.NET Core 2.1<\/td>\n<td align=\"right\">68.81 ns<\/td>\n<td align=\"right\">0 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>In particular, note that all allocation has been eliminated.<\/p>\n<p>Another very common <code>String<\/code> API was improved to reduce allocation while also improving throughput. In .NET Core 2.0, <code>String.Split<\/code> allocates an <code>Int32[]<\/code> to track split locations in the string; PR <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/15435\">dotnet\/coreclr#15435<\/a> from <a href=\"https:\/\/github.com\/dotnet\/\">@cod7alex<\/a> removed that and replaced it with either stack allocation or usage of <code>ArrayPool&lt;int&gt;.Shared<\/code>, depending on the input string&#8217;s length. Further, PR <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/15322\">dotnet\/coreclr#15322<\/a> took advantage of span internally to improve the throughput of several common cases. The results of both of these can be seen in this benchmark:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/913b0d6c85a7a96bbc5307f38d235b7c.js\"><\/script><\/p>\n<table border=\"1\" cellpadding=\"5\">\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Toolchain<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>StringSplit<\/td>\n<td>.NET Core 2.0<\/td>\n<td align=\"right\">459.5 ns<\/td>\n<td align=\"right\">1216 B<\/td>\n<\/tr>\n<tr>\n<td>StringSplit<\/td>\n<td>.NET Core 2.1<\/td>\n<td align=\"right\">305.2 ns<\/td>\n<td align=\"right\">480 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Even some corner cases of <code>String<\/code> usage saw improvements. For example, some developers use <code>String.Concat(IEnumerable&lt;char&gt;)<\/code> as a way to compose characters into strings. PR <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/14298\">dotnet\/coreclr#14298<\/a> special-cased <code>T<\/code> == <code>char<\/code> in this overload, yielding some nice throughput and allocation wins:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/d9c956c23702b9c12840c488154dbeff.js\"><\/script><\/p>\n<table border=\"1\" cellpadding=\"5\">\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Toolchain<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>StringConcatCharEnumerable<\/td>\n<td>.NET Core 2.0<\/td>\n<td align=\"right\">22.05 us<\/td>\n<td align=\"right\">35.82 KB<\/td>\n<\/tr>\n<tr>\n<td>StringConcatCharEnumerable<\/td>\n<td>.NET Core 2.1<\/td>\n<td align=\"right\">15.56 us<\/td>\n<td align=\"right\">4.57 KB<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2><a id=\"user-content-formatting-and-parsing\" class=\"anchor\" href=\"#formatting-and-parsing\"><\/a>Formatting and Parsing<\/h2>\n<p>The work done around strings also extends into the broad area of formatting and parsing, work that&#8217;s the bread-and-butter of many applications.<\/p>\n<p>As noted at the beginning of this post, many <code>Span&lt;T&gt;<\/code>-based methods were added across the framework, and while I&#8217;m not going to focus on those here from a new API perspective, the act of adding these APIs helped to improve existing APIs. Some existing APIs were improved by taking advantage of the new <code>Span&lt;T&gt;<\/code>-based methods. For example, PR <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/15110\">dotnet\/coreclr#15110<\/a> from <a href=\"https:\/\/github.com\/justinvp\">@justinvp<\/a> utilizes the new <code>Span&lt;T&gt;<\/code>-based <code>TryFormat<\/code> in <code>StringBuilder.AppendFormat<\/code>, which is itself used internally by <code>String.Format<\/code>. The usage of <code>Span&lt;T&gt;<\/code> enables the implementation internally to format directly into existing buffers rather than first formatting into allocated strings and then copying those strings to the destination buffer.<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/c3ba6f0e3addde88c5acb9d31dabbb9b.js\"><\/script><\/p>\n<table border=\"1\" cellpadding=\"5\">\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Toolchain<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>StringFormat<\/td>\n<td>.NET Core 2.0<\/td>\n<td align=\"right\">196.1 ns<\/td>\n<td align=\"right\">128 B<\/td>\n<\/tr>\n<tr>\n<td>StringFormat<\/td>\n<td>.NET Core 2.1<\/td>\n<td align=\"right\">151.3 ns<\/td>\n<td align=\"right\">80 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Similarly, PR <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/15069\">dotnet\/coreclr#15069<\/a> takes advantage of the <code>Span&lt;T&gt;<\/code>-based methods in various <code>StringBuilder.Append<\/code> overloads, to format the provided value directly into the <code>StringBuilder<\/code>&#8216;s buffer rather than going through a <code>String<\/code>:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/faffac735b268f17777f75ca264a37e2.js\"><\/script><\/p>\n<table border=\"1\" cellpadding=\"5\">\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Toolchain<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>StringBuilderAppend<\/td>\n<td>.NET Core 2.0<\/td>\n<td align=\"right\">6.523 ms<\/td>\n<td align=\"right\">3992000 B<\/td>\n<\/tr>\n<tr>\n<td>StringBuilderAppend<\/td>\n<td>.NET Core 2.1<\/td>\n<td align=\"right\">3.268 ms<\/td>\n<td align=\"right\">0 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Another way the new <code>Span&lt;T&gt;<\/code>-based methods helped was as a motivational forcing function. In the .NET Framework and .NET Core 2.0 and earlier, most of the numeric parsing and formatting code in .NET was implemented in native code. Having that code as C++ made it a lot more difficult to add the new <code>Span&lt;T&gt;<\/code>-based methods, which would ideally share most of their implementation with their <code>String<\/code>-based forebearers. However, all of that C++ was previously ported to C# as part of enabling .NET Native, and all of that code then found its way into <a href=\"https:\/\/github.com\/dotnet\/corert\">corert<\/a>, which also shares code with <a href=\"https:\/\/github.com\/dotnet\/coreclr\">coreclr<\/a>. For the .NET Core 2.1 release, we thus deleted most of the native parsing\/formatting code in coreclr and replaced it with the managed port, that&#8217;s now shared between coreclr and corert. With the implementation in managed code, it was then also easier to iterate and experiment with optimizations, so not only did the code move to managed and not only is it now used for both the <code>String<\/code>-based and <code>Span&lt;T&gt;<\/code>-based implementations, many aspects of it also got faster.<\/p>\n<p>For example, via PRs like <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/15069\">dotnet\/coreclr#15069<\/a> and <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/17432\">dotnet\/coreclr#17432<\/a>, throughput of <code>Int32.ToString()<\/code> approximately doubled:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/b09005eabea3776aa6e61281d0fccb3c.js\"><\/script><\/p>\n<table border=\"1\" cellpadding=\"5\">\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Toolchain<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Int32Formatting<\/td>\n<td>.NET Core 2.0<\/td>\n<td align=\"right\">65.27 ns<\/td>\n<td align=\"right\">48 B<\/td>\n<\/tr>\n<tr>\n<td>Int32Formatting<\/td>\n<td>.NET Core 2.1<\/td>\n<td align=\"right\">34.88 ns<\/td>\n<td align=\"right\">48 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>while via PRs like <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/13389\">dotnet\/coreclr#13389<\/a>, Int32 parsing improved by over 20%:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/64d9ba2cd6f31809072dbcfd6d7fccfb.js\"><\/script><\/p>\n<table border=\"1\" cellpadding=\"5\">\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Toolchain<\/th>\n<th align=\"right\">Mean<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Int32Parsing<\/td>\n<td>.NET Core 2.0<\/td>\n<td align=\"right\">96.95 ns<\/td>\n<\/tr>\n<tr>\n<td>Int32Parsing<\/td>\n<td>.NET Core 2.1<\/td>\n<td align=\"right\">76.99 ns<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>These improvements aren&#8217;t limited to just integral types like <code>Int32<\/code>, <code>UInt32<\/code>, <code>Int64<\/code>, and <code>UInt64<\/code>. <code>Single.ToString()<\/code> and <code>Double.ToString()<\/code> improved as well, in particular on Unix where PR <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/12894\">dotnet\/coreclr#12894<\/a> from <a href=\"https:\/\/github.com\/mazong1123\">@mazong1123<\/a> provided an entirely new implementation for some very nice wins over the rather slow implementation that was there previously:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/018da8c7c7dcf04bfff313c378b3513e.js\"><\/script><\/p>\n<p><em>Windows<\/em>:<\/p>\n<table border=\"1\" cellpadding=\"5\">\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Toolchain<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>DoubleFormatting<\/td>\n<td>.NET Core 2.0<\/td>\n<td align=\"right\">448.7 ns<\/td>\n<td align=\"right\">48 B<\/td>\n<\/tr>\n<tr>\n<td>DoubleFormatting<\/td>\n<td>.NET Core 2.1<\/td>\n<td align=\"right\">186.8 ns<\/td>\n<td align=\"right\">48 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><em>Linux<\/em> (note that my Windows and Linux installations are running on very different setups, so the values shouldn&#8217;t be compared across OSes):<\/p>\n<table border=\"1\" cellpadding=\"5\">\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Toolchain<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>DoubleFormatting<\/td>\n<td>.NET Core 2.0<\/td>\n<td align=\"right\">2,018.2 ns<\/td>\n<td align=\"right\">48 B<\/td>\n<\/tr>\n<tr>\n<td>DoubleFormatting<\/td>\n<td>.NET Core 2.1<\/td>\n<td align=\"right\">258.1 ns<\/td>\n<td align=\"right\">48 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The improvements in 2.1 also apply to less commonly used but still important numerical types, such as via PR <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/25353\">dotnet\/corefx#25353<\/a> for <code>BigInteger<\/code>:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/f18f06656e462f40f8e60403d53d5616.js\"><\/script><\/p>\n<table border=\"1\" cellpadding=\"5\">\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Toolchain<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>BigIntegerFormatting<\/td>\n<td>.NET Core 2.0<\/td>\n<td align=\"right\">36.677 us<\/td>\n<td align=\"right\">34.73 KB<\/td>\n<\/tr>\n<tr>\n<td>BigIntegerFormatting<\/td>\n<td>.NET Core 2.1<\/td>\n<td align=\"right\">3.119 us<\/td>\n<td align=\"right\">3.27 KB<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Note both the 10x improvement in throughput and 10x reduction in allocation.<\/p>\n<p>These improvements continue with other parsing and formatting routines. For example, in services in particular, <code>DateTime<\/code> and <code>DateTimeOffset<\/code> are often formatted using either the <a href=\"https:\/\/docs.microsoft.com\/en-us\/dotnet\/standard\/base-types\/standard-date-and-time-format-strings#RFC1123\" rel=\"nofollow\"><code>\"r\"<\/code><\/a> or <a href=\"https:\/\/docs.microsoft.com\/en-us\/dotnet\/standard\/base-types\/standard-date-and-time-format-strings#Roundtrip\" rel=\"nofollow\"><code>\"o\"<\/code><\/a> formats, both of which have been optimized in .NET Core 2.1, via PR <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/17092\">dotnet\/coreclr#17092<\/a>:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/81b51c6f60a2c6a0937d0fc6f85b20de.js\"><\/script><\/p>\n<table border=\"1\" cellpadding=\"5\">\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Toolchain<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>DateTimeOffsetFormatR<\/td>\n<td>.NET Core 2.0<\/td>\n<td align=\"right\">220.89 ns<\/td>\n<td align=\"right\">88 B<\/td>\n<\/tr>\n<tr>\n<td>DateTimeOffsetFormatR<\/td>\n<td>.NET Core 2.1<\/td>\n<td align=\"right\">64.60 ns<\/td>\n<td align=\"right\">88 B<\/td>\n<\/tr>\n<tr>\n<td><\/td>\n<td><\/td>\n<td align=\"right\"><\/td>\n<td align=\"right\"><\/td>\n<\/tr>\n<tr>\n<td>DateTimeOffsetFormatO<\/td>\n<td>.NET Core 2.0<\/td>\n<td align=\"right\">263.45 ns<\/td>\n<td align=\"right\">96 B<\/td>\n<\/tr>\n<tr>\n<td>DateTimeOffsetFormatO<\/td>\n<td>.NET Core 2.1<\/td>\n<td align=\"right\">104.66 ns<\/td>\n<td align=\"right\">96 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Even <code>System.Convert<\/code> has gotten in on the formatting and parsing performance fun, with parsing from Base64 via <code>FromBase64Chars<\/code> and <code>FromBase64String<\/code> getting significant speedups, thanks to PR <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/17033\">dotnet\/coreclr#17033<\/a>:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/cc7cb8174bf26f4274f137bd7e8c72f0.js\"><\/script><\/p>\n<table border=\"1\" cellpadding=\"5\">\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Toolchain<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>ConvertFromBase64String<\/td>\n<td>.NET Core 2.0<\/td>\n<td align=\"right\">45.99 us<\/td>\n<td align=\"right\">9.79 KB<\/td>\n<\/tr>\n<tr>\n<td>ConvertFromBase64String<\/td>\n<td>.NET Core 2.1<\/td>\n<td align=\"right\">29.86 us<\/td>\n<td align=\"right\">9.79 KB<\/td>\n<\/tr>\n<tr>\n<td><\/td>\n<td><\/td>\n<td align=\"right\"><\/td>\n<td align=\"right\"><\/td>\n<\/tr>\n<tr>\n<td>ConvertFromBase64Chars<\/td>\n<td>.NET Core 2.0<\/td>\n<td align=\"right\">46.34 us<\/td>\n<td align=\"right\">9.79 KB<\/td>\n<\/tr>\n<tr>\n<td>ConvertFromBase64Chars<\/td>\n<td>.NET Core 2.1<\/td>\n<td align=\"right\">29.51 us<\/td>\n<td align=\"right\">9.79 KB<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2><a id=\"user-content-networking\" class=\"anchor\" href=\"#networking\"><\/a>Networking<\/h2>\n<p>The System.Net libraries received some good performance attention in .NET Core 2.0, but significantly more so in .NET Core 2.1.<\/p>\n<p>There have been some nice improvements throughout the libraries, such as PR <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/26850\">dotnet\/corefx#26850<\/a> from <a href=\"https:\/\/github.com\/JeffCyr\">@JeffCyr<\/a> improving <code>Dns.GetHostAddressAsync<\/code> on Windows with a true asynchronous implementation, or PR <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/26303\">dotnet\/corefx#26303<\/a> providing an optimized endian-reversing routine which was then used by PR <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/26329\">dotnet\/corefx#26329<\/a> from <a href=\"https:\/\/github.com\/justinvp\">@justinvp<\/a> to optimize <code>IPAddress.HostToNetworkOrder<\/code>\/<code>NetworkToHostOrder<\/code>:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/5e8001eeb31b9142588f2c4fc29480f7.js\"><\/script><\/p>\n<table border=\"1\" cellpadding=\"5\">\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Toolchain<\/th>\n<th align=\"right\">Mean<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>NetworkToHostOrder<\/td>\n<td>.NET Core 2.0<\/td>\n<td align=\"right\">10.760 ns<\/td>\n<\/tr>\n<tr>\n<td>NetworkToHostOrder<\/td>\n<td>.NET Core 2.1<\/td>\n<td align=\"right\">1.461 ns<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>or in PRs like <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/28086\">dotnet\/corefx#28086<\/a>, <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/28084\">dotnet\/corefx#28084<\/a>, and <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/22872\">dotnet\/corefx#22872<\/a> avoiding allocations in <code>Uri<\/code>:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/92f42725ba088be8b0eb40ba387b5f80.js\"><\/script><\/p>\n<table border=\"1\" cellpadding=\"5\">\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Toolchain<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>UriAllocations<\/td>\n<td>.NET Core 2.0<\/td>\n<td align=\"right\">997.6 ns<\/td>\n<td align=\"right\">1168 B<\/td>\n<\/tr>\n<tr>\n<td>UriAllocations<\/td>\n<td>.NET Core 2.1<\/td>\n<td align=\"right\">650.6 ns<\/td>\n<td align=\"right\">672 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>But the most impactful changes have come in higher-level types, in particular in <code>Socket<\/code>, <code>SslStream<\/code>, and <code>HttpClient<\/code>.<\/p>\n<p>At the sockets layer, there have been a variety of improvements, but the impact is most noticeable on Unix, where PRs like <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/23115\">dotnet\/corefx#23115<\/a> and <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/25402\">dotnet\/corefx#25402<\/a> overhauled how socket operations are processed and the allocations they incur. This is visible in the following benchmark that repeatedly does receives that will always complete asynchronously, followed by sends to satisfy them, and which sees a 2x improvement in throughput:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/84b78a0152d746f023d64220396d31f0.js\"><\/script><\/p>\n<table border=\"1\" cellpadding=\"5\">\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Toolchain<\/th>\n<th align=\"right\">Mean<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>SocketReceiveThenSend<\/td>\n<td>.NET Core 2.0<\/td>\n<td align=\"right\">102.82 ms<\/td>\n<\/tr>\n<tr>\n<td>SocketReceiveThenSend<\/td>\n<td>.NET Core 2.1<\/td>\n<td align=\"right\">48.95 ms<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Often used on top of sockets and <code>NetworkStream<\/code>, <code>SslStream<\/code> was improved significantly in .NET Core 2.1, as well, in a few ways. First, PRs like <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/24497\">dotnet\/corefx#24497<\/a> and <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/23715\">dotnet\/corefx#23715<\/a> from <a href=\"https:\/\/github.com\/Drawaes\">@Drawaes<\/a>, as well as <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/22304\">dotnet\/corefx#22304<\/a> and <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/29031\">dotnet\/corefx#29031<\/a> helped to clean up the <code>SslStream<\/code> codebase, making it easier to improve in the future but also removing a bunch of allocations (above and beyond the significant allocation reductions that were seen in .NET Core 2.0). Second, though, a significant scalability bottleneck in <code>SslStream<\/code> on Unix was fixed in PR <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/25646\">dotnet\/corefx#25646<\/a> from <a href=\"https:\/\/github.com\/Drawaes\">@Drawaes<\/a>, such that <code>SslStream<\/code> now scales well on Unix as concurrent usage increases. This, in concert with the sockets improvements and other lower-level improvements, contributes to the managed implementation beneath <code>HttpClient<\/code>.<\/p>\n<p><code>HttpClient<\/code> is a thin wrapper around an <code>HttpMessageHandler<\/code>, a public abstract class that represents an implementation of an HTTP client. A general-purpose implementation of <code>HttpMessageHandler<\/code> is provided in the form of the derived <code>HttpClientHandler<\/code> class, and while it&#8217;s possible to construct and pass a handler like <code>HttpClientHandler<\/code> to an <code>HttpClient<\/code> constructor (generally done to be able to configure the handler via its properties), <code>HttpClient<\/code> also provides a parameterless constructor that uses <code>HttpClientHandler<\/code> implicitly. In .NET Core 2.0 and earlier, <code>HttpClientHandler<\/code> was implemented on Windows on top of the native WinHTTP library, and it was implemented on Unix on top of the libcurl library. That dependency on the underlying external library has led to a variety of problems, including different behaviors across platforms and OS distributions as well as limited functionality on some platforms. In .NET Core 2.1, <code>HttpClientHandler<\/code> has a new default implementation implemented from scratch entirely in C# on top of the other System.Net libraries, e.g. System.Net.Sockets, System.Net.Security, etc. Not only does this address the aforementioned behavioral issues, it provides a significant boost in performance (the implementation is also exposed publicly as <code>SocketsHttpHandler<\/code>, which can be used directly instead of via <code>HttpClientHandler<\/code> in order to configure <code>SocketsHttpHandler<\/code>-specific properties).<\/p>\n<p>Here&#8217;s an example benchmark making a bunch of concurrent HTTPS calls to an in-process socket server:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/0a02b91c8ea2cf2914d52750458fa165.js\"><\/script><\/p>\n<p>On an 8-core Windows machine, here are my results:<\/p>\n<table border=\"1\" cellpadding=\"5\">\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Toolchain<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Gen 0<\/th>\n<th align=\"right\">Gen 1<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>ConcurrentHttpsGets<\/td>\n<td>.NET Core 2.0<\/td>\n<td align=\"right\">228.03 ms<\/td>\n<td align=\"right\">1250.0000<\/td>\n<td align=\"right\">312.5000<\/td>\n<\/tr>\n<tr>\n<td>ConcurrentHttpsGets<\/td>\n<td>.NET Core 2.1<\/td>\n<td align=\"right\">17.93 ms<\/td>\n<td align=\"right\">656.2500<\/td>\n<td align=\"right\">&#8211;<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>That&#8217;s a 12.7x improvement in throughput and a huge reduction in garbage collections, even though the .NET Core 2.0 implementation has most of the logic in native rather than managed code! Similarly, on an 8-core Linux machine, here are my results:<\/p>\n<table border=\"1\" cellpadding=\"5\">\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Toolchain<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Gen 0<\/th>\n<th align=\"right\">Gen 1<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>ConcurrentHttpsGets<\/td>\n<td>.NET Core 2.0<\/td>\n<td align=\"right\">135.46 ms<\/td>\n<td align=\"right\">750.0000<\/td>\n<td align=\"right\">250.0000<\/td>\n<\/tr>\n<tr>\n<td>ConcurrentHttpsGets<\/td>\n<td>.NET Core 2.1<\/td>\n<td align=\"right\">21.83 ms<\/td>\n<td align=\"right\">343.7500<\/td>\n<td align=\"right\">&#8211;<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Again, huge improvement!<\/p>\n<h2><a id=\"user-content-and-more\" class=\"anchor\" href=\"#and-more\"><\/a>And More<\/h2>\n<p>Through this post I aimed to categorize and group various performance changes to highlight areas of concentrated improvement, but I also want to highlight that performance work has happened across the breadth of runtime and libraries, beyond this limited categorization. I&#8217;ve picked a few other examples to highlight some of the changes to elsewhere in the libraries throughout the stack.<\/p>\n<p>One particularly nice set of improvements came to file system enumeration support, in PRs <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/26806\">dotnet\/corefx#26806<\/a> and <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/25426\">dotnet\/corefx#25426<\/a>. This work has made enumerating directories and files not only faster but also with significantly less garbage left in its wake. Here&#8217;s an example enumerating all of the files in my <a href=\"https:\/\/github.com\/dotnet\/corefx\/tree\/master\/src\/System.IO.FileSystem\">System.IO.FileSystem library folder<\/a> from my corefx repo clone (obviously if you try this one out locally, you&#8217;ll need to update the path to whatever works on your machine):<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/f3ad0d33fc34b6ebcacf69c754a04d1a.js\"><\/script><\/p>\n<p>The improvements are particularly stark on Windows, where this benchmark shows a 3x improvement in throughput and a 50% reduction in allocation:<\/p>\n<table border=\"1\" cellpadding=\"5\">\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Toolchain<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>EnumerateFiles<\/td>\n<td>.NET Core 2.0<\/td>\n<td align=\"right\">1,982.6 us<\/td>\n<td align=\"right\">71.65 KB<\/td>\n<\/tr>\n<tr>\n<td>EnumerateFiles<\/td>\n<td>.NET Core 2.1<\/td>\n<td align=\"right\">650.1 us<\/td>\n<td align=\"right\">35.24 KB<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>but also on Unix, where this benchmark (with the path fixed up appropriately) on Linux shows a 15% improvement in throughput and a 45% reduction in allocation:<\/p>\n<table border=\"1\" cellpadding=\"5\">\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Toolchain<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>EnumerateFiles<\/td>\n<td>.NET Core 2.0<\/td>\n<td align=\"right\">638.0 us<\/td>\n<td align=\"right\">56.09 KB<\/td>\n<\/tr>\n<tr>\n<td>EnumerateFiles<\/td>\n<td>.NET Core 2.1<\/td>\n<td align=\"right\">539.5 us<\/td>\n<td align=\"right\">38.6 KB<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>This change internally benefited from the <code>Span&lt;T&gt;<\/code>-related work done throughout the framework, as did, for example, an improvement to <code>Rfc2898DeriveBytes<\/code> in <code>System.Security.Cryptography<\/code>. <code>Rfc2898DeriveBytes<\/code> computes cryptographic hash codes over and over as part of implementing password-based key derivation functionality. In previous releases, each iteration of that algorithm would result in at least one <code>byte[]<\/code> allocation, but now with <code>Span&lt;T&gt;<\/code>-based methods like <code>HashAlgorithm.TryComputeHash<\/code>, due to PR <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/23269\">dotnet\/corefx#23269<\/a> those allocations are entirely avoided. And that results in dramatic savings, especially for longer iteration counts:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/f0cd266fcd68dc34552e0235ba3ab56f.js\"><\/script><\/p>\n<table border=\"1\" cellpadding=\"5\">\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Toolchain<\/th>\n<th align=\"right\">Mean<\/th>\n<th align=\"right\">Allocated<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>DeriveBytes<\/td>\n<td>.NET Core 2.0<\/td>\n<td align=\"right\">9.199 ms<\/td>\n<td align=\"right\">1120120 B<\/td>\n<\/tr>\n<tr>\n<td>DeriveBytes<\/td>\n<td>.NET Core 2.1<\/td>\n<td align=\"right\">8.084 ms<\/td>\n<td align=\"right\">176 B<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Effort has also been put into improving places where one platform is more deficient than others. For example, <code>Guid.NewGuid()<\/code> on Unix is considerably slower than it is on Windows. And while the gap hasn&#8217;t been entirely closed, as part of removing a dependency on the libuuid library, PR <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/16643\">dotnet\/coreclr#16643<\/a> did significantly improve the throughput of <code>Guid.NewGuid()<\/code> on Unix:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/debfdd9e8d52fbe5505e62c8656d8346.js\"><\/script><\/p>\n<table border=\"1\" cellpadding=\"5\">\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Toolchain<\/th>\n<th align=\"right\">Mean<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>GuidNewGuid<\/td>\n<td>.NET Core 2.0<\/td>\n<td align=\"right\">7.179 us<\/td>\n<\/tr>\n<tr>\n<td>GuidNewGuid<\/td>\n<td>.NET Core 2.1<\/td>\n<td align=\"right\">1.770 us<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The list goes on: improvements to array processing (e.g. <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/13962\">dotnet\/coreclr#13962<\/a>), improvements to LINQ (e.g. <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/23368\">dotnet\/corefx#23368<\/a> from <a href=\"https:\/\/github.com\/dnickless\">@dnickless<\/a>), improvements to <code>Environment<\/code> (e.g. <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/14502\">dotnet\/coreclr#14502<\/a> from <a href=\"https:\/\/github.com\/justinvp\">@justinvp<\/a>), improvements to collections (e.g. <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/26087\">dotnet\/corefx#26087<\/a> from <a href=\"https:\/\/github.com\/gfoidl\">@gfoidl<\/a>), improvements to globalization (e.g. <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/17399\">dotnet\/coreclr#17399<\/a>), improvements around pooling (e.g. <a href=\"https:\/\/github.com\/dotnet\/coreclr\/pull\/17078\">dotnet\/coreclr#17078<\/a>), improvements to <code>SqlClient<\/code> (e.g. <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/27758\">dotnet\/corefx#27758<\/a>), improvements to <code>StreamWriter<\/code> and <code>StreamReader<\/code> (e.g. <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/22147\">dotnet\/corefx#22147<\/a>), and on.<\/p>\n<p>Finally, all of the examples shown throughout this post were already at least as good in .NET Core 2.0 (if not significantly better) as in the .NET Framework 4.7, and then .NET Core 2.1 just made things even better. However, there are a few places where features were missing in .NET Core 2.0 and have been brought back in 2.1, including for performance. One notable such improvement is in <code>Regex<\/code>, where the <code>Regex.Compiled<\/code> option was exposed but ignored in .NET Core 2.0. PR <a href=\"https:\/\/github.com\/dotnet\/corefx\/pull\/24158\">dotnet\/corefx#24158<\/a> brought back the in-memory compilation support for <code>Regex<\/code>, enabling the same kinds of throughput improvements here previously available in the .NET Framework:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/stephentoub\/8b962c3e85810cc246a9d8c838b3c066.js\"><\/script><\/p>\n<table border=\"1\" cellpadding=\"5\">\n<thead>\n<tr>\n<th>Method<\/th>\n<th>Toolchain<\/th>\n<th align=\"right\">Mean<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>RegexCompiled<\/td>\n<td>.NET Core 2.0<\/td>\n<td align=\"right\">473.7 ns<\/td>\n<\/tr>\n<tr>\n<td>RegexCompiled<\/td>\n<td>.NET Core 2.1<\/td>\n<td align=\"right\">295.2 ns<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2><a id=\"user-content-whats-next\" class=\"anchor\" href=\"#whats-next\"><\/a>What&#8217;s Next?<\/h2>\n<p>Huge &#8220;thank you&#8221;s to everyone who has contributed to this release. As is obvious from this tour, there&#8217;s a lot to look forward to in .NET Core 2.1, and this post only scratched the surface of the improvements coming. We look forward to hearing your feedback and to your future contributions in the <a href=\"https:\/\/github.com\/dotnet\/coreclr\">coreclr<\/a>, <a href=\"https:\/\/github.com\/dotnet\/corefx\">corefx<\/a>, and other <a href=\"https:\/\/github.com\/dotnet\">dotnet<\/a> and <a href=\"https:\/\/github.com\/aspnet\">ASP.NET<\/a> repos!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Back before .NET Core 2.0 shipped, I wrote a post highlighting various performance improvements in .NET Core 2.0 when compared with .NET Core 1.1 and the .NET Framework. As .NET Core 2.1 is in its final stages of being released, I thought it would be a good time to have some fun and take a [&hellip;]<\/p>\n","protected":false},"author":360,"featured_media":58792,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[685],"tags":[8082],"class_list":["post-17335","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-dotnet","tag-dotnetperf"],"acf":[],"blog_post_summary":"<p>Back before .NET Core 2.0 shipped, I wrote a post highlighting various performance improvements in .NET Core 2.0 when compared with .NET Core 1.1 and the .NET Framework. As .NET Core 2.1 is in its final stages of being released, I thought it would be a good time to have some fun and take a [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts\/17335","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/users\/360"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/comments?post=17335"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts\/17335\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/media\/58792"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/media?parent=17335"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/categories?post=17335"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/tags?post=17335"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}