{"id":44715,"date":"2023-03-16T04:20:00","date_gmt":"2023-03-16T11:20:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/dotnet\/?p=44715"},"modified":"2023-06-06T17:47:03","modified_gmt":"2023-06-07T00:47:03","slug":"how-async-await-really-works","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/dotnet\/how-async-await-really-works\/","title":{"rendered":"How Async\/Await Really Works in C#"},"content":{"rendered":"<p>Several weeks ago, the <a href=\"https:\/\/devblogs.microsoft.com\/dotnet\/\">.NET Blog<\/a> featured a post <a href=\"https:\/\/devblogs.microsoft.com\/dotnet\/why-dotnet\/\">What is .NET, and why should you choose it?<\/a>. It provided a high-level overview of the platform, summarizing various components and design decisions, and promising more in-depth posts on the covered areas.  This post is the first such follow-up, deep-diving into the history leading to, the design decisions behind, and implementation details of <code>async<\/code>\/<code>await<\/code> in C# and .NET.<\/p>\n<p>The support for <code>async<\/code>\/<code>await<\/code> has been around now for over a decade. In that time, it&#8217;s transformed how scalable code is written for .NET, and it&#8217;s both viable and extremely common to utilize the functionality without understanding exactly what&#8217;s going on under the covers.  You start with a synchronous method like the following (this method is &#8220;synchronous&#8221; because a caller will not be able to do anything else until this whole operation completes and control is returned back to the caller):<\/p>\n<pre><code class=\"language-C#\">\/\/ Synchronously copy all data from source to destination.\r\npublic void CopyStreamToStream(Stream source, Stream destination)\r\n{\r\n    var buffer = new byte[0x1000];\r\n    int numRead;\r\n    while ((numRead = source.Read(buffer, 0, buffer.Length)) != 0)\r\n    {\r\n        destination.Write(buffer, 0, numRead);\r\n    }\r\n}<\/code><\/pre>\n<p>Then you sprinkle a few keywords, change a few method names, and you end up with the following asynchronous method instead (this method is &#8220;asynchronous&#8221; because control is expected to be returned back to its caller very quickly and possibly before the work associated with the whole operation has completed):<\/p>\n<pre><code class=\"language-C#\">\/\/ Asynchronously copy all data from source to destination.\r\npublic async Task CopyStreamToStreamAsync(Stream source, Stream destination)\r\n{\r\n    var buffer = new byte[0x1000];\r\n    int numRead;\r\n    while ((numRead = await source.ReadAsync(buffer, 0, buffer.Length)) != 0)\r\n    {\r\n        await destination.WriteAsync(buffer, 0, numRead);\r\n    }\r\n}<\/code><\/pre>\n<p>Almost identical in syntax, still able to utilize all of the same control flow constructs, but now non-blocking in nature, with a significantly different underlying execution model, and with all the heavy lifting done for you under the covers by the C# compiler and core libraries.<\/p>\n<p>While it&#8217;s common to use this support without knowing exactly what&#8217;s happening under the hood, I&#8217;m a firm believer that understanding how something actually works helps you to make even better use of it.  For <code>async<\/code>\/<code>await<\/code> in particular, understanding the mechanisms involved is especially helpful when you want to look below the surface, such as when you&#8217;re trying to debug things gone wrong or improve the performance of things otherwise gone right. In this post, then, we&#8217;ll deep-dive into exactly how <code>await<\/code> works at the language, compiler, and library level, so that you can make the most of these valuable features.<\/p>\n<p>To do that well, though, we need to go way back to before <code>async<\/code>\/<code>await<\/code> to understand what state-of-the-art asynchronous code looked like in its absence. Fair warning, it wasn&#8217;t pretty.<\/p>\n<h2>In the beginning&#8230;<\/h2>\n<p>All the way back in .NET Framework 1.0, there was the Asynchronous Programming Model pattern, otherwise known as the APM pattern, otherwise known as the Begin\/End pattern, otherwise known as the <code>IAsyncResult<\/code> pattern.  At a high-level, the pattern is simple.  For a synchronous operation <code>DoStuff<\/code>:<\/p>\n<pre><code class=\"language-C#\">class Handler\r\n{\r\n    public int DoStuff(string arg);\r\n}<\/code><\/pre>\n<p>there would be two corresponding methods as part of the pattern: a <code>BeginDoStuff<\/code> method and an <code>EndDoStuff<\/code> method:<\/p>\n<pre><code class=\"language-C#\">class Handler\r\n{\r\n    public int DoStuff(string arg);\r\n\r\n    public IAsyncResult BeginDoStuff(string arg, AsyncCallback? callback, object? state);\r\n    public int EndDoStuff(IAsyncResult asyncResult);\r\n}<\/code><\/pre>\n<p><code>BeginDoStuff<\/code> would accept all of the same parameters as does <code>DoStuff<\/code>, but in addition it would also accept an <a href=\"https:\/\/github.com\/dotnet\/runtime\/blob\/967a59712996c2cdb8ce2f65fb3167afbd8b01f3\/src\/libraries\/System.Private.CoreLib\/src\/System\/AsyncCallback.cs#L14\"><code>AsyncCallback<\/code><\/a> delegate and an opaque state <code>object<\/code>, one or both of which could be <code>null<\/code>. The Begin method was responsible for initiating the asynchronous operation, and if provided with a callback (often referred to as the &#8220;continuation&#8221; for the initial operation), it was also responsible for ensuring the callback was invoked when the asynchronous operation completed.  The Begin method would also construct an instance of a type that implemented <a href=\"https:\/\/github.com\/dotnet\/runtime\/blob\/967a59712996c2cdb8ce2f65fb3167afbd8b01f3\/src\/libraries\/System.Private.CoreLib\/src\/System\/IAsyncResult.cs#L17-L27\"><code>IAsyncResult<\/code><\/a>, using the optional <code>state<\/code> to populate that <code>IAsyncResult<\/code>&#8216;s <code>AsyncState<\/code> property:<\/p>\n<pre><code class=\"language-C#\">namespace System\r\n{\r\n    public interface IAsyncResult\r\n    {\r\n        object? AsyncState { get; }\r\n        WaitHandle AsyncWaitHandle { get; }\r\n        bool IsCompleted { get; }\r\n        bool CompletedSynchronously { get; }\r\n    }\r\n\r\n    public delegate void AsyncCallback(IAsyncResult ar);\r\n}<\/code><\/pre>\n<p>This <code>IAsyncResult<\/code> instance would then both be returned from the Begin method as well as passed to the <code>AsyncCallback<\/code> when it was eventually invoked.  When ready to consume the results of the operation, a caller would then pass that <code>IAsyncResult<\/code> instance to the End method, which was responsible for ensuring the operation was completed (synchronously waiting for it to complete by blocking if it wasn&#8217;t) and then returning any result of the operation, including propagating any errors\/exceptions that may have occurred.  Thus, instead of writing code like the following to perform the operation synchronously:<\/p>\n<pre><code class=\"language-C#\">try\r\n{\r\n    int i = handler.DoStuff(arg); \r\n    Use(i);\r\n}\r\ncatch (Exception e)\r\n{\r\n    ... \/\/ handle exceptions from DoStuff and Use\r\n}<\/code><\/pre>\n<p>the Begin\/End methods could be used in the following manner to perform the same operation asynchronously:<\/p>\n<pre><code class=\"language-C#\">try\r\n{\r\n    handler.BeginDoStuff(arg, iar =&gt;\r\n    {\r\n        try\r\n        {\r\n            Handler handler = (Handler)iar.AsyncState!;\r\n            int i = handler.EndDoStuff(iar);\r\n            Use(i);\r\n        }\r\n        catch (Exception e2)\r\n        {\r\n            ... \/\/ handle exceptions from EndDoStuff and Use\r\n        }\r\n    }, handler);\r\n}\r\ncatch (Exception e)\r\n{\r\n    ... \/\/ handle exceptions thrown from the synchronous call to BeginDoStuff\r\n}<\/code><\/pre>\n<p>For anyone who&#8217;s dealt with callback-based APIs in any language, this should feel familiar.<\/p>\n<p>Things only got more complicated from there, however. For instance, there&#8217;s the issue of &#8220;stack dives.&#8221;  A stack dive is when code repeatedly makes calls that go deeper and deeper on the stack, to the point where it could potentially stack overflow.  The Begin method is allowed to invoke the callback synchronously if the operation completes synchronously, meaning the call to Begin might itself directly invoke the callback.  And &#8220;asynchronous&#8221; operations that complete synchronously are actually very common; they&#8217;re not &#8220;asynchronous&#8221; because they&#8217;re guaranteed to complete asynchronously but rather are just permitted to. For example, consider an asynchronous read from some networked operation, like receiving from a socket.  If you need only a small amount of data for each individual operation, such as reading some header data from a response, you might put a buffer in place in order to avoid the overhead of lots of system calls. Instead of doing a small read for just the amount of data you need immediately, you perform a larger read into the buffer and then consume data from that buffer until its exhausted; that lets you reduce the number of expensive system calls required to actually interact with the socket.  Such a buffer might exist behind whatever asynchronous abstraction you&#8217;re using, such that the first &#8220;asynchronous&#8221; operation you perform (filling the buffer) completes asynchronously, but then all subsequent operations until that underlying buffer is exhausted don&#8217;t actually need to do any I\/O, instead just pulling from the buffer, and can thus all complete synchronously.  When the Begin method performs one of these operations, and finds it completes synchronously, it can then invoke the callback synchronously.  That means you have one stack frame that called the Begin method, another stack frame for the Begin method itself, and now another stack frame for the callback.  Now what happens if that callback turns around and calls Begin again?  If that operation completes synchronously and its callback is invoked synchronously, you&#8217;re now again several more frames deep on the stack.  And so on, and so on, until eventually you run out of stack.<\/p>\n<p>This is a real possibility that&#8217;s easy to repro.  Try this program on .NET Core:<\/p>\n<pre><code class=\"language-C#\">using System.Net;\r\nusing System.Net.Sockets;\r\n\r\nusing Socket listener = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);\r\nlistener.Bind(new IPEndPoint(IPAddress.Loopback, 0));\r\nlistener.Listen();\r\n\r\nusing Socket client = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);\r\nclient.Connect(listener.LocalEndPoint!);\r\n\r\nusing Socket server = listener.Accept();\r\n_ = server.SendAsync(new byte[100_000]);\r\n\r\nvar mres = new ManualResetEventSlim();\r\nbyte[] buffer = new byte[1];\r\n\r\nvar stream = new NetworkStream(client);\r\n\r\nvoid ReadAgain()\r\n{\r\n    stream.BeginRead(buffer, 0, 1, iar =&gt;\r\n    {\r\n        if (stream.EndRead(iar) != 0)\r\n        {\r\n            ReadAgain(); \/\/ uh oh!\r\n        }\r\n        else\r\n        {\r\n            mres.Set();\r\n        }\r\n    }, null);\r\n};\r\nReadAgain();\r\n\r\nmres.Wait();<\/code><\/pre>\n<p>Here I&#8217;ve set up a simple client socket and server socket connected to each other.  The server sends 100,000 bytes to the client, which then proceeds to use <code>BeginRead<\/code>\/<code>EndRead<\/code> to consume them &#8220;asynchronously&#8221; one at a time (this is terribly inefficient and is only being done in the name of pedagogy).  The callback passed to <code>BeginRead<\/code> finishes the read by calling <code>EndRead<\/code>, and then if it successfully read the desired byte (in which case it wasn&#8217;t yet at end-of-stream), it issues another <code>BeginRead<\/code> via a recursive call to the <code>ReadAgain<\/code> local function.  However, in .NET Core, socket operations are much faster than they were on .NET Framework, and will complete synchronously if the OS is able to satisfy the operation synchronously (noting the kernel itself has a buffer used to satisfy socket receive operations).  Thus, this stack overflows:\n<img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2023\/03\/BeginReadStackOverflow.png\" alt=\"Stack overflow due to improper handling of synchronous completion\" \/><\/p>\n<p>So, compensation for this was built into the APM model.  There are two possible ways to compensate for this:<\/p>\n<ol>\n<li>Don&#8217;t allow the <code>AsyncCallback<\/code> to be invoked synchronously.  If it&#8217;s always invoked asynchronously, even if the operation completes synchronously, then the risk of stack dives goes away.  But so too does performance, because operations that complete synchronously (or so quickly that they&#8217;re observably indistinguishable) are very common, and forcing each of those to queue its callback adds measurable overhead.<\/li>\n<li>Employ a mechanism that allows the caller rather than the callback to do the continuation work if the operation completes synchronously. That way, you escape the extra method frame and continue doing the follow-on work no deeper on the stack.<\/li>\n<\/ol>\n<p>The APM pattern goes with option (2). For that, the <code>IAsyncResult<\/code> interface exposes two related but distinct members: <code>IsCompleted<\/code> and <code>CompletedSynchronously<\/code>.  <code>IsCompleted<\/code> tells you whether the operation has completed: you can check it multiple times, and eventually it&#8217;ll transition from <code>false<\/code> to <code>true<\/code> and then stay there.  In contrast, <code>CompletedSynchronously<\/code> never changes (or if it does, it&#8217;s a nasty bug waiting to happen); it&#8217;s used to communicate between the caller of the Begin method and the <code>AsyncCallback<\/code> which of them is responsible for performing any continuation work.  If <code>CompletedSynchronously<\/code> is <code>false<\/code>, then the operation is completing asynchronously and any continuation work in response to the operation completing should be left up to the callback; after all, if the work didn&#8217;t complete synchronously, the caller of Begin can&#8217;t really handle it because the operation isn&#8217;t known to be done yet (and if the caller were to just call End, it would block until the operation completed).  If, however, <code>CompletedSynchronously<\/code> is <code>true<\/code>, if the callback were to handle the continuation work, then it risks a stack dive, as it&#8217;ll be performing that continuation work deeper on the stack than where it started.  Thus, any implementations at all concerned about such stack dives need to examine <code>CompletedSynchronously<\/code> and have the caller of the Begin method do the continuation work if it&#8217;s <code>true<\/code>, which means the callback then needs to <em>not<\/em> do the continuation work.  This is also why <code>CompletedSynchronously<\/code> must never change: the caller and the callback need to see the same value to ensure that the continuation work is performed once and only once, regardless of race conditions.<\/p>\n<p>In our previous <code>DoStuff<\/code> example, that then leads to code like this:<\/p>\n<pre><code class=\"language-C#\">try\r\n{\r\n    IAsyncResult ar = handler.BeginDoStuff(arg, iar =&gt;\r\n    {\r\n        if (!iar.CompletedSynchronously)\r\n        {\r\n            try\r\n            {\r\n                Handler handler = (Handler)iar.AsyncState!;\r\n                int i = handler.EndDoStuff(iar);\r\n                Use(i);\r\n            }\r\n            catch (Exception e2)\r\n            {\r\n                ... \/\/ handle exceptions from EndDoStuff and Use\r\n            }\r\n        }\r\n    }, handler);\r\n    if (ar.CompletedSynchronously)\r\n    {\r\n        int i = handler.EndDoStuff(ar);\r\n        Use(i);\r\n    }\r\n}\r\ncatch (Exception e)\r\n{\r\n    ... \/\/ handle exceptions that emerge synchronously from BeginDoStuff and possibly EndDoStuff\/Use\r\n}<\/code><\/pre>\n<p>That&#8217;s a mouthful.  And so far we&#8217;ve only looked at consuming the pattern&#8230; we haven&#8217;t looked at implementing the pattern.  While most developers wouldn&#8217;t need to be concerned about leaf operations (e.g. implementing the actual <code>Socket.BeginReceive<\/code>\/<code>EndReceive<\/code> methods that interact with the operating system), many, many developers would need to be concerned with composing these operations (performing multiple asynchronous operations that together form a larger one), which means not only consuming other Begin\/End methods but also implementing them yourself so that your composition itself can be consumed elsewhere.  And, you&#8217;ll notice there was no control flow in my previous <code>DoStuff<\/code> example.  Introduce multiple operations into this, especially with even simple control flow like a loop, and all of a sudden this becomes the domain of experts that enjoy pain, or blog post authors trying to make a point.<\/p>\n<p>So just to drive that point home, let&#8217;s implement a complete example.  At the beginning of this post, I showed a <code>CopyStreamToStream<\/code> method that copies all of the data from one stream to another (\u00e0 la <code>Stream.CopyTo<\/code>, but, for the sake of explanation, assuming that doesn&#8217;t exist):<\/p>\n<pre><code class=\"language-C#\">public void CopyStreamToStream(Stream source, Stream destination)\r\n{\r\n    var buffer = new byte[0x1000];\r\n    int numRead;\r\n    while ((numRead = source.Read(buffer, 0, buffer.Length)) != 0)\r\n    {\r\n        destination.Write(buffer, 0, numRead);\r\n    }\r\n}<\/code><\/pre>\n<p>Straightforward: we repeatedly read from one stream and then write the resulting data to the other, read from one stream and write to the other, and so on, until we have no more data to read.  Now, how would we implement this asynchronously using the APM pattern?  Something like this:<\/p>\n<pre><code class=\"language-C#\">public IAsyncResult BeginCopyStreamToStream(\r\n    Stream source, Stream destination,\r\n    AsyncCallback callback, object state)\r\n{\r\n    var ar = new MyAsyncResult(state);\r\n    var buffer = new byte[0x1000];\r\n\r\n    Action&lt;IAsyncResult?&gt; readWriteLoop = null!;\r\n    readWriteLoop = iar =&gt;\r\n    {\r\n        try\r\n        {\r\n            for (bool isRead = iar == null; ; isRead = !isRead)\r\n            {\r\n                if (isRead)\r\n                {\r\n                    iar = source.BeginRead(buffer, 0, buffer.Length, static readResult =&gt;\r\n                    {\r\n                        if (!readResult.CompletedSynchronously)\r\n                        {\r\n                            ((Action&lt;IAsyncResult?&gt;)readResult.AsyncState!)(readResult);\r\n                        }\r\n                    }, readWriteLoop);\r\n\r\n                    if (!iar.CompletedSynchronously)\r\n                    {\r\n                        return;\r\n                    }\r\n                }\r\n                else\r\n                {\r\n                    int numRead = source.EndRead(iar!);\r\n                    if (numRead == 0)\r\n                    {\r\n                        ar.Complete(null);\r\n                        callback?.Invoke(ar);\r\n                        return;\r\n                    }\r\n\r\n                    iar = destination.BeginWrite(buffer, 0, numRead, writeResult =&gt;\r\n                    {\r\n                        if (!writeResult.CompletedSynchronously)\r\n                        {\r\n                            try\r\n                            {\r\n                                destination.EndWrite(writeResult);\r\n                                readWriteLoop(null);\r\n                            }\r\n                            catch (Exception e2)\r\n                            {\r\n                                ar.Complete(e);\r\n                                callback?.Invoke(ar);\r\n                            }\r\n                        }\r\n                    }, null);\r\n\r\n                    if (!iar.CompletedSynchronously)\r\n                    {\r\n                        return;\r\n                    }\r\n\r\n                    destination.EndWrite(iar);\r\n                }\r\n            }\r\n        }\r\n        catch (Exception e)\r\n        {\r\n            ar.Complete(e);\r\n            callback?.Invoke(ar);\r\n        }\r\n    };\r\n\r\n    readWriteLoop(null);\r\n\r\n    return ar;\r\n}\r\n\r\npublic void EndCopyStreamToStream(IAsyncResult asyncResult)\r\n{\r\n    if (asyncResult is not MyAsyncResult ar)\r\n    {\r\n        throw new ArgumentException(null, nameof(asyncResult));\r\n    }\r\n\r\n    ar.Wait();\r\n}\r\n\r\nprivate sealed class MyAsyncResult : IAsyncResult\r\n{\r\n    private bool _completed;\r\n    private int _completedSynchronously;\r\n    private ManualResetEvent? _event;\r\n    private Exception? _error;\r\n\r\n    public MyAsyncResult(object? state) =&gt; AsyncState = state;\r\n\r\n    public object? AsyncState { get; }\r\n\r\n    public void Complete(Exception? error)\r\n    {\r\n        lock (this)\r\n        {\r\n            _completed = true;\r\n            _error = error;\r\n            _event?.Set();\r\n        }\r\n    }\r\n\r\n    public void Wait()\r\n    {\r\n        WaitHandle? h = null;\r\n        lock (this)\r\n        {\r\n            if (_completed)\r\n            {\r\n                if (_error is not null)\r\n                {\r\n                    throw _error;\r\n                }\r\n                return;\r\n            }\r\n\r\n            h = _event ??= new ManualResetEvent(false);\r\n        }\r\n\r\n        h.WaitOne();\r\n        if (_error is not null)\r\n        {\r\n            throw _error;\r\n        }\r\n    }\r\n\r\n    public WaitHandle AsyncWaitHandle\r\n    {\r\n        get\r\n        {\r\n            lock (this)\r\n            {\r\n                return _event ??= new ManualResetEvent(_completed);\r\n            }\r\n        }\r\n    }\r\n\r\n    public bool CompletedSynchronously\r\n    {\r\n        get\r\n        {\r\n            lock (this)\r\n            {\r\n                if (_completedSynchronously == 0)\r\n                {\r\n                    _completedSynchronously = _completed ? 1 : -1;\r\n                }\r\n\r\n                return _completedSynchronously == 1;\r\n            }\r\n        }\r\n    }\r\n\r\n    public bool IsCompleted\r\n    {\r\n        get\r\n        {\r\n            lock (this)\r\n            {\r\n                return _completed;\r\n            }\r\n        }\r\n    }\r\n}<\/code><\/pre>\n<p>Yowsers.  And, even with all of that gobbledygook, it&#8217;s still not a great implementation.  For example, the <code>IAsyncResult<\/code> implementation is locking on every operation rather than doing things in a more lock-free manner where possible, the <code>Exception<\/code> is being stored raw rather than as an <a href=\"https:\/\/github.com\/dotnet\/runtime\/blob\/967a59712996c2cdb8ce2f65fb3167afbd8b01f3\/src\/libraries\/System.Private.CoreLib\/src\/System\/Runtime\/ExceptionServices\/ExceptionDispatchInfo.cs#L9-L16\"><code>ExceptionDispatchInfo<\/code><\/a> that would enable augmenting its call stack when propagated, there&#8217;s a lot of allocation involved in each individual operation (e.g. a delegate being allocated for each <code>BeginWrite<\/code> call), and so on. Now, imagine having to do all of this for each method you wanted to write.  Every time you wanted to write a reusable method that would consume another asynchronous operation, you&#8217;d need to do all of this work.  And if you wanted to write reusable combinators that could operate over multiple discrete <code>IAsyncResult<\/code>s efficiently (think <code>Task.WhenAll<\/code>), that&#8217;s another level of difficulty; every operation implementing and exposing its own APIs specific to that operation meant there was no lingua franca for talking about them all similarly (though some developers wrote libraries that tried to ease the burden a bit, typically via another layer of callbacks that enabled the API to supply an appropriate <code>AsyncCallback<\/code> to a Begin method).<\/p>\n<p>And all of that complication meant that very few folks even attempted this, and for those who did, well, bugs were rampant.  To be fair, this isn&#8217;t really a criticism of the APM pattern.  Rather, it&#8217;s a critique of callback-based asynchrony in general.  We&#8217;re all so used to the power and simplicity that control flow constructs in modern languages provide us with, and callback-based approaches typically run afoul of such constructs once any reasonable amount of complexity is introduced.  No other mainstream language had a better alternative available, either.<\/p>\n<p>We needed a better way, one in which we learned from the APM pattern, incorporating the things it got right while avoiding its pitfalls.  An interesting thing to note is that the APM pattern is just that, a pattern; the runtime, core libraries, and compiler didn&#8217;t provide any assistance in consuming or implementing the pattern.<\/p>\n<h2>Event-Based Asynchronous Pattern<\/h2>\n<p>.NET Framework 2.0 saw a few APIs introduced that implemented a different pattern for handling asynchronous operations, one primarily intended for doing so in the context of client applications.  This Event-based Asynchronous Pattern, or EAP, also came as a pair of members (at least, possibly more), this time a method to initiate the asynchronous operation and an event to listen for its completion.  Thus, our earlier <code>DoStuff<\/code> example might have been exposed as a set of members like this:<\/p>\n<pre><code class=\"language-C#\">class Handler\r\n{\r\n    public int DoStuff(string arg);\r\n\r\n    public void DoStuffAsync(string arg, object? userToken);\r\n    public event DoStuffEventHandler? DoStuffCompleted;\r\n}\r\n\r\npublic delegate void DoStuffEventHandler(object sender, DoStuffEventArgs e);\r\n\r\npublic class DoStuffEventArgs : AsyncCompletedEventArgs\r\n{\r\n    public DoStuffEventArgs(int result, Exception? error, bool canceled, object? userToken) :\r\n        base(error, canceled, usertoken) =&gt; Result = result;\r\n\r\n    public int Result { get; }\r\n}<\/code><\/pre>\n<p>You&#8217;d register your continuation work with the <code>DoStuffCompleted<\/code> event and then invoke the <code>DoStuffAsync<\/code> method; it would initiate the operation, and upon that operation&#8217;s completion, the <code>DoStuffCompleted<\/code> event would be raised asynchronously from the caller.  The handler could then run its continuation work, likely validating that the <code>userToken<\/code> supplied matched the one it was expecting, enabling multiple handlers to be hooked up to the event at the same time.<\/p>\n<p>This pattern made a few use cases a bit easier while making other uses cases significantly harder (and given the previous APM <code>CopyStreamToStream<\/code> example, that&#8217;s saying something). It didn&#8217;t get rolled out in a widespread manner, and it came and went effectively in a single release of .NET Framework, albeit leaving behind the APIs added during its tenure, like <code>Ping.SendAsync<\/code>\/<code>Ping.PingCompleted<\/code>:<\/p>\n<pre><code class=\"language-C#\">public class Ping : Component\r\n{\r\n    public void SendAsync(string hostNameOrAddress, object? userToken);\r\n    public event PingCompletedEventHandler? PingCompleted;\r\n    ...\r\n}<\/code><\/pre>\n<p>However, it did add one notable advance that the APM pattern didn&#8217;t factor in at all, and that has endured into the models we embrace today: <a href=\"https:\/\/github.com\/dotnet\/runtime\/blob\/967a59712996c2cdb8ce2f65fb3167afbd8b01f3\/src\/libraries\/System.Private.CoreLib\/src\/System\/Threading\/SynchronizationContext.cs#L6\"><code>SynchronizationContext<\/code><\/a>.<\/p>\n<p><code>SynchronizationContext<\/code> was also introduced in .NET Framework 2.0, as an abstraction for a general scheduler.  In particular, <code>SynchronizationContext<\/code>&#8216;s most used method is <code>Post<\/code>, which queues a work item to whatever scheduler is represented by that context.  The base implementation of <code>SynchronizationContext<\/code>, for example, just represents the <code>ThreadPool<\/code>, and so the <a href=\"https:\/\/github.com\/dotnet\/runtime\/blob\/95df571be36ed8973d09746b61fae16b2e3f251f\/src\/libraries\/System.Private.CoreLib\/src\/System\/Threading\/SynchronizationContext.cs#L22\">base implementation of <code>SynchronizationContext.Post<\/code><\/a> simply delegates to <a href=\"https:\/\/learn.microsoft.com\/dotnet\/api\/system.threading.threadpool.queueuserworkitem\"><code>ThreadPool.QueueUserWorkItem<\/code><\/a>, which is used to ask the <code>ThreadPool<\/code> to invoke the supplied callback with the associated state on one the pool&#8217;s threads. However, <code>SynchronizationContext<\/code>&#8216;s bread-and-butter isn&#8217;t just about supporting arbitrary schedulers, rather it&#8217;s about supporting scheduling in a manner that works according to the needs of various application models.<\/p>\n<p>Consider a UI framework like Windows Forms.  As with most UI frameworks on Windows, controls are associated with a particular thread, and that thread runs a message pump which runs work that&#8217;s able to interact with those controls: only that thread should try to manipulate those controls, and any other thread that wants to interact with the controls should do so by sending a message to be consumed by the UI thread&#8217;s pump.  Windows Forms makes this easy with methods like <code>Control.BeginInvoke<\/code>, which queues the supplied delegate and arguments to be run by whatever thread is associated with that <code>Control<\/code>.  You can thus write code like this:<\/p>\n<pre><code class=\"language-C#\">private void button1_Click(object sender, EventArgs e)\r\n{\r\n    ThreadPool.QueueUserWorkItem(_ =&gt;\r\n    {\r\n        string message = ComputeMessage();\r\n        button1.BeginInvoke(() =&gt;\r\n        {\r\n            button1.Text = message;\r\n        });\r\n    });\r\n}<\/code><\/pre>\n<p>That will offload the <code>ComputeMessage()<\/code> work to be done on a <code>ThreadPool<\/code> thread (so as to keep the UI responsive while it&#8217;s being processed), and then when that work has completed, queue a delegate back to the thread associated with <code>button1<\/code> to update <code>button1<\/code>&#8216;s label.  Easy enough.  WPF has something similar, just with its <code>Dispatcher<\/code> type:<\/p>\n<pre><code class=\"language-C#\">private void button1_Click(object sender, RoutedEventArgs e)\r\n{\r\n    ThreadPool.QueueUserWorkItem(_ =&gt;\r\n    {\r\n        string message = ComputeMessage();\r\n        button1.Dispatcher.InvokeAsync(() =&gt;\r\n        {\r\n            button1.Content = message;\r\n        });\r\n    });\r\n}<\/code><\/pre>\n<p>And .NET MAUI has something similar. But what if I wanted to put this logic into a helper method? e.g.<\/p>\n<pre><code class=\"language-C#\">\/\/ Call ComputeMessage and then invoke the update action to update controls.\r\ninternal static void ComputeMessageAndInvokeUpdate(Action&lt;string&gt; update) { ... }<\/code><\/pre>\n<p>I could then use that like this:<\/p>\n<pre><code class=\"language-C#\">private void button1_Click(object sender, EventArgs e)\r\n{\r\n    ComputeMessageAndInvokeUpdate(message =&gt; button1.Text = message);\r\n}<\/code><\/pre>\n<p>but how could <code>ComputeMessageAndInvokeUpdate<\/code> be implemented in such a way that it could work in any of those applications?  Would it need to be hardcoded to know about every possible UI framework? That&#8217;s where <code>SynchronizationContext<\/code> shines.  We might implement the method like this:<\/p>\n<pre><code class=\"language-C#\">internal static void ComputeMessageAndInvokeUpdate(Action&lt;string&gt; update)\r\n{\r\n    SynchronizationContext? sc = SynchronizationContext.Current;\r\n    ThreadPool.QueueUserWorkItem(_ =&gt;\r\n    {\r\n        string message = ComputeMessage();\r\n        if (sc is not null)\r\n        {\r\n            sc.Post(_ =&gt; update(message), null);\r\n        }\r\n        else\r\n        {\r\n            update(message);\r\n        }\r\n    });\r\n}<\/code><\/pre>\n<p>That uses the <code>SynchronizationContext<\/code> as an abstraction to target whatever &#8220;scheduler&#8221; should be used to get back to the necessary environment for interacting with the UI.  Each application model then ensures it&#8217;s published as <code>SynchronizationContext.Current<\/code> a <code>SynchronizationContext<\/code>-derived type that does the &#8220;right thing.&#8221;  For example, <a href=\"https:\/\/github.com\/dotnet\/winforms\/blob\/41b11b6a7290a2bbc0c293042f30d9632e55aae2\/src\/System.Windows.Forms\/src\/System\/Windows\/Forms\/WindowsFormsSynchronizationContext.cs#L13\">Windows Forms has this<\/a>:<\/p>\n<pre><code class=\"language-C#\">public sealed class WindowsFormsSynchronizationContext : SynchronizationContext, IDisposable\r\n{\r\n    public override void Post(SendOrPostCallback d, object? state) =&gt;\r\n        _controlToSendTo?.BeginInvoke(d, new object?[] { state });\r\n    ...\r\n}<\/code><\/pre>\n<p>and <a href=\"https:\/\/github.com\/dotnet\/wpf\/blob\/c67b9f6f5ad04f5c264b52de0733a8832714615f\/src\/Microsoft.DotNet.Wpf\/src\/WindowsBase\/System\/Windows\/Threading\/DispatcherSynchronizationContext.cs#L18\">WPF has this<\/a>:<\/p>\n<pre><code class=\"language-C#\">public sealed class DispatcherSynchronizationContext : SynchronizationContext\r\n{\r\n    public override void Post(SendOrPostCallback d, Object state) =&gt;\r\n        _dispatcher.BeginInvoke(_priority, d, state);\r\n    ...\r\n}<\/code><\/pre>\n<p>ASP.NET <em>used<\/em> to <a href=\"https:\/\/referencesource.microsoft.com\/#System.Web\/AspNetSynchronizationContext.cs,16\">have one<\/a>, which didn&#8217;t actually care about what thread work ran on, but rather that work associated with a given request was serialized such that multiple threads wouldn&#8217;t concurrently be accessing a given <code>HttpContext<\/code>:<\/p>\n<pre><code class=\"language-C#\">internal sealed class AspNetSynchronizationContext : AspNetSynchronizationContextBase\r\n{\r\n    public override void Post(SendOrPostCallback callback, Object state) =&gt;\r\n        _state.Helper.QueueAsynchronous(() =&gt; callback(state));\r\n    ...\r\n}<\/code><\/pre>\n<p>This also isn&#8217;t limited to such main application models.  For example, <a href=\"https:\/\/github.com\/xunit\/xunit\">xunit<\/a> is a popular unit testing framework, one that .NET&#8217;s core repos use for their unit testing, and it also employs multiple custom <code>SynchronizationContext<\/code>s. You can, for example, allow tests to run in parallel but limit the number of tests that are allowed to be running concurrently.  How is that enabled? Via a <code>SynchronizationContext<\/code>:<\/p>\n<pre><code class=\"language-C#\">public class MaxConcurrencySyncContext : SynchronizationContext, IDisposable\r\n{\r\n    public override void Post(SendOrPostCallback d, object? state)\r\n    {\r\n        var context = ExecutionContext.Capture();\r\n        workQueue.Enqueue((d, state, context));\r\n        workReady.Set();\r\n    }\r\n}<\/code><\/pre>\n<p><a href=\"https:\/\/github.com\/xunit\/xunit\/blob\/601e2d830853fa2ef0048d34afae520d6b73deca\/src\/xunit.v3.core\/Sdk\/MaxConcurrencySyncContext.cs#L14\"><code>MaxConcurrencySyncContext<\/code>&#8216;s<\/a> <code>Post<\/code> method just queues the work to its own internal work queue, which it then processes on its own worker threads, where it controls how many there are based on the max concurrency desired. You get the idea.<\/p>\n<p>How does this tie in with the Event-based Asynchronous Pattern?  Both EAP and <code>SynchronizationContext<\/code> were introduced at the same time, and the EAP dictated that the completion events should be queued to whatever <code>SynchronizationContext<\/code> was current when the asynchronous operation was initiated.  To simplify that ever so slightly (and arguably not enough to warrant the extra complexity), some helper types were also introduced in <code>System.ComponentModel<\/code>, in particular <code>AsyncOperation<\/code> and <code>AsyncOperationManager<\/code>.  The former was just a tuple that wrapped the user-supplied state object and the captured <code>SynchronizationContext<\/code>, and the latter just served as a simple factory to do that capture and create the <code>AsyncOperation<\/code> instance.  Then EAP implementations would use those, e.g. <code>Ping.SendAsync<\/code> called <a href=\"https:\/\/github.com\/dotnet\/runtime\/blob\/5f94bffeff62f4b767a311a4505d6d40d86279d9\/src\/libraries\/System.ComponentModel.EventBasedAsync\/src\/System\/ComponentModel\/AsyncOperationManager.cs#L10-L36\"><code>AsyncOperationManager.CreateOperation<\/code><\/a> to capture the <code>SynchronizationContext<\/code>, and then when the operation completed, the <code>AsyncOperation<\/code>&#8216;s <a href=\"https:\/\/github.com\/dotnet\/runtime\/blob\/5f94bffeff62f4b767a311a4505d6d40d86279d9\/src\/libraries\/System.ComponentModel.EventBasedAsync\/src\/System\/ComponentModel\/AsyncOperation.cs#L51-L77\"><code>PostOperationCompleted<\/code><\/a> method would be invoked to call the stored <code>SynchronizationContext<\/code>&#8216;s <code>Post<\/code> method.<\/p>\n<p><code>SynchronizationContext<\/code> provides a few more trinkets worthy of mention as they&#8217;ll show up again in a bit.  In particular, it exposes <code>OperationStarted<\/code> and <code>OperationCompleted<\/code> methods.  The base implementation of these virtuals are empty, doing nothing, but a derived implementation might override these to know about in-flight operations.  That means EAP implementations would also invoke these <code>OperationStarted<\/code>\/<code>OperationCompleted<\/code> at the beginning and end of each operation, in order to inform any present <code>SynchronizationContext<\/code> and allow it to track the work.  This is particularly relevant to the EAP pattern because the methods that initiate the async operations are <code>void<\/code> returning: you get nothing back that allows you to track the work individually.  We&#8217;ll get back to that.<\/p>\n<p>So, we needed something better than the APM pattern, and the EAP that came next introduced some new things but didn&#8217;t really address the core problems we faced. We still needed something better.<\/p>\n<h2>Enter Tasks<\/h2>\n<p>.NET Framework 4.0 introduced the <code>System.Threading.Tasks.Task<\/code> type. At its heart, a <code>Task<\/code> is just a data structure that represents the eventual completion of some asynchronous operation (other frameworks call a similar type a &#8220;promise&#8221; or a &#8220;future&#8221;).  A <code>Task<\/code> is created to represent some operation, and then when the operation it logically represents completes, the results are stored into that <code>Task<\/code>. Simple enough. But <em>the<\/em> key feature that <code>Task<\/code> provides that makes it leaps and bounds more useful than <code>IAsyncResult<\/code> is that it builds into itself the notion of a continuation.  That one feature means you can walk up to any <code>Task<\/code> and ask to be notified asynchronously when it completes, with the task itself handling the synchronization to ensure the continuation is invoked regardless of whether the task has already completed, hasn&#8217;t yet completed, or is completing concurrently with the notification request. Why is that so impactful?  Well, if you remember back to our discussion of the old APM pattern, there were two primary problems.<\/p>\n<ol>\n<li>You had to implement a custom <code>IAsyncResult<\/code> implementation for every operation: there was no built-in <code>IAsyncResult<\/code> implementation anyone could just use for their needs.<\/li>\n<li>You had to know prior to the Begin method being called what you wanted to do when it was complete. This makes it a significant challenge to implement combinators and other generalized routines for consuming and composing arbitrary async implementations.<\/li>\n<\/ol>\n<p>In contrast, with <code>Task<\/code>, that shared representation lets you walk up to an async operation <em>after<\/em> you&#8217;ve already initiated the operation and provide a continuation <em>after<\/em> you&#8217;ve already initiated the operation&#8230; you don&#8217;t need to provide that continuation <em>to<\/em> the method that initiates the operation.  Everyone who has asynchronous operations can produce a <code>Task<\/code>, and everyone who consumes asynchronous operations can consume a <code>Task<\/code>, and nothing custom needs to be done to connect the two: <code>Task<\/code> becomes the lingua franca for enabling producers and consumers of asynchronous operations to talk.  And that has changed the face of .NET.  More on that in a bit&#8230;<\/p>\n<p>For now, let&#8217;s better understand what this actually means.  Rather than dive into the intricate code for <code>Task<\/code>, we&#8217;ll do the pedagogical thing and just implement a simple version.  This isn&#8217;t meant to be a great implementation, rather only complete enough functionally to help understand the meat of what is a <code>Task<\/code>, which, at the end of the day, is really just a data structure that handles coordinating the setting and reception of a completion signal.  We&#8217;ll start with just a few fields:<\/p>\n<pre><code class=\"language-C#\">class MyTask\r\n{\r\n    private bool _completed;\r\n    private Exception? _error;\r\n    private Action&lt;MyTask&gt;? _continuation;\r\n    private ExecutionContext? _ec;\r\n    ...\r\n}<\/code><\/pre>\n<p>We need a field to know whether the task has completed (<code>_completed<\/code>), and we need a field to store any error that caused the task to fail (<code>_error<\/code>); if we were also implementing a generic <code>MyTask&lt;TResult&gt;<\/code>, there&#8217;d also be a <code>private TResult _result<\/code> field for storing the successful result of the operation.  Thus far, this looks a lot like our custom <code>IAsyncResult<\/code> implementation earlier (not a coincidence, of course).  But now the pi\u00e8ce de r\u00e9sistance, the <code>_continuation<\/code> field. In this simple implementation, we&#8217;re supporting just a single continuation, but that&#8217;s enough for explanatory purposes (the real <code>Task<\/code> employs an <a href=\"https:\/\/github.com\/dotnet\/runtime\/blob\/81977309048600e67fdb44a7d4c99aaad89846d7\/src\/libraries\/System.Private.CoreLib\/src\/System\/Threading\/Tasks\/Task.cs#L176-L178\"><code>object<\/code> field<\/a> that can either be an individual continuation object or a <code>List&lt;&gt;<\/code> of continuation objects).  This is a delegate that will be invoked when the task completes.<\/p>\n<p>Now, a bit of surface area. As noted, one of the fundamental advances in <code>Task<\/code> over previous models was the ability to supply the continuation work (the callback) <em>after<\/em> the operation was initiated.  We need a method to let us do that, so let&#8217;s add <code>ContinueWith<\/code>:<\/p>\n<pre><code class=\"language-C#\">public void ContinueWith(Action&lt;MyTask&gt; action)\r\n{\r\n    lock (this)\r\n    {\r\n        if (_completed)\r\n        {\r\n            ThreadPool.QueueUserWorkItem(_ =&gt; action(this));\r\n        }\r\n        else if (_continuation is not null)\r\n        {\r\n            throw new InvalidOperationException(\"Unlike Task, this implementation only supports a single continuation.\");\r\n        }\r\n        else\r\n        {\r\n            _continuation = action;\r\n            _ec = ExecutionContext.Capture();\r\n        }\r\n    }\r\n}<\/code><\/pre>\n<p>If the task has already been marked completed by the time <code>ContinueWith<\/code> is called, <code>ContinueWith<\/code> just queues the execution of the delegate.  Otherwise, the method stores the delegate, such that the continuation may be queued when the task completes (it also stores something called an <code>ExecutionContext<\/code>, and then uses that when the delegate is later invoked, but don&#8217;t worry about that part for now&#8230; we&#8217;ll get to it).  Simple enough.<\/p>\n<p>Then we need to be able to mark the <code>MyTask<\/code> as completed, meaning whatever asynchronous operation it represents has finished. For that, we&#8217;ll expose two methods, one to mark it completed successfully (&#8220;SetResult&#8221;), and one to mark it completed with an error (&#8220;SetException&#8221;):<\/p>\n<pre><code class=\"language-C#\">public void SetResult() =&gt; Complete(null);\r\n\r\npublic void SetException(Exception error) =&gt; Complete(error);\r\n\r\nprivate void Complete(Exception? error)\r\n{\r\n    lock (this)\r\n    {\r\n        if (_completed)\r\n        {\r\n            throw new InvalidOperationException(\"Already completed\");\r\n        }\r\n\r\n        _error = error;\r\n        _completed = true;\r\n\r\n        if (_continuation is not null)\r\n        {\r\n            ThreadPool.QueueUserWorkItem(_ =&gt;\r\n            {\r\n                if (_ec is not null)\r\n                {\r\n                    ExecutionContext.Run(_ec, _ =&gt; _continuation(this), null);\r\n                }\r\n                else\r\n                {\r\n                    _continuation(this);\r\n                }\r\n            });\r\n        }\r\n    }\r\n}<\/code><\/pre>\n<p>We store any error, we mark the task as having been completed, and then if a continuation had previously been registered, we queue it to be invoked.<\/p>\n<p>Finally, we need a way to propagate any exception that may have occurred in the task (and, if this were a generic <code>MyTask&lt;T&gt;<\/code>, to return its <code>_result<\/code>); to facilitate certain scenarios, we also allow this method to block waiting for the task to complete, which we can implement in terms of <code>ContinueWith<\/code> (the continuation just signals a <code>ManualResetEventSlim<\/code> that the caller then blocks on waiting for completion).<\/p>\n<pre><code class=\"language-C#\">public void Wait()\r\n{\r\n    ManualResetEventSlim? mres = null;\r\n    lock (this)\r\n    {\r\n        if (!_completed)\r\n        {\r\n            mres = new ManualResetEventSlim();\r\n            ContinueWith(_ =&gt; mres.Set());\r\n        }\r\n    }\r\n\r\n    mres?.Wait();\r\n    if (_error is not null)\r\n    {\r\n        ExceptionDispatchInfo.Throw(_error);\r\n    }\r\n}<\/code><\/pre>\n<p>And that&#8217;s basically it. Now to be sure, the real <code>Task<\/code> is way more complicated, with a much more efficient implementation, with support for any number of continuations, with a multitude of knobs about how it should behave (e.g. should continuations be queued as is being done here or should they be invoked synchronously as part of the task&#8217;s completion), with the ability to store multiple exceptions rather than just one, with special knowledge of cancellation, with tons of helper methods for doing common operations (e.g. <code>Task.Run<\/code> which creates a <code>Task<\/code> to represent a delegate queued to be invoked on the thread pool), and so on.  But there&#8217;s no magic to any of that; at its core, it&#8217;s just what we saw here.<\/p>\n<p>You might also notice that my simple <code>MyTask<\/code> has public <code>SetResult<\/code>\/<code>SetException<\/code> methods directly on it, whereas <code>Task<\/code> doesn&#8217;t.  Actually, <code>Task<\/code> <em>does<\/em> have such methods, <a href=\"https:\/\/github.com\/dotnet\/runtime\/blob\/81977309048600e67fdb44a7d4c99aaad89846d7\/src\/libraries\/System.Private.CoreLib\/src\/System\/Threading\/Tasks\/Task.cs#L3271\">they&#8217;re just internal<\/a>, with a <code>System.Threading.Tasks.TaskCompletionSource<\/code> type serving as a separate &#8220;producer&#8221; for the task and its completion; that was done not out of technical necessity but as a way to keep the completion methods off of the thing meant only for consumption.  You can then hand out a <code>Task<\/code> without having to worry about it being completed out from under you; the completion signal is an implementation detail of whatever created the task and also reserves the right to complete it by keeping the <code>TaskCompletionSource<\/code> to itself. (<code>CancellationToken<\/code> and <code>CancellationTokenSource<\/code> follow a similar pattern: <code>CancellationToken<\/code> is just a struct wrapper for a <code>CancellationTokenSource<\/code>, serving up only the public surface area related to consuming a cancellation signal but without the ability to produce one, which is a capability restricted to whomever has access to the <code>CancellationTokenSource<\/code>.)<\/p>\n<p>Of course, we can implement combinators and helpers for this <code>MyTask<\/code> similar to what <code>Task<\/code> provides.  Want a simple <code>MyTask.WhenAll<\/code>? Here you go:<\/p>\n<pre><code class=\"language-C#\">public static MyTask WhenAll(MyTask t1, MyTask t2)\r\n{\r\n    var t = new MyTask();\r\n\r\n    int remaining = 2;\r\n    Exception? e = null;\r\n\r\n    Action&lt;MyTask&gt; continuation = completed =&gt;\r\n    {\r\n        e ??= completed._error; \/\/ just store a single exception for simplicity\r\n        if (Interlocked.Decrement(ref remaining) == 0)\r\n        {\r\n            if (e is not null) t.SetException(e);\r\n            else t.SetResult();\r\n        }\r\n    };\r\n\r\n    t1.ContinueWith(continuation);\r\n    t2.ContinueWith(continuation);\r\n\r\n    return t;\r\n}<\/code><\/pre>\n<p>Want a <code>MyTask.Run<\/code>? You got it:<\/p>\n<pre><code class=\"language-C#\">public static MyTask Run(Action action)\r\n{\r\n    var t = new MyTask();\r\n\r\n    ThreadPool.QueueUserWorkItem(_ =&gt;\r\n    {\r\n        try\r\n        {\r\n            action();\r\n            t.SetResult();\r\n        }\r\n        catch (Exception e)\r\n        {\r\n            t.SetException(e);\r\n        }\r\n    });\r\n\r\n    return t;\r\n}<\/code><\/pre>\n<p>How about a <code>MyTask.Delay<\/code>? Sure:<\/p>\n<pre><code class=\"language-C#\">public static MyTask Delay(TimeSpan delay)\r\n{\r\n    var t = new MyTask();\r\n\r\n    var timer = new Timer(_ => t.SetResult());\r\n    timer.Change(delay, Timeout.InfiniteTimeSpan);\r\n\r\n    return t;\r\n}<\/code><\/pre>\n<p>You get the idea.<\/p>\n<p>With <code>Task<\/code> in place, all previous async patterns in .NET became a thing of the past.  Anywhere an asynchronous implementation previously was implemented with the APM pattern or the EAP pattern, new <code>Task<\/code>-returning methods were exposed.<\/p>\n<h3>And ValueTasks<\/h3>\n<p><code>Task<\/code> continues to be the workhorse for asynchrony in .NET to this day, with new methods exposed every release and routinely throughout the ecosystem that return <code>Task<\/code> and <code>Task&lt;TResult&gt;<\/code>. However, <code>Task<\/code> is a class, which means creating one does come with an allocation.  For the most part, one extra allocation for a long-lived asynchronous operation is a pittance and won&#8217;t meaningfully impact performance for all but the most performance-sensitive operations.  However, as was previously noted, synchronous completion of asynchronous operations is fairly common.  <code>Stream.ReadAsync<\/code> was introduced to return a <code>Task&lt;int&gt;<\/code>, but if you&#8217;re reading from, say, a <code>BufferedStream<\/code>, there&#8217;s a really good chance many of your reads are going to complete synchronously due to simply needing to pull data from an in-memory buffer rather than performing syscalls and real I\/O.  Having to allocate an additional object just to return such data is unfortunate (note it was the case with APM as well).  For non-generic <code>Task<\/code>-returning methods, the method can just return a singleton already-completed task, and in fact one such singleton is provided by <code>Task<\/code> in the form of <code>Task.CompletedTask<\/code>.  But for <code>Task&lt;TResult&gt;<\/code>, it&#8217;s impossible to cache a <code>Task<\/code> for every possible <code>TResult<\/code>.  What can we do to make such synchronous completion faster?<\/p>\n<p>It is possible to cache <em>some<\/em> <code>Task&lt;TResult&gt;<\/code>s.  For example, <code>Task&lt;bool&gt;<\/code> is very common, and there&#8217;s only two meaningful things to cache there: a <code>Task&lt;bool&gt;<\/code> when the <code>Result<\/code> is <code>true<\/code> and one when the <code>Result<\/code> is <code>false<\/code>.  Or while we wouldn&#8217;t want to try caching four billion <code>Task&lt;int&gt;<\/code>s to accommmodate every possible <code>Int32<\/code> result, small <code>Int32<\/code> values are very common, so we could cache a few for, say, -1 through 8.  Or for arbitrary types, <code>default<\/code> is a reasonably common value, so we could cache a <code>Task&lt;TResult&gt;<\/code> where <code>Result<\/code> is <code>default(TResult)<\/code> for every relevant type.  And in fact, <a href=\"https:\/\/github.com\/dotnet\/runtime\/blob\/81977309048600e67fdb44a7d4c99aaad89846d7\/src\/libraries\/System.Private.CoreLib\/src\/System\/Threading\/Tasks\/Task.cs#L5222-L5273\"><code>Task.FromResult<\/code> does that today<\/a> (as of recent versions of .NET), using a small cache of such reusable <code>Task&lt;TResult&gt;<\/code> singletons and returning one of them if appropriate or otherwise allocating a new <code>Task&lt;TResult&gt;<\/code> for the exact provided result value.  Other schemes can be created to handle other reasonably common cases.  For example, when working with <code>Stream.ReadAsync<\/code>, it&#8217;s reasonably common to call it multiple times on the same stream, all with the same <code>count<\/code> for the number of bytes allowed to be read.  And it&#8217;s reasonably common for the implementation to be able to fully satisfy that <code>count<\/code> request.  Which means it&#8217;s reasonably common for <code>Stream.ReadAsync<\/code> to repeatedly return the same <code>int<\/code> result value.  To avoid multiple allocations in such scenarios, multiple <code>Stream<\/code> types (like <code>MemoryStream<\/code>) will cache the last <code>Task&lt;int&gt;<\/code> they successfully returned, and if the next read ends up also completing synchronously and successfully with the same result, it can just return the same <code>Task&lt;int&gt;<\/code> again rather than creating a new one.  But what about other cases?  How can this allocation for synchronous completions be avoided more generally in situations where the performance overhead really matters?<\/p>\n<p>That&#8217;s where <code>ValueTask&lt;TResult&gt;<\/code> comes into the picture (<a href=\"https:\/\/devblogs.microsoft.com\/dotnet\/understanding-the-whys-whats-and-whens-of-valuetask\/\">a much more detailed examination of <code>ValueTask&lt;TResult&gt;<\/code><\/a> is also available). <code>ValueTask&lt;TResult&gt;<\/code> started life as a discriminated union between a <code>TResult<\/code> and a <code>Task&lt;TResult&gt;<\/code>.  At the end of the day, ignoring all the bells and whistles, <a href=\"https:\/\/github.com\/dotnet\/corefx\/blob\/d6173e069a9bcedfdfd7f4f41e67d23f67157b61\/src\/System.Threading.Tasks.Extensions\/src\/System\/Threading\/Tasks\/ValueTask.cs#L53-L58\">that&#8217;s all it is<\/a> (or, rather, was), either an immediate result or a promise for a result at some point in the future:<\/p>\n<pre><code class=\"language-C#\">public readonly struct ValueTask&lt;TResult&gt;\r\n{\r\n   private readonly Task&lt;TResult&gt;? _task;\r\n   private readonly TResult _result;\r\n   ...\r\n}<\/code><\/pre>\n<p>A method could then return such a <code>ValueTask&lt;TResult&gt;<\/code> instead of a <code>Task&lt;TResult&gt;<\/code>, and at the expense of a larger return type and a little more indirection, avoid the <code>Task&lt;TResult&gt;<\/code> allocation if the <code>TResult<\/code> was known by the time it needed to be returned.<\/p>\n<p>There are, however, super duper extreme high-performance scenarios where you want to be able to avoid the <code>Task&lt;TResult&gt;<\/code> allocation even in the asynchronous-completion case.  For example, <code>Socket<\/code> lives at the bottom of the networking stack, and <code>SendAsync<\/code> and <code>ReceiveAsync<\/code> on sockets are on the super hot path for many a service, with both synchronous and asynchronous completions being very common (most sends complete synchronously, and many receives complete synchronously due to data having already been buffered in the kernel).  Wouldn&#8217;t it be nice if, on a given <code>Socket<\/code>, we could make such sending and receiving allocation-free, regardless of whether the operations complete synchronously or asynchronously?<\/p>\n<p>That&#8217;s where <code>System.Threading.Tasks.Sources.IValueTaskSource&lt;TResult&gt;<\/code> enters the picture:<\/p>\n<pre><code class=\"language-C#\">public interface IValueTaskSource&lt;out TResult&gt;\r\n{\r\n    ValueTaskSourceStatus GetStatus(short token);\r\n    void OnCompleted(Action&lt;object?&gt; continuation, object? state, short token, ValueTaskSourceOnCompletedFlags flags);\r\n    TResult GetResult(short token);\r\n}<\/code><\/pre>\n<p>The <code>IValueTaskSource&lt;TResult&gt;<\/code> interface allows an implementation to provide its own backing object for a <code>ValueTask&lt;TResult&gt;<\/code>, enabling the object to implement methods like <code>GetResult<\/code> to retrieve the result of the operation and <code>OnCompleted<\/code> to hook up a continuation to the operation. With that, <code>ValueTask&lt;TResult&gt;<\/code> evolved <a href=\"https:\/\/github.com\/dotnet\/runtime\/blob\/81977309048600e67fdb44a7d4c99aaad89846d7\/src\/libraries\/System.Private.CoreLib\/src\/System\/Threading\/Tasks\/ValueTask.cs#L465-L468\">a small change to its definition<\/a>, with its <code>Task&lt;TResult&gt;? _task<\/code> field replaced by an <code>object? _obj<\/code> field:<\/p>\n<pre><code class=\"language-C#\">public readonly struct ValueTask&lt;TResult&gt;\r\n{\r\n   private readonly object? _obj;\r\n   private readonly TResult _result;\r\n   ...\r\n}<\/code><\/pre>\n<p>Whereas the <code>_task<\/code> field was either a <code>Task&lt;TResult&gt;<\/code> or null, the <code>_obj<\/code> field now can also be an <code>IValueTaskSource&lt;TResult&gt;<\/code>.  Once a <code>Task&lt;TResult&gt;<\/code> is marked as completed, that&#8217;s it, it will remain completed and never transition back to an incomplete state. In contrast, an object implementing <code>IValueTaskSource&lt;TResult&gt;<\/code> has full control over the implementation, and is free to transition bidirectionally between complete and incomplete states, as <code>ValueTask&lt;TResult&gt;<\/code>&#8216;s contract is that a given instance may be consumed only once, thus by construction it shouldn&#8217;t observe a post-consumption change in the underlying instance (this is why analysis rules like <a href=\"https:\/\/learn.microsoft.com\/dotnet\/fundamentals\/code-analysis\/quality-rules\/ca2012\">CA2012<\/a> exist). This then enables types like <code>Socket<\/code> to pool <code>IValueTaskSource&lt;TResult&gt;<\/code> instances to use for repeated calls.  <code>Socket<\/code> caches up to two such instances, one for reads and one for writes, since the 99.999% case is to have at most one receive and one send in-flight at the same time.<\/p>\n<p>I mentioned <code>ValueTask&lt;TResult&gt;<\/code> but not <code>ValueTask<\/code>.  When dealing only with avoiding allocation for synchronous completion, there&#8217;s little performance benefit to a non-generic <code>ValueTask<\/code> (representing result-less, <code>void<\/code> operations), since the same condition can be represented with <code>Task.CompletedTask<\/code>.  But once we care about the ability to use a poolable underlying object for avoiding allocation in asynchronous completion case, that then also matters for the non-generic.  Thus, when <code>IValueTaskSource&lt;TResult&gt;<\/code> was introduced, so too were <code>IValueTaskSource<\/code> and <code>ValueTask<\/code>.<\/p>\n<p>So, we have <code>Task<\/code>, <code>Task&lt;TResult&gt;<\/code>, <code>ValueTask<\/code>, and <code>ValueTask&lt;TResult&gt;<\/code>.  We&#8217;re able to interact with them in various ways, representing arbitrary asynchronous operations and hooking up continuations to handle the completion of those asynchronous operations. And yes, we can do so <em>before<\/em> or <em>after<\/em> the operation completes.<\/p>\n<p><em>But<\/em>&#8230; those continuations are still callbacks!<\/p>\n<p>We&#8217;re still forced into a continuation-passing style for encoding our asynchronous control flow!!<\/p>\n<p>It&#8217;s still really hard to get right!!!<\/p>\n<p>How can we fix that????<\/p>\n<h2>C# Iterators to the Rescue<\/h2>\n<p>The glimmer of hope for that solution actually came about a few years before <code>Task<\/code> hit the scene, with C# 2.0, when it added support for iterators.<\/p>\n<p>&#8220;Iterators?&#8221; you ask? &#8220;You mean for <code>IEnumerable&lt;T&gt;<\/code>?&#8221; That&#8217;s the one.  Iterators let you write a single method that is then used by the compiler to implement an <code>IEnumerable&lt;T&gt;<\/code> and\/or an <code>IEnumerator&lt;T&gt;<\/code>.  For example, if I wanted to create an enumerable that yielded the Fibonacci sequence, I might write something like this:<\/p>\n<pre><code class=\"language-C#\">public static IEnumerable&lt;int&gt; Fib()\r\n{\r\n    int prev = 0, next = 1;\r\n    yield return prev;\r\n    yield return next;\r\n\r\n    while (true)\r\n    {\r\n        int sum = prev + next;\r\n        yield return sum;\r\n        prev = next;\r\n        next = sum;\r\n    }\r\n}<\/code><\/pre>\n<p>I can then enumerate this with a <code>foreach<\/code>:<\/p>\n<pre><code class=\"language-C#\">foreach (int i in Fib())\r\n{\r\n    if (i &gt; 100) break;\r\n    Console.Write($\"{i} \");\r\n}<\/code><\/pre>\n<p>I can compose it with other <code>IEnumerable&lt;T&gt;<\/code>s via combinators like those on <code>System.Linq.Enumerable<\/code>:<\/p>\n<pre><code class=\"language-C#\">foreach (int i in Fib().Take(12))\r\n{\r\n    Console.Write($\"{i} \");\r\n}<\/code><\/pre>\n<p>Or I can just manually enumerate it directly via an <code>IEnumerator&lt;T&gt;<\/code>:<\/p>\n<pre><code class=\"language-C#\">using IEnumerator&lt;int&gt; e = Fib().GetEnumerator();\r\nwhile (e.MoveNext())\r\n{\r\n    int i = e.Current;\r\n    if (i &gt; 100) break;\r\n    Console.Write($\"{i} \");\r\n}<\/code><\/pre>\n<p>All of the above result in this output:<\/p>\n<pre><code class=\"language-text\">0 1 1 2 3 5 8 13 21 34 55 89<\/code><\/pre>\n<p>The really interesting thing about this is that in order to achieve the above, we need to be able to enter and exit that <code>Fib<\/code> method multiple times.  We call <code>MoveNext<\/code>, it enters the method, the method then executes until it encounters a <code>yield return<\/code>, at which point the call to <code>MoveNext<\/code> needs to return <code>true<\/code> and a subsequent access to <code>Current<\/code> needs to return the yielded value.  Then we call <code>MoveNext<\/code> again, and we need to be able to pick up in <code>Fib<\/code> just after where we last left off, and with all of the state from the previous invocation intact.  Iterators are effectively coroutines provided by the C# language \/ compiler, with the compiler expanding my <code>Fib<\/code> iterator into a full-blown state machine:<\/p>\n<pre><code class=\"language-C#\">public static IEnumerable&lt;int&gt; Fib() =&gt; new &lt;Fib&gt;d__0(-2);\r\n\r\n[CompilerGenerated]\r\nprivate sealed class &lt;Fib&gt;d__0 : IEnumerable&lt;int&gt;, IEnumerable, IEnumerator&lt;int&gt;, IEnumerator, IDisposable\r\n{\r\n    private int &lt;&gt;1__state;\r\n    private int &lt;&gt;2__current;\r\n    private int &lt;&gt;l__initialThreadId;\r\n    private int &lt;prev&gt;5__2;\r\n    private int &lt;next&gt;5__3;\r\n    private int &lt;sum&gt;5__4;\r\n\r\n    int IEnumerator&lt;int&gt;.Current =&gt; &lt;&gt;2__current;\r\n    object IEnumerator.Current =&gt; &lt;&gt;2__current;\r\n\r\n    public &lt;Fib&gt;d__0(int &lt;&gt;1__state)\r\n    {\r\n        this.&lt;&gt;1__state = &lt;&gt;1__state;\r\n        &lt;&gt;l__initialThreadId = Environment.CurrentManagedThreadId;\r\n    }\r\n\r\n    private bool MoveNext()\r\n    {\r\n        switch (&lt;&gt;1__state)\r\n        {\r\n            default:\r\n                return false;\r\n            case 0:\r\n                &lt;&gt;1__state = -1;\r\n                &lt;prev&gt;5__2 = 0;\r\n                &lt;next&gt;5__3 = 1;\r\n                &lt;&gt;2__current = &lt;prev&gt;5__2;\r\n                &lt;&gt;1__state = 1;\r\n                return true;\r\n            case 1:\r\n                &lt;&gt;1__state = -1;\r\n                &lt;&gt;2__current = &lt;next&gt;5__3;\r\n                &lt;&gt;1__state = 2;\r\n                return true;\r\n            case 2:\r\n                &lt;&gt;1__state = -1;\r\n                break;\r\n            case 3:\r\n                &lt;&gt;1__state = -1;\r\n                &lt;prev&gt;5__2 = &lt;next&gt;5__3;\r\n                &lt;next&gt;5__3 = &lt;sum&gt;5__4;\r\n                break;\r\n        }\r\n        &lt;sum&gt;5__4 = &lt;prev&gt;5__2 + &lt;next&gt;5__3;\r\n        &lt;&gt;2__current = &lt;sum&gt;5__4;\r\n        &lt;&gt;1__state = 3;\r\n        return true;\r\n    }\r\n\r\n    IEnumerator&lt;int&gt; IEnumerable&lt;int&gt;.GetEnumerator()\r\n    {\r\n        if (&lt;&gt;1__state == -2 &amp;&amp;\r\n            &lt;&gt;l__initialThreadId == Environment.CurrentManagedThreadId)\r\n        {\r\n            &lt;&gt;1__state = 0;\r\n            return this;\r\n        }\r\n        return new &lt;Fib&gt;d__0(0);\r\n    }\r\n\r\n    IEnumerator IEnumerable.GetEnumerator() =&gt; ((IEnumerable&lt;int&gt;)this).GetEnumerator();\r\n    void IEnumerator.Reset() =&gt; throw new NotSupportedException();\r\n    void IDisposable.Dispose() { }\r\n}<\/code><\/pre>\n<p>All of the logic for Fib is now inside of the <code>MoveNext<\/code> method, but as part of a jump table that lets the implementation branch to where it last left off, which is tracked in a generated state field on the enumerator type.  And the variables I wrote as locals, like <code>prev<\/code>, <code>next<\/code>, and <code>sum<\/code>, have been &#8220;lifted&#8221; to be fields on the enumerator, so that they may persist across invocations of <code>MoveNext<\/code>.<\/p>\n<p>(Note that the previous code snippet showing how the C# compiler emits the implementation won&#8217;t compile as-is.  The C# compiler synthesizes &#8220;unspeakable&#8221; names, meaning it names types and members it creates in a way that&#8217;s valid IL but invalid C#, so as not to risk conflicting with any user-named types and members.  I&#8217;ve kept everything named as the compiler does, but if you want to experiment with compiling it, you can rename things to use valid C# names instead.)<\/p>\n<p>In my previous example, the last form of enumeration I showed involved manually using the <code>IEnumerator&lt;T&gt;<\/code>. At that level, we&#8217;re manually invoking <code>MoveNext()<\/code>, deciding when it was an appropriate time to re-enter the coroutine. But&#8230; what if instead of invoking it like that, I could instead have the next invocation of <code>MoveNext<\/code> actually be part of the continuation work performed when an asynchronous operation completes?  What if I could <code>yield return<\/code> something that represents an asynchronous operation and have the consuming code hook up a continuation to that yielded object where that continuation then does the <code>MoveNext<\/code>? With such an approach, I could write a helper method like this:<\/p>\n<pre><code class=\"language-C#\">static Task IterateAsync(IEnumerable&lt;Task&gt; tasks)\r\n{\r\n    var tcs = new TaskCompletionSource();\r\n\r\n    IEnumerator&lt;Task&gt; e = tasks.GetEnumerator();\r\n\r\n    void Process()\r\n    {\r\n        try\r\n        {\r\n            if (e.MoveNext())\r\n            {\r\n                e.Current.ContinueWith(t =&gt; Process());\r\n                return;\r\n            }\r\n        }\r\n        catch (Exception e)\r\n        {\r\n            tcs.SetException(e);\r\n            return;\r\n        }\r\n        tcs.SetResult();\r\n    };\r\n    Process();\r\n\r\n    return tcs.Task;\r\n}<\/code><\/pre>\n<p>Now this is getting interesting.  We&#8217;re given an enumerable of tasks that we can iterate through.  Each time we <code>MoveNext<\/code> to the next <code>Task<\/code> and get one, we then hook up a continuation to that <code>Task<\/code>; when that <code>Task<\/code> completes, it&#8217;ll just turn around and call right back to the same logic that does a <code>MoveNext<\/code>, gets the next <code>Task<\/code>, and so on.  This is building on the idea of <code>Task<\/code> as a single representation for any asynchronous operation, so the enumerable we&#8217;re fed can be a sequence of any asynchronous operations.  Where might such a sequence come from?  From an iterator, of course.  Remember our earlier <code>CopyStreamToStream<\/code> example and how gloriously horrible the APM-based implementation was?  Consider this instead:<\/p>\n<pre><code class=\"language-C#\">static Task CopyStreamToStreamAsync(Stream source, Stream destination)\r\n{\r\n    return IterateAsync(Impl(source, destination));\r\n\r\n    static IEnumerable&lt;Task&gt; Impl(Stream source, Stream destination)\r\n    {\r\n        var buffer = new byte[0x1000];\r\n        while (true)\r\n        {\r\n            Task&lt;int&gt; read = source.ReadAsync(buffer, 0, buffer.Length);\r\n            yield return read;\r\n            int numRead = read.Result;\r\n            if (numRead &lt;= 0)\r\n            {\r\n                break;\r\n            }\r\n\r\n            Task write = destination.WriteAsync(buffer, 0, numRead);\r\n            yield return write;\r\n            write.Wait();\r\n        }\r\n    }\r\n}<\/code><\/pre>\n<p>Wow, this is almost legible.  We&#8217;re calling that <code>IterateAsync<\/code> helper, and the enumerable we&#8217;re feeding it is one produced by an iterator that&#8217;s handling all the control flow for the copy.  It calls <code>Stream.ReadAsync<\/code> and then <code>yield return<\/code>s that <code>Task<\/code>; that yielded task is what will be handed off to <code>IterateAsync<\/code> after it calls <code>MoveNext<\/code>, and <code>IterateAsync<\/code> will hook a continuation up to that <code>Task<\/code>, which when it completes will then just call back into <code>MoveNext<\/code> and end up back in this iterator just after the <code>yield<\/code>.  At that point, the <code>Impl<\/code> logic gets the result of the method, calls <code>WriteAsync<\/code>, and again yields the <code>Task<\/code> it produced.  And so on.<\/p>\n<p>And that, my friends, is the beginning of <code>async<\/code>\/<code>await<\/code> in C# and .NET. Something around 95% of the logic in support of iterators and <code>async<\/code>\/<code>await<\/code> in the C# compiler is shared.  Different syntax, different types involved, but fundamentally the same transform. Squint at the <code>yield return<\/code>s, and you can almost see <code>await<\/code>s in their stead.<\/p>\n<p>In fact, some enterprising developers <a href=\"https:\/\/learn.microsoft.com\/archive\/msdn-magazine\/2008\/june\/concurrent-affairs-simplified-apm-with-the-asyncenumerator\"> used iterators in this fashion for asynchronous programming<\/a> before <code>async<\/code>\/<code>await<\/code> hit the scene. And a similar transformation was prototyped in the experimental <a href=\"https:\/\/en.wikipedia.org\/wiki\/Axum_(programming_language)\">Axum<\/a> programming language, serving as a key inspiration for C#&#8217;s async support. Axum provided an <code>async<\/code> keyword that could be put onto a method, just like <code>async<\/code> can now in C#. <code>Task<\/code> wasn&#8217;t yet ubiquitous, so inside of <code>async<\/code> methods, the Axum compiler heuristically matched synchronous method calls to their APM counterparts, e.g. if it saw you calling <code>stream.Read<\/code>, it would find and utilize the corresponding <code>stream.BeginRead<\/code> and <code>stream.EndRead<\/code> methods, synthesizing the appropriate delegate to pass to the Begin method, while also generating a complete APM implementation for the <code>async<\/code> method being defined such that it was compositional. It even integrated with <code>SynchronizationContext<\/code>! While Axum was eventually shelved, it served as an awesome and motivating prototype for what eventually became <code>async<\/code>\/<code>await<\/code> in C#.<\/p>\n<h2><code>async<\/code>\/<code>await<\/code> under the covers<\/h2>\n<p>Now that we know how we got here, let&#8217;s dive in to how it actually works.  For reference, here&#8217;s our example synchronous method again:<\/p>\n<pre><code class=\"language-C#\">public void CopyStreamToStream(Stream source, Stream destination)\r\n{\r\n    var buffer = new byte[0x1000];\r\n    int numRead;\r\n    while ((numRead = source.Read(buffer, 0, buffer.Length)) != 0)\r\n    {\r\n        destination.Write(buffer, 0, numRead);\r\n    }\r\n}<\/code><\/pre>\n<p>and again here&#8217;s what the corresponding method looks like with <code>async<\/code>\/<code>await<\/code>:<\/p>\n<pre><code class=\"language-C#\">public async Task CopyStreamToStreamAsync(Stream source, Stream destination)\r\n{\r\n    var buffer = new byte[0x1000];\r\n    int numRead;\r\n    while ((numRead = await source.ReadAsync(buffer, 0, buffer.Length)) != 0)\r\n    {\r\n        await destination.WriteAsync(buffer, 0, numRead);\r\n    }\r\n}<\/code><\/pre>\n<p>A breadth of fresh air in comparison to everything we&#8217;ve seen thus far. The signature changed from <code>void<\/code> to <code>async Task<\/code>, we call <code>ReadAsync<\/code> and <code>WriteAsync<\/code> instead of <code>Read<\/code> and <code>Write<\/code>, respectively, and both of those operations are prefixed with <code>await<\/code>.  That&#8217;s it.  The compiler and the core libraries take over the rest, fundamentally changing how the code is actually executed.  Let&#8217;s dive into how.<\/p>\n<h3>Compiler Transform<\/h3>\n<p>As we&#8217;ve already seen, as with iterators, the compiler rewrites the async method into one based on a state machine.  We still have a method with the same signature the developer wrote (<code>public Task CopyStreamToStreamAsync(Stream source, Stream destination)<\/code>), but the body of that method is completely different:<\/p>\n<pre><code class=\"language-C#\">[AsyncStateMachine(typeof(&lt;CopyStreamToStreamAsync&gt;d__0))]\r\npublic Task CopyStreamToStreamAsync(Stream source, Stream destination)\r\n{\r\n    &lt;CopyStreamToStreamAsync&gt;d__0 stateMachine = default;\r\n    stateMachine.&lt;&gt;t__builder = AsyncTaskMethodBuilder.Create();\r\n    stateMachine.source = source;\r\n    stateMachine.destination = destination;\r\n    stateMachine.&lt;&gt;1__state = -1;\r\n    stateMachine.&lt;&gt;t__builder.Start(ref stateMachine);\r\n    return stateMachine.&lt;&gt;t__builder.Task;\r\n}\r\n\r\nprivate struct &lt;CopyStreamToStreamAsync&gt;d__0 : IAsyncStateMachine\r\n{\r\n    public int &lt;&gt;1__state;\r\n    public AsyncTaskMethodBuilder &lt;&gt;t__builder;\r\n    public Stream source;\r\n    public Stream destination;\r\n    private byte[] &lt;buffer&gt;5__2;\r\n    private TaskAwaiter &lt;&gt;u__1;\r\n    private TaskAwaiter&lt;int&gt; &lt;&gt;u__2;\r\n\r\n    ...\r\n}<\/code><\/pre>\n<p>Note that the only signature difference from what the dev wrote is the lack of the <code>async<\/code> keyword itself.  <code>async<\/code> isn&#8217;t actually a part of the method signature; like <code>unsafe<\/code>, when you put it in the method signature, you&#8217;re expressing an implementation detail of the method rather than something that&#8217;s actually exposed as part of the contract.  Using <code>async<\/code>\/<code>await<\/code> to implement a <code>Task<\/code>-returning method is an implementation detail.<\/p>\n<p>The compiler has generated a struct named <code>&lt;CopyStreamToStreamAsync&gt;d__0<\/code>, and it&#8217;s zero-initialized an instance of that struct on the stack. Importantly, if the async method completes synchronously, this state machine will never have left the stack.  That means there&#8217;s no allocation associated with the state machine <em>unless<\/em> the method needs to complete asynchronously, meaning it <code>await<\/code>s something that&#8217;s not yet completed by that point.  More on that in a bit.<\/p>\n<p>This struct <em>is<\/em> the state machine for the method, containing not only all of the transformed logic from what the developer wrote, but also fields for tracking the current position in that method as well as all of the &#8220;local&#8221; state the compiler lifted out of the method that needs to survive between <code>MoveNext<\/code> invocations.  It&#8217;s the logical equivalent of the <code>IEnumerable&lt;T&gt;<\/code>\/<code>IEnumerator&lt;T&gt;<\/code> implementation we saw in the iterator. (Note that the code I&#8217;m showing is from a release build; in debug builds the C# compiler will actually generate these state machine types as classes, as doing so can aid in certain debugging exercises).<\/p>\n<p>After initializing the state machine, we see a call to <a href=\"https:\/\/github.com\/dotnet\/runtime\/blob\/6319039691477bf9296a0d62fd4a2491868966d8\/src\/libraries\/System.Private.CoreLib\/src\/System\/Runtime\/CompilerServices\/AsyncTaskMethodBuilder.cs#L25\"><code>AsyncTaskMethodBuilder.Create()<\/code><\/a>.  While we&#8217;re currently focused on <code>Task<\/code>s, the C# language and compiler allow for arbitrary types (<a href=\"https:\/\/learn.microsoft.com\/dotnet\/csharp\/language-reference\/proposals\/csharp-7.0\/task-types#builder-type\">&#8220;task-like&#8221; types<\/a>) to be returned from <code>async<\/code> methods, e.g. I can write a method <code>public async MyTask CopyStreamToStreamAsync<\/code>, and it would compile just fine as long as we augment the <code>MyTask<\/code> we defined earlier in an appropriate way. That appropriateness includes declaring an associated &#8220;builder&#8221; type and associating it with the type via the <code>AsyncMethodBuilder<\/code> attribute:<\/p>\n<pre><code class=\"language-C#\">[AsyncMethodBuilder(typeof(MyTaskMethodBuilder))]\r\npublic class MyTask\r\n{\r\n    ...\r\n}\r\n\r\npublic struct MyTaskMethodBuilder\r\n{\r\n    public static MyTaskMethodBuilder Create() { ... }\r\n\r\n    public void Start&lt;TStateMachine&gt;(ref TStateMachine stateMachine) where TStateMachine : IAsyncStateMachine { ... }\r\n    public void SetStateMachine(IAsyncStateMachine stateMachine) { ... }\r\n\r\n    public void SetResult() { ... }\r\n    public void SetException(Exception exception) { ... }\r\n\r\n    public void AwaitOnCompleted&lt;TAwaiter, TStateMachine&gt;(\r\n        ref TAwaiter awaiter, ref TStateMachine stateMachine)\r\n        where TAwaiter : INotifyCompletion\r\n        where TStateMachine : IAsyncStateMachine { ... }\r\n    public void AwaitUnsafeOnCompleted&lt;TAwaiter, TStateMachine&gt;(\r\n        ref TAwaiter awaiter, ref TStateMachine stateMachine)\r\n        where TAwaiter : ICriticalNotifyCompletion\r\n        where TStateMachine : IAsyncStateMachine { ... }\r\n\r\n    public MyTask Task { get { ... } }\r\n}<\/code><\/pre>\n<p>In this context, such a &#8220;builder&#8221; is something that knows how to create an instance of that type (the <code>Task<\/code> property), complete it either successfully and with a result if appropriate (<code>SetResult<\/code>) or with an exception (<code>SetException<\/code>), and handle hooking up continuations to <code>await<\/code>ed things that haven&#8217;t yet completed (<code>AwaitOnCompleted<\/code>\/<code>AwaitUnsafeOnCompleted<\/code>).  In the case of <code>System.Threading.Tasks.Task<\/code>, it is by default associated with the <code>AsyncTaskMethodBuilder<\/code>.  Normally that association is provided via an <code>[AsyncMethodBuilder(...)]<\/code> attribute applied to the type, but <code>Task<\/code> is known specially to C# and so isn&#8217;t actually adorned with that attribute.  As such, the compiler has reached for the builder to use for this <code>async<\/code> method, and is constructing an instance of it using the <code>Create<\/code> method that&#8217;s part of the pattern.  Note that as with the state machine, <code>AsyncTaskMethodBuilder<\/code> is also a struct, so there&#8217;s no allocation here, either.<\/p>\n<p>The state machine is then populated with the arguments to this entry point method.  Those parameters need to be available to the body of the method that&#8217;s been moved into <code>MoveNext<\/code>, and as such these arguments need to be stored in the state machine so that they can be referenced by the code on the subsequent call to <code>MoveNext<\/code>.  The state machine is also initialized to be in the initial <code>-1<\/code> state.  If <code>MoveNext<\/code> is called and the state is <code>-1<\/code>, we&#8217;ll end up starting logically at the beginning of the method.<\/p>\n<p>Now the most unassuming but most consequential line: a call to the builder&#8217;s <a href=\"https:\/\/github.com\/dotnet\/runtime\/blob\/6319039691477bf9296a0d62fd4a2491868966d8\/src\/libraries\/System.Private.CoreLib\/src\/System\/Runtime\/CompilerServices\/AsyncTaskMethodBuilder.cs#L32-L33\"><code>Start<\/code><\/a> method.  This is another part of the pattern that must be exposed on a type used in the return position of an <code>async<\/code> method, and it&#8217;s used to perform the initial <code>MoveNext<\/code> on the state machine.  The builder&#8217;s Start method is effectively just this:<\/p>\n<pre><code class=\"language-C#\">public void Start&lt;TStateMachine&gt;(ref TStateMachine stateMachine) where TStateMachine : IAsyncStateMachine\r\n{\r\n    stateMachine.MoveNext();\r\n}<\/code><\/pre>\n<p>such that calling <code>stateMachine.&lt;&gt;t__builder.Start(ref stateMachine);<\/code> is really just calling <code>stateMachine.MoveNext()<\/code>.  In which case, why doesn&#8217;t the compiler just emit that directly? Why have <code>Start<\/code> at all?  The answer is that there&#8217;s a tad bit more to <code>Start<\/code> than I let on.  But for that, we need to take a brief detour into understanding <code>ExecutionContext<\/code>.<\/p>\n<h4>ExecutionContext<\/h4>\n<p>We&#8217;re all familiar with passing around state from method to method.  You call a method, and if that method specifies parameters, you call the method with arguments in order to feed that data into the callee.  This is explicitly passing around data.  But there are other more implicit means.  For example, rather than passing data as arguments, a method could be parameterless but could dictate that some specific static fields may be populated prior to making the method call, and the method will pull state from there.  Nothing about the method&#8217;s signature indicates it takes arguments, because it doesn&#8217;t: there&#8217;s just an implicit contract between the caller and callee that the caller might populate some memory locations and the callee might read those memory locations.  The callee and the caller may not even realize it&#8217;s happening if they&#8217;re intermediaries, e.g. method <code>A<\/code> might populate the statics and then call <code>B<\/code> which calls <code>C<\/code> which calls <code>D<\/code> which eventually calls <code>E<\/code> that reads the values of those statics. This is often referred to as &#8220;ambient&#8221; data: it&#8217;s not passed to you via parameters but rather is just sort of hanging out there and available for you to consume if desired.<\/p>\n<p>We can take this a step further, and use thread-local state. Thread-local state, which in .NET is achieved via static fields attributed as <code>[ThreadStatic]<\/code> or via the <code>ThreadLocal&lt;T&gt;<\/code> type, can be used in the same way, but with the data limited to just the current thread of execution, with every thread able to have its own isolated copy of those fields.  With that, you could populate the thread static, make the method call, and then upon the method&#8217;s completion revert the changes to the thread static, enabling a fully isolated form of such implicitly passed data.<\/p>\n<p>But, what about asynchrony? If we make an asynchronous method call and logic inside that asynchronous method wants to access that ambient data, how would it do so?  If the data were stored in regular statics, the asynchronous method would be able to access it, but you could only ever have one such method in flight at a time, as multiple callers could end up overwriting each others&#8217; state when they write to those shared static fields.  If the data were stored in thread statics, the asynchronous method would be able to access it, but only up until the point where it stopped running synchronously on the calling thread; if it hooked up a continuation to some operation it initiated and that continuation ended up running on some other thread, it would no longer have access to the thread static information.  Even if it did happen to run on the same thread, either by chance or because the scheduler forced it to, by the time it did it&#8217;s likely the data would have been removed and\/or overwritten by some other operation initiated by that thread.  For asynchrony, what we need is a mechanism that would allow arbitrary ambient data to flow across these asynchronous points, such that throughout an async method&#8217;s logic, wherever and whenever that logic might run, it would have access to that same data.<\/p>\n<p>Enter <code>ExecutionContext<\/code>.  The <code>ExecutionContext<\/code> type is the vehicle by which ambient data flows from async operation to async operation.  It lives in a <code>[ThreadStatic]<\/code>, but then when some asynchronous operation is initiated, it&#8217;s &#8220;captured&#8221; (a fancy way of saying &#8220;read a copy from that thread static&#8221;), stored, and then when the continuation of that asynchronous operation is run, the <code>ExecutionContext<\/code> is first restored to live in the <code>[ThreadStatic]<\/code> on the thread which is about to run the operation.  <code>ExecutionContext<\/code> is the mechanism by which <code>AsyncLocal&lt;T&gt;<\/code> is implemented (in fact, in .NET Core, <code>ExecutionContext<\/code> is entirely about <code>AsyncLocal&lt;T&gt;<\/code>, nothing more), such that if you store a value into an <code>AsyncLocal&lt;T&gt;<\/code>, and then for example queue a work item to run on the <code>ThreadPool<\/code>, that value will be visible in that <code>AsyncLocal&lt;T&gt;<\/code> inside of that work item running on the pool:<\/p>\n<pre><code class=\"language-C#\">var number = new AsyncLocal&lt;int&gt;();\r\n\r\nnumber.Value = 42;\r\nThreadPool.QueueUserWorkItem(_ =&gt; Console.WriteLine(number.Value));\r\nnumber.Value = 0;\r\n\r\nConsole.ReadLine();<\/code><\/pre>\n<p>That will print <code>42<\/code> every time this is run.  It doesn&#8217;t matter that the moment after we queue the delegate we reset the value of the <code>AsyncLocal&lt;int&gt;<\/code> back to 0, because the <code>ExecutionContext<\/code> was captured as part of the <code>QueueUserWorkItem<\/code> call, and that capture included the state of the <code>AsyncLocal&lt;int&gt;<\/code> at that exact moment.  We can see this in more detail by implementing our own simple thread pool:<\/p>\n<pre><code class=\"language-C#\">using System.Collections.Concurrent;\r\n\r\nvar number = new AsyncLocal&lt;int&gt;();\r\n\r\nnumber.Value = 42;\r\nMyThreadPool.QueueUserWorkItem(() =&gt; Console.WriteLine(number.Value));\r\nnumber.Value = 0;\r\n\r\nConsole.ReadLine();\r\n\r\nclass MyThreadPool\r\n{\r\n    private static readonly BlockingCollection&lt;(Action, ExecutionContext?)&gt; s_workItems = new();\r\n\r\n    public static void QueueUserWorkItem(Action workItem)\r\n    {\r\n        s_workItems.Add((workItem, ExecutionContext.Capture()));\r\n    }\r\n\r\n    static MyThreadPool()\r\n    {\r\n        for (int i = 0; i &lt; Environment.ProcessorCount; i++)\r\n        {\r\n            new Thread(() =&gt;\r\n            {\r\n                while (true)\r\n                {\r\n                    (Action action, ExecutionContext? ec) = s_workItems.Take();\r\n                    if (ec is null)\r\n                    {\r\n                        action();\r\n                    }\r\n                    else\r\n                    {\r\n                        ExecutionContext.Run(ec, s =&gt; ((Action)s!)(), action);\r\n                    }\r\n                }\r\n            })\r\n            { IsBackground = true }.UnsafeStart();\r\n        }\r\n    }\r\n}<\/code><\/pre>\n<p>Here <code>MyThreadPool<\/code> has a <code>BlockingCollection&lt;(Action, ExecutionContext?)&gt;<\/code> that represents its work item queue, with each work item being the delegate for the work to be invoked as well as the <code>ExecutionContext<\/code> associated with that work.  The static constructor for the pool spins up a bunch of threads, each of which just sits in an infinite loop taking the next work item and running it.  If no <code>ExecutionContext<\/code> was captured for a given delegate, the delegate is just invoked directly.  But if an <code>ExecutionContext<\/code> was captured, rather than invoking the delegate directly, we call the <code>ExecutionContext.Run<\/code> method, which will restore the supplied <code>ExecutionContext<\/code> as the current context prior to running the delegate, and will then reset the context afterwards.  This example includes the exact same code with an <code>AsyncLocal&lt;int&gt;<\/code> previously shown, except this time using <code>MyThreadPool<\/code> instead of <code>ThreadPool<\/code>, yet it will still output <code>42<\/code> each time, because the pool is properly flowing <code>ExecutionContext<\/code>.<\/p>\n<p>As an aside, you&#8217;ll note I called <code>UnsafeStart<\/code> in <code>MyThreadPool<\/code>&#8216;s static constructor. Starting a new thread is exactly the kind of asynchronous point that should flow <code>ExecutionContext<\/code>, and indeed, <code>Thread<\/code>&#8216;s <code>Start<\/code> method uses <code>ExecutionContext.Capture<\/code> to capture the current context, store it on the <code>Thread<\/code>, and then use that captured context when eventually invoking the <code>Thread<\/code>&#8216;s <code>ThreadStart<\/code> delegate.  I didn&#8217;t want to do that in this example, though, as I didn&#8217;t want the <code>Thread<\/code>s to capture whatever <code>ExecutionContext<\/code> happened to be present when the static constructor ran (doing so could make a demo about <code>ExecutionContext<\/code> more convoluted), so I used the <code>UnsafeStart<\/code> method instead.  Threading-related methods that begin with <code>Unsafe<\/code> behave exactly the same as the corresponding method that lacks the <code>Unsafe<\/code> prefix except that they <em>don&#8217;t<\/em> capture <code>ExecutionContext<\/code>, e.g. <code>Thread.Start<\/code> and <code>Thread.UnsafeStart<\/code> do identical work, but whereas <code>Start<\/code> captures <code>ExecutionContext<\/code>, <code>UnsafeStart<\/code> does not.<\/p>\n<h4>Back To Start<\/h4>\n<p>We took a detour into discussing <code>ExecutionContext<\/code> when I was writing about the implementation of <code>AsyncTaskMethodBuilder.Start<\/code>, which I said was effectively:<\/p>\n<pre><code class=\"language-C#\">public void Start&lt;TStateMachine&gt;(ref TStateMachine stateMachine) where TStateMachine : IAsyncStateMachine\r\n{\r\n    stateMachine.MoveNext();\r\n}<\/code><\/pre>\n<p>and then suggested I simplified a bit.  That simplification was ignoring the fact that the method actually needs to factor <code>ExecutionContext<\/code> into things, and is thus more like this:<\/p>\n<pre><code class=\"language-C#\">public void Start&lt;TStateMachine&gt;(ref TStateMachine stateMachine) where TStateMachine : IAsyncStateMachine\r\n{\r\n    ExecutionContext previous = Thread.CurrentThread._executionContext; \/\/ [ThreadStatic] field\r\n    try\r\n    {\r\n        stateMachine.MoveNext();\r\n    }\r\n    finally\r\n    {\r\n        ExecutionContext.Restore(previous); \/\/ internal helper\r\n    }\r\n}<\/code><\/pre>\n<p>Rather than just calling <code>stateMachine.MoveNext()<\/code> as I&#8217;d previously suggested we did, we do a dance here of getting the current <code>ExecutionContext<\/code>, then invoking <code>MoveNext<\/code>, and then upon its completion resetting the current context back to what it was prior to the <code>MoveNext<\/code> invocation.<\/p>\n<p>The reason for this is to prevent ambient data leakage from an async method out to its caller.  An example method demonstrates why that matters:<\/p>\n<pre><code class=\"language-C#\">async Task ElevateAsAdminAndRunAsync()\r\n{\r\n    using (WindowsIdentity identity = LoginAdmin())\r\n    {\r\n        using (WindowsImpersonationContext impersonatedUser = identity.Impersonate())\r\n        {\r\n            await DoSensitiveWorkAsync();\r\n        }\r\n    }\r\n}<\/code><\/pre>\n<p>&#8220;Impersonation&#8221; is the act of changing ambient information about the current user to instead be that of someone else; this lets code act on behalf of someone else, using their privileges and access. In .NET, such impersonation flows across asynchronous operations, which means it&#8217;s part of <code>ExecutionContext<\/code>.  Now imagine if <code>Start<\/code> didn&#8217;t restore the previous context, and consider this code:<\/p>\n<pre><code class=\"language-C#\">Task t = ElevateAsAdminAndRunAsync();\r\nPrintUser();\r\nawait t;<\/code><\/pre>\n<p>This code could find that the <code>ExecutionContext<\/code> modified inside of <code>ElevateAsAdminAndRunAsync<\/code> remains after <code>ElevateAsAdminAndRunAsync<\/code> returns to its synchronous caller (which happens the first time the method <code>await<\/code>s something that&#8217;s not yet complete).  That&#8217;s because after calling <code>Impersonate<\/code>, we call <code>DoSensitiveWorkAsync<\/code> and <code>await<\/code> the task it returns.  Assuming that task isn&#8217;t complete, it will cause the invocation of <code>ElevateAsAdminAndRunAsync<\/code> to yield and return to the caller, with the impersonation still in effect on the current thread. That is not something we want.  As such, <code>Start<\/code> erects this guard that ensures any modifications to <code>ExecutionContext<\/code> don&#8217;t flow <em>out<\/em> of the synchronous method call and only flow along with any subsequent work performed by the method.<\/p>\n<h4>MoveNext<\/h4>\n<p>So, the entry point method was invoked, the state machine struct was initialized, <code>Start<\/code> was called, and that invoked <code>MoveNext<\/code>.  What is <code>MoveNext<\/code>?  It&#8217;s the method that contains all of the original logic from the dev&#8217;s method, but with a whole bunch of changes.  Let&#8217;s start just by looking at the scaffolding of the method. Here&#8217;s a decompiled version of what the compiler emit for our method, but with everything inside of the generated <code>try<\/code> block removed:<\/p>\n<pre><code class=\"language-C#\">private void MoveNext()\r\n{\r\n    try\r\n    {\r\n        ... \/\/ all of the code from the CopyStreamToStreamAsync method body, but not exactly as it was written\r\n    }\r\n    catch (Exception exception)\r\n    {\r\n        &lt;&gt;1__state = -2;\r\n        &lt;buffer&gt;5__2 = null;\r\n        &lt;&gt;t__builder.SetException(exception);\r\n        return;\r\n    }\r\n\r\n    &lt;&gt;1__state = -2;\r\n    &lt;buffer&gt;5__2 = null;\r\n    &lt;&gt;t__builder.SetResult();\r\n}<\/code><\/pre>\n<p>Whatever other work is performed by <code>MoveNext<\/code>, it has the responsibility of completing the <code>Task<\/code> returned from the <code>async Task<\/code> method when all of the work is done.  If the body of the <code>try<\/code> block throws an exception that goes unhandled, then the task will be faulted with that exception.  And if the async method successfully reaches its end (equivalent to a synchronous method returning), it will complete the returned task successfully.  In either of those cases, it&#8217;s setting the state of the state machine to indicate completion. (I sometimes hear developers theorize that, when it comes to exceptions, there&#8217;s a difference between those thrown before the first <code>await<\/code> and after&#8230; based on the above, it should be clear that is <em>not<\/em> the case.  Any exception that goes unhandled inside of an <code>async<\/code> method, no matter where it is in the method and no matter whether the method has yielded, will end up in the above <code>catch<\/code> block, with the caught exception then stored into the <code>Task<\/code> that&#8217;s returned from the <code>async<\/code> method.)<\/p>\n<p>Also note that this completion is going through the builder, using its <code>SetException<\/code> and <code>SetResult<\/code> methods that are part of the pattern for a builder expected by the compiler.  If the async method has previously suspended, the builder will have already had to manufacture a <code>Task<\/code> as part of that suspension handling (we&#8217;ll see how and where soon), in which case calling <code>SetException<\/code>\/<code>SetResult<\/code> will complete that <code>Task<\/code>.  If, however, the async method hasn&#8217;t previously suspended, then we haven&#8217;t yet created a <code>Task<\/code> or returned anything to the caller, so the builder has more flexibility in how it produces that <code>Task<\/code>.  If you remember previously in the entry point method, the very last thing it does is return the <code>Task<\/code> to the caller, which it does by returning the result of accessing the builder&#8217;s <code>Task<\/code> property (so many things called &#8220;Task&#8221;, I know):<\/p>\n<pre><code class=\"language-C#\">public Task CopyStreamToStreamAsync(Stream source, Stream destination)\r\n{\r\n    ...\r\n    return stateMachine.&lt;&gt;t__builder.Task;\r\n}<\/code><\/pre>\n<p>The builder knows if the method ever suspended, in which case it has a <code>Task<\/code> that was already created and just returns that.  If the method never suspended and the builder doesn&#8217;t yet have a task, it can manufacture a completed task here. In this case, with a successful completion, it can just use <code>Task.CompletedTask<\/code> rather than allocating a new task, avoiding any allocation. In the case of a generic <code>Task&lt;TResult&gt;<\/code>, the builder can just use <code>Task.FromResult&lt;TResult&gt;(TResult result)<\/code>.<\/p>\n<p>The builder can also do whatever translations it deems are appropriate to the kind of object it&#8217;s creating.  For example, <code>Task<\/code> actually has three possible final states: success, failure, and canceled. The <code>AsyncTaskMethodBuilder<\/code>&#8216;s <code>SetException<\/code> method <a href=\"https:\/\/github.com\/dotnet\/runtime\/blob\/3e73be1b8082840545dbf85867cc4f9023e9b1aa\/src\/libraries\/System.Private.CoreLib\/src\/System\/Runtime\/CompilerServices\/AsyncTaskMethodBuilderT.cs#L461-L486\">special-cases <code>OperationCanceledException<\/code><\/a>, transitioning the <code>Task<\/code> into a <code>TaskStatus.Canceled<\/code> final state if the exception provided is or derives from <code>OperationCanceledException<\/code>; otherwise, the task ends as <code>TaskStatus.Faulted<\/code>.  Such a distinction often isn&#8217;t apparent in consuming code; since the exception is stored into the <code>Task<\/code> regardless of whether it&#8217;s marked as <code>Canceled<\/code> or <code>Faulted<\/code>, code <code>await<\/code>&#8216;ing that <code>Task<\/code> will not be able to observe the difference between the states (the original exception will be propagated in either case)&#8230; it only affects code that interacts with the <code>Task<\/code> directly, such as via <code>ContinueWith<\/code>, which has overloads that enable a continuation to be invoked only for a subset of completion statuses.<\/p>\n<p>Now that we understand the lifecycle aspects, here&#8217;s everything filled in inside the <code>try<\/code> block in <code>MoveNext<\/code>:<\/p>\n<pre><code class=\"language-C#\">private void MoveNext()\r\n{\r\n    try\r\n    {\r\n        int num = &lt;&gt;1__state;\r\n\r\n        TaskAwaiter&lt;int&gt; awaiter;\r\n        if (num != 0)\r\n        {\r\n            if (num != 1)\r\n            {\r\n                &lt;buffer&gt;5__2 = new byte[4096];\r\n                goto IL_008b;\r\n            }\r\n\r\n            awaiter = &lt;&gt;u__2;\r\n            &lt;&gt;u__2 = default(TaskAwaiter&lt;int&gt;);\r\n            num = (&lt;&gt;1__state = -1);\r\n            goto IL_00f0;\r\n        }\r\n\r\n        TaskAwaiter awaiter2 = &lt;&gt;u__1;\r\n        &lt;&gt;u__1 = default(TaskAwaiter);\r\n        num = (&lt;&gt;1__state = -1);\r\n        IL_0084:\r\n        awaiter2.GetResult();\r\n\r\n        IL_008b:\r\n        awaiter = source.ReadAsync(&lt;buffer&gt;5__2, 0, &lt;buffer&gt;5__2.Length).GetAwaiter();\r\n        if (!awaiter.IsCompleted)\r\n        {\r\n            num = (&lt;&gt;1__state = 1);\r\n            &lt;&gt;u__2 = awaiter;\r\n            &lt;&gt;t__builder.AwaitUnsafeOnCompleted(ref awaiter, ref this);\r\n            return;\r\n        }\r\n        IL_00f0:\r\n        int result;\r\n        if ((result = awaiter.GetResult()) != 0)\r\n        {\r\n            awaiter2 = destination.WriteAsync(&lt;buffer&gt;5__2, 0, result).GetAwaiter();\r\n            if (!awaiter2.IsCompleted)\r\n            {\r\n                num = (&lt;&gt;1__state = 0);\r\n                &lt;&gt;u__1 = awaiter2;\r\n                &lt;&gt;t__builder.AwaitUnsafeOnCompleted(ref awaiter2, ref this);\r\n                return;\r\n            }\r\n            goto IL_0084;\r\n        }\r\n    }\r\n    catch (Exception exception)\r\n    {\r\n        &lt;&gt;1__state = -2;\r\n        &lt;buffer&gt;5__2 = null;\r\n        &lt;&gt;t__builder.SetException(exception);\r\n        return;\r\n    }\r\n\r\n    &lt;&gt;1__state = -2;\r\n    &lt;buffer&gt;5__2 = null;\r\n    &lt;&gt;t__builder.SetResult();\r\n}<\/code><\/pre>\n<p>This kind of complication might feel a tad familiar.  Remember how convoluted our manually-implemented <code>BeginCopyStreamToStream<\/code> based on APM was?  This isn&#8217;t quite as complicated, but is also way better in that the compiler is doing the work for us, having rewritten the method in a form of continuation passing while ensuring that all necessary state is preserved for those continuations.  Even so, we can squint and follow along.  Remember that the state was initialized to -1 in the entry point.  We then enter <code>MoveNext<\/code>, find that this state (which is now stored in the <code>num<\/code> local) is neither 0 nor 1, and thus execute the code that creates the temporary buffer and then branches to label IL_008b, where it makes the call to <code>stream.ReadAsync<\/code>.  Note that at this point we&#8217;re still running synchronously from this call to <code>MoveNext<\/code>, and thus synchronously from <code>Start<\/code>, and thus synchronously from the entry point, meaning the developer&#8217;s code called <code>CopyStreamToStreamAsync<\/code> and it&#8217;s still synchronously executing, having not yet returned back a <code>Task<\/code> to represent the eventual completion of this method. That might be about to change&#8230;<\/p>\n<p>We call <code>Stream.ReadAsync<\/code> and we get back a <code>Task&lt;int&gt;<\/code> from it.  The read may have completed synchronously, it may have completed asynchronously but so fast that it&#8217;s now already completed, or it might not have completed yet.  Regardless, we have a <code>Task&lt;int&gt;<\/code> that represents its eventual completion, and the compiler emits code that inspects that <code>Task&lt;int&gt;<\/code> to determine how to proceed: if the <code>Task&lt;int&gt;<\/code> has in fact already completed (doesn&#8217;t matter whether it was completed synchronously or just by the time we checked), then the code for this method can just continue running synchronously&#8230; no point in spending unnecessary overhead queueing a work item to handle the remainder of the method&#8217;s execution when we can instead just keep running here and now.  But to handle the case where the <code>Task&lt;int&gt;<\/code> hasn&#8217;t completed, the compiler needs to emit code to hook up a continuation to the <code>Task<\/code>.  It thus needs to emit code that asks the <code>Task<\/code> &#8220;are you done?&#8221;  Does it talk to the <code>Task<\/code> directly to ask that?<\/p>\n<p>It would be limiting if the only thing you could <code>await<\/code> in C# was a <code>System.Threading.Tasks.Task<\/code>.  Similarly, it would be limiting if the C# compiler had to know about every possible type that could be <code>await<\/code>ed.  Instead, C# does what it typically does in cases like this: it employs a pattern of APIs. Code can <code>await<\/code> anything that exposes that appropriate pattern, the &#8220;awaiter&#8221; pattern (just as you can <code>foreach<\/code> anything that provides the proper &#8220;enumerable&#8221; pattern).  For example, we can augment the <code>MyTask<\/code> type we wrote earlier to implement the awaiter pattern:<\/p>\n<pre><code class=\"language-C#\">class MyTask\r\n{\r\n    ...\r\n    public MyTaskAwaiter GetAwaiter() =&gt; new MyTaskAwaiter { _task = this };\r\n\r\n    public struct MyTaskAwaiter : ICriticalNotifyCompletion\r\n    {\r\n        internal MyTask _task;\r\n\r\n        public bool IsCompleted =&gt; _task._completed;\r\n        public void OnCompleted(Action continuation) =&gt; _task.ContinueWith(_ =&gt; continuation());\r\n        public void UnsafeOnCompleted(Action continuation) =&gt; _task.ContinueWith(_ =&gt; continuation());\r\n        public void GetResult() =&gt; _task.Wait();\r\n    }\r\n}<\/code><\/pre>\n<p>A type can be awaited if it exposes a <code>GetAwaiter()<\/code> method, which <code>Task<\/code> does.  That method needs to return something that in turn exposes several members, including an <code>IsCompleted<\/code> property, which is used to check at the moment <code>IsCompleted<\/code> is called whether the operation has already completed.  And you can see that happening: at IL_008b, the <code>Task<\/code> returned from <code>ReadAsync<\/code> has <code>GetAwaiter<\/code> called on it, and then <code>IsCompleted<\/code> accessed on that struct awaiter instance.  If <code>IsCompleted<\/code> returns <code>true<\/code>, then we&#8217;ll end up falling through to IL_00f0, where the code calls another member of the awaiter: <code>GetResult()<\/code>.  If the operation failed, <code>GetResult()<\/code> is responsible for throwing an exception in order to propagate it out of the <code>await<\/code> in the async method; otherwise, <code>GetResult()<\/code> is responsible for returning the result of the operation, if there is one.  In the case here of the <code>ReadAsync<\/code>, if that result is 0, then we break out of our read\/write loop, go to the end of the method where it calls <code>SetResult<\/code>, and we&#8217;re done.<\/p>\n<p>Backing up a moment, though, the really interesting part of all of this is what happens if that <code>IsCompleted<\/code> check actually returns <code>false<\/code>.  If it returns <code>true<\/code>, we just keep on processing the loop, akin to in the APM pattern when <code>CompletedSynchronously<\/code> returned true and the caller of the Begin method, rather than the callback, was responsible for continuing execution. But if <code>IsCompleted<\/code> returns false, we need to suspend the execution of the async method until the <code>await<\/code>&#8216;d operation completes.  That means returning out of <code>MoveNext<\/code>, and as this was part of <code>Start<\/code> and we&#8217;re still in the entry point method, that means returning the <code>Task<\/code> out to the caller.  But before any of that can happen, we need to hook up a continuation to the <code>Task<\/code> being awaited (noting that to avoid stack dives as in the APM case, if the asynchronous operation completes after <code>IsCompleted<\/code> returns false but before we get to hook up the continuation, the continuation still needs to be invoked asynchronously from the calling thread, and thus it&#8217;ll get queued).  Since we can <code>await<\/code> anything, we can&#8217;t just talk to the <code>Task<\/code> instance directly; instead, we need to go through some pattern-based method to perform this.<\/p>\n<p>Does that mean there&#8217;s a method on the awaiter that will hook up the continuation?  That would make sense; after all, <code>Task<\/code> itself supports continuations, has a <code>ContinueWith<\/code> method, etc&#8230; shouldn&#8217;t it be the <code>TaskAwaiter<\/code> returned from <code>GetAwaiter<\/code> that exposes the method that lets us set up a continuation?  It does, in fact.  The awaiter pattern requires that the awaiter implement the <code>INotifyCompletion<\/code> interface, which contains a single method <code>void OnCompleted(Action continuation)<\/code>.  An awaiter can also optionally implement the <code>ICriticalNotifyCompletion<\/code> interface, which inherits <code>INotifyCompletion<\/code> and adds a <code>void UnsafeOnCompleted(Action continuation)<\/code> method.  Per our previous discussion of <code>ExecutionContext<\/code>, you can guess what the difference between these two methods is: both hook up the continuation, but whereas <code>OnCompleted<\/code> should flow <code>ExecutionContext<\/code>, <code>UnsafeOnCompleted<\/code> needn&#8217;t. The need for two distinct methods here, <code>INotifyCompletion.OnCompleted<\/code> and <code>ICriticalNotifyCompletion.UnsafeOnCompleted<\/code>, is largely historical, having to do with Code Access Security, or CAS.  CAS no longer exists in .NET Core, and it&#8217;s off by default in .NET Framework, having teeth only if you opt back in to the legacy partial trust feature.  When partial trust is used, CAS information flows as part of <code>ExecutionContext<\/code>, and thus not flowing it is &#8220;unsafe&#8221;, hence why methods that don&#8217;t flow <code>ExecutionContext<\/code> were prefixed with &#8220;Unsafe&#8221;.  Such methods were also attributed as <code>[SecurityCritical]<\/code>, and partially trusted code can&#8217;t call a <code>[SecurityCritical]<\/code> method.  As a result, two variants of <code>OnCompleted<\/code> were created, with the compiler preferring to use <code>UnsafeOnCompleted<\/code> if provided, but with the <code>OnCompleted<\/code> variant always provided on its own in case an awaiter needed to support partial trust.  From an async method perspective, however, the builder always flows <code>ExecutionContext<\/code> across await points, so an awaiter that also does so is unnecessary and duplicative work.<\/p>\n<p>Ok, so the awaiter does expose a method to hook up the continuation.  The compiler <em>could<\/em> use it directly, except for a very critical piece of the puzzle: what exactly should the continuation be?  And more to the point, with what object should it be associated? Remember that the state machine struct is on the stack, and the <code>MoveNext<\/code> invocation we&#8217;re currently running in is a method call on that instance. We need to preserve the state machine so that upon resumption we have all the correct state, which means the state machine can&#8217;t just keep living on the stack; it needs to be copied to somewhere on the heap, since the stack is going to end up being used for other subsequent, unrelated work performed by this thread.  And then the continuation needs to invoke the <code>MoveNext<\/code> method on that copy of the state machine on the heap.<\/p>\n<p>Moreover, <code>ExecutionContext<\/code> is relevant here as well.  The state machine needs to ensure that any ambient data stored in the <code>ExecutionContext<\/code> is captured at the point of suspension and then applied at the point of resumption, which means the continuation also needs to incorporate that <code>ExecutionContext<\/code>.  So, just creating a delegate that points to <code>MoveNext<\/code> on the state machine is insufficient.  It&#8217;s also undesirable overhead.  If when we suspend we create a delegate that points to <code>MoveNext<\/code> on the state machine, each time we do so we&#8217;ll be boxing the state machine struct (even when it&#8217;s already on the heap as part of some other object) and allocating an additional delegate (the delegate&#8217;s <code>this<\/code> object reference will be to a newly boxed copy of the struct).  We thus need to do a complicated dance whereby we ensure we only promote the struct from the stack to the heap the first time the method suspends execution but all other times uses the same heap object as the target of the <code>MoveNext<\/code>, and in the process ensures we&#8217;ve captured the right context, and upon resumption ensures we&#8217;re using that captured context to invoke the operation.<\/p>\n<p>That&#8217;s a lot more logic than we want the compiler to emit&#8230; we instead want it encapsulated in a helper, for several reasons. First, it&#8217;s a lot of complicated code to be emitted into each user&#8217;s assembly. Second, we want to allow customization of that logic as part of implementing the builder pattern (we&#8217;ll see an example of why later when talking about pooling).  And third, we want to be able to evolve and improve that logic and have existing previously-compiled binaries just get better.  That&#8217;s not a hypothetical; the library code for this support was completely overhauled in .NET Core 2.1, such that the operation is much more efficient than it was on .NET Framework.  We&#8217;ll start by exploring exactly how this worked on .NET Framework, and then look at what happens now in .NET Core.<\/p>\n<p>You can see in the code generated by the C# compiler happens when we need to suspend:<\/p>\n<pre><code class=\"language-C#\">if (!awaiter.IsCompleted) \/\/ we need to suspend when IsCompleted is false\r\n{\r\n    &lt;&gt;1__state = 1;\r\n    &lt;&gt;u__2 = awaiter;\r\n    &lt;&gt;t__builder.AwaitUnsafeOnCompleted(ref awaiter, ref this);\r\n    return;\r\n}<\/code><\/pre>\n<p>We&#8217;re storing into the state field the state id that indicates the location we should jump to when the method resumes.  We&#8217;re then persisting the awaiter itself into a field, so that it can be used to call <code>GetResult<\/code> after resumption.  And then just before returning out of the <code>MoveNext<\/code> call, the very last thing we do is call <code>&lt;&gt;t__builder.AwaitUnsafeOnCompleted(ref awaiter, ref this)<\/code>, asking the builder to hook up a continuation to the awaiter for this state machine. (Note that it calls the builder&#8217;s <code>AwaitUnsafeOnCompleted<\/code> rather than the builder&#8217;s <code>AwaitOnCompleted<\/code> because the awaiter implements <code>ICriticalNotifyCompletion<\/code>; the state machine handles flowing <code>ExecutionContext<\/code> so we needn&#8217;t require the awaiter to as well&#8230; as mentioned earlier, doing so would just be duplicative and unnecessary overhead.)<\/p>\n<p>The implementation of that <code>AwaitUnsafeOnCompleted<\/code> method is too complicated to copy here, so I&#8217;ll summarize <a href=\"https:\/\/referencesource.microsoft.com\/#mscorlib\/system\/runtime\/compilerservices\/AsyncMethodBuilder.cs,535\">what it does<\/a> on .NET Framework:<\/p>\n<ol>\n<li>\n<p>It uses <code>ExecutionContext.Capture()<\/code> to grab the current context.<\/p>\n<\/li>\n<li>\n<p>It then allocates a <code>MoveNextRunner<\/code> object to wrap both the captured context as well as the boxed state machine (which we don&#8217;t yet have if this is the first time the method suspends, so we just use <code>null<\/code> as a placeholder).<\/p>\n<\/li>\n<li>\n<p>It then creates an <code>Action<\/code> delegate to a <code>Run<\/code> method on that <code>MoveNextRunner<\/code>; this is how it&#8217;s able to get a delegate that will invoke the state machine&#8217;s <code>MoveNext<\/code> in the context of the captured <code>ExecutionContext<\/code>.<\/p>\n<\/li>\n<li>\n<p>If this is the first time the method is suspending, we won&#8217;t yet have a boxed state machine, so at this point it boxes it, creating a copy on the heap by storing the instance into a local typed as the <code>IAsyncStateMachine<\/code> interface.  That box is then stored into the <code>MoveNextRunner<\/code> that was allocated.<\/p>\n<\/li>\n<li>\n<p>Now comes a somewhat mind-bending step.  If you look back at the definition of the state machine struct, it contains the builder, <code>public AsyncTaskMethodBuilder &lt;&gt;t__builder;<\/code>, and if you look at the definition of the builder, it contains <code>internal IAsyncStateMachine m_stateMachine;<\/code>.  The builder needs to reference the boxed state machine so that on subsequent suspensions it can see it&#8217;s already boxed the state machine and doesn&#8217;t need to do so again.  But we just boxed the state machine, and that state machine contained a builder whose <code>m_stateMachine<\/code> field is null.  We need to mutate that boxed state machine&#8217;s builder&#8217;s <code>m_stateMachine<\/code> to point to its parent box.  To achieve that, the <code>IAsyncStateMachine<\/code> interface that the compiler-generated state machine struct implements includes a <code>void SetStateMachine(IAsyncStateMachine stateMachine);<\/code> method, and that state machine struct includes an implementation of that interface method:<\/p>\n<pre><code class=\"language-C#\">private void SetStateMachine(IAsyncStateMachine stateMachine) =&gt;\r\n    &lt;&gt;t__builder.SetStateMachine(stateMachine);<\/code><\/pre>\n<p>So the builder boxes the state machine, and then passes that box to the box&#8217;s <code>SetStateMachine<\/code> method, which calls to the builder&#8217;s <code>SetStateMachine<\/code> method, which stores the box into the field. Whew.<\/p>\n<\/li>\n<li>\n<p>Finally, we have an <code>Action<\/code> that represents the continuation, and that&#8217;s passed to the awaiter&#8217;s <code>UnsafeOnCompleted<\/code> method.  In the case of a <code>TaskAwaiter<\/code>, the task will store that <code>Action<\/code> into the task&#8217;s continuation list, such that when the task completes, it&#8217;ll invoke the <code>Action<\/code>, call back through the <code>MoveNextRunner.Run<\/code>, call back through <code>ExecutionContext.Run<\/code>, and finally invoke the state machine&#8217;s <code>MoveNext<\/code> method to re-enter the state machine and continue running from where it left off.<\/p>\n<\/li>\n<\/ol>\n<p>That&#8217;s what happens on .NET Framework, and you can witness the outcome of this in a profiler, such as by running an allocation profiler to see what&#8217;s allocated on each await.  Let&#8217;s take this silly program, which I&#8217;ve written just to highlight the allocation costs involved:<\/p>\n<pre><code class=\"language-C#\">using System.Threading;\r\nusing System.Threading.Tasks;\r\n\r\nclass Program\r\n{\r\n    static async Task Main()\r\n    {\r\n        var al = new AsyncLocal&lt;int&gt;() { Value = 42 };\r\n        for (int i = 0; i &lt; 1000; i++)\r\n        {\r\n            await SomeMethodAsync();\r\n        }\r\n    }\r\n\r\n    static async Task SomeMethodAsync()\r\n    {\r\n        for (int i = 0; i &lt; 1000; i++)\r\n        {\r\n            await Task.Yield();\r\n        }\r\n    }\r\n}<\/code><\/pre>\n<p>This program is creating an <code>AsyncLocal&lt;int&gt;<\/code> to flow the value 42 through all subsequent async operations.  It&#8217;s then calling <code>SomeMethodAsync<\/code> 1000 times, each of which is suspending\/resuming 1000 times.  In Visual Studio, I run this using the <a href=\"https:\/\/learn.microsoft.com\/visualstudio\/profiling\/dotnet-alloc-tool\">.NET Object Allocation Tracking profiler<\/a>, which yields the following results:\n<img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2023\/03\/AllocationNetFramework.png\" alt=\"Allocation associated with asynchronous operations on .NET Framework\" \/>\nThat&#8217;s&#8230; a lot of allocation! Let&#8217;s examine each of these to understand where they&#8217;re coming from.<\/p>\n<ul>\n<li><code>ExecutionContext<\/code>.  There&#8217;s over a million of these being allocated. Why? Because in .NET Framework, <code>ExecutionContext<\/code> is a <em>mutable<\/em> data structure.  Since we want to flow the data that was present at the time an async operation was forked and we don&#8217;t want it to then see mutations performed after that fork, we need to copy the <code>ExecutionContext<\/code>.  Every single forked operation requires such a copy, so with 1000 calls to <code>SomeMethodAsync<\/code> each of which is suspending\/resuming 1000 times, we have a million <code>ExecutionContext<\/code> instances. Ouch.<\/li>\n<li><code>Action<\/code>. Similarly, every time we <code>await<\/code> something that&#8217;s not yet complete (which is the case with our million <code>await Task.Yield()<\/code>s), we end up allocating a new <code>Action<\/code> delegate to pass to that awaiter&#8217;s <code>UnsafeOnCompleted<\/code> method.<\/li>\n<li><code>MoveNextRunner<\/code>.  Same deal; there&#8217;s a million of these, since in the outline of the steps earlier, every time we suspend, we&#8217;re allocating a new <code>MoveNextRunner<\/code> to store the <code>Action<\/code> and the <code>ExecutionContext<\/code>, in order to execute the former with the latter.<\/li>\n<li><code>LogicalCallContext<\/code>. Another million.  These are an implementation detail of <code>AsyncLocal&lt;T&gt;<\/code> on .NET Framework; <code>AsyncLocal&lt;T&gt;<\/code> stores its data into the <code>ExecutionContext<\/code>&#8216;s &#8220;logical call context&#8221;, which is a fancy way of saying the general state that&#8217;s flowed with the <code>ExecutionContext<\/code>.  So, if we&#8217;re making a million copies of the <code>ExecutionContext<\/code>, we&#8217;re making a million copies of the <code>LogicalCallContext<\/code>, too.<\/li>\n<li><code>QueueUserWorkItemCallback<\/code>. Each <code>Task.Yield()<\/code> is queueing a work item to the thread pool, resulting in a million allocations of the work item objects used to represent those million operations.<\/li>\n<li><code>Task&lt;VoidResult&gt;<\/code>. There&#8217;s a thousand of these, so at least we&#8217;re out of the &#8220;million&#8221; club.  Every <code>async Task<\/code> invocation that completes asynchronously needs to allocate a new <code>Task<\/code> instance to represent the eventual completion of that call.<\/li>\n<li><code>&lt;SomeMethodAsync&gt;d__1<\/code>.  This is the box of the compiler-generated state machine struct.  1000 methods suspend, 1000 boxes occur.<\/li>\n<li><code>QueueSegment<\/code>\/<code>IThreadPoolWorkItem[]<\/code>.  There are several thousand of these, and they&#8217;re not technically related to async methods specifically, but rather to work being queued to the thread pool in general.  In .NET Framework, the thread pool&#8217;s queue is a linked list of non-circular segments. These segments aren&#8217;t reused; for a segment of length N, once N work items have been enqueued into and dequeued from that segment, the segment is discarded and left up for garbage collection.<\/li>\n<\/ul>\n<p>That was .NET Framework.  <a href=\"https:\/\/github.com\/dotnet\/runtime\/blob\/8de96c8b1b1cc3a781f23dcdf68c0aeb62dadbe7\/src\/libraries\/System.Private.CoreLib\/src\/System\/Runtime\/CompilerServices\/AsyncTaskMethodBuilderT.cs#L97-L145\">This<\/a> is .NET Core:\n<img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2023\/03\/AllocationNetCore.png\" alt=\"Allocation associated with asynchronous operations on .NET Core\" \/>\nSo much prettier!  For this sample on .NET Framework, there were more than 5 million allocations totaling ~145MB of allocated memory.  For that same sample on .NET Core, there were instead only ~1000 allocations totaling only ~109KB.  Why so much less?<\/p>\n<ul>\n<li><code>ExecutionContext<\/code>. In .NET Core, <code>ExecutionContext<\/code> is now <em>immutable<\/em>.  The downside to that is that every change to the context, e.g. by setting a value into an <code>AsyncLocal&lt;T&gt;<\/code>, requires allocating a new <code>ExecutionContext<\/code>.  The upside, however, is that flowing context is way, way, way more common than is changing it, and as <code>ExecutionContext<\/code> is now immutable, we no longer need to clone as part of flowing it. &#8220;Capturing&#8221; the context is literally just reading it out of a field, rather than reading it and doing a clone of its contents. So it&#8217;s not only way, way, way more common to flow than to change, it&#8217;s also way, way, way cheaper.<\/li>\n<li><code>LogicalCallContext<\/code>. This no longer exists in .NET Core.  In .NET Core, the only thing <code>ExecutionContext<\/code> exists for is the storage for <code>AsyncLocal&lt;T&gt;<\/code>.  Other things that had their own special place in <code>ExecutionContext<\/code> are modeled in terms of <code>AsyncLocal&lt;T&gt;<\/code>.  For example, impersonation in .NET Framework would flow as part of the <code>SecurityContext<\/code> that&#8217;s part of <code>ExecutionContext<\/code>; in .NET Core, impersonation flows via an <code>AsyncLocal&lt;SafeAccessTokenHandle&gt;<\/code> that uses a <code>valueChangedHandler<\/code> to make appropriate changes to the current thread.<\/li>\n<li><code>QueueSegment<\/code>\/<code>IThreadPoolWorkItem[]<\/code>. In .NET Core, the <code>ThreadPool<\/code>&#8216;s global queue is now implemented as a <code>ConcurrentQueue&lt;T&gt;<\/code>, and <code>ConcurrentQueue&lt;T&gt;<\/code> has been rewritten to be a linked list of <em>circular<\/em> segments of non-fixed size. Once the size of a segment is large enough that the segment never fills because steady-state dequeues are able to keep up with steady-state enqueues, no additional segments need to be allocated, and the same large-enough segment is just used endlessly.<\/li>\n<\/ul>\n<p>What about the rest of the allocations, like <code>Action<\/code>, <code>MoveNextRunner<\/code>, and <code>&lt;SomeMethodAsync&gt;d__1<\/code>?\nUnderstanding how the remaining allocations were removed requires diving into how this now works on .NET Core.<\/p>\n<p>Let&#8217;s rewind our discussion back to when we were discussing what happens at suspension time:<\/p>\n<pre><code class=\"language-C#\">if (!awaiter.IsCompleted) \/\/ we need to suspend when IsCompleted is false\r\n{\r\n    &lt;&gt;1__state = 1;\r\n    &lt;&gt;u__2 = awaiter;\r\n    &lt;&gt;t__builder.AwaitUnsafeOnCompleted(ref awaiter, ref this);\r\n    return;\r\n}<\/code><\/pre>\n<p>The code that&#8217;s emitted here is the same regardless of which platform surface area is being targeted, so regardless of .NET Framework vs .NET Core, the generated IL for this suspension is identical.  What changes, however, is the implementation of that <code>AwaitUnsafeOnCompleted<\/code> method, which on .NET Core is much different:<\/p>\n<ol>\n<li>\n<p>Things do start out the same: the method calls <code>ExecutionContext.Capture()<\/code> to get the current execution context.<\/p>\n<\/li>\n<li>\n<p>Then things diverge from .NET Framework. The builder in .NET Core has just a single field on it:<\/p>\n<pre><code class=\"language-C#\">public struct AsyncTaskMethodBuilder\r\n{\r\n    private Task&lt;VoidTaskResult&gt;? m_task;\r\n    ...\r\n}<\/code><\/pre>\n<p>After capturing the <code>ExecutionContext<\/code>, it checks whether that <code>m_task<\/code> field contains an instance of an <a href=\"https:\/\/github.com\/dotnet\/runtime\/blob\/8de96c8b1b1cc3a781f23dcdf68c0aeb62dadbe7\/src\/libraries\/System.Private.CoreLib\/src\/System\/Runtime\/CompilerServices\/AsyncTaskMethodBuilderT.cs#L273\"><code>AsyncStateMachineBox&lt;TStateMachine&gt;<\/code><\/a>, where <code>TStateMachine<\/code> is the type of the compiler-generated state machine struct.  That <code>AsyncStateMachineBox&lt;TStateMachine&gt;<\/code> type is the &#8220;magic.&#8221;  It&#8217;s defined like this:<\/p>\n<pre><code class=\"language-C#\">private class AsyncStateMachineBox&lt;TStateMachine&gt; :\r\n    Task&lt;TResult&gt;, IAsyncStateMachineBox\r\n    where TStateMachine : IAsyncStateMachine\r\n{\r\n    private Action? _moveNextAction;\r\n    public TStateMachine? StateMachine;\r\n    public ExecutionContext? Context;\r\n    ...\r\n}<\/code><\/pre>\n<p>Rather than having a separate <code>Task<\/code>, this <em>is<\/em> the task (note its base type).  Rather than boxing the state machine, the struct just lives as a strongly-typed field on this task.  And rather than having a separate <code>MoveNextRunner<\/code> to store both the <code>Action<\/code> and the <code>ExecutionContext<\/code>, they&#8217;re just fields on this type, and since this <em>is<\/em> the instance that gets stored into the builder&#8217;s <code>m_task<\/code> field, we have direct access to it and don&#8217;t need to re-allocate things on every suspension.  If the <code>ExecutionContext<\/code> changes, we can just overwrite the field with the new context and don&#8217;t need to allocate anything else; any <code>Action<\/code> we have still points to the right place. So, after capturing the <code>ExecutionContext<\/code>, if we already have an instance of this <code>AsyncStateMachineBox&lt;TStateMachine&gt;<\/code>, this isn&#8217;t the first time the method is suspending, and we can just store the newly captured <code>ExecutionContext<\/code> into it.  If we don&#8217;t already have an instance of <code>AsyncStateMachineBox&lt;TStateMachine&gt;<\/code>, then we need to allocate it:<\/p>\n<pre><code class=\"language-C#\">var box = new AsyncStateMachineBox&lt;TStateMachine&gt;();\r\ntaskField = box; \/\/ important: this must be done before storing stateMachine into box.StateMachine!\r\nbox.StateMachine = stateMachine;\r\nbox.Context = currentContext;<\/code><\/pre>\n<p>Note that line which the source comments as &#8220;important&#8221;.  This takes the place of that complicated <code>SetStateMachine<\/code> dance in .NET Framework, such that <code>SetStateMachine<\/code> isn&#8217;t actually used at all in .NET Core.  The <code>taskField<\/code> you see there is a <code>ref<\/code> to the <code>AsyncTaskMethodBuilder<\/code>&#8216;s <code>m_task<\/code> field. We allocate the <code>AsyncStateMachineBox&lt;TStateMachine&gt;<\/code>, then via <code>taskField<\/code> store that object into the builder&#8217;s <code>m_task<\/code> (this is the builder that&#8217;s in the state machine struct on the stack), and then copy that stack-based state machine (which now already contains the reference to the box) into the heap-based <code>AsyncStateMachineBox&lt;TStateMachine&gt;<\/code>, such that the <code>AsyncStateMachineBox&lt;TStateMachine&gt;<\/code> appropriately and recursively ends up referencing itself.  Still mind bending, but a much more efficient mind bending.<\/p>\n<\/li>\n<li>\n<p>We can then get an <code>Action<\/code> to a method on this instance that will invoke its <code>MoveNext<\/code> method that will do the appropriate <code>ExecutionContext<\/code> restoration prior to calling into the <code>StateMachine<\/code>&#8216;s <code>MoveNext<\/code>.  And that <code>Action<\/code> can be cached into the <code>_moveNextAction<\/code> field such that any subsequent use can just reuse the same <code>Action<\/code>.  That <code>Action<\/code> is then passed to the awaiter&#8217;s <code>UnsafeOnCompleted<\/code> to hook up the continuation.<\/p>\n<\/li>\n<\/ol>\n<p>That explanation explains why most of the rest of the allocations are gone: <code>&lt;SomeMethodAsync&gt;d__1<\/code> doesn&#8217;t get boxed and instead just lives as a field on the task itself, and the <code>MoveNextRunner<\/code> is no longer needed as it existed only to store the <code>Action<\/code> and <code>ExecutionContext<\/code>.  But, based on this explanation, we should have still seen 1000 <code>Action<\/code> allocations, one per method call, and we didn&#8217;t.  Why?  And what about those <code>QueueUserWorkItemCallback<\/code> objects&#8230; we&#8217;re still queueing as part of <code>Task.Yield()<\/code>, so why aren&#8217;t those showing up?<\/p>\n<p>As I noted, one of the nice things about pushing off the implementation details into the core library is it can evolve the implementation over time, and we&#8217;ve already seen how it evolved from .NET Framework to .NET Core.  It&#8217;s also evolved further from the initial rewrite for .NET Core, with additional optimizations that benefit from having internal access to key components in the system.  In particular, the async infrastructure knows about core types like <code>Task<\/code> and <code>TaskAwaiter<\/code>.  And because it knows about them and has internals access, it doesn&#8217;t have to play by the publicly-defined rules.  The awaiter pattern followed by the C# language requires an awaiter to have an <code>AwaitOnCompleted<\/code> or <code>AwaitUnsafeOnCompleted<\/code> method, both of which take the continuation as an <code>Action<\/code>, and that means the infrastructure needs to be able to create an <code>Action<\/code> to represent the continuation, in order to work with arbitrary awaiters the infrastructure knows nothing about.  But if the infrastructure encounters an awaiter it <em>does<\/em> know about, it&#8217;s under no obligation to take the same code path.  For all of the core awaiters defined in System.Private.CoreLib, then, the infrastructure has a leaner path it can follow, one that doesn&#8217;t require an <code>Action<\/code> at all.  These awaiters all know about <code>IAsyncStateMachineBox<\/code>es, and are able to treat the box object itself as the continuation.  So, for example, the <code>YieldAwaitable<\/code> returned by <code>Task.Yield<\/code> is able to queue the <code>IAsyncStateMachineBox<\/code> itself directly into the <code>ThreadPool<\/code> as a work item, and the <code>TaskAwaiter<\/code> used when <code>await<\/code>&#8216;ing a <code>Task<\/code> is able to store the <code>IAsyncStateMachineBox<\/code> itself directly into the <code>Task<\/code>&#8216;s continuation list.  No <code>Action<\/code> needed, no <code>QueueUserWorkItemCallback<\/code> needed.<\/p>\n<p>Thus, in the very common case where an async method only awaits things from System.Private.CoreLib (<code>Task<\/code>, <code>Task&lt;TResult&gt;<\/code>, <code>ValueTask<\/code>, <code>ValueTask&lt;TResult&gt;<\/code>, <code>YieldAwaitable<\/code>, and the <code>ConfigureAwait<\/code> variants of those), worst case is there&#8217;s only ever a single allocation of overhead associated with the entire lifecycle of the async method: if the method ever suspends, it allocates that single <code>Task<\/code>-derived type which stores all other required state, and if the method never suspends, there&#8217;s no additional allocation incurred.<\/p>\n<p>We can get rid of that last allocation as well, if desired, at least in an amortized fashion.  As has been shown, there&#8217;s a default builder associated with <code>Task<\/code> (<code>AsyncTaskMethodBuilder<\/code>), and similarly there&#8217;s a default builder associated with <code>Task&lt;TResult&gt;<\/code> (<code>AsyncTaskMethodBuilder&lt;TResult&gt;<\/code>) and with <code>ValueTask<\/code> and <code>ValueTask&lt;TResult&gt;<\/code> (<code>AsyncValueTaskMethodBuilder<\/code> and <code>AsyncValueTaskMethodBuilder&lt;TResult&gt;<\/code>, respectively).  For <code>ValueTask<\/code>\/<code>ValueTask&lt;TResult&gt;<\/code>, the builders are actually fairly simple, as they themselves only handle the synchronously-and-successfully-completing case, in which case the async method completes without ever suspending and the builders can just return a <code>ValueTask.Completed<\/code> or a <code>ValueTask&lt;TResult&gt;<\/code> wrapping the result value. For everything else, they just delegate to <code>AsyncTaskMethodBuilder<\/code>\/<code>AsyncTaskMethodBuilder&lt;TResult&gt;<\/code>, since the <code>ValueTask<\/code>\/<code>ValueTask&lt;TResult&gt;<\/code> that&#8217;ll be returned just wraps a <code>Task<\/code> and it can share all of the same logic.  But <a href=\"https:\/\/devblogs.microsoft.com\/dotnet\/performance-improvements-in-net-6\">.NET 6 and C# 10<\/a> introduced the ability for a method to override the builder that&#8217;s used on a method-by-method basis, and introduced a couple of specialized builders for <code>ValueTask<\/code>\/<code>ValueTask&lt;TResult&gt;<\/code> that are able to pool <code>IValueTaskSource<\/code>\/<code>IValueTaskSource&lt;TResult&gt;<\/code> objects representing the eventual completion rather than using <code>Task<\/code>s.<\/p>\n<p>We can see the impact of this in our sample. Let&#8217;s slightly tweak our <code>SomeMethodAsync<\/code> we were profiling to return <code>ValueTask<\/code> instead of <code>Task<\/code>:<\/p>\n<pre><code class=\"language-C#\">static async ValueTask SomeMethodAsync()\r\n{\r\n    for (int i = 0; i &lt; 1000; i++)\r\n    {\r\n        await Task.Yield();\r\n    }\r\n}<\/code><\/pre>\n<p>That will result in this generated entry point:<\/p>\n<pre><code class=\"language-C#\">[AsyncStateMachine(typeof(&lt;SomeMethodAsync&gt;d__1))]\r\nprivate static ValueTask SomeMethodAsync()\r\n{\r\n    &lt;SomeMethodAsync&gt;d__1 stateMachine = default;\r\n    stateMachine.&lt;&gt;t__builder = AsyncValueTaskMethodBuilder.Create();\r\n    stateMachine.&lt;&gt;1__state = -1;\r\n    stateMachine.&lt;&gt;t__builder.Start(ref stateMachine);\r\n    return stateMachine.&lt;&gt;t__builder.Task;\r\n}<\/code><\/pre>\n<p>Now, we add <code>[AsyncMethodBuilder(typeof(PoolingAsyncValueTaskMethodBuilder))]<\/code> to the declaration of <code>SomeMethodAsync<\/code>:<\/p>\n<pre><code class=\"language-C#\">[AsyncMethodBuilder(typeof(PoolingAsyncValueTaskMethodBuilder))]\r\nstatic async ValueTask SomeMethodAsync()\r\n{\r\n    for (int i = 0; i &lt; 1000; i++)\r\n    {\r\n        await Task.Yield();\r\n    }\r\n}<\/code><\/pre>\n<p>and the compiler instead outputs this:<\/p>\n<pre><code class=\"language-C#\">[AsyncStateMachine(typeof(&lt;SomeMethodAsync&gt;d__1))]\r\n[AsyncMethodBuilder(typeof(PoolingAsyncValueTaskMethodBuilder))]\r\nprivate static ValueTask SomeMethodAsync()\r\n{\r\n    &lt;SomeMethodAsync&gt;d__1 stateMachine = default;\r\n    stateMachine.&lt;&gt;t__builder = PoolingAsyncValueTaskMethodBuilder.Create();\r\n    stateMachine.&lt;&gt;1__state = -1;\r\n    stateMachine.&lt;&gt;t__builder.Start(ref stateMachine);\r\n    return stateMachine.&lt;&gt;t__builder.Task;\r\n}<\/code><\/pre>\n<p>The actual C# code gen for the entirety of the implementation, including the whole state machine (not shown), is almost identical; the <em>only<\/em> difference is the type of the builder that&#8217;s created and stored and thus used everywhere we previously saw references to the builder.  And if you look at <a href=\"https:\/\/github.com\/dotnet\/runtime\/blob\/8de96c8b1b1cc3a781f23dcdf68c0aeb62dadbe7\/src\/libraries\/System.Private.CoreLib\/src\/System\/Runtime\/CompilerServices\/PoolingAsyncValueTaskMethodBuilderT.cs#L152-L218\">the code for <code>PoolingAsyncValueTaskMethodBuilder<\/code><\/a>, you&#8217;ll see its structure is almost identical to that of <code>AsyncTaskMethodBuilder<\/code>, including using some of the exact same shared routines for doing things like special-casing known awaiter types.  The key difference is that instead of doing <code>new AsyncStateMachineBox&lt;TStateMachine&gt;()<\/code> when the method first suspends, it instead does <code>StateMachineBox&lt;TStateMachine&gt;.RentFromCache()<\/code>, and upon the async method (<code>SomeMethodAsync<\/code>) completing and an <code>await<\/code> on the returned <code>ValueTask<\/code> completing, the rented box is returned to the cache. That means (amortized) zero allocation:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2023\/03\/AllocationNetCoreWithPooling.png\" alt=\"Allocation associated with asynchronous operations on .NET Core with pooling\" \/><\/p>\n<p>That cache in and of itself is a bit interesting.  Object pooling can be a good idea and it can be a bad idea.  The more expensive an object is to create, the more valuable it is to pool them; so, for example, it&#8217;s a lot more valuable to pool really large arrays than it is to pool really tiny arrays, because larger arrays not only require more CPU cycles and memory accesses to zero out, they put more pressure on the garbage collector to collect more often.  For very small objects, though, pooling them can be a net negative.  Pools are just memory allocators, as is the GC, so when you pool, you&#8217;re trading off the costs associated with one allocator for the costs associated with another, and the GC is very efficient at handling lots of tiny, short-lived objects.  If you do a lot of work in an object&#8217;s constructor, avoiding that work can dwarf the costs of the allocator itself, making pooling valuable.  But if you do little to no work in an object&#8217;s constructor, and you pool it, you&#8217;re betting that your allocator (your pool) is more efficient for the access patterns employed than is the GC, and that is frequently a bad bet.  There are other costs involved as well, and in some cases you can end up effectively fighting against the GC&#8217;s heuristics; for example, the GC is optimized based on the premise that references from higher generation (e.g. gen2) objects to lower generation (e.g. gen0) objects are relatively rare, but pooling objects can invalidate those premises.<\/p>\n<p>Now, the objects created by async methods aren&#8217;t <em>tiny<\/em>, and they can be on super hot paths, so pooling can be reasonable. But to make it as valuable as possible we also want to avoid as much overhead as possible. The pool is thus very simple, opting to make renting and returning really fast with little to no contention, even if that means it might end up allocating more than it would if it more aggressively cached more.  For each state machine type, the implementation <a href=\"https:\/\/github.com\/dotnet\/runtime\/blob\/8de96c8b1b1cc3a781f23dcdf68c0aeb62dadbe7\/src\/libraries\/System.Private.CoreLib\/src\/System\/Runtime\/CompilerServices\/PoolingAsyncValueTaskMethodBuilderT.cs#L287-L292\">pools<\/a> up to a single state machine box per <em>thread<\/em> and a single state machine box per <em>core<\/em>; this allows it to rent and return with minimal overhead and minimal contention (no other thread can be accessing the thread-specific cache at the same time, and it&#8217;s rare for another thread to be accessing the core-specific cache at the same time).  And while this might seem like a relatively small pool, it&#8217;s also quite effective at significantly reducing steady state allocation, given that the pool is only responsible for storing objects not currently in use; you could have a million async methods all in flight at any given time, and even though the pool is only able to store up to one object per thread and per core, it can still avoid dropping lots of objects, since it only needs to store an object long enough to transfer it from one operation to another, not while it&#8217;s in use by that operation.<\/p>\n<h3>SynchronizationContext and ConfigureAwait<\/h3>\n<p>We talked about <code>SynchronizationContext<\/code> previously in the context of the EAP pattern and mentioned that it would show up again. <code>SynchronizationContext<\/code> makes it possible to call reusable helpers and automatically be scheduled back whenever and to wherever the calling environment deems fit.  As a result, it&#8217;s natural to expect that to &#8220;just work&#8221; with <code>async<\/code>\/<code>await<\/code>, and it does.  Going back to our button click handler from earlier:<\/p>\n<pre><code class=\"language-C#\">ThreadPool.QueueUserWorkItem(_ =&gt;\r\n{\r\n    string message = ComputeMessage();\r\n    button1.BeginInvoke(() =&gt;\r\n    {\r\n        button1.Text = message;\r\n    });\r\n});<\/code><\/pre>\n<p>with <code>async<\/code>\/<code>await<\/code> we&#8217;d like to instead be able to write this as follows:<\/p>\n<pre><code class=\"language-C#\">button1.Text = await Task.Run(() =&gt; ComputeMessage());<\/code><\/pre>\n<p>That invocation of <code>ComputeMessage<\/code> is offloaded to the thread pool, and upon the method&#8217;s completion, execution transitions back to the UI thread associated with the button, and the setting of its Text property happens on that thread.<\/p>\n<p>That integration with <code>SynchronizationContext<\/code> is left up to the awaiter implementation (the code generated for the state machine knows nothing about <code>SynchronizationContext<\/code>), as it&#8217;s the awaiter that is responsible for actually invoking or queueing the supplied continuation when the represented asynchronous operation completes.  While a custom awaiter need not respect <code>SynchronizationContext.Current<\/code>, the awaiters for <code>Task<\/code>, <code>Task&lt;TResult&gt;<\/code>, <code>ValueTask<\/code>, and <code>ValueTask&lt;TResult&gt;<\/code> all do.  That means that, by default, when you <code>await<\/code> a <code>Task<\/code>, a <code>Task&lt;TResult&gt;<\/code>, a <code>ValueTask<\/code>, a <code>ValueTask&lt;TResult&gt;<\/code>, or even the result of a <code>Task.Yield()<\/code> call, the awaiter by default will look up the current <code>SynchronizationContext<\/code> and then if it successfully got a non-default one, will eventually queue the continuation to that context.<\/p>\n<p>We can see this if we look at the code involved in <code>TaskAwaiter<\/code>.  Here&#8217;s a snippet of the <a href=\"https:\/\/github.com\/dotnet\/runtime\/blob\/967a59712996c2cdb8ce2f65fb3167afbd8b01f3\/src\/libraries\/System.Private.CoreLib\/src\/System\/Threading\/Tasks\/Task.cs#L2558-L2583\">relevant code<\/a> from Corelib:<\/p>\n<pre><code class=\"language-C#\">internal void UnsafeSetContinuationForAwait(IAsyncStateMachineBox stateMachineBox, bool continueOnCapturedContext)\r\n{\r\n    if (continueOnCapturedContext)\r\n    {\r\n        SynchronizationContext? syncCtx = SynchronizationContext.Current;\r\n        if (syncCtx != null &amp;&amp; syncCtx.GetType() != typeof(SynchronizationContext))\r\n        {\r\n            var tc = new SynchronizationContextAwaitTaskContinuation(syncCtx, stateMachineBox.MoveNextAction, flowExecutionContext: false);\r\n            if (!AddTaskContinuation(tc, addBeforeOthers: false))\r\n            {\r\n                tc.Run(this, canInlineContinuationTask: false);\r\n            }\r\n            return;\r\n        }\r\n        else\r\n        {\r\n            TaskScheduler? scheduler = TaskScheduler.InternalCurrent;\r\n            if (scheduler != null &amp;&amp; scheduler != TaskScheduler.Default)\r\n            {\r\n                var tc = new TaskSchedulerAwaitTaskContinuation(scheduler, stateMachineBox.MoveNextAction, flowExecutionContext: false);\r\n                if (!AddTaskContinuation(tc, addBeforeOthers: false))\r\n                {\r\n                    tc.Run(this, canInlineContinuationTask: false);\r\n                }\r\n                return;\r\n            }\r\n        }\r\n    }\r\n\r\n    ...\r\n}<\/code><\/pre>\n<p>This is part of a method that&#8217;s determining what object to store into the <code>Task<\/code> as a continuation.  It&#8217;s being passed the <code>stateMachineBox<\/code>, which, as was alluded to earlier, can be stored directly into the <code>Task<\/code>&#8216;s continuation list. However, this special logic might wrap that <code>IAsyncStateMachineBox<\/code> to also incorporate a scheduler if one is present.  It checks to see whether there&#8217;s currently a non-default <code>SynchronizationContext<\/code>, and if there is, it creates a <code>SynchronizationContextAwaitTaskContinuation<\/code> as the actual object that&#8217;ll be stored as the continuation; that object in turn wraps the original and the captured <code>SynchronizationContext<\/code>, and knows how to invoke the former&#8217;s <code>MoveNext<\/code> in a work item queued to the latter.  This is how you&#8217;re able to <code>await<\/code> as part of some event handler in a UI application and have the code after the <code>await<\/code>s completion continue on the right thread. The next interesting thing to note here is that it&#8217;s not just paying attention to a <code>SynchronizationContext<\/code>: if it couldn&#8217;t find a custom <code>SynchronizationContext<\/code> to use, it also looks to see whether the <code>TaskScheduler<\/code> type that&#8217;s used by <code>Task<\/code>s has a custom one in play that needs to be considered.  As with <code>SynchronizationContext<\/code>, if there&#8217;s a non-default one of those, it&#8217;s then wrapped with the original box in a <code>TaskSchedulerAwaitTaskContinuation<\/code> that&#8217;s used as the continuation object.<\/p>\n<p>But arguably the most interesting thing to notice here is the very first line of the method body: <code>if (continueOnCapturedContext)<\/code>.  We only do these checks for <code>SynchronizationContext<\/code>\/<code>TaskScheduler<\/code> if <code>continueOnCapturedContext<\/code> is <code>true<\/code>; if it&#8217;s <code>false<\/code>, the implementation behaves as if both were default and ignores them.  What, pray tell, sets <code>continueOnCapturedContext<\/code> to false?  You&#8217;ve probably guessed it: using the ever popular <code>ConfigureAwait(false)<\/code>.<\/p>\n<p>I talk about <code>ConfigureAwait<\/code> at length in <a href=\"https:\/\/devblogs.microsoft.com\/dotnet\/configureawait-faq\/\">ConfigureAwait FAQ<\/a>, so I&#8217;d encourage you to read that for more information.  Suffice it to say, the <em>only<\/em> thing <code>ConfigureAwait(false)<\/code> does as part of an <code>await<\/code> is feed its argument <code>Boolean<\/code> into this function (and others like it) as that <code>continueOnCapturedContext<\/code> value, so as to skip the checks on <code>SynchronizationContext<\/code>\/<code>TaskScheduler<\/code> and behave as if neither of them existed.  In the case of <code>Task<\/code>s, this then permits the <code>Task<\/code> to invoke its continuations wherever it deems fit rather than being forced to queue them to execute on some specific scheduler.<\/p>\n<p>I previously mentioned one other aspect of <code>SynchronizationContext<\/code>, and I said we&#8217;d see it again: <code>OperationStarted<\/code>\/<code>OperationCompleted<\/code>.  Now&#8217;s the time.  These rear their heads as part of the feature everyone loves to hate: <code>async void<\/code>. <code>ConfigureAwait<\/code>-aside, <code>async void<\/code> is arguably one of the most divisive features added as part of <code>async\/await<\/code>.  It was added for one reason and one reason only: event handlers. In a UI application, you want to be able to write code like the following:<\/p>\n<pre><code class=\"language-C#\">button1.Click += async (sender, eventArgs) =&gt;\r\n{\r\n  button1.Text = await Task.Run(() =&gt; ComputeMessage());  \r\n};<\/code><\/pre>\n<p>but if all <code>async<\/code> methods had to have a return type like <code>Task<\/code>, you wouldn&#8217;t be able to do this. The <code>Click<\/code> event has a signature <code>public event EventHandler? Click;<\/code>, with <code>EventHandler<\/code> defined as <code>public delegate void EventHandler(object? sender, EventArgs e);<\/code>, and thus to provide a method that matches that signature, the method needs to be <code>void<\/code>-returning.<\/p>\n<p>There are a variety of reasons <code>async void<\/code> is considered bad, why <a href=\"https:\/\/learn.microsoft.com\/archive\/msdn-magazine\/2013\/march\/async-await-best-practices-in-asynchronous-programming\">articles<\/a> recommend avoiding it wherever possible, and why <a href=\"https:\/\/github.com\/microsoft\/vs-threading\/blob\/main\/doc\/analyzers\/VSTHRD101.md\">analyzers<\/a> have sprung up to flag use of them.  One of the biggest issues is with delegate inference.  Consider this program:<\/p>\n<pre><code class=\"language-C#\">using System.Diagnostics;\r\n\r\nTime(async () =&gt;\r\n{\r\n    Console.WriteLine(\"Enter\");\r\n    await Task.Delay(TimeSpan.FromSeconds(10));\r\n    Console.WriteLine(\"Exit\");\r\n});\r\n\r\nstatic void Time(Action action)\r\n{\r\n    Console.WriteLine(\"Timing...\");\r\n    Stopwatch sw = Stopwatch.StartNew();\r\n    action();\r\n    Console.WriteLine($\"...done timing: {sw.Elapsed}\");\r\n}<\/code><\/pre>\n<p>One could easily expect this to output an elapsed time of at least 10 seconds, but if you run this you&#8217;ll instead find output like this:<\/p>\n<pre><code class=\"language-text\">Timing...\r\nEnter\r\n...done timing: 00:00:00.0037550<\/code><\/pre>\n<p>Huh? Of course, based on everything we&#8217;ve discussed in this post, it should be understood what the problem is.  The <code>async<\/code> lambda is actually an <code>async void<\/code> method.  Async methods return to their caller the moment they hit the first suspension point.  If this were an <code>async Task<\/code> method, that&#8217;s when the <code>Task<\/code> would be returned.  But in the case of an <code>async void<\/code>, nothing is returned.  All the <code>Time<\/code> method knows is that it invoked <code>action();<\/code> and the delegate call returned; it has no idea that the async method is actually still &#8220;running&#8221; and will asynchronously complete later.<\/p>\n<p>That&#8217;s where <code>OperationStarted<\/code>\/<code>OperationCompleted<\/code> come in. Such <code>async void<\/code> methods are similar in nature to the EAP methods discussed earlier: the initiation of such methods is <code>void<\/code>, and so you need some other mechanism to be able to track all such operations in flight. The EAP implementations thus call the current <code>SynchronizationContext<\/code>&#8216;s <code>OperationStarted<\/code> when the operation is initiated and <code>OperationCompleted<\/code> when it completes, and <code>async void<\/code> does the same.  The builder associated with <code>async void<\/code> is <code>AsyncVoidMethodBuilder<\/code>.  Remember in the entry point of an async method how the compiler-generated code invokes the builder&#8217;s static <code>Create<\/code> method to get an appropriate builder instance? <code>AsyncVoidMethodBuilder<\/code> takes advantage of that in order to hook creation and invoke <code>OperationStarted<\/code>:<\/p>\n<pre><code class=\"language-C#\">public static AsyncVoidMethodBuilder Create()\r\n{\r\n    SynchronizationContext? sc = SynchronizationContext.Current;\r\n    sc?.OperationStarted();\r\n    return new AsyncVoidMethodBuilder() { _synchronizationContext = sc };\r\n}<\/code><\/pre>\n<p>Similarly, when the builder is marked for completion via either <code>SetResult<\/code> or <code>SetException<\/code>, it invokes the corresponding <code>OperationCompleted<\/code> method.  This is how a unit testing framework like xunit is able to have <code>async void<\/code> test methods and still employ a maximum degree of concurrency on concurrent test executions, for example in xunit&#8217;s <a href=\"https:\/\/github.com\/xunit\/xunit\/blob\/4d1f2e5d4ac9260487d0a8f35a2d045388021b33\/src\/xunit.v3.core\/Sdk\/AsyncTestSyncContext.cs#L1\">AsyncTestSyncContext<\/a>.<\/p>\n<p>With that knowledge, we can now rewrite our timing sample:<\/p>\n<pre><code class=\"language-C#\">using System.Diagnostics;\r\n\r\nTime(async () =>\r\n{\r\n    Console.WriteLine(\"Enter\");\r\n    await Task.Delay(TimeSpan.FromSeconds(10));\r\n    Console.WriteLine(\"Exit\");\r\n});\r\n\r\nstatic void Time(Action action)\r\n{\r\n    var oldCtx = SynchronizationContext.Current;\r\n    try\r\n    {\r\n        var newCtx = new CountdownContext();\r\n        SynchronizationContext.SetSynchronizationContext(newCtx);\r\n\r\n        Console.WriteLine(\"Timing...\");\r\n        Stopwatch sw = Stopwatch.StartNew();\r\n        \r\n        action();\r\n        newCtx.SignalAndWait();\r\n\r\n        Console.WriteLine($\"...done timing: {sw.Elapsed}\");\r\n    }\r\n    finally\r\n    {\r\n        SynchronizationContext.SetSynchronizationContext(oldCtx);\r\n    }\r\n}\r\n\r\nsealed class CountdownContext : SynchronizationContext\r\n{\r\n    private readonly ManualResetEventSlim _mres = new ManualResetEventSlim(false);\r\n    private int _remaining = 1;\r\n\r\n    public override void OperationStarted() => Interlocked.Increment(ref _remaining);\r\n\r\n    public override void OperationCompleted()\r\n    {\r\n        if (Interlocked.Decrement(ref _remaining) == 0)\r\n        {\r\n            _mres.Set();\r\n        }\r\n    }\r\n\r\n    public void SignalAndWait()\r\n    {\r\n        OperationCompleted();\r\n        _mres.Wait();\r\n    }\r\n}<\/code><\/pre>\n<p>Here, I&#8217;ve created a <code>SynchronizationContext<\/code> that tracks a count for pending operations, and supports blocking waiting for them all to complete. When I run that, I get output like this:<\/p>\n<pre><code class=\"language-text\">Timing...\r\nEnter\r\nExit\r\n...done timing: 00:00:10.0149074<\/code><\/pre>\n<p>Tada!<\/p>\n<h3>State Machine Fields<\/h3>\n<p>At this point, we&#8217;ve seen the generated entry point method and how everything in the <code>MoveNext<\/code> implementation works.  We also glimpsed some of the fields defined on the state machine.  Let&#8217;s take a closer look at those.<\/p>\n<p>For the <code>CopyStreamToStream<\/code> method shown earlier:<\/p>\n<pre><code class=\"language-C#\">public async Task CopyStreamToStreamAsync(Stream source, Stream destination)\r\n{\r\n    var buffer = new byte[0x1000];\r\n    int numRead;\r\n    while ((numRead = await source.ReadAsync(buffer, 0, buffer.Length)) != 0)\r\n    {\r\n        await destination.WriteAsync(buffer, 0, numRead);\r\n    }\r\n}<\/code><\/pre>\n<p>here are the fields we ended up with:<\/p>\n<pre><code class=\"language-C#\">private struct &lt;CopyStreamToStreamAsync&gt;d__0 : IAsyncStateMachine\r\n{\r\n    public int &lt;&gt;1__state;\r\n    public AsyncTaskMethodBuilder &lt;&gt;t__builder;\r\n    public Stream source;\r\n    public Stream destination;\r\n    private byte[] &lt;buffer&gt;5__2;\r\n    private TaskAwaiter &lt;&gt;u__1;\r\n    private TaskAwaiter&lt;int&gt; &lt;&gt;u__2;\r\n\r\n    ...\r\n}<\/code><\/pre>\n<p>What are each of these?<\/p>\n<ul>\n<li><code>&lt;&gt;1__state<\/code>. The is the &#8220;state&#8221; in &#8220;state machine&#8221;. It defines the current state the state machine is in, and most importantly what should be done the next time <code>MoveNext<\/code> is called.  If the state is -2, the operation has completed.  If the state is -1, either we&#8217;re about to call <code>MoveNext<\/code> for the first time or <code>MoveNext<\/code> code is currently running on some thread.  If you&#8217;re debugging an async method&#8217;s processing and you see the state as -1, that means there&#8217;s some thread somewhere that&#8217;s actually executing the code contained in the method.  If the state is 0 or greater, the method is suspended, and the value of the state tells you at which <code>await<\/code> it&#8217;s suspended.  While this isn&#8217;t a hard and fast rule (certain code patterns can confuse the numbering), in general the state assigned corresponds to the 0-based number of the <code>await<\/code> in top-to-bottom ordering of the source code. So, for example, if the body of an <code>async<\/code> method were entirely:\n<pre><code class=\"language-C#\">await A();\r\nawait B();\r\nawait C();\r\nawait D();<\/code><\/pre>\n<p>and you found the state value was 2, that almost certainly means the async method is currently suspended waiting for the task returned from <code>C()<\/code> to complete.<\/p>\n<\/li>\n<li><code>&lt;&gt;t__builder<\/code>. This is the builder for the state machine, e.g. <code>AsyncTaskMethodBuilder<\/code> for a <code>Task<\/code>, <code>AsyncValueTaskMethodBuilder&lt;TResult&gt;<\/code> for a <code>ValueTask&lt;TResult&gt;<\/code>, <code>AsyncVoidMethodBuilder<\/code> for an <code>async void<\/code> method, or whatever builder was declared for use via <code>[AsyncMethodBuilder(...)]<\/code> on either the async return type or overridden via such an attribute on the async method itself.  As previously discussed, the builder is responsible for the lifecycle of the async method, including creating the return task, eventually completing that task, and serving as an intermediary for suspension, with the code in the async method asking the builder to suspend until a specific awaiter completes.<\/li>\n<li><code>source<\/code>\/<code>destination<\/code>.  These are the method parameters.  You can tell because they&#8217;re not name mangled; the compiler has named them exactly as the parameter names were specified.  As noted earlier, all parameters that are used by the method body need to be stored onto the state machine so that the <code>MoveNext<\/code> method has access to them. Note I said &#8220;used by&#8221;.  If the compiler sees that a parameter is unused by the body of the async method, it can optimize away the need to store the field. For example, given the method:\n<pre><code class=\"language-C#\">public async Task M(int someArgument)\r\n{\r\n    await Task.Yield();\r\n}<\/code><\/pre>\n<p>the compiler will emit these fields onto the state machine:<\/p>\n<pre><code class=\"language-C#\">private struct &lt;M&gt;d__0 : IAsyncStateMachine\r\n{\r\n    public int &lt;&gt;1__state;\r\n    public AsyncTaskMethodBuilder &lt;&gt;t__builder;\r\n    private YieldAwaitable.YieldAwaiter &lt;&gt;u__1;\r\n    ...\r\n}<\/code><\/pre>\n<p>Note the distinct lack of something named <code>someArgument<\/code>.  But, if we change the async method to actually use the argument in any way:<\/p>\n<pre><code class=\"language-C#\">public async Task M(int someArgument)\r\n{\r\n    Console.WriteLine(someArgument);\r\n    await Task.Yield();\r\n}<\/code><\/pre>\n<p>it shows up:<\/p>\n<pre><code class=\"language-C#\">private struct &lt;M&gt;d__0 : IAsyncStateMachine\r\n{\r\n    public int &lt;&gt;1__state;\r\n    public AsyncTaskMethodBuilder &lt;&gt;t__builder;\r\n    public int someArgument;\r\n    private YieldAwaitable.YieldAwaiter &lt;&gt;u__1;\r\n    ...\r\n}<\/code><\/pre>\n<\/li>\n<li><code>&lt;buffer&gt;5__2;<\/code>.  This is the <code>buffer<\/code> &#8220;local&#8221; that got lifted to be a field so that it could survive across <code>await<\/code> points.  The compiler tries reasonably hard to keep state from being lifted unnecessarily.  Note that there&#8217;s another local in the source, <code>numRead<\/code>, that <em>doesn&#8217;t<\/em> have a corresponding field in the state machine.  Why? Because it&#8217;s not necessary.  That local is set as the result of the <code>ReadAsync<\/code> call and is then used as the input to the <code>WriteAsync<\/code> call.  There&#8217;s no <code>await<\/code> in between those and across which the <code>numRead<\/code> value would need to be stored.  Just as how in a synchronous method the JIT compiler could choose to store such a value entirely in a register and never actually spill it to the stack, the C# compiler can avoid lifting this local to be a field as it needn&#8217;t preserve it&#8217;s value across any awaits.  In general, the C# compiler can elide lifting locals if it can prove that their value needn&#8217;t be preserved across <code>await<\/code>s.<\/li>\n<li><code>&lt;&gt;u__1<\/code> and <code>&lt;&gt;u__2<\/code>. There are two <code>await<\/code>s in the async method: one for a <code>Task&lt;int&gt;<\/code> returned by <code>ReadAsync<\/code>, and one for a <code>Task<\/code> returned by <code>WriteAsync<\/code>.  <code>Task.GetAwaiter()<\/code> returns a <code>TaskAwaiter<\/code>, and <code>Task&lt;TResult&gt;.GetAwaiter()<\/code> returns a <code>TaskAwaiter&lt;TResult&gt;<\/code>, both of which are distinct struct types. Since the compiler needs to get these awaiters prior to the <code>await<\/code> (<code>IsCompleted<\/code>, <code>UnsafeOnCompleted<\/code>) and then needs to access them after the <code>await<\/code> (<code>GetResult<\/code>), the awaiters need to be stored .  And since they&#8217;re distinct struct types, the compiler needs to maintain two separate fields to do so (the alternative would be to box them and have a single <code>object<\/code> field for awaiters, but that would result in extra allocation costs).  The compiler will try to reuse fields whenever possible, though.  If I have:\n<pre><code class=\"language-C#\">public async Task M()\r\n{\r\n    await Task.FromResult(1);\r\n    await Task.FromResult(true);\r\n    await Task.FromResult(2);\r\n    await Task.FromResult(false);\r\n    await Task.FromResult(3);\r\n}<\/code><\/pre>\n<p>there are five <code>await<\/code>s, but only two different types of awaiters involved: three are <code>TaskAwaiter&lt;int&gt;<\/code> and two are <code>TaskAwaiter&lt;bool&gt;<\/code>.  As such, there only end up being two awaiter fields on the state machine:<\/p>\n<pre><code class=\"language-C#\">private struct &lt;M&gt;d__0 : IAsyncStateMachine\r\n{\r\n    public int &lt;&gt;1__state;\r\n    public AsyncTaskMethodBuilder &lt;&gt;t__builder;\r\n    private TaskAwaiter&lt;int&gt; &lt;&gt;u__1;\r\n    private TaskAwaiter&lt;bool&gt; &lt;&gt;u__2;\r\n    ...\r\n}<\/code><\/pre>\n<p>Then if I change my example to instead be:<\/p>\n<pre><code class=\"language-C#\">public async Task M()\r\n{\r\n    await Task.FromResult(1);\r\n    await Task.FromResult(true);\r\n    await Task.FromResult(2).ConfigureAwait(false);\r\n    await Task.FromResult(false).ConfigureAwait(false);\r\n    await Task.FromResult(3);\r\n}<\/code><\/pre>\n<p>there are still only <code>Task&lt;int&gt;<\/code>s and <code>Task&lt;bool&gt;<\/code>s involved, but I&#8217;m actually using four distinct struct awaiter types, because the awaiter returned from the <code>GetAwaiter()<\/code> call on the thing returned by <code>ConfigureAwait<\/code> is a different type than that returned by <code>Task.GetAwaiter()<\/code>&#8230; this is again evident from the awaiter fields created by the compiler:<\/p>\n<pre><code class=\"language-C#\">private struct &lt;M&gt;d__0 : IAsyncStateMachine\r\n{\r\n    public int &lt;&gt;1__state;\r\n    public AsyncTaskMethodBuilder &lt;&gt;t__builder;\r\n    private TaskAwaiter&lt;int&gt; &lt;&gt;u__1;\r\n    private TaskAwaiter&lt;bool&gt; &lt;&gt;u__2;\r\n    private ConfiguredTaskAwaitable&lt;int&gt;.ConfiguredTaskAwaiter &lt;&gt;u__3;\r\n    private ConfiguredTaskAwaitable&lt;bool&gt;.ConfiguredTaskAwaiter &lt;&gt;u__4;\r\n    ...\r\n}<\/code><\/pre>\n<p>If you find yourself wanting to optimize the size associated with an async state machine, one thing you can look at is whether you can consolidate the kinds of things being awaited and thereby consolidate these awaiter fields.<\/p>\n<\/li>\n<\/ul>\n<p>There are other kinds of fields you might see defined on a state machine. Notably, you might see some fields containing the word &#8220;wrap&#8221;. Consider this silly example:<\/p>\n<pre><code class=\"language-C#\">public async Task&lt;int&gt; M() =&gt; await Task.FromResult(42) + DateTime.Now.Second;<\/code><\/pre>\n<p>This produces a state machine with the following fields:<\/p>\n<pre><code class=\"language-C#\">private struct &lt;M&gt;d__0 : IAsyncStateMachine\r\n{\r\n    public int &lt;&gt;1__state;\r\n    public AsyncTaskMethodBuilder&lt;int&gt; &lt;&gt;t__builder;\r\n    private TaskAwaiter&lt;int&gt; &lt;&gt;u__1;\r\n    ...\r\n}<\/code><\/pre>\n<p>Nothing special so far.  Now flip the order of the expressions being added:<\/p>\n<pre><code class=\"language-C#\">public async Task&lt;int&gt; M() =&gt; DateTime.Now.Second + await Task.FromResult(42);<\/code><\/pre>\n<p>With that, you get these fields:<\/p>\n<pre><code class=\"language-C#\">private struct &lt;M&gt;d__0 : IAsyncStateMachine\r\n{\r\n    public int &lt;&gt;1__state;\r\n    public AsyncTaskMethodBuilder&lt;int&gt; &lt;&gt;t__builder;\r\n    private int &lt;&gt;7__wrap1;\r\n    private TaskAwaiter&lt;int&gt; &lt;&gt;u__1;\r\n    ...\r\n}<\/code><\/pre>\n<p>We now have one more: <code>&lt;&gt;7__wrap1<\/code>. Why? Because we computed the value of <code>DateTime.Now.Second<\/code>, and only after computing it, we had to <code>await<\/code> something, and the value of the first expression needs to be preserved in order to add it to the result of the second.  The compiler thus needs to ensure that the temporary result from that first expression is available to add to the result of the <code>await<\/code>, which means it needs to spill the result of the expression into a temporary, which it does with this <code>&lt;&gt;7__wrap1<\/code> field.  If you ever find yourself hyper-optimizing async method implementations to drive down the amount of memory allocated, you can look for such fields and see if small tweaks to the source could avoid the need for spilling and thus avoid the need for such temporaries.<\/p>\n<h2>Wrap Up<\/h2>\n<p>I hope this post has helped to illuminate exactly what&#8217;s going on under the covers when you use <code>async<\/code>\/<code>await<\/code>, but thankfully you generally don&#8217;t need to know or care. There are many moving pieces here, all coming together to create an efficient solution to writing scalable asynchronous code without having to deal with callback soup.  And yet at the end of the day, those pieces are actually relatively simple: a universal representation for any asynchronous operation, a language and compiler capable of rewriting normal control flow into a state machine implementation of coroutines, and patterns that bind them all together.  Everything else is optimization gravy.<\/p>\n<p>Happy coding!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Async\/await was added to the C# language over a decade ago and has transformed how we write scalable code for .NET. But how does it really work? In this post, we take a deep dive into its internals.<\/p>\n","protected":false},"author":360,"featured_media":44716,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[685,7699,7254],"tags":[36,7715,7714,46],"class_list":["post-44715","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-dotnet","category-dotnet-fundamentals","category-featured","tag-async","tag-async-programming","tag-async-await","tag-c"],"acf":[],"blog_post_summary":"<p>Async\/await was added to the C# language over a decade ago and has transformed how we write scalable code for .NET. But how does it really work? In this post, we take a deep dive into its internals.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts\/44715","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/users\/360"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/comments?post=44715"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts\/44715\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/media\/44716"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/media?parent=44715"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/categories?post=44715"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/tags?post=44715"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}