{"id":104945,"date":"2021-03-10T07:00:00","date_gmt":"2021-03-10T15:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=104945"},"modified":"2021-03-11T09:41:04","modified_gmt":"2021-03-11T17:41:04","slug":"20210310-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20210310-00\/?p=104945","title":{"rendered":"Creating other types of synchronization objects that can be used with co_await, part 2: The basic library"},"content":{"rendered":"<p>Last time, I teased <a title=\"Creating other types of synchronization objects that can be used with co_await, part 1\" href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20210309-00\/?p=104942\"> a library for building awaitable synchronization objects<\/a>. It builds on the code had earlier written for one-shot events by distilling the pattern to its essence and then rebuilding it in a more generic way.<\/p>\n<p>We start with the linked list nodes which are used to keep track of coroutines who are waiting for the synchronization object.<\/p>\n<pre>namespace async_helpers::impl\r\n{\r\n    struct node_base\r\n    {\r\n        node_base* next;\r\n        node_base* prev;\r\n    };\r\n\r\n    struct node_handle : node_base\r\n    {\r\n        std::experimental::coroutine_handle&lt;&gt; handle{};\r\n    };\r\n\r\n    template&lt;typename Extra&gt;\r\n    struct node : node_handle\r\n    {\r\n        template&lt;typename... Args&gt;\r\n        node(Args&amp;&amp;... args) :\r\n            extra(std::forward&lt;Args&gt;(args)...) {}\r\n\r\n        Extra extra;\r\n    };\r\n}\r\n<\/pre>\n<p>The <code>node_base<\/code> is the node for a doubly-linked list. Derived from that is a <code>node_handle<\/code> that also carries a coroutine handle payload, and further derived from that is a <code>node&lt;Extra&gt;<\/code> which allows us to attach other payload to the node.<\/p>\n<p>Next comes the linked list of nodes:<\/p>\n<pre>namespace async_helpers\r\n{\r\n    struct node_list : impl::node_base\r\n    {\r\n        node_list() : node_base{ this, this } {}\r\n        node_list(node_list const&amp;) = delete;\r\n        void operator=(node_list const&amp;) = delete;\r\n\r\n        bool empty() const noexcept\r\n        {\r\n            return next == this;\r\n        }\r\n\r\n        void append_node(impl::node_base&amp; node) noexcept\r\n        {\r\n            node.next = this;\r\n            node.prev = prev;\r\n            prev-&gt;next = std::addressof(node);\r\n            prev = std::addressof(node);\r\n        }\r\n\r\n        bool append_list(node_list&amp; other) noexcept\r\n        {\r\n            if (other.empty()) return false;\r\n            other.prev-&gt;next = next;\r\n            next-&gt;prev = other.prev;\r\n            next = other.next;\r\n            next-&gt;prev = this;\r\n            other.next = other.prev = std::addressof(other);\r\n            return true;\r\n        }\r\n\r\n        impl::node_base* peek_head() const noexcept\r\n        {\r\n            return empty() ? nullptr : next;\r\n        }\r\n\r\n        impl::node_base* try_remove_head() noexcept\r\n        {\r\n            if (empty()) return nullptr;\r\n            auto node = next;\r\n            next = node-&gt;next;\r\n            next-&gt;prev = this;\r\n            return node;\r\n        }\r\n    };\r\n}\r\n<\/pre>\n<p>These are unexciting operations on doubly-linked lists. We have to write them ourselves because the C++ standard library uses non-intrusive linked lists, but we want intrusive linked lists to avoid the error conditions associated with memory allocation failures.<\/p>\n<p>The node list is represented by a sentinel node that marks the beginning\/end of the list. There is a little quirkiness in that I use <code>std::addressof<\/code> instead of the <code>&amp;<\/code> operator, out of an abundance of caution, in case somebody decides to overload the <code>&amp;<\/code> operator.<\/p>\n<pre>namespace async_helpers\r\n{\r\n    \/\/ Specialize if you need extra data for co_await.\r\n    template&lt;typename State&gt; struct extra_await_data {};\r\n}\r\n<\/pre>\n<p>We provide a traits class that lets you attach extra information to your <code>awaiter<\/code> to record additional context about awaiting. This will come in handy when we try to implement reader\/writer locks.<\/p>\n<p>The real excitement is in our <code>state<\/code> object. The intended usage is that you derive your full state object from the basic one, and the basic one uses CRTP to allow your full state object to customize certain operations. Last time, we saw how we could use this pattern for a one-shot event.<\/p>\n<p>We&#8217;ll look at the <code>awaitable_<wbr \/>state<\/code> a little bit at a time.<\/p>\n<pre>namespace async_helpers\r\n{\r\n    template&lt;typename State&gt;\r\n    class awaitable_state\r\n    {\r\n        std::mutex mutex;\r\n        node_list sentinel;\r\n\r\n        State&amp; parent() { return static_cast&lt;State&amp;&gt;(*this); }\r\n\r\n        auto&amp; extra_node(impl::node_base&amp; node)\r\n        { static_cast&lt;impl::node&lt;extra_await_data&gt;&amp;&gt;(node); }\r\n<\/pre>\n<p>An awaitable object&#8217;s state consists of a list of waiting coroutines, protected by a mutex. The helper function <code>parent()<\/code> helps with CRPT by downcasting the <code>awaitable_<wbr \/>state<\/code> to the full <code>State<\/code> object. Similarly, <code>extra_<wbr \/>node<\/code> takes a <code>node_<wbr \/>base<\/code> and downcasts it all the way to a <code>node&lt;extra_<wbr \/>await_<wbr \/>data&gt;<\/code> so we can access the full node complete with extra data.<\/p>\n<pre>    public:\r\n        using extra_await_data = extra_await_data&lt;State&gt;;\r\n        using node_list = async_helpers::node_list;\r\n<\/pre>\n<p>For convenience, we give names to the extra data and the node list itself.<\/p>\n<pre>        bool fast_claim(extra_await_data const&amp;) const noexcept\r\n        { return false; }\r\n<\/pre>\n<p>The <code>fast_<wbr \/>claim<\/code> method is optional. If the derived class doesn&#8217;t implement it, then we provide a default implementation that says &#8220;Nope, must do it the slow way.&#8221;<\/p>\n<pre>        extra_await_data* peek_head() const noexcept\r\n        {\r\n            auto node = sentinel.peek_head();\r\n            return node ? &amp;extra_node(node)-&gt;extra : nullptr;\r\n        }\r\n<\/pre>\n<p>The derived class can use <code>peek_<wbr \/>head<\/code> to peek at the extra data associated with the coroutine at the head of the wait list. If the wait list is empty, then the method returns <code>nullptr<\/code>. This method may be called only from within an action, since it requires that the lock be held.<\/p>\n<p>Also from within an action, the derived class can ask for one waiting coroutine to be resumed:<\/p>\n<pre>        bool resume_one(node_list&amp; list) noexcept\r\n        {\r\n            auto node = sentinel.try_remove_head();\r\n            if (!node) return false;\r\n            list.append_node(*node);\r\n            return true;\r\n        }\r\n<\/pre>\n<p>This is done by unlinking the head node from the waiting list and appending it to the resume list. We append to the tail of the resume list to preserve FIFO.<\/p>\n<p>Another thing an action can do is ask for all of the waiting coroutines to be released.<\/p>\n<pre>        bool resume_all(node_list&amp; list) noexcept\r\n        {\r\n            return list.append_list(sentinel);\r\n        }\r\n<\/pre>\n<p>This is equivalent to calling <code>resume_one<\/code> in a loop until it fails, but we can do it more efficiently by moving the entire list at once.<\/p>\n<p>When a coroutine awaits the synchronization object, we ask <code>await_<wbr \/>suspend<\/code> to do the work:<\/p>\n<pre>        bool await_suspend(\r\n            std::experimental::coroutine_handle&lt;&gt; handle,\r\n            impl::node&lt;extra_await_data&gt;&amp; node)\r\n        {\r\n            auto guard = std::lock_guard(mutex);\r\n            if (parent().claim(node.extra)) return false;\r\n            node.handle = handle;\r\n            sentinel.append_node(node);\r\n            return true;\r\n        }\r\n<\/pre>\n<p>The caller provides a coroutine handle and a preallocated <code>node<\/code>. We enter the lock and try to claim the synchronization object. If successful, then we are done, and we return <code>false<\/code> to tell the coroutine machinery that the coroutine should resume immediately. Otherwise, we remember the coroutine handle and append the node to the list of waiters, returning <code>true<\/code> to complete the suspension.<\/p>\n<pre>        void await_resume(\r\n            impl::node&lt;extra_await_data&gt;&amp; node) noexcept\r\n        {\r\n            node.handle = nullptr;\r\n        }\r\n<\/pre>\n<p>When the coroutine resumes, we null out the coroutine handle so that we know that the coroutine is no longer suspended and that we therefore are no longer in the cancellation case. I&#8217;ll discuss later why we do it in <code>await_resume<\/code>.<\/p>\n<pre>        void destruct_node(impl::node_base&amp; node) noexcept\r\n        {\r\n            if (node.handle) {\r\n                auto guard = std::lock_guard(mutex);\r\n                node.next = node.prev-&gt;next;\r\n                node.prev = node.next-&gt;prev;\r\n            }\r\n        }\r\n<\/pre>\n<p>This is part of our accommodation for cancellation: In the case that a waiting coroutine is destroyed, we check if it has a pending resumption handle. If so, then we are in the cancellation case, and we unlink it from the list of waiters. There is a race here: If the coroutine has already been scheduled for resumption, our attempt to unlink it will corrupt memory because the <code>prev<\/code> and <code>next<\/code> members of the node point to already-resumed coroutines. However, <!-- backref: Creating a <CODE>co_await<\/CODE> awaitable signal that can be awaited multiple times, part 6 --> as we discussed earlier, if this happens, then it means that the caller was already in a race condition, trying to destroy a coroutine as it is resuming. It is the caller&#8217;s responsibility to ensure that destroying a suspended coroutine happens only when it can guarantee that the coroutine is not at risk of resumption.\u00b9<\/p>\n<p>The next bit is the part that makes actions happen:<\/p>\n<pre>        template&lt;typename... Params, typename... Args&gt;\r\n        void action_impl(\r\n            void (State::*handler)(node_list&amp;, Params...),\r\n            Args&amp;&amp;... args)\r\n        {\r\n            node_list list;\r\n            {\r\n                auto guard = std::lock_guard(mutex);\r\n                (parent().*handler)(list,\r\n                    std::forward&lt;Args&gt;(args)...);\r\n            }\r\n            resume_list(list);\r\n        }\r\n<\/pre>\n<p>We create an empty node list outside the lock, take the lock, and then call the handler, forwarding all the parameters. When the handler returns, we drop the lock and resume all the coroutines it had requested to be resumed, using this <code>resume_<wbr \/>list<\/code> method:<\/p>\n<pre>    private:\r\n        void resume_list(node_list&amp; list)\r\n        {\r\n            auto node = list.next;\r\n            while (node != std::addressof(list))\r\n            {\r\n                resume_node(std::exchange(node, node-&gt;next));\r\n            }\r\n        }\r\n\r\n        void resume_node(impl::node_base* node) noexcept\r\n        {\r\n            extra_node(*node).handle.resume();\r\n        }\r\n    };\r\n<\/pre>\n<p>Resuming a list consists of calling <code>resume_<wbr \/>node<\/code> on each node in the list. Note that the node is owned by the coroutine, so resuming the coroutine will cause the awaiter to be destructed, and the node will disappear with it. We therefore have to make sure that we do not touch the node after resuming the coroutine. In this case, it means that we advance to the <code>next<\/code> node and pull out the <code>handle<\/code> before resuming the handle.<\/p>\n<p>This example resumes the coroutines synchronously. We&#8217;ll work on asynchronous resumption next time.<\/p>\n<p>The last piece is defining the synchronization object itself:<\/p>\n<pre>    template&lt;typename State&gt;\r\n    class awaitable_sync_object\r\n    {\r\n        std::shared_ptr&lt;State&gt; shared;\r\n\r\n    public:\r\n        template&lt;typename... Args&gt;\r\n        awaitable_sync_object(Args&amp;&amp;... args) :\r\n            shared(std::make_shared&lt;State&gt;(\r\n                std::forward&lt;Args&gt;(args)...)) {}\r\n\r\n        template&lt;typename Arg = typename State::extra_await_data&gt;\r\n        auto make_awaiter(Arg arg = {})\r\n        { return awaiter{ *shared, std::forward&lt;Arg&gt;(arg) }; }\r\n\r\n        auto operator co_await() { return awaiter{ *shared }; }\r\n\r\n    protected:\r\n        using state = State;\r\n\r\n        State&amp; get_state() const noexcept { return *shared; }\r\n\r\n        template&lt;typename... Args&gt;\r\n        void action_impl(Args&amp;&amp;... args) const\r\n        {\r\n            shared-&gt;action_impl(std::forward&lt;Args&gt;(args)...);\r\n        }\r\n\r\n        struct awaiter\r\n        {\r\n            template&lt;typename... Args&gt;\r\n            awaiter(State&amp; state, Args&amp;&amp;... args)\r\n                : s(state), node(std::forward&lt;Args&gt;(args)...) {}\r\n\r\n            State&amp; s;\r\n            impl::node&lt;extra_await_data&lt;State&gt;&gt; node;\r\n\r\n            bool await_ready()\r\n            { return s.fast_claim(node.extra); }\r\n\r\n            bool await_suspend(\r\n                std::experimental::coroutine_handle&lt;&gt; handle)\r\n            { return s.await_suspend(handle, node); }\r\n\r\n            void await_resume()\r\n            { s.await_resume(node); }\r\n\r\n            ~awaiter() { s.destruct_node(node); }\r\n        };\r\n    };\r\n}\r\n<\/pre>\n<p>The <code>awaitable_<wbr \/>sync_<wbr \/>object<\/code> is a wrapper around a <code>shared_<wbr \/>ptr<\/code> of the <code>State<\/code>. The shared state is constructed by forwarding all of the <code>awaitable_<wbr \/>sync_<wbr \/>object<\/code> constructor parameters to the <code>State<\/code> constructor, so that you can construct the <code>State<\/code> object with any default initial state. For example, semaphores may want to be initialized with an initial token count and possibly also a maximum token count.<\/p>\n<p>There is also a <code>make_<wbr \/>awaiter<\/code> method for creating the awaiter with optional <code>extra_<wbr \/>await_<wbr \/>data<\/code>, if you need to pass additional context for a particular <code>await<\/code> operation.<\/p>\n<p>And the last public method is the <code>co_await<\/code> operator which creates our custom-defined <code>awaiter<\/code>.<\/p>\n<p>The protected methods are for the custom synchronization object to use as part of its implementation. The <code>get_<wbr \/>state()<\/code> method lets the implementation access its own state. The <code>action_<wbr \/>impl<\/code> forwards its parameters to the state&#8217;s <code>action_<wbr \/>impl<\/code> method, which we discussed earlier.<\/p>\n<p>The <code>awaiter<\/code> is generalized so that you can construct the extra data in the node. The <code>await_<wbr \/>ready<\/code> tries to do a <code>fast_<wbr \/>claim<\/code>, which might allow the caller to bypass suspension if it was able to claim the object immediately. The <code>await_<wbr \/>suspend<\/code> method forwards to the state, which we discussed earlier. The <code>await_<wbr \/>resume<\/code> method asks the <code>State<\/code> to clean up, and recall that the method sets the <code>handle<\/code> to null to indicate that the coroutine is no longer suspended. We use that fact in the awaiter&#8217;s destructor to detect the case where the coroutine is destroyed while suspended: In that case, the <code>handle<\/code> will still hold the coroutine&#8217;s continuation (which is being abandoned), and that tells us to unlink this coroutine from the list of waiters.<\/p>\n<p>There is a race condition here, in the case that a coroutine is destroyed while a resumption is in flight, but as I noted earlier, this is a bug in the caller, so we don&#8217;t need to defend against it particularly hard. I chose to null out the <code>handle<\/code> in <code>await_<wbr \/>resume<\/code> because that is more likely to be optimized out by the compiler, seeing as the awaiter&#8217;s destructor runs immediately after <code>await_<wbr \/>resume<\/code>, which increases the likelihood that the compiler will be able to propagate the value into the awaiter&#8217;s destructor and realize that the code path is dead in the case of normal resumption.<\/p>\n<p>Now, we could have nulled out the coroutine handle at many points in the lifetime of the awaiter. Why did I choose to do it in <code>await_<wbr \/>resume<\/code>? Let&#8217;s sketch out the various possible fates of the awaiter:<\/p>\n<table class=\"cp3\" style=\"border-collapse: collapse; text-align: center;\" border=\"0\" cellspacing=\"0\" cellpadding=\"3\">\n<tbody>\n<tr>\n<td style=\"border: solid 1px black;\" colspan=\"5\">Awaiter constructed<br \/>\n<code>handle = nullptr;<\/code><\/td>\n<\/tr>\n<tr>\n<td>\u2193<\/td>\n<td>&nbsp;<\/td>\n<td>\u2193<\/td>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td style=\"border: solid 1px black;\" rowspan=\"7\">No suspension<\/td>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px black;\">Suspended<br \/>\n<code>await_suspend<\/code><br \/>\n<code>handle = coroutine;<\/code><\/td>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<td>\u2193<\/td>\n<td>\u2198\ufe0e<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px black;\">Resume<br \/>\n<code>resume_node<\/code><\/td>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px black;\" rowspan=\"5\">Cancelled<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<td>\u2193<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px black;\">Resuming<br \/>\n<code>resume_node_callback<\/code><\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<td>\u2193<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px black;\">Resumed<br \/>\n<code>await_ready<\/code><br \/>\n<code>handle = nullptr;<\/code><\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td>\u2193<\/td>\n<td>&nbsp;<\/td>\n<td>\u2193<\/td>\n<td>&nbsp;<\/td>\n<td>\u2193<\/td>\n<\/tr>\n<tr>\n<td style=\"border: solid 1px black;\" colspan=\"5\">Awaiter destructed<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>In the case where the coroutine doesn&#8217;t suspend because it was able to claim the synchronization object, the <code>handle<\/code> starts out as <code>nullptr<\/code> and stays that way until it is destructed. I&#8217;m hoping the optimizer can observe that the <code>handle<\/code> is not modified through this non-suspending path. The destructor&#8217;s call to <code>destruct_<wbr \/>node<\/code> will therefore do nothing, since it does work only if the handle is non-null, and can consequently be completely optimized out.<\/p>\n<p>In the case where the coroutine is cancelled, the coroutine handle is set to a non-null value in <code>await_<wbr \/>suspend<\/code> after the coroutine has suspended. When the coroutine is cancelled, the coroutine&#8217;s destructor destructs the awaiter, and at that point, the <code>destruct_<wbr \/>node<\/code> observes a non-null handle and knows to unlink the node from the list of waiters.<\/p>\n<p>The last case is the one down the middle, and it&#8217;s the most complicated one: The synchronization object cannot be claimed, so the <code>handle<\/code> gets set to the coroutine handle after the coroutine suspends. Later, some code signals the synchronization object, and it resumes all of the waiters. Eventually, we get to the <code>await_<wbr \/>ready<\/code> which is called inside the now-resumed coroutine, and it sets the <code>handle<\/code> back to null to indicate that everything is back to normal. By setting the <code>handle<\/code> to null at the last moment before destruction, I&#8217;m hoping the optimizer can recognize that the destructor&#8217;s call to <code>destruct_<wbr \/>node<\/code> will do nothing, since it does work only if the handle is non-null, and can be completely optimized out.<\/p>\n<p>It looks like gcc 10.1 and the Microsoft Visual C++ 19.24 compiler both can take my hints, and they do do optimize out the <code>destruct_<wbr \/>node<\/code> calls in the two cases I noted. So hooray for that.<\/p>\n<p>Okay, that was a whirlwind tour of our little library for writing generic awaitable synchronization objects. I did mention that this version resumes coroutines synchronously. Next time, <a title=\"Creating other types of synchronization objects that can be used with co_await, part 3: Parallel resumption\" href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20210311-00\/?p=104949\"> we&#8217;ll make it resume coroutines on a thread pool<\/a>.<\/p>\n<p>\u00b9 We could defend against this by having <code>resume_<wbr \/>list<\/code> make the node point to itself, so that if the race occurs, we don&#8217;t corrupt memory immediately. But really, all we are doing is delaying the corruption, because the resumption will try to resume a destroyed coroutine, which will corrupt memory in a different way. Maybe we should <code>std::terminate<\/code>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Distilling the pattern to its essence, and the building back up.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-104945","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>Distilling the pattern to its essence, and the building back up.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/104945","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=104945"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/104945\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=104945"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=104945"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=104945"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}