{"id":108392,"date":"2023-07-05T07:00:00","date_gmt":"2023-07-05T14:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=108392"},"modified":"2023-06-19T07:29:23","modified_gmt":"2023-06-19T14:29:23","slug":"20230705-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20230705-00\/?p=108392","title":{"rendered":"How to wait for multiple C++ coroutines to complete before propagating failure, preallocating the coroutine frame"},"content":{"rendered":"<p>Last time, <a title=\"\" href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20230704-00\/?p=108389\"> we dealt with memory allocation failures in our <code>when_<wbr \/>all_<wbr \/>completed<\/code> coroutine by terminating immediately<\/a>. But can we avoid memory allocation failures entirely?<\/p>\n<p>We learned some time ago that the coroutine frame consists of the following things:<\/p>\n<ul>\n<li>Promise object.<\/li>\n<li>Inbound parameters.<\/li>\n<li>Local variables.<\/li>\n<li>Temporaries.<\/li>\n<li>Compiler overhead.<\/li>\n<\/ul>\n<p>This size is fixed for a given coroutine, but it varies from coroutine to coroutine depending on what goes into the coroutine body.<\/p>\n<p>But if we can anticipate the necessary size for the coroutine frame, we can preallocate the memory in the caller&#8217;s frame, thereby avoiding the need for dynamic allocation.<\/p>\n<p>Recall that our coroutine body looks like this:<\/p>\n<pre>    auto capture_exception = [](auto&amp; async)\r\n        -&gt; all_completed_result {\r\n        co_await std::move(async);\r\n    };\r\n<\/pre>\n<p>Let&#8217;s look at the sizes of the things that contribute to the coroutine frame. Some of them are easy to calculate:<\/p>\n<ul>\n<li>Promise object: <code>sizeof(all_completed_promise)<\/code>.<\/li>\n<li>Inbound parameters: Size of a reference is <code>sizeof(void*)<\/code>.<\/li>\n<li>Local variables: <code>sizeof(std::exception_ptr)<\/code>.<\/li>\n<\/ul>\n<p>Temporaries will require some thought. What temporaries are created by our lambda?<\/p>\n<p>The <code>co_await<\/code> expression triggers the possible creation of a temporary awaiter. So we&#8217;ll have to calculate the size of the awaiter associated with the <code>async<\/code> object.<\/p>\n<p>The last piece is the compiler overhead. We will have to determine this experimentally because each compiler is welcome to implement coroutines in its own way. (Although there is <a title=\"Debugging coroutine handles: The Microsoft Visual C++ compiler, clang, and gcc\" href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20211007-00\/?p=105777\"> a de facto ABI shared by the three major compiler vendors<\/a>.)<\/p>\n<p>Determining the awaiter requires us to reimplement <a title=\"C++ coroutines: Defining the co_await operator\" href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20191218-00\/?p=103221\"> the algorithm the compiler uses to find the awaiter<\/a>. I&#8217;ll adapt the code from the C++\/WinRT library:<\/p>\n<pre>class awaiter_finder\r\n{\r\n    template&lt;typename T&gt;\r\n    static void find_co_await_member(T&amp;&amp;, ...);\r\n    template&lt;typename T&gt;\r\n    static auto find_co_await_member(T&amp;&amp; value, int)\r\n    -&gt; decltype(static_cast&lt;T&amp;&amp;&gt;(value).operator co_await()) {\r\n         return static_cast&lt;T&amp;&amp;&gt;(value).operator co_await();\r\n    }\r\n    template&lt;typename T&gt;\r\n    using member_awaiter = decltype(find_co_await_member(std::declval&lt;T&gt;(), 0));\r\n\r\n    template&lt;typename T&gt;\r\n    static void find_co_await_free(T&amp;&amp;, ...);\r\n    template&lt;typename T&gt;\r\n    static auto find_co_await_free(T&amp;&amp; value, int)\r\n    -&gt; decltype(operator co_await(static_cast&lt;T&amp;&amp;&gt;(value))) {\r\n         return operator co_await(static_cast&lt;T&amp;&amp;&gt;(value));\r\n    }\r\n    template&lt;typename T&gt;\r\n    using free_awaiter = decltype(find_co_await_free(std::declval&lt;T&gt;(), 0));\r\n\r\npublic:\r\n    template&lt;typename T&gt;\r\n    static auto get_awaiter(T&amp;&amp; value)\r\n    {\r\n        if constexpr (!std::is_same_v&lt;member_awaiter&lt;T&gt;, void&gt;) {\r\n            return find_co_await_member(static_cast&lt;T&amp;&amp;&gt;(value), 0);\r\n        } else if constexpr (!std::is_same_v&lt;free_awaiter&lt;T&gt;, void&gt;) {\r\n            return find_co_await_free(static_cast&lt;T&amp;&amp;&gt;(value), 0);\r\n        } else {\r\n            return (char)0;\r\n        }\r\n    }\r\n\r\n    template&lt;typename T&gt;\r\n    using type = decltype(get_awaiter(std::declval&lt;T&gt;()));\r\n};\r\n<\/pre>\n<p>This uses SFINAE to detect whether a class has <code>co_await<\/code> as a member operator and to detect whether it supports <code>co_await<\/code> as a free function operator. These are the two cases where the <code>co_await<\/code> will create a temporary awaiter, and in those cases, we return the (suitably decayed)\u00b9 type of that awaiter. In the case where the object is its own awaiter, there is no temporary, so the extra memory required for the awaiter is zero. There are no objects of size zero in C++, so we just use a <code>char<\/code>, which has size 1. This is an overestimate, but that&#8217;s okay.<\/p>\n<p>We can now use the <code>awaiter_finder<\/code> to build up the storage for holding our coroutine frames. Since only one coroutine frame is needed at a time, we just need a buffer that is big enough to hold the largest one.<\/p>\n<pre>template&lt;typename... Types&gt;\r\nstruct coroutine_frame_storage\r\n{\r\n    void* overhead[50];\r\n    struct alignas(typename awaiter_finder::type&lt;Types&gt;...) {\r\n        char buffer[(std::max)({ sizeof(\r\n            typename awaiter_finder::type&lt;Types&gt;)... })];\r\n    } awaiter;\r\n};\r\n<\/pre>\n<p>There is no way to calculate the compiler overhead except by just playing around with the compiler. In my experiments, the compiler overhead is around 48 bytes, so I&#8217;m going to be a little generous and say it&#8217;s 200 bytes.<\/p>\n<p>We can preallocate the memory for the coroutine frame in the caller and pass it to the coroutine function as a bonus parameter. The custom <code>operator new<\/code> then knows to use that storage for the coroutine frame.<\/p>\n<pre>struct all_completed_promise\r\n{\r\n    ...\r\n\r\n    <span style=\"border: solid 1px currentcolor; border-bottom: none;\">template&lt;typename Lambda, typename...Storage, typename Async&gt;<\/span>\r\n    <span style=\"border: 1px currentcolor; border-style: none solid;\">void* operator new(                                          <\/span>\r\n    <span style=\"border: 1px currentcolor; border-style: none solid;\">    std::size_t n,                                           <\/span>\r\n    <span style=\"border: 1px currentcolor; border-style: none solid;\">    Lambda&amp;&amp;,                                                <\/span>\r\n    <span style=\"border: 1px currentcolor; border-style: none solid;\">    coroutine_frame_storage&lt;Storage...&gt;&amp; storage,            <\/span>\r\n    <span style=\"border: 1px currentcolor; border-style: none solid;\">    Async&amp;&amp;) {                                               <\/span>\r\n    <span style=\"border: 1px currentcolor; border-style: none solid;\">    \/\/ If this terminates, then we need to increase the      <\/span>\r\n    <span style=\"border: 1px currentcolor; border-style: none solid;\">    \/\/ extra overhead in coroutine_frame_storage.            <\/span>\r\n    <span style=\"border: 1px currentcolor; border-style: none solid;\">    if (n &gt; sizeof(storage)) std::terminate();               <\/span>\r\n    <span style=\"border: 1px currentcolor; border-style: none solid;\">\u00a0                                                            <\/span>\r\n    <span style=\"border: 1px currentcolor; border-style: none solid;\">    return std::addressof(storage);                          <\/span>\r\n    <span style=\"border: 1px currentcolor; border-style: none solid;\">}                                                            <\/span>\r\n    <span style=\"border: 1px currentcolor; border-style: none solid;\">\u00a0                                                            <\/span>\r\n    <span style=\"border: solid 1px currentcolor; border-top: none;\">void operator delete(void*) {}                               <\/span>\r\n};\r\n\r\ntemplate&lt;typename... T&gt;\r\nIAsyncAction when_all_complete(T... asyncs)\r\n{\r\n    std::exception_ptr eptr;\r\n    <span style=\"border: solid 1px currentcolor;\">coroutine_frame_storage&lt;T...&gt; storage;<\/span>\r\n\r\n    auto capture_exception = [](<span style=\"border: solid 1px currentcolor;\">auto&amp; storage<\/span>, auto&amp; async)\r\n        -&gt; all_completed_result {\r\n        co_await std::move(async);\r\n    };\r\n\r\n    auto accumulate = [&amp;](std::exception_ptr e) {\r\n        if (eptr == nullptr) eptr = e;\r\n    };\r\n\r\n    (accumulate(co_await capture_exception(<span style=\"border: solid 1px currentcolor;\">storage<\/span>, asyncs)), ...);\r\n\r\n    if (eptr) std::rethrow_exception(eptr);\r\n}\r\n<\/pre>\n<p>Note that this code reuses the same <code>coroutine_<wbr \/>frame_<wbr \/>storage<\/code> for each <code>co_await<\/code>. This requires that the coroutine storage be deleted before the next one starts. We accomplished this by having the <code>all_<wbr \/>completed_<wbr \/>result<\/code> destroy the coroutine when it resumes. That way, the storage is no longer in use when the next <code>co_await<\/code> begins.<\/p>\n<p>This was an awful lot of work to avoid &#8220;out of memory&#8221; errors, and it involves a little bit of chumminess with the compiler (to calculate the size of the coroutine frame). Mind you, precalculating the coroutine frame size is one of the things called out in the original coroutine specification for scenarios that must avoid dynamic memory allocation, so at least what we&#8217;re doing is implicitly acceptable to the authors of the coroutine specification.<\/p>\n<p>But maybe we can avoid needing to be chummy at all.<\/p>\n<p>We&#8217;ll look at this next time. The secret is to avoid coroutines.<\/p>\n<p>\u00b9 The decay happens when we return the type from an <code>auto<\/code> method. If the return type of the <code>co_await<\/code> operator is a reference, the referred-to object is copied and returned.<\/p>\n<p>\u00b2 Note that we could not do<\/p>\n<pre>template&lt;typename... Types&gt;\r\nstruct coroutine_frame_storage\r\n{\r\n    void* overhead[50];\r\n    char awaiter[std::max(\r\n        { sizeof(typename awaiter_finder::type&lt;Types&gt;)... })];\r\n};\r\n<\/pre>\n<p>This version allocates the correct number of bytes, but it does not preserve any alignment requirements awaiter.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Avoiding dynamic memory allocation.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-108392","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>Avoiding dynamic memory allocation.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/108392","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=108392"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/108392\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=108392"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=108392"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=108392"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}