{"id":107842,"date":"2023-02-17T07:00:00","date_gmt":"2023-02-17T15:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=107842"},"modified":"2023-02-16T22:59:20","modified_gmt":"2023-02-17T06:59:20","slug":"20230217-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20230217-00\/?p=107842","title":{"rendered":"What happens if you co_await a std::future, and why is it a bad idea?"},"content":{"rendered":"<p>The C++ standard library introduced <code>std::future<\/code> in C++11, along with various functions and types that produce futures: <code>std::async<\/code>, <code>std::packaged_task<\/code>, and <code>std::promise<\/code>. The only way to known when the result of a <code>std::future<\/code> is ready is to poll for it, or simply block until the result is ready.<\/p>\n<p>When the Visual C++ compiler implemented experimental coroutine support, it added the ability to <code>co_await<\/code> a <code>std::future<\/code>: If you do that, the coroutine suspends until the <code>std::future<\/code> produces a result, and the result of the <code>std::future<\/code> becomes the result of the <code>co_await<\/code>.<\/p>\n<p>That sounds convenient.<\/p>\n<p>A customer reported that sometimes their program would crash with an out-of-memory error. They sent us some of the crash dumps they received. The crash dumps showed that their program had created <a href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20050729-14\/?p=34773\"> around 2000 threads<\/a> before finally succumbing. And most of the threads were waiting on a condition variable.<\/p>\n<pre>ntdll!ZwWaitForAlertByThreadId+0x14\r\nntdll!RtlSleepConditionVariableSRW+0x137\r\nKERNELBASE!SleepConditionVariableSRW+0x33\r\nmsvcp_win!Concurrency::details::stl_condition_variable_win7::wait_for+0x15\r\nmsvcp_win!Concurrency::details::stl_condition_variable_win7::wait+0x19\r\nmsvcp_win!_Cnd_wait+0x2a\r\ncontoso!std::condition_variable::wait+0x10\r\ncontoso!std::_Associated_state&lt;winrt::hstring&gt;::_Wait+0x3b\r\ncontoso!std::_State_manager&lt;winrt::hstring&gt;::wait+0x42\r\ncontoso!std::experimental::_Future_awaiter&lt;winrt::hstrint&gt;::await_suspend::__l2::&lt;lambda_5f42a2a4a1d632a6517852fe05159fc3&gt;::operator()+0x45\r\ncontoso!std::invoke+0x45\r\ncontoso!std::thread::_Invoke&lt;std::tuple&lt;&lt;lambda_5f42a2a4a1d632a6517852fe05159fc3&gt; &gt;,0&gt;+0x53\r\nucrtbase!thread_start&lt;unsigned int (__cdecl*)(void *),1&gt;+0x93\r\nKERNEL32!BaseThreadInitThunk+0x14\r\nntdll!RtlUserThreadStart+0x28\r\n<\/pre>\n<p>From the function names on the stack, we can pull out that this code is waiting for a <code>std::future<\/code> to become ready. (Lots of the names are strong hints, but the giveaway is <code>_Future_awaiter<\/code>.)<\/p>\n<p>Let&#8217;s look at how <code>operator co_await<\/code> is implemented for <code>std::future<\/code>:<\/p>\n<pre>template &lt;class _Ty&gt;\r\nstruct _Future_awaiter {\r\n    future&lt;_Ty&gt;&amp; _Fut;\r\n\r\n    bool await_ready() const {\r\n        return _Fut._Is_ready();\r\n    }\r\n\r\n    void await_suspend(\r\n        experimental::coroutine_handle&lt;&gt; _ResumeCb) {\r\n        \/\/ TRANSITION, change to .then if and when future gets .then\r\n        thread _WaitingThread(\r\n            [&amp;_Fut = _Fut, _ResumeCb]() mutable {\r\n            _Fut.wait();\r\n            _ResumeCb();\r\n        });\r\n        _WaitingThread.detach();\r\n    }\r\n\r\n    decltype(auto) await_resume() {\r\n        return _Fut.get();\r\n    }\r\n};\r\n<\/pre>\n<p>To <code>co_await<\/code> a <code>std::future<\/code>, the code first checks if the value is already set. If not, then we create a thread and have the thread call <code>future.wait()<\/code>, which is a blocking wait. When the wait is satisfied, the coroutine resumes.<\/p>\n<p>The stack is consistent with our analysis. We are on a dedicated thread running the lambda inside <code>await_suspend<\/code>, and that lambda is waiting for the <code>std::future<\/code> to produce the result.<\/p>\n<p>Each <code>co_await<\/code> of a <code>std::future<\/code> burns a thread. Checking the customer&#8217;s code showed that there&#8217;s a <code>std::future&lt;winrt::hstring&gt;<\/code> that represents some calculation. The calculation itself requires asynchronous work, so each time somebody asks for the value to be calculated, a new <code>std::future<\/code> is created to represent the calculation, and the caller then <code>co_await<\/code>s for the result of the calculation.<\/p>\n<p>What happened is that the calculation for some reason is taking a long time, and a lot of requests have piled up. Under normal conditions, stackless coroutines do not consume a thread while they are suspended; they just sign up to be resumed when the thing they are awaiting finally produced a result. But <code>std::future<\/code> has no way to register a way to be called back when the result is ready. The only way to find out is to wait for it, and that consumes a thread. (That&#8217;s what the &#8220;TRANSITION&#8221; comment is trying to say: When it becomes possible to register a callback for the readiness of a <code>std::future<\/code>, we should switch to it.)<\/p>\n<p>The program is using <code>std::promise<\/code> as an implementation of a task completion source, unaware that the implementation is very expensive, burning a thread for each outstanding <code>co_await<\/code>. We advised the customer to switch to something lighter weight, such as the <a href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20210323-00\/?p=104987\"> task completion source we developed as part of our study of coroutines<\/a>.<\/p>\n<p>Or you can build your own quick-and-dirty task completion source that has the limitation that it doesn&#8217;t support exceptions. (Because I&#8217;m lazy.) For this customer&#8217;s purpose, that may be sufficient.<\/p>\n<pre>template&lt;typename T&gt;\r\nstruct qd_completion_source\r\n{\r\n    void set_result(T value) {\r\n        result = std::move(value);\r\n        SetEvent(event.get());\r\n    }\r\n\r\n    auto resume_when_ready() {\r\n        return winrt::resume_on_signal(event.get());\r\n    }\r\n\r\n    T&amp; get_result() { return *result; }\r\n\r\nprivate:\r\n    std::optional&lt;T&gt; result;\r\n    winrt::handle event = winrt::check_pointer(CreateEvent(nullptr, TRUE, FALSE, nullptr));\r\n}\r\n\r\n\/\/ Produce the qd_completion_source\r\nstd::shared_ptr&lt;qd_completion_source&lt;int&gt;&gt;\r\nStartSomething()\r\n{\r\n    auto source = std::make_shared&lt;\r\n            qd_completion_source&lt;int&gt;&gt;();\r\n\r\n    [](auto source) -&gt; winrt::fire_and_forget {\r\n        co_await step1();\r\n        co_await step2();\r\n        source.set_result(co_await step3());\r\n    }(source);\r\n\r\n    return source;\r\n}\r\n\r\n\/\/ Consume the qd_completion_source\r\nwinrt::fire_and_forget GetSomethingResult()\r\n{\r\n    auto source = StartSomething();\r\n\r\n    co_await source-&gt;resume_when_ready();\r\n\r\n    auto result = source-&gt;get_result();\r\n}\r\n<\/pre>\n<p>When the result is ready, our quick-and-dirty completion source saves the answer in the <code>std::optional<\/code> and then signals the event. To resume when the result is ready, we resume when the event is set.<\/p>\n<p>If you want to be awaitable more than once, you can return a copy of the result from <code>await_resume<\/code> rather than moving the result to the caller.<\/p>\n<p>Like I said, this is a quick-and-dirty version. It still uses a kernel object to synchronize between the producer and consumer, but even so, a kernel event is far lighter than an entire thread! I started writing a version that used <code>coroutine_handle&lt;&gt;<\/code> but realized that I <a href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20210323-00\/?p=104987\"> already did that<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Just waiting for something to finish.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-107842","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>Just waiting for something to finish.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/107842","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=107842"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/107842\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=107842"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=107842"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=107842"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}