{"id":107287,"date":"2022-10-14T07:00:00","date_gmt":"2022-10-14T14:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=107287"},"modified":"2022-10-14T06:53:42","modified_gmt":"2022-10-14T13:53:42","slug":"20221014-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20221014-00\/?p=107287","title":{"rendered":"The case of the memory corruption from a coroutine that already finished"},"content":{"rendered":"<p>A customer was getting sporadic crashes in the following code fragment:<\/p>\n<pre>class Widget : WidgetT&lt;Widget&gt;\r\n{\r\npublic:\r\n    winrt::IAsyncOperation&lt;bool&gt; InitializeAsync();\r\n\r\nprivate:\r\n    winrt::IAsyncAction GetHighScoreAsync();\r\n    winrt::IAsyncAction GetNameAsync();\r\n    winrt::IAsyncAction GetPictureAsync();\r\n\r\n    winrt::hstring m_name{ L\"(anonymous)\" };\r\n    winrt::SoftwareBitmap m_picture{ DefaultPicture() };\r\n    std::optional&lt;int32_t&gt; m_highScore;\r\n}\r\n\r\nwinrt::IAsyncAction Widget::GetHighScoreAsync()\r\n{\r\n    auto lifetime = get_strong();\r\n    co_await winrt::resume_background();\r\n\r\n    m_highScore = co_await GetHighScoreFromServer();\r\n}\r\n\r\nwinrt::IAsyncAction Widget::GetNameAsync()\r\n{\r\n    auto lifetime = get_strong();\r\n    co_await winrt::resume_background();\r\n\r\n    m_name = co_await GetNameFromIdentityService();\r\n}\r\n\r\nwinrt::IAsyncAction Widget::GetPictureAsync()\r\n{\r\n    auto lifetime = get_strong();\r\n    co_await winrt::resume_background();\r\n\r\n    m_picture = co_await DecodePictureFromSettings();\r\n}\r\n\r\nwinrt::IAsyncOperation&lt;bool&gt;\r\n    Widget::InitializeAsync()\r\n{\r\n    auto lifetime = get_strong();\r\n\r\n    try {\r\n        \/\/ Get information in parallel. Faster!\r\n        co_await winrt::when_all(\r\n            GetHighScoreAsync(),\r\n            GetNameAsync(),\r\n            GetPictureAsyncAsync());\r\n    } catch (...) {\r\n        \/\/ Service is unavailable or something\r\n        \/\/ else went wrong. Just proceed with whatever\r\n        \/\/ worked.\r\n    }\r\n\r\n    ShowHighScore(m_highScore);\r\n    BuildGreeting(m_name);\r\n    CropPicture(m_picture);\r\n}\r\n<\/pre>\n<p>The idea here is that they have an <code>Initialize\u00adAsync<\/code> coroutine function that wants to run a bunch of other coroutines to initialize stuff, and let those other coroutines run in parallel, since each one is doing something different. When all of the helper coroutines are done, we process the results. And if any of the helper coroutines fails, that&#8217;s okay. We just proceed with what we were able to get.<\/p>\n<p>The crashes, though, indicated that <code>Build\u00adGreeting<\/code> or <code>Crop\u00adPicture<\/code> were crashing on their accesses to <code>m_name<\/code> and <code>m_picture<\/code>.<\/p>\n<p>Let&#8217;s take a survey of how various programming languages allow you to wait for multiple asynchronous actions:<\/p>\n<table class=\"cp3\" style=\"border-collapse: collapse;\" border=\"1\" cellspacing=\"0\" cellpadding=\"3\">\n<tbody>\n<tr>\n<th>Language<\/th>\n<th>Method<\/th>\n<th>Result<\/th>\n<th>If any fail<\/th>\n<\/tr>\n<tr>\n<td>C++<\/td>\n<td>Concurrency::when_all<\/td>\n<td><code>vector&lt;T&gt;<\/code><\/td>\n<td>fail immediately<\/td>\n<\/tr>\n<tr>\n<td>C++<\/td>\n<td>winrt::when_all<\/td>\n<td><code>void<\/code><\/td>\n<td>fail immediately<\/td>\n<\/tr>\n<tr>\n<td>C#<\/td>\n<td>Task.WhenAll<\/td>\n<td><code>T[]<\/code><\/td>\n<td>wait for others<\/td>\n<\/tr>\n<tr>\n<td>JavaScript<\/td>\n<td>Promise.all<\/td>\n<td><code>Array<\/code><\/td>\n<td>fail immediately<\/td>\n<\/tr>\n<tr>\n<td>JavaScript<\/td>\n<td>Promise.allSettled<\/td>\n<td><code>Array<\/code><\/td>\n<td>wait for others<\/td>\n<\/tr>\n<tr>\n<td>Python<\/td>\n<td>asyncio.gather<\/td>\n<td>List<\/td>\n<td>fail immediately by default<\/td>\n<\/tr>\n<tr>\n<td>Rust<\/td>\n<td>join!<\/td>\n<td>tuple<\/td>\n<td>wait for others<\/td>\n<\/tr>\n<tr>\n<td>Rust<\/td>\n<td>try_join!<\/td>\n<td>tuple<\/td>\n<td>fail immediately<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Python&#8217;s <code>asynchio.gather<\/code> lets you choose whether a failed coroutine causes <code>gather<\/code> to fail immediately or to wait for others before failing. The default is to fail immediately.<\/p>\n<p>This customer is using <code>winrt::when_all<\/code>, which (consults table) fails as soon as any coroutine fails. (Our <a href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20200902-00\/?p=104155\"> custom <code>when_all<\/code> has the same behavior<\/a>.)<\/p>\n<p>What happened is that one of the coroutines, let&#8217;s say <code>Get\u00adHigh\u00adScore\u00adAsync<\/code> failed with an exception. That caused <code>winrt::when_all<\/code> to propagate the exception and abandon waiting on the other coroutines. The <code>Initialize\u00adAsync<\/code> method ignores the exception and then proceeds on the false assumption that all of the methods ran to complete (possibly with failure). When it tries to use the <code>m_name<\/code>, it races against the still-running <code>Get\u00adName\u00adAsync<\/code> method, causing the <code>L\"(anonymous)\"<\/code> string to be destructed at the same time it is being copied, which does not end well. A similar race occurs when <code>Crop\u00adPicture<\/code> reads the <code>m_picture<\/code> while <code>Get\u00adPicture\u00adAsync<\/code> is writing to it.<\/p>\n<p>The simple solution here is to catch the exceptions in the coroutines so that they never produce a failure. That way, <code>winrt::when_all<\/code> never completes early.<\/p>\n<pre>winrt::IAsyncAction Widget::GetHighScoreAsync() <span style=\"color: blue;\">try<\/span>\r\n{\r\n    auto lifetime = get_strong();\r\n    co_await winrt::resume_background();\r\n\r\n    m_highScore = co_await GetHighScoreFromServer();\r\n} <span style=\"color: blue;\">catch (...)\r\n{\r\n}<\/span>\r\n\r\nwinrt::IAsyncAction Widget::GetNameAsync() <span style=\"color: blue;\">try<\/span>\r\n{\r\n    auto lifetime = get_strong();\r\n    co_await winrt::resume_background();\r\n\r\n    m_name = co_await GetNameFromIdentityService();\r\n} <span style=\"color: blue;\">catch (...)\r\n{\r\n}<\/span>\r\n\r\nwinrt::IAsyncAction Widget::GetPictureAsync() <span style=\"color: blue;\">try<\/span>\r\n{\r\n    auto lifetime = get_strong();\r\n    co_await winrt::resume_background();\r\n\r\n    m_picture = co_await DecodePictureFromSettings();\r\n} <span style=\"color: blue;\">catch (...)\r\n{\r\n}<\/span>\r\n\r\nwinrt::IAsyncOperation&lt;bool&gt;\r\n    Widget::InitializeAsync()\r\n{\r\n    auto lifetime = get_strong();\r\n\r\n    <span style=\"color: red;\">\/\/ <span style=\"text-decoration: line-through;\">try {<\/span><\/span>\r\n    \/\/ Get information in parallel. Faster!\r\n    co_await winrt::when_all(\r\n        GetHighScoreAsync(),\r\n        GetNameAsync(),\r\n        GetPictureAsyncAsync());\r\n    <span style=\"color: red;\">\/\/ <span style=\"text-decoration: line-through;\">} catch (...) {<\/span><\/span>\r\n    <span style=\"color: red;\">\/\/ <span style=\"text-decoration: line-through;\">    \/\/ Service is unavailable or something<\/span><\/span>\r\n    <span style=\"color: red;\">\/\/ <span style=\"text-decoration: line-through;\">    \/\/ else went wrong. Just proceed with whatever<\/span><\/span>\r\n    <span style=\"color: red;\">\/\/ <span style=\"text-decoration: line-through;\">    \/\/ worked.<\/span><\/span>\r\n    <span style=\"color: red;\">\/\/ <span style=\"text-decoration: line-through;\">}<\/span><\/span>\r\n\r\n    ShowHighScore(m_highScore);\r\n    BuildGreeting(m_name);\r\n    CropPicture(m_picture);\r\n}\r\n<\/pre>\n<p>This code happens to use WIL, so there&#8217;s a helper macro for catching exceptions and logging them.<\/p>\n<pre>winrt::IAsyncAction Widget::GetHighScoreAsync() <span style=\"color: blue;\">try<\/span>\r\n{\r\n    auto lifetime = get_strong();\r\n    co_await winrt::resume_background();\r\n\r\n    m_highScore = co_await GetHighScoreFromServer();\r\n} <span style=\"color: blue;\">CATCH_LOG()<\/span>\r\n\r\nwinrt::IAsyncAction Widget::GetNameAsync() <span style=\"color: blue;\">try<\/span>\r\n{\r\n    auto lifetime = get_strong();\r\n    co_await winrt::resume_background();\r\n\r\n    m_name = co_await GetNameFromIdentityService();\r\n} <span style=\"color: blue;\">CATCH_LOG()<\/span>\r\n\r\nwinrt::IAsyncAction Widget::GetPictureAsync() <span style=\"color: blue;\">try<\/span>\r\n{\r\n    auto lifetime = get_strong();\r\n    co_await winrt::resume_background();\r\n\r\n    m_picture = co_await DecodePictureFromSettings();\r\n} <span style=\"color: blue;\">CATCH_LOG()<\/span>\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>The zombie coroutine.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-107287","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>The zombie coroutine.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/107287","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=107287"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/107287\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=107287"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=107287"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=107287"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}