{"id":111291,"date":"2025-06-20T07:00:00","date_gmt":"2025-06-20T14:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=111291"},"modified":"2025-06-20T09:44:14","modified_gmt":"2025-06-20T16:44:14","slug":"20250620-00-2","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20250620-00\/?p=111291","title":{"rendered":"The case of the invalid handle error when a handle is closed while a thread is waiting on it"},"content":{"rendered":"<p>A customer tracked one of their crashes to an invalid handle exception being raised when one thread closed a handle that another thread was waiting for. Or at least that&#8217;s how they presented the problem.<\/p>\n<p>The stack trace in the crash dump said<\/p>\n<pre>ntdll!KiRaiseUserExceptionDispatcher+0x3a\r\nKERNELBASE!WaitForMultipleObjectsEx+0x123\r\nKERNELBASE!WaitForMultipleObjects+0x11\r\ncontoso!Widget::WaitUntilReadyAsync$_ResumeCoro$1+0x1316\r\ncontoso!std::experimental::coroutine_handle&lt;void&gt;::resume+0xc\r\ncontoso!std::experimental::coroutine_handle&lt;void&gt;::operator()+0xc\r\ncontoso!winrt::impl::resume_background_callback+0x10\r\nntdll!TppSimplepExecuteCallback+0x14d\r\nntdll!TppWorkerThread+0x819\r\nkernel32!BaseThreadInitThunk+0x17\r\n<\/pre>\n<p>Here&#8217;s a simplified version of the code:<\/p>\n<pre>struct Widget : std::enable_shared_from_this&lt;Widget&gt;\r\n{\r\n    wil::unique_event m_readyEvent{ wil::EventOptions::ManualReset };\r\n    wil::unique_event m_shutdownEvent{ wil::EventOptions::ManualReset };\r\n\r\n    winrt::IAsyncOperation&lt;bool&gt; WaitUntilReadyAsync()\r\n    {\r\n        co_await winrt::resume_background();\r\n\r\n        HANDLE events[] = { m_readyEvent.get(), m_shutdownEvent.get() };\r\n        auto status = WaitForMultipleObjects(ARRAYSIZE(events), events,\r\n                        FALSE \/* bWaitAll *\/, INFINITE);\r\n\r\n        switch (status) {\r\n        case WAIT_OBJECT_0:\r\n            co_return true; \/\/ the ready event is set\r\n\r\n        case WAIT_OBJECT_0 + 1:\r\n            co_return false; \/\/ the shutdown event is set\r\n\r\n        case WAIT_FAILED:\r\n            FAIL_FAST_LAST_ERROR();\r\n\r\n        default:\r\n            FAIL_FAST();\r\n        }\r\n    }\r\n};\r\n<\/pre>\n<p>The customer&#8217;s debugging showed that the <code>Widget<\/code> object had already destructed. (The coroutine should have done a <code>auto lifetime = shared_from_this()<\/code> to ensure that the <code>Widget<\/code> did not destruct while it was still in use.) The destructor of <code>wil::unique_event<\/code> closes the event handle, so we have a case of closing a handle while a thread was waiting on it. The <code>Wait\u00adFor\u00adMultiple\u00adObjecs<\/code> documentation calls this out:<\/p>\n<blockquote class=\"q\"><p>If one of these handles is closed while the wait is still pending, the function&#8217;s behavior is undefined.<\/p><\/blockquote>\n<p>The customer noted that the behavior was undefined, but nevertheless wondered why they were crashing at the <code>Wait\u00adFor\u00adMultiple\u00adObjects<\/code> call. When they tried to reproduce the error in-house by forcing the object to destruct during the wait, but they couldn&#8217;t get the crash that their clients were getting.<\/p>\n<p>The first thing to note is that &#8220;undefined behavior&#8221; means that anything can happen. Maybe it crashes, maybe it hangs, maybe it returns &#8220;The event is set&#8221; even though it isn&#8217;t, maybe it just seems to work okay. There&#8217;s no requirement that the undefined behavior be consistent from one instance to another.<\/p>\n<p>But you don&#8217;t need to use the &#8220;undefined behavior&#8221; escape hatch to explain the behavior.<\/p>\n<p>If you have a case where one thread closes a handle after another thread has started waiting for it, then you also have a case where one thread closes a a handle <i>before<\/i> another thread has started waiting for it.<\/p>\n<p>If this is possible&#8230;<\/p>\n<table style=\"border-collapse: collapse;\" border=\"0\" cellspacing=\"0\" cellpadding=\"3\">\n<tbody>\n<tr>\n<th style=\"border: 1px currentcolor; border-style: none solid solid none;\">Thread 1<\/th>\n<th style=\"border-bottom: solid 1px currentcolor;\">Thread 2<\/th>\n<\/tr>\n<tr>\n<td style=\"border-right: solid 1px currentcolor;\"><tt>WaitForMultipleObjects(...);<\/tt><\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td style=\"border-right: solid 1px currentcolor;\">\u00a0<\/td>\n<td><tt>CloseHandle(...);<\/tt><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Then this is also possible because there is no synchronization between the two threads, so one CPU might just get lucky and execute a tiny bit faster than the other:<\/p>\n<table style=\"border-collapse: collapse;\" border=\"0\" cellspacing=\"0\" cellpadding=\"3\">\n<tbody>\n<tr>\n<th style=\"border: 1px currentcolor; border-style: none solid solid none;\">Thread 1<\/th>\n<th style=\"border-bottom: solid 1px currentcolor;\">Thread 2<\/th>\n<\/tr>\n<tr>\n<td style=\"border-right: solid 1px currentcolor;\">\u00a0<\/td>\n<td><tt>CloseHandle(...);<\/tt><\/td>\n<\/tr>\n<tr>\n<td style=\"border-right: solid 1px currentcolor;\"><tt>WaitForMultipleObjects(...);<\/tt><\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>In this case, it&#8217;s clear that the <code>Wait\u00adFor\u00adMultiple\u00adObjects<\/code> will fail with &#8220;invalid handle&#8221; since the handle had already been closed by the time we try to wait on it.<\/p>\n<p>Therefore, my guess was that we are in the second case, where the <code>CloseHandle<\/code> raced ahead of the <code>Wait\u00adFor\u00adMultiple\u00adObjects<\/code>, causing the <code>Wait\u00adFor\u00adMultiple\u00adObjects<\/code> to wait on an invalid handle. We know that it&#8217;s possible, and the exception code of &#8220;invalid handle&#8221; is consistent with that theory.<\/p>\n<p>The customer was not entirely convinced that the object was being destroyed before the wait. They observed that the <code>status<\/code> variable always held the value 41. What does 41 mean?<\/p>\n<p>The <code>status<\/code> variable contains the value 41, not because <code>Wait\u00adFor\u00adMultiple\u00adObjects<\/code> returned 41, but because <code>Wait\u00adFor\u00adMultiple\u00adObjects<\/code> <i>never returned a value<\/i>. This process had enabled the strict handle checking policy by using <a href=\"https:\/\/learn.microsoft.com\/en-us\/windows\/win32\/api\/processthreadsapi\/nf-processthreadsapi-setprocessmitigationpolicy\"> <code>Set\u00adProcess\u00adMitigation\u00adPolicy<\/code><\/a>, which means that a use of an invalid handle raises an exception rather than just failing with <code>ERROR_<wbr \/>INVALID_<wbr \/>HANDLE<\/code>. You can see this in the stack trace where the call to <code>Wait\u00adFor\u00adMultiple\u00adObjects\u00adEx<\/code> raised an exception rather than returning.<\/p>\n<p>Therefore, the <code>status<\/code> variable has yet to be initialized, so it contains garbage. The value 41 doesn&#8217;t mean anything; it&#8217;s a value left over from previous computations.<\/p>\n<p>The customer was still unconvinced. &#8220;Are you sure that it&#8217;s garbage? In the case of garbage, we would have expected it to contain a random number, but in the crash dumps, the value is consistently 41.&#8221;<\/p>\n<p>Garbage is not the same as random.<\/p>\n<p>If you look in my garbage can as I take it out to the curb, you will always find a bag of lint on top. That&#8217;s because the laundry room is the room closest to where I keep the garbage can, so the last thing I do before taking out the garbage can is grab the trash from the laundry room and put it into the garbage can, and the result is that the lint always winds up on the top.<\/p>\n<p>But it&#8217;s still garbage.<\/p>\n<p>In the case of the program that is crashing, the storage assigned to that variable happens to have always had a leftover value of 41. Why? Who knows. I guess you could debug it further if you are really curious, but really, it&#8217;s not of any consequence.<\/p>\n<p>The code could fix the race condition by taking a strong reference to the <code>Widget<\/code> to ensure that it doesn&#8217;t destruct while it&#8217;s still in use.<\/p>\n<pre>    winrt::IAsyncOperation&lt;bool&gt; WaitUntilReadyAsync()\r\n    {\r\n        <span style=\"border: solid 1px currentcolor;\">auto lifetime = shared_from_this();<\/span>\r\n\r\n        co_await winrt::resume_background();\r\n\r\n        \u27e6 ... rest as before ... \u27e7\r\n    }\r\n<\/pre>\n<p><b>Bonus chatter<\/b>: I found it a bit ironic that the customer simultaneously believed that &#8220;unpredictable behavior should always behave the same&#8221; when they were trying to reproduce the problem, yet also believed that &#8220;unpredictable behavior should never behave the same&#8221; when they were trying to explain the consistent presence of the value 41.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>You are theorizing one race but experiencing another.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-111291","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>You are theorizing one race but experiencing another.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/111291","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=111291"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/111291\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=111291"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=111291"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=111291"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}