{"id":108342,"date":"2023-06-15T07:00:00","date_gmt":"2023-06-15T14:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=108342"},"modified":"2023-06-15T07:05:42","modified_gmt":"2023-06-15T14:05:42","slug":"20230615-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20230615-00\/?p=108342","title":{"rendered":"The case of the invalid handle despite being managed by an RAII type"},"content":{"rendered":"<p>There were sporadic reports of an invalid handle in a component. This component was using a &#8220;widget tracker&#8221;. The widget tracker is rather complicated, but here&#8217;s a simplified version:<\/p>\n<pre>typedef struct Tracker* TrackerHandle;\r\n\r\nenum CallbackReason\r\n{\r\n    WidgetFound,\r\n    InitialScanComplete,\r\n    TrackerDestroyed,\r\n}:\r\n\r\ntypedef void (CALLBACK *TrackerCallback)(int reason, int id, void* context);\r\n\r\nstruct TrackerOptions { \/* not important *\/ };\r\n\r\nHRESULT WINAPI CreateTracker(\r\n    TrackerOptions const* options,\r\n    TrackerCallback callback,\r\n    void* context,\r\n    TrackerHandle* trackerHandle);\r\n\r\nvoid WINAPI DestroyTracker(\r\n    TrackerHandler trackerHandle);\r\n<\/pre>\n<p>The idea is that you create a tracker, and the tracker will issue a <code>Widget\u00adFound<\/code> callback for each widget it finds. Next, it issues a <code>Initial\u00adScan\u00adComplete<\/code> callback to let you know that the initial widget search is complete, and it&#8217;s now monitoring for any new widgets that get created, issuing <code>Widget\u00adFound<\/code> for any new widgets that get discovered. Finally, when you destroy the tracker, you get a <code>Destroyed<\/code> callback to let you know that the tracker is destroyed and will not generate any new callbacks.<\/p>\n<p>The customer created different kinds of trackers, but used a common class to help find the first widget.<\/p>\n<pre>struct FirstWidgetResults\r\n{\r\n  int GetWidgetId() { return widgetId; }\r\n  void WaitForResults() { readyEvent.wait(); }\r\n  void WaitForDestroyed() { destroyedEvent.wait(); }\r\n\r\n  static void CALLBACK OnCallback(\r\n      int reason, int id, void* context)\r\n  {\r\n    auto self = (MyTrackerResults*)context;\r\n\r\n    switch (reason) {\r\n    case WidgetFound:\r\n      if (self-&gt;widgetId == INVALID_WIDGET_ID) {\r\n        self-&gt;widgetId = id;\r\n        self-&gt;readyEvent.Set();\r\n      }\r\n      break;\r\n        \r\n    case InitialScanComplete:\r\n      self-&gt;readyEvent.Set();\r\n      break;\r\n\r\n    case TrackerDestroyed:\r\n      self-&gt;destroyedEvent.Set();\r\n      break;\r\n    }\r\n  }\r\n\r\nprivate:\r\n  int widgetId = INVALID_WIDGET_ID;\r\n\r\n  wil::unique_event readyEvent{ wil::EventOptions::None };\r\n  wil::unique_event destroyedEvent{ wil::EventOptions::None };\r\n\r\n};\r\n<\/pre>\n<p>The idea behind this <code>First\u00adWidget\u00adResults<\/code> class is that you use it to deal with the widget tracker callbacks:<\/p>\n<ul>\n<li>When the first widget is found, it remembers the ID of that widget and signals that the results are ready.<\/li>\n<li>When the initial search is complete, it signals that the results are ready. This is important if no widgets were found during the initial search, so that we don&#8217;t wait forever for widgets to be found when none exist.<\/li>\n<li>When the tracker is destroyed, it signals that the helper class can safely be destructed.<\/li>\n<\/ul>\n<p>They found that sometimes they got an invalid handle failure at the <code>destroyedEvent.Set()<\/code>. How can that be? The event handle is kept in an RAII type, so we shouldn&#8217;t be having handle lifetime issues, yet the crash dumps tell us otherwise.<\/p>\n<p>The call stack for the invalid handle exception looks like this:<\/p>\n<pre>ntdll!KiRaiseUserExceptionDispatcher+0x3a\r\nKERNELBASE!SetEvent+0xd\r\nContoso!wil::details::SetEvent+0xa\r\nContoso!wil::event_t::SetEvent+0xa\r\nContoso!FirstWidgetResults::OnCallback+0x73\r\nlitware!WidgetTracker::RaiseCallback+0xe2\r\n...\r\n<\/pre>\n<p>We can ask the debugger for the identity of the <code>First\u00adWidget\u00adResult<\/code> that failed.<\/p>\n<pre>0:008&gt; .frame 2\r\n02 Contoso!wil::details::SetEvent+0xa\r\n0:008&gt; dv\r\n\r\n0:008&gt; .frame 3\r\n03 Contoso!wil::event_t::SetEvent+0xa\r\n0:008&gt; dv\r\n\r\n0:008&gt; .frame 4\r\n04 Contoso!FirstWidgetResults::OnCallback+0x73\r\n0:008&gt; dv\r\n         reason = &lt;value unavailable&gt;\r\n             id = &lt;value unavailable&gt;\r\n        context = &lt;value unavailable&gt;\r\n\r\n0:008&gt; .frame 5\r\n05 litware!WidgetTracker::RaiseCallback+0xe2\r\n0:008&gt; dv\r\n               this = 0x000001d6`b72bbde0\r\n              event = 0x000001d6`b72bbe58\r\n\r\n0:008&gt; ?? this\r\nclass WidgetTracker * 0x000001d6`b72bbde0\r\n   +0x008 m_SRWLock        : _RTL_SRWLOCK\r\n   +0x000 __VFN_table : 0x00007ffd`0d1c8c50\r\n   +0x010 m_pClientContext : 0x000000a9`4e5ff298 Void\r\n   +0x018 m_pClientCallback : 0x00007ffc`d3e35980 Contoso!FirstWidgetResults::OnCallback+0\r\n   ...\r\n<\/pre>\n<p>The compiler optimized out the event handle and the <code>this<\/code> pointer for the <code>First\u00adWidget\u00adResult<\/code>, so we had to go digging further and further back into the stack until we could recover it. We eventually were able to find it from the widget tracker itself.<\/p>\n<pre>0:008&gt; dt -r1 contoso!FirstWidgetResults 0x000000a9`4e5ff298\r\n   +0000 widgetId          : -739262712\r\n   +0x008 readyEvent       : wil::unique_event\r\n      +0x000 m_ptr            : 0x00007ffc`eeb48b58 Void\r\n   +0x010 destroyedEvent   : wil::unique_event\r\n      +0x000 m_ptr            : 0x00007ffc`eeaf0000 Void\r\n<\/pre>\n<p>Yeah, all of those values look like garbage. Widget IDs and kernel event handles are normally small integers, but here they are very large.<\/p>\n<p>This class is used in a few places, and all of them put it on the stack. So let&#8217;s see if we can find what thread&#8217;s stack this pointer belongs to.<\/p>\n<pre>0:008&gt; ~*k1\r\n\r\n   0  Id: 3700.2404 Suspend: 0 Teb: 000000a9`4d490000 Unfrozen\r\nChild-SP          RetAddr               Call Site\r\n000000a9`4d3deda8 00007ffd`0d7f43a9     ntdll!ZwWaitForMultipleObjects+0x14\r\n\r\n   1  Id: 3700.2728 Suspend: 0 Teb: 000000a9`4d49a000 Unfrozen\r\nChild-SP          RetAddr               Call Site\r\n000000a9`4daff2a8 00007ffd`0d7f43a9     ntdll!ZwWaitForMultipleObjects+0x14\r\n\r\n   2  Id: 3700.1d40 Suspend: 0 Teb: 000000a9`4d49e000 Unfrozen\r\nChild-SP          RetAddr               Call Site\r\n000000a9`4dcff128 00007ffd`0e5f53dc     win32u!ZwUserMsgWaitForMultipleObjectsEx+0x14\r\n\r\n   3  Id: 3700.36b8 Suspend: 0 Teb: 000000a9`4d4a2000 Unfrozen\r\nChild-SP          RetAddr               Call Site\r\n000000a9`4deff4e8 00007ffd`0d7f43a9     ntdll!ZwWaitForMultipleObjects+0x14\r\n\r\n   4  Id: 3700.34d8 Suspend: 0 Teb: 000000a9`4d4a4000 Unfrozen\r\nChild-SP          RetAddr               Call Site\r\n000000a9`4dfff828 00007ffd`0d7f43a9     ntdll!ZwWaitForMultipleObjects+0x14\r\n\r\n   5  Id: 3700.21dc Suspend: 0 Teb: 000000a9`4d4aa000 Unfrozen\r\nChild-SP          RetAddr               Call Site\r\n000000a9`4e2ff7b8 00007ffd`0e5f53dc     win32u!ZwUserMsgWaitForMultipleObjectsEx+0x14\r\n\r\n   6  Id: 3700.2184 Suspend: 0 Teb: 000000a9`4d4ae000 Unfrozen\r\nChild-SP          RetAddr               Call Site\r\n000000a9`4e4ff768 00007ffd`0ae89910     win32u!ZwGdiDdDDIWaitForVerticalBlankEvent+0x14\r\n\r\n   7  Id: 3700.2180 Suspend: 0 Teb: 000000a9`4d4b0000 Unfrozen\r\nChild-SP          RetAddr               Call Site\r\n000000a9`4ddff7b8 00007ffc`ec21d7e0     win32u!ZwUserMsgWaitForMultipleObjectsEx+0x14\r\n\r\n#  8  Id: 3700.381c Suspend: 0 Teb: 000000a9`4d50e000 Unfrozen\r\nChild-SP          RetAddr               Call Site\r\n000000a9`4d8fd7f8 00007ffd`101dabd1     ntdll!ZwAlpcSendWaitReceivePort+0x14\r\n\r\n   9  Id: 3700.3a6c Suspend: 0 Teb: 000000a9`4d524000 Unfrozen\r\nChild-SP          RetAddr               Call Site\r\n000000a9`4e5ff698 00007ffd`10111b3d     ntdll!ZwWaitForWorkViaWorkerFactory+0x14\r\n\r\n  10  Id: 3700.3854 Suspend: 0 Teb: 000000a9`4d52e000 Unfrozen\r\nChild-SP          RetAddr               Call Site\r\n000000a9`4e0ffac8 00007ffd`0e9501f3     win32u!ZwUserMsgWaitForMultipleObjectsEx+0x14\r\n\r\n  11  Id: 3700.3438 Suspend: 0 Teb: 000000a9`4d530000 Unfrozen\r\nChild-SP          RetAddr               Call Site\r\n000000a9`4e3ff7e8 00007ffd`0e9505f3     win32u!ZwUserMsgWaitForMultipleObjectsEx+0x14\r\n\r\n  12  Id: 3700.31bc Suspend: 0 Teb: 000000a9`4d538000 Unfrozen\r\nChild-SP          RetAddr               Call Site\r\n000000a9`4d7ff838 00007ffd`0e9501f3     win32u!ZwUserMsgWaitForMultipleObjectsEx+0x14\r\n\r\n  13  Id: 3700.11e8 Suspend: 0 Teb: 000000a9`4d546000 Unfrozen\r\nChild-SP          RetAddr               Call Site\r\n000000a9`4e9ff448 00007ffd`0d7f43a9     ntdll!ZwWaitForMultipleObjects+0x14\r\n\r\n  14  Id: 3700.5c4 Suspend: 0 Teb: 000000a9`4d552000 Unfrozen\r\nChild-SP          RetAddr               Call Site\r\n000000a9`4d6ff818 00007ffd`10111b3d     ntdll!ZwWaitForWorkViaWorkerFactory+0x14\r\n\r\n  15  Id: 3700.2188 Suspend: 0 Teb: 000000a9`4d554000 Unfrozen\r\nChild-SP          RetAddr               Call Site\r\n000000a9`4d9ffb38 00007ffd`10111b3d     ntdll!ZwWaitForWorkViaWorkerFactory+0x14\r\n\r\n  16  Id: 3700.2df0 Suspend: 0 Teb: 000000a9`4d556000 Unfrozen\r\nChild-SP          RetAddr               Call Site\r\n000000a9`4dbff4b8 00007ffd`10111b3d     ntdll!ZwWaitForWorkViaWorkerFactory+0x14\r\n\r\n  17  Id: 3700.1c74 Suspend: 0 Teb: 000000a9`4d558000 Unfrozen\r\nChild-SP          RetAddr               Call Site\r\n000000a9`4e1ff3a8 00007ffd`0e5f53dc     win32u!ZwUserMsgWaitForMultipleObjectsEx+0x14\r\n\r\n  18  Id: 3700.3190 Suspend: 0 Teb: 000000a9`4d55c000 Unfrozen\r\nChild-SP          RetAddr               Call Site\r\n000000a9`4e7ff768 00007ffd`0ae89910     win32u!ZwGdiDdDDIWaitForVerticalBlankEvent+0x14\r\n\r\n  19  Id: 3700.38b8 Suspend: 0 Teb: 000000a9`4d55e000 Unfrozen\r\nChild-SP          RetAddr               Call Site\r\n000000a9`4e6ffaa8 00007ffd`10111b3d     ntdll!ZwWaitForWorkViaWorkerFactory+0x14\r\n\r\n  20  Id: 3700.3bc4 Suspend: 0 Teb: 000000a9`4d560000 Unfrozen\r\nChild-SP          RetAddr               Call Site\r\n000000a9`4e8ff878 00007ffd`10111b3d     ntdll!ZwWaitForWorkViaWorkerFactory+0x14\r\n\r\n  21  Id: 3700.b34 Suspend: 0 Teb: 000000a9`4d562000 Unfrozen\r\nChild-SP          RetAddr               Call Site\r\n000000a9`4eaffbf8 00007ffd`10111b3d     ntdll!ZwWaitForWorkViaWorkerFactory+0x14\r\n\r\n  22  Id: 3700.2de8 Suspend: 0 Teb: 000000a9`4d564000 Unfrozen\r\nChild-SP          RetAddr               Call Site\r\n000000a9`4ebffa48 00007ffd`10111b3d     ntdll!ZwWaitForWorkViaWorkerFactory+0x14\r\n\r\n  23  Id: 3700.1f70 Suspend: 0 Teb: 000000a9`4d566000 Unfrozen\r\nChild-SP          RetAddr               Call Site\r\n000000a9`4ecff7c8 00007ffd`10111b3d     ntdll!ZwWaitForWorkViaWorkerFactory+0x14\r\n\r\n  24  Id: 3700.1180 Suspend: 0 Teb: 000000a9`4d568000 Unfrozen\r\nChild-SP          RetAddr               Call Site\r\n000000a9`4efff9f8 00007ffd`10111b3d     ntdll!ZwWaitForWorkViaWorkerFactory+0x14\r\n\r\n  25  Id: 3700.684 Suspend: 0 Teb: 000000a9`4d56a000 Unfrozen\r\nChild-SP          RetAddr               Call Site\r\n000000a9`4f0ffcd8 00007ffc`ec21d7e0     win32u!ZwUserMsgWaitForMultipleObjectsEx+0x14\r\n<\/pre>\n<p>The <code>~*k1<\/code> command means to select (<code>~<\/code>) all threads (<code>*<\/code>) and execute the <code>k1<\/code> command, which dumps the top frame of the call stack.<\/p>\n<p>I&#8217;m not actually interested in the call stacks. What I want to see is the <code>Child-SP<\/code> values, so I can find the thread that <code>0x000000a9`4e5ff298<\/code> belongs to.<\/p>\n<p>And here it is:<\/p>\n<pre>   9  Id: 3700.3a6c Suspend: 0 Teb: 000000a9`4d524000 Unfrozen\r\nChild-SP          RetAddr               Call Site\r\n000000a9`4e5ff698 00007ffd`10111b3d     ntdll!ZwWaitForWorkViaWorkerFactory+0x14\r\n<\/pre>\n<p>If we look at the thread&#8217;s call stack, we see<\/p>\n<pre>0:008&gt; ~9k\r\nChild-SP          RetAddr               Call Site\r\n000000a9`4e5ff698 00007ffd`10111b3d     ntdll!ZwWaitForWorkViaWorkerFactory+0x14\r\n000000a9`4e5ff6a0 00007ffd`0e4e458d     ntdll!TppWorkerThread+0x2fd\r\n000000a9`4e5ff9b0 00007ffd`10157558     kernel32!BaseThreadInitThunk+0x1d\r\n000000a9`4e5ff9e0 00000000`00000000     ntdll!RtlUserThreadStart+0x28\r\n<\/pre>\n<p>This is a thread pool worker thread that is waiting for work. The original work (that created the <code>First\u00adWidget\u00adResults<\/code>) has evidently already returned, seeing as it&#8217;s not on the stack any more, and the pointer is in fact below the current stack pointer, so it&#8217;s in freed stack space.<\/p>\n<p>What we have is a use-after-free of the <code>First\u00adWidget\u00adResults<\/code>. That explains why it&#8217;s filled with garbage, and why RAII didn&#8217;t help: The object did indeed keep the handle valid as long as it was alive, but the object itself is no longer alive!<\/p>\n<p>All uses of this <code>First\u00adWidget\u00adResults<\/code> class are as automatic variables. Here&#8217;s an example:<\/p>\n<pre>using unique_tracker = wil::unique_any&lt;\r\n    TrackerHandle, decltype(&amp;::DestroyTracker), ::DestroyTracker&gt;;\r\n\r\nWidget FindWidgetWithEnabledAdapter()\r\n{\r\n  TrackerOptions options;\r\n  \/* ... code to set the options ... *\/\r\n\r\n  FirstWidgetResults results;\r\n\r\n  \/\/ Start the tracker (or die trying)\r\n  unique_tracker tracker;\r\n  THROW_IF_FAILED(CreateTracker(\r\n        &amp;options,\r\n        FirstWidgetResults::OnCallback,\r\n        &amp;results,\r\n        &amp;tracker));\r\n\r\n  \/\/ Wait for results to arrive\r\n  results.WaitForResults();\r\n\r\n  \/\/ Fail if no qualifying widget was found\r\n  THROW_HR_IF(HRESULT_FROM_WIN32(ERROR_NOT_FOUND),\r\n              results.GetWidgetId() == INVALID_WIDGET_ID);\r\n\r\n  \/\/ Stop the tracker and wait for it to be destroyed.\r\n  tracker.reset();\r\n  results.WaitForDestroyed();\r\n\r\n  \/\/ Return the widget that we found.\r\n  return WidgetFromId(results.GetWidgetId());\r\n}\r\n<\/pre>\n<p>Do you see the problem?<\/p>\n<p>We evaluate the results of the tracker immediately after the <code>readyEvent<\/code> is signaled, and we throw an exception out of the function if we couldn&#8217;t find an acceptable Widget. Now, the <code>tracker<\/code> variable is inside an RAII type (<code>unique_tracker<\/code>), so it will be destroyed as part of the destruction of locals. However, the early exit means that we never perform the <code>destroyedEvent.wait()<\/code>. We just destruct the <code>results<\/code> without waiting.<\/p>\n<p>Now that we have a theory for the premature destruction, we can look into the error logs to seek confirmation. And indeed, the error logs show that we did throw the <code>ERROR_<wbr \/>NOT_<wbr \/>FOUND<\/code> because we didn&#8217;t find any widget.<\/p>\n<p>Okay, now that we have a theory, and the theory is supported by evidence, what can we do to fix the problem? We&#8217;ll look at that next time.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The handle remains valid for the object&#8217;s lifetime, but what is the object&#8217;s lifetime?<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-108342","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>The handle remains valid for the object&#8217;s lifetime, but what is the object&#8217;s lifetime?<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/108342","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=108342"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/108342\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=108342"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=108342"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=108342"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}