{"id":110777,"date":"2025-01-17T07:00:00","date_gmt":"2025-01-17T15:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=110777"},"modified":"2025-01-17T09:13:05","modified_gmt":"2025-01-17T17:13:05","slug":"20250117-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20250117-00\/?p=110777","title":{"rendered":"The case of the crash when trying to erase an element from a <CODE>std::set<\/CODE>"},"content":{"rendered":"<p>Today, we&#8217;ll look at a crash that occurred when trying to erase an element from a <code>std::set<\/code>.<\/p>\n<pre>rax=000001f565bc046e rbx=000001f589b20340 rcx=000001f565bc046e\r\nrdx=000000e6658feca8 rsi=000001f589b20690 rdi=000001f589b203c0\r\nrip=00007ffdd4726bc4 rsp=000000e6658fec30 rbp=0000388a1713ab55\r\n r8=000001f589b895d0  r9=000001f589b895d0 r10=000001f589000140\r\nr11=0000000000000000 r12=0000000000000001 r13=000000007ffe0385\r\nr14=0000000000000000 r15=000001f589b8f900\r\n\r\nLitWare!std::_Tree&lt;std::_Tset_traits&lt;WidgetWatcher *,\r\n        std::less&lt;WidgetWatcher *&gt;,\r\n        std::allocator&lt;WidgetWatcher *&gt;,0&gt; &gt;::_Eqrange+0x14\r\n    [inlined in LitWare!std::_Tree&lt;std::_Tset_traits&lt;\r\n        WidgetWatcher *,std::less&lt;WidgetWatcher *&gt;,\r\n        std::allocator&lt;WidgetWatcher *&gt;,0&gt; &gt;::erase+0x18]:\r\n00007ffd`d4726bc4 cmp     byte ptr [rax+19h],r11b ds:000001f5`65bc0487=??\r\n<\/pre>\n<p>The stack trace has some information about how we got here.<\/p>\n<pre>LitWare!std::_Tree&lt;std::_Tset_traits&lt;Widget *,\r\n        std::less&lt;Widget *&gt;,\r\n        std::allocator&lt;Widget *&gt;,0&gt; &gt;::_Eqrange+0x14\r\nLitWare!std::_Tree&lt;std::_Tset_traits&lt;Widget *,\r\n        std::less&lt;Widget *&gt;,\r\n        std::allocator&lt;Widget *&gt;,0&gt; &gt;::erase+0x18\r\nLitWare!Widget::~Widget+0xc8\r\nLitWare!Widget::`scalar deleting destructor'+0x14\r\nLitWare!DestroyWidget+0x15\r\nFabrikam!Doodad::~Doodad+0x75\r\nFabrikam!Doodad::`scalar deleting destructor'+0x14\r\nFabrikam!Doodad::Release+0x40\r\nContoso!Gadget::~Gadget+0x66\r\nucrtbase!&lt;lambda_\u27e6...\u27e7&gt;::operator()+0xa5\r\nucrtbase!__crt_seh_guarded_call&lt;int&gt;::operator()&lt;\u27e6...\u27e7&gt;+0x3b\r\nucrtbase!__acrt_lock_and_call+0x1c\r\nucrtbase!_execute_onexit_table+0x3d\r\nContoso!dllmain_crt_process_detach+0x45\r\nContoso!dllmain_dispatch+0xe6\r\nntdll!LdrpCallInitRoutine+0xb0\r\nntdll!LdrShutdownProcess+0x260\r\nntdll!RtlExitUserProcess+0x114\r\nkernel32!FatalExit+0xb\r\nucrtbased!exit_or_terminate_process+0x3a\r\nucrtbased!common_exit+0x85\r\nucrtbased!exit+0x16\r\n<\/pre>\n<p>The top of the stack tells us that we are trying to erase an element from a <code>std::set<\/code>. This happened in the <code>Widget<\/code> destructor:<\/p>\n<pre>struct Widget\r\n{\r\n    \u27e6 ... \u27e7\r\n\r\n    \/\/ For debugging purposes, keep track of all of the Widgets.\r\n    static wil::srwlock s_lock;\r\n    static std::set&lt;Widget*&gt; s_allWidgets;\r\n};\r\n\r\nWidget::~Widget()\r\n{\r\n    auto guard = s_lock.lock_exclusive();\r\n    s_allWidgets.erase(this);\r\n}\r\n<\/pre>\n<p>The idea is that whenever we create a <code>Widget<\/code>, we store its address in the <code>s_allWidgets<\/code> set, and when we destroy one, we remove it from the set. The comment notes that this is just for debugging purposes, so that when we want to see what all the Widgets are doing, we can walk through the set to find each one.<\/p>\n<p>Okay, so now that we have a general idea of what this code is trying to do, let&#8217;s go back and study the crash.<\/p>\n<p>First, what is the Widget being destructed?<\/p>\n<pre>0:000&gt; .frame 2\r\n02 LitWare!Widget::~Widget+0xc8\r\n0:000&gt; dv\r\n           this = <span style=\"border: solid 1px currentcolor;\">0x000001f5`89b20340<\/span>\r\n<\/pre>\n<p>Okay, remember that number.<\/p>\n<p>Next, is the <code>std::set<\/code> corrupted?<\/p>\n<pre>0:000&gt; dx LitWare!Widget::s_allWidgets\r\nLitWare!Widget::s_allWidgets : { size=0x1 }\r\n    [&lt;Raw View&gt;]\r\n    [comparator]     : less\r\n    [allocator]      : allocator\r\n0:000&gt; ?? LitWare!Widget::s_allWidgets\r\nclass std::set&lt;Widget *,std::less&lt;Widget *&gt;,std::allocator&lt;Widget *&gt; &gt;\r\n   +0x000 _Mypair          : std::_Compressed_pair&lt;\u27e6...\u27e7&gt;\r\n0:000&gt; ?? LitWare!Widget::s_allWidgets._Mypair\r\nclass std::_Compressed_pair&lt;\u27e6...\u27e7&gt;\r\n   +0x000 _Myval2          : std::_Compressed_pair&lt;\u27e6...\u27e7&gt;\r\n0:000&gt; ?? LitWare!Widget::s_allWidgets._Mypair._Myval2\r\nclass std::_Compressed_pair&lt;\u27e6...\u27e7&gt;\r\n   +0x000 _Myval2          : std::_Compressed_pair&lt;\u27e6...\u27e7&gt;\r\n0:000&gt; ?? LitWare!Widget::s_allWidgets._Mypair._Myval2._Myval2\r\nclass std::_Tree_val&lt;std::_Tree_simple_types&lt;Widget *&gt; &gt;\r\n   +0x000 _Myhead          : 0x000001f5`89b895d0 std::_Tree_node&lt;Widget *,void *&gt;\r\n   +0x008 _Mysize          : 1\r\n<\/pre>\n<p>Okay, after digging through a bunch of compressed pairs, we finally get to the goods. There is one element in the set, and we can look at the sentinel node.<\/p>\n<pre>0:000&gt; ?? LitWare!Widget::s_allWidgets._Mypair._Myval2._Myval2._Myhead\r\nstruct std::_Tree_node&lt;Widget *,void *&gt; * 0x000001f5`89b895d0\r\n   +0x000 _Left            : 0x000001f5`89b800c<span style=\"border: solid 1px currentcolor; border-bottom: none;\">b<\/span> std::_Tree_node&lt;Widget *,void *&gt;\r\n   +0x008 _Parent          : 0x000001f5`65bc046<span style=\"border: solid 1px currentcolor; border-top: none;\">e<\/span> std::_Tree_node&lt;Widget *,void *&gt;\r\n   +0x010 _Right           : 0x000001f5`89b846e<span style=\"border: solid 1px transparent;\">0<\/span> std::_Tree_node&lt;Widget *,void *&gt;\r\n   +0x018 _Color           : 1 ''\r\n   +0x019 _Isnil           : 1 ''\r\n   +0x020 _Myval           : (null)\r\n<\/pre>\n<p>This already looks bad, because the <code>_Left<\/code> and <code>_Parent<\/code> pointers are misaligned. Furthermore, if the set has only one element, then the root (<code>_Parent<\/code>), left, and right nodes should all be the same, but all of these pointers are different.<\/p>\n<p>The not-yet-obviously-corrupted pointer is <code>_Right<\/code>, and chasing through shows a similar type of corruption:<\/p>\n<pre>0:000&gt; ?? LitWare!Widget::s_allWidgets._Mypair._Myval2._Myval2._Myhead-&gt;_Right\r\nstruct std::_Tree_node&lt;Widget *,void *&gt; * 0x000001f5`89b846e0\r\n   +0x000 _Left            : 0x000001f5`89b80268 std::_Tree_node&lt;Widget *,void *&gt;\r\n   +0x008 _Parent          : 0x000001f5`f69fb889 std::_Tree_node&lt;Widget *,void *&gt;\r\n   +0x010 _Right           : 0x000001f5`89b895d0 std::_Tree_node&lt;Widget *,void *&gt;\r\n   +0x018 _Color           : 1 ''\r\n   +0x019 _Isnil           : 0 ''\r\n   +0x020 _Myval           : 0x000001f5`89b20340 Widget *\r\n0:000&gt;\r\n<\/pre>\n<p>The first two pointers are corrupted, but the rest looks okay. (Notice that the <code>_Right<\/code> points back to the sentinel node, as expected, and the <code>_Myval<\/code> is the pointer to the <code>Widget<\/code> we are trying to remove.)<\/p>\n<p>When I see a memory block that has had two pointers unexpectedly written to the start, my instincts tell me that I might be looking at a freed memory block. It is a common design in heap managers to store metadata about a freed memory block in the freed memory block itself. Often, the free blocks are kept in a doubly-linked list, which explains why the corruption is seen as two pointers.<\/p>\n<p>Okay, so my working theory is that this is freed memory, which implies that the <code>std::set<\/code> has already destructed. This theory is supported by the stack trace:<\/p>\n<pre>LitWare!std::_Tree&lt;std::_Tset_traits&lt;Widget *,\r\n        std::less&lt;Widget *&gt;,\r\n        std::allocator&lt;Widget *&gt;,0&gt; &gt;::_Eqrange+0x14\r\nLitWare!std::_Tree&lt;std::_Tset_traits&lt;Widget *,\r\n        std::less&lt;Widget *&gt;,\r\n        std::allocator&lt;Widget *&gt;,0&gt; &gt;::erase+0x18\r\nLitWare!Widget::~Widget+0xc8\r\nLitWare!Widget::`scalar deleting destructor'+0x14\r\nLitWare!DestroyWidget+0x15\r\nFabrikam!Doodad::~Doodad+0x75\r\nFabrikam!Doodad::`scalar deleting destructor'+0x14\r\nFabrikam!Doodad::Release+0x40\r\nContoso!Gadget::~Gadget+0x66\r\nucrtbase!&lt;lambda_\u27e6...\u27e7&gt;::operator()+0xa5\r\nucrtbase!__crt_seh_guarded_call&lt;int&gt;::operator()&lt;\u27e6...\u27e7&gt;+0x3b\r\nucrtbase!__acrt_lock_and_call+0x1c\r\nucrtbase!_execute_onexit_table+0x3d\r\nContoso!dllmain_crt_process_detach+0x45\r\nContoso!dllmain_dispatch+0xe6\r\nntdll!LdrpCallInitRoutine+0xb0\r\nntdll!LdrShutdownProcess+0x260\r\nntdll!RtlExitUserProcess+0x114\r\nkernel32!FatalExit+0xb\r\nucrtbased!exit_or_terminate_process+0x3a\r\nucrtbased!common_exit+0x85\r\nucrtbased!<span style=\"border: solid 1px currentcolor;\">exit<\/span>+0x16\r\n<\/pre>\n<p>The process is exiting, and we are notifying <code>Contoso.dll<\/code>. This prompts the DLL to destruct its global variables, one of which is apparently a <code>Gadget<\/code>, or at least it&#8217;s a global variable whose destruction leads to the destruction of a <code>Gadget<\/code>. The destruction of the <code>Gadget<\/code> in turn release a <code>Doodad<\/code>, which causes it to destruct, and the destructor of the <code>Doodad<\/code> calls <code>Destroy\u00adWidget<\/code> which causes the <code>Widget<\/code> to destruct, and that&#8217;s when we notice that the <code>s_all\u00adWidgets<\/code> has already been destructed.<\/p>\n<p>We are a victim of the <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/siof\"> static initialization order fiasco<\/a>, but at the other end of the story.<\/p>\n<p>People usually worry about the static initialization order fiasco at the initialization side: Making sure that objects you depend on during initialization have themselves been initialized. But what goes up must come down, and you also have to make sure that the object you depend on during destruction have themselves not yet been destructed.<\/p>\n<p>In this case, the <code>s_all<wbr \/>Widgets<\/code> was destructed as part of <code>LitWare.dll<\/code>&#8216;s cleanup, even though <code>Contoso.dll<\/code> still had plans on using it.<\/p>\n<p>This can happen even in the absence of DLLs at all. Objects with static storage duration are destructed in reverse order of construction, so we may have this:<\/p>\n<pre>\/\/ File 1 - global variable, suppose this constructs first\r\nstruct Gadget\r\n{\r\n    std::unique_ptr m_widget;\r\n};\r\nGadget mainGadget;\r\n\r\n\/\/ File 2 - global variable, suppose this constructs second\r\nstd::set&lt;Widget*&gt; s_allWidgets;\r\n<\/pre>\n<p>At program startup, we construct the <code>mainGadget<\/code>. The <code>Gadget<\/code> constructor doesn&#8217;t need a <code>Widget<\/code> right away, so there&#8217;s no static initialization order fiasco. Later, we decide that the main gadget needs a widget, so we do a <code>m_widget = std::make_unique&lt;Widget&gt;()<\/code>, and this returns the newly-constructed <code>Widget<\/code>, as well as registering the widget in the <code>s_all<wbr \/>Widgets<\/code> set.<\/p>\n<p>At destruction, the <code>s_all<wbr \/>Widgets<\/code> destructs first, and then when the main gadget destructs, it destructs the now-nonempty <code>m_widget<\/code>, which destructs the <code>Widget<\/code> and tries to unregister it from the already-destructed <code>std::set<\/code>.<\/p>\n<p>Using function-local statics to create the <code>std::set<\/code> on demand at first use avoids the static initialization order fiasco, but it doesn&#8217;t help the static <i>destruction<\/i> order fiasco. The <code>std::set<\/code> was constructed <i>after<\/i> the <code>Gadget<\/code>, so it destructs <i>first<\/i>.<\/p>\n<p>This particular component used the Windows Implementation Library, so it can use <a href=\"https:\/\/github.com\/microsoft\/wil\/wiki\/Shutdown-aware-objects#processshutdowninprogress\"> <code>wil::<wbr \/>Process\u00adShutdown\u00adIn\u00adProgress()<\/code><\/a> to skip the <code>Widget<\/code>-tracking code during shutdown.<\/p>\n<pre>Widget::~Widget()\r\n{\r\n    <span style=\"border: solid 1px currentcolor;\">if (!wil::ProcessShutdownInProgress()) {<\/span>\r\n        auto guard = s_lock.lock_exclusive();\r\n        s_allWidgets.erase(this);\r\n    }\r\n}\r\n<\/pre>\n<p>If you don&#8217;t use WIL, you can get a similar effect by creating your own shutdown canary.<\/p>\n<pre>bool s_shuttingDown = false;\r\nstd::set&lt;Widget*&gt; s_allWidgets;\r\n\r\n\/\/ Put the canary last so that it destructs first.\r\nstruct shutdown_canary\r\n{\r\n    ~shutdown_canary() { s_shuttingDown = true; }\r\n} s_widgetShutdownCanary;\r\n\r\nWidget::~Widget()\r\n{\r\n    <span style=\"border: solid 1px currentcolor;\">if (!s_shuttingDown)<\/span>\r\n        auto guard = s_lock.lock_exclusive();\r\n        s_allWidgets.erase(this);\r\n    }\r\n}\r\n<\/pre>\n<p><b>Bonus chatter<\/b>: Afterward, I was able to confirm my theory that the memory for the <code>std::set<\/code> had indeed been freed.<\/p>\n<pre>0:000&gt; !heap -p -a 0x000001f5`89b895d0\r\n    address 000001f589b895d0 found in\r\n    _HEAP @ 1f589000000\r\n              HEAP_ENTRY Size Prev Flags            UserPtr UserSize - state\r\n        000001f589b895d0 0048 0000  [00]   000001f589b895d0    00048 - (free)\r\n\r\n0:000&gt; !heap -p -a 0x000001f5`89b846e0\r\n    address 000001f589b846e0 found in\r\n    _HEAP @ 1f589000000\r\n              HEAP_ENTRY Size Prev Flags            UserPtr UserSize - state\r\n        000001f589b846e0 0048 0000  [00]   000001f589b846e0    00048 - (free)\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Another kind of fiasco.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-110777","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>Another kind of fiasco.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/110777","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=110777"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/110777\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=110777"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=110777"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=110777"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}