{"id":109242,"date":"2024-01-05T07:00:00","date_gmt":"2024-01-05T15:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=109242"},"modified":"2024-01-12T06:38:20","modified_gmt":"2024-01-12T14:38:20","slug":"20240105-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20240105-00\/?p=109242","title":{"rendered":"The case of the vector with an impossibly large size"},"content":{"rendered":"<p>A customer had a program that crashed with this stack:<\/p>\n<pre>contoso!Widget::GetCost\r\ncontoso!StandardWidgets::get_TotalCost+0x12f\r\nrpcrt4!Invoke+0x73\r\nrpcrt4!Ndr64StubWorker+0xb9b\r\nrpcrt4!NdrStubCall3+0xd7\r\ncombase!CStdStubBuffer_Invoke+0xdb\r\ncombase!ObjectMethodExceptionHandlingAction&lt;&lt;lambda_...&gt; &gt;+0x47\r\ncombase!DefaultStubInvoke+0x376\r\ncombase!ServerCall::ContextInvoke+0x6f3\r\ncombase!ComInvokeWithLockAndIPID+0xacb\r\ncombase!ThreadInvoke+0x103\r\nrpcrt4!DispatchToStubInCNoAvrf+0x18\r\nrpcrt4!RPC_INTERFACE::DispatchToStubWorker+0x1a9\r\nrpcrt4!RPC_INTERFACE::DispatchToStubWithObject+0x1a7\r\nrpcrt4!LRPC_SCALL::DispatchRequest+0x308\r\nrpcrt4!LRPC_SCALL::HandleRequest+0xdcb\r\nrpcrt4!LRPC_SASSOCIATION::HandleRequest+0x2c3\r\nrpcrt4!LRPC_ADDRESS::HandleRequest+0x183\r\nrpcrt4!LRPC_ADDRESS::ProcessIO+0x939\r\nrpcrt4!LrpcIoComplete+0xff\r\nntdll!TppAlpcpExecuteCallback+0x14d\r\nntdll!TppWorkerThread+0x4b4\r\nkernel32!BaseThreadInitThunk+0x18\r\nntdll!RtlUserThreadStart+0x21\r\n<\/pre>\n<p>They wondered if some recent change to Windows was the source of the problem, since it didn&#8217;t happen as much in earlier versions of Windows.<\/p>\n<p>The stack trace pointed to <code>Widget::<wbr \/>IsEnabled<\/code>, which was crashing on the first instruction because it was given an invalid <code>this<\/code> pointer.<\/p>\n<pre>00007fff`73a8a59f mov edx,dword ptr [rcx+40h]\r\n                                    ds:00000000`00000040=????????\r\n<\/pre>\n<p>The <code>Widget<\/code> pointer came from a <code>std::vector<\/code> that is a member of the <code>Standard\u00adWidgets<\/code> class.<\/p>\n<pre>using namespace Microsoft::WRL;\r\n\r\nclass StandardWidgets : RuntimeClass&lt;IStandardWidgets, FtmBase&gt;\r\n{\r\n    IFACEMETHODIMP get_TotalCost(INT32* result);\r\n    \u27e6 other methods not relevant here \u27e7\r\n\r\nprivate:\r\n    HRESULT LazyInitializeWidgetList();\r\n\r\n    static constexpr PCWSTR standardWidgetNames[] = {\r\n        L\"Bob\", L\"Carol\", L\"Ted\", L\"Alice\" };\r\n    static constexpr int standardWidgetCount =\r\n        ARRAYSIZE(standardWidgetNames);\r\n\r\n    std::vector&lt;ComPtr&lt;Widget&gt;&gt; m_widgets;\r\n};\r\n<\/pre>\n<p>The code crashed at this call to <code>Widget::<wbr \/>GetCost<\/code>:<\/p>\n<pre>IFACEMETHODIMP StandardWidgets::get_TotalCost(INT32* result)\r\n{\r\n    *result = 0;\r\n\r\n    RETURN_IF_FAILED(LazyInitializeWidgetList());\r\n\r\n    INT32 totalCost = 0;\r\n    for (int i = 0; i &lt; standardWidgetCount; i++) {\r\n        totalCost += <span style=\"border: solid 1px currentcolor;\">m_widgets[i]-&gt;GetCost()<\/span>; \/\/ here\r\n    }\r\n    *result = totalCost;\r\n    return S_OK;\r\n}\r\n<\/pre>\n<p>The customer&#8217;s debugging showed that at the point of the crash, not only was the <code>widget<\/code> garbage, but the <code>m_widgets<\/code> vector had an impossibly large number of elements. The <code>m_widgets<\/code> is expected to have only four widgets, but it somehow found itself with ten, and sometimes as many as a hundred widgets. Of course, they were nearly all corrupted.<\/p>\n<p>Here&#8217;s the code that lazy-initializes the widget list:<\/p>\n<pre>HRESULT StandardWidgets::LazyInitializeWidgetList()\r\n{\r\n    \/\/ Early-out if already initialized.\r\n    if (!m_widgets.empty()) {\r\n        return S_OK;\r\n    }\r\n\r\n    \/\/ Lazy-create the vector of standard widgets\r\n    try {\r\n        m_widgets.reserve(standardWidgetCount);\r\n        for (auto name : standardWidgetNames) {\r\n            ComPtr&lt;Widget&gt; widget;\r\n            RETURN_IF_FAILED(\r\n                MakeAndInitialize&lt;Widget&gt;(&amp;widget, name));\r\n            m_widgets.push_back(widget);\r\n        }\r\n    } catch (std::bad_alloc const&amp;) {\r\n        return E_OUTOFMEMORY;\r\n    }\r\n    return S_OK;\r\n}\r\n<\/pre>\n<p>The customer noted that the <code>reserve<\/code> method is always called with the value 4, and the code never pushes more than four items into the vector. They admitted that if there is a problem creating all four of the standard widgets, the vector could end up with fewer than four widgets, but it should never have <i>more<\/i> than four.<\/p>\n<p>You already have multiple clues that point toward what the customer&#8217;s problem is. I&#8217;ll give you some time to think about it.<\/p>\n<p>In the meantime, let&#8217;s look at other issues with how the code lazy-initializes the widget list.<\/p>\n<p>As the customer noted, if there is a problem creating any of the four standard widgets, the failure is propagated to the caller of <code>Lazy\u00adInitialize\u00adWidget\u00adList<\/code>, and <code>get_TotalCost<\/code> in turn propagates the error to its own caller, and it never gets to the point where it walks through the vector adding up all the costs.<\/p>\n<p>If there is a memory allocation failure at the <code>reserve()<\/code>, or if there is a problem with the first standard widget, then the vector remains empty, and a second call to <code>Lazy\u00adInitialize\u00adWidget\u00adList<\/code> will make a new attempt at initialization.<\/p>\n<p>If there is a problem with the second or subsequent standard widget, however, things get weird. The <code>Lazy\u00adInitialize\u00adWidget\u00adList<\/code> function returns a failure, which causes <code>get_TotalCost<\/code> to return failure. But the second time someone calls <code>get_TotalCost<\/code>, <code>Lazy\u00adInitialize\u00adWidget\u00adList<\/code> will see a nonempty vector and assume that everything was initialized. This time, the <code>get_TotalCost<\/code> method will proceed with the summation and perform an out-of-bounds array access when it gets to the widget that failed to be created.<\/p>\n<p>Oops.<\/p>\n<p>This particular problem boils down to leaving a partially-initialized <code>m_widgets<\/code> if the lazy initialization fails. To avoid this problem, we should create the vector in a local variable and transfer it to the member variable only after we are sure all of the widgets were created successfully.<\/p>\n<pre>HRESULT StandardWidgets::LazyInitializeWidgetList()\r\n{\r\n    \/\/ Early-out if already initialized.\r\n    if (!m_widgets.empty()) {\r\n        return S_OK;\r\n    }\r\n\r\n    \/\/ Lazy-create the vector of standard widgets\r\n    try {\r\n        <span style=\"border: solid 1px currentcolor;\">std::vector&lt;ComPtr&lt;Widget&gt;&gt; widgets;<\/span>\r\n        <span style=\"border: solid 1px currentcolor;\">widgets<\/span>.reserve(standardWidgetCount);\r\n        for (auto name : standardWidgetNames) {\r\n            ComPtr&lt;Widget&gt; widget;\r\n            RETURN_IF_FAILED(\r\n                MakeAndInitialize&lt;Widget&gt;(&amp;widget, name));\r\n            <span style=\"border: solid 1px currentcolor;\">widgets<\/span>.push_back(widget);\r\n        }\r\n        <span style=\"border: solid 1px currentcolor;\">m_widgets.swap(widgets);<\/span>\r\n    } catch (std::bad_alloc const&amp;) {\r\n        return E_OUTOFMEMORY;\r\n    }\r\n    return S_OK;\r\n}\r\n<\/pre>\n<p>This ensures that the <code>m_widgets<\/code> is either totally empty or totally initialized. It is never in a half-initialized state.<\/p>\n<p>While we&#8217;re at it, we probably should convert the loop in <code>get_TotalCost<\/code> into a ranged <code>for<\/code> loop. Right now, <code>get_<wbr \/>Total\u00adCost<\/code> has a hidden dependency on <code>Lazy\u00adInitialize\u00adWidgets<\/code>: It assumes that <code>Lazy\u00adInitialize\u00adWidgets<\/code> always creates exactly the number of widgets as there are <code>standard\u00adWidget\u00adNames<\/code>. Maybe in the future, you might want to suppress some of the standard widgets based on some configuration setting. If you add that configuration setting and forget to update <code>get_<wbr \/>Total\u00adCost<\/code> to account for suppressed widgets, you will have an out-of-bounds index. All the logic to decide which widgets are standard should be local to <code>Lazy\u00adInitialize\u00adWidgets<\/code>.<\/p>\n<pre>IFACEMETHODIMP StandardWidgets::get_TotalCost(INT32* result)\r\n{\r\n    *result = 0;\r\n\r\n    RETURN_IF_FAILED(LazyInitializeWidgetList());\r\n\r\n    INT32 totalCost = 0;\r\n    <span style=\"border: solid 1px currentcolor; border-bottom: none;\">for (auto&amp;&amp; widget : m_widgets) {  <\/span>\r\n    <span style=\"border: 1px currentcolor; border-style: none solid;\">    totalCost += widget-&gt;GetCost();<\/span>\r\n    <span style=\"border: solid 1px currentcolor; border-top: none;\">}                                  <\/span>\r\n    *result = totalCost;\r\n    return S_OK;\r\n}\r\n<\/pre>\n<p>Or if you want to get fancy,<\/p>\n<pre>IFACEMETHODIMP StandardWidgets::get_TotalCost(INT32* result)\r\n{\r\n    *result = 0;\r\n\r\n    RETURN_IF_FAILED(LazyInitializeWidgetList());\r\n\r\n    <span style=\"border: solid 1px currentcolor; border-bottom: none;\">*result = std::transform_reduce(                         <\/span>\r\n    <span style=\"border: 1px currentcolor; border-style: none solid;\">    m_widgets.begin(), m_widgets.end(), 0, std::plus&lt;&gt;(),<\/span>\r\n    <span style=\"border: solid 1px currentcolor; border-top: none;\">    [](auto&amp;&amp; w) { return w-&gt;GetCost(); });              <\/span>\r\n    return S_OK;\r\n}\r\n<\/pre>\n<p>Okay, but back to the crash. I think I&#8217;ve added enough filler to give you time to consider what is happening.<\/p>\n<p>When I looked at this crash, I noticed that the class is implemented with <a href=\"https:\/\/learn.microsoft.com\/en-us\/cpp\/cppcx\/wrl\/runtimeclass-class?view=msvc-170\"> the <code>Microsoft::<wbr \/>WRL::<wbr \/>RuntimeClass<\/code> template class<\/a>, and the implementation explicitly listed <code>FtmBase<\/code> as a template parameter, marking this class as free-threaded (also known as &#8220;agile&#8221;), which means that it can be used from multiple threads simultaneously.\u00b9<\/p>\n<p>The object is eligible for multithreaded use, but there are no mutexes to protect two threads from modifying <code>m_widgets<\/code> at the same time. I suspected a race condition.<\/p>\n<p>You can also observe that the class is being used in a free-threaded manner because the stack trace that leads to the crash says that it&#8217;s running on a thread pool thread (<code>TppWorkerThread<\/code>), and thread pool threads default to the multi-threaded apartment.\u00b2 The only code on the stack between <code>TppWorkerThread<\/code> and the application code is all COM and RPC, so no application code snuck in and initialized the thread into single-threaded apartment mode.<\/p>\n<p>And when I looked at the crash dump, I caught the code red-handed: There was another thread also calling into this code.<\/p>\n<pre>\/\/ Crashing thread\r\n0:006&gt; .frame 1\r\n01 contoso!StandardWidgets::get_TotalCost+0x12f\r\n0:006&gt; dv\r\n    this = <span style=\"border: solid 1px currentcolor;\">0x000001b7`492833b0<\/span>\r\n    ...\r\n\r\n\/\/ Another thread running at the time of the crash\r\n0:004&gt; kn\r\n # Call Site\r\n00 ntdll!ZwDelayExecution+0x14\r\n01 ntdll!RtlDelayExecution+0x4c\r\n02 KERNELBASE!SleepEx+0x84\r\n04 kernel32!WerpReportFault+0xa4\r\n05 KERNELBASE!UnhandledExceptionFilter+0xd3a02\r\n06 ntdll!TppExceptionFilter+0x7a\r\n07 ntdll!TppWorkerpInnerExceptionFilter+0x1a\r\n08 ntdll!TppWorkerThread$filt$3+0x19\r\n09 ntdll!__C_specific_handler+0x96\r\n0a ntdll!__GSHandlerCheck_SEH+0x6a\r\n0b ntdll!RtlpExecuteHandlerForException+0xf\r\n0c ntdll!RtlDispatchException+0x2d4\r\n0d ntdll!KiUserExceptionDispatch+0x2e\r\n0e contoso!StandardWidgets::get_TotalCost+0x12f\r\n0f rpcrt4!Invoke+0x73\r\n10 rpcrt4!Ndr64StubWorker+0xb9b\r\n11 rpcrt4!NdrStubCall3+0xd7\r\n12 combase!CStdStubBuffer_Invoke+0xdb\r\n14 combase!ObjectMethodExceptionHandlingAction&lt;&lt;lambda_...&gt; &gt;+0x47\r\n16 combase!DefaultStubInvoke+0x376\r\n1a combase!ServerCall::ContextInvoke+0x6f3\r\n1f combase!ComInvokeWithLockAndIPID+0xacb\r\n21 combase!ThreadInvoke+0x103\r\n22 rpcrt4!DispatchToStubInCNoAvrf+0x18\r\n23 rpcrt4!RPC_INTERFACE::DispatchToStubWorker+0x1a9\r\n25 rpcrt4!RPC_INTERFACE::DispatchToStubWithObject+0x1a7\r\n27 rpcrt4!LRPC_SCALL::DispatchRequest+0x308\r\n29 rpcrt4!LRPC_SCALL::HandleRequest+0xdcb\r\n2a rpcrt4!LRPC_SASSOCIATION::HandleRequest+0x2c3\r\n2b rpcrt4!LRPC_ADDRESS::HandleRequest+0x183\r\n2c rpcrt4!LRPC_ADDRESS::ProcessIO+0x939\r\n2d rpcrt4!LrpcIoComplete+0xff\r\n2e ntdll!TppAlpcpExecuteCallback+0x14d\r\n2f ntdll!TppWorkerThread+0x4b4\r\n30 kernel32!BaseThreadInitThunk+0x18\r\n31 ntdll!RtlUserThreadStart+0x21\r\n0:004&gt; .frame 0xe\r\n0e contoso!StandardWidgets::get_TotalCost+0x12f\r\n0:004&gt; dv\r\n    this = <span style=\"border: solid 1px currentcolor;\">0x000001b7`492833b0<\/span>\r\n    ...\r\n<\/pre>\n<p>Notice that the <code>this<\/code> pointer is the same for both threads, so we have proof that this object is being used from multiple threads simultaneously.<\/p>\n<p>The fix for the multithreading issue is to ensure that only one thread tries to initialize the widget vector at a time. We can do this by adding a mutex, but I&#8217;m going to go even further and use the <code>std::once_flag<\/code>, whose purpose in life is to be used in conjunction with <code>std::<wbr \/>call_once<\/code> to perform thread-safe one-time initialization, which is exactly what we want.<\/p>\n<pre>class StandardWidgets : RuntimeClass&lt;IStandardWidgets, FtmBase&gt;\r\n{\r\n    IFACEMETHODIMP get_TotalCost(INT32* result);\r\n    \u27e6 other methods not relevant here \u27e7\r\n\r\nprivate:\r\n    HRESULT LazyInitializeWidgetList();\r\n\r\n    static constexpr PCWSTR standardWidgetNames[] = {\r\n        L\"Bob\", L\"Carol\", L\"Ted\", L\"Alice\" };\r\n    static constexpr int standardWidgetCount =\r\n        ARRAYSIZE(standardWidgetNames);\r\n\r\n    <span style=\"border: solid 1px currentcolor;\">std::once_flag m_initializeFlag;<\/span>\r\n    std::vector&lt;ComPtr&lt;Widget&gt;&gt; m_widgets;\r\n};\r\n\r\nHRESULT StandardWidgets::LazyInitializeWidgetList()\r\n{\r\n    try {\r\n        <span style=\"border: solid 1px currentcolor;\">std::call_once(m_initializeFlag, [&amp;] {<\/span>\r\n            std::vector&lt;ComPtr&lt;Widget&gt;&gt; widgets;\r\n            widgets.reserve(standardWidgetCount);\r\n            for (auto name : standardWidgetNames) {\r\n                ComPtr&lt;Widget&gt; widget;\r\n                <span style=\"border: solid 1px currentcolor;\">THROW<\/span>_IF_FAILED(\r\n                    MakeAndInitialize&lt;Widget&gt;(&amp;widget, name));\r\n                widgets.push_back(widget);\r\n            }\r\n            m_widgets = std::move(widgets);\r\n        <span style=\"border: solid 1px currentcolor;\">});<\/span>\r\n    } catch (std::bad_alloc const&amp;) {\r\n        return E_OUTOFMEMORY;\r\n    }\r\n    return S_OK;\r\n}\r\n<\/pre>\n<p><b>Update<\/b>: We had to change the <code>RETURN_<wbr \/>IF_<wbr \/>FAILED<\/code> to <code>THROW_<wbr \/>IF_<wbr \/>FAILED<\/code> because (1)\u00a0without the change, the compiler will complain that not all code paths return a value, because the lambda also falls off the end, and more importantly, (2)\u00a0<code>call_once<\/code> doesn&#8217;t care about the lambda return value; it uses exceptions to detect errors. <b>End Update<\/b>.<\/p>\n<p>This version also deals with the edge case where there are no standard widgets at all. The original code would continuously try to reinitialize the vector since it couldn&#8217;t tell whether an empty vector means &#8220;Not yet initialized&#8221; or &#8220;Successfully initialized (and it&#8217;s empty)&#8221;.<\/p>\n<p><b>Bonus chatter<\/b>: How did this multithreaded race condition lead to a vector with a ridiculous size? Well, we saw some time ago that <a title=\"Inside STL: The vector\" href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20230802-00\/?p=108524\"> the internal structure of a <code>std::vector<\/code><\/a> is three pointers, one for the start of the vector data, one for the end of the valid data, and one for the end of the allocated data. If two threads call <code>reserve()<\/code> simultaneously, both will allocate new data, and then they were race to update the three pointers. You might end up with a &#8220;start&#8221; pointer that points to the data allocated by the first thread, but an &#8220;end&#8221; pointer that points to the data allocated by the second thread, resulting in a vector of unusual size.<\/p>\n<p>\u00b9 It was actually convenient that the implementation lists <code>FtmBase<\/code> explicitly, since the default behavior varies depending on whether <code>__WRL_<wbr \/>CONFIGURATION_<wbr \/>LEGACY__<\/code> is set.<\/p>\n<table class=\"cp3\" style=\"border-collapse: collapse;\" border=\"1\" cellspacing=\"0\" cellpadding=\"3\">\n<tbody>\n<tr>\n<th>Template parameter<\/th>\n<th>Standard mode<\/th>\n<th>Legacy mode<\/th>\n<\/tr>\n<tr>\n<td>Nothing specified<\/td>\n<td align=\"center\">Free-threaded<\/td>\n<td align=\"center\">Not free-threaded<\/td>\n<\/tr>\n<tr>\n<td><code>FtmBase<\/code><\/td>\n<td colspan=\"2\" align=\"center\">Free-threaded<\/td>\n<\/tr>\n<tr>\n<td><code>InhibitFtmBase<\/code><\/td>\n<td colspan=\"2\" align=\"center\">Not free-threaded<\/td>\n<\/tr>\n<tr>\n<td><code>InhibitFtmBase<\/code> + <code>FtmBase<\/code><\/td>\n<td colspan=\"2\" align=\"center\">Not free-threaded<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>In pseudocode:<\/p>\n<pre>bool isFreeThreaded = !InhibitFtmBase &amp;&amp; (FtmBase || Standard mode);\r\n<\/pre>\n<p>The explicitly inclusion of <code>FtmBase<\/code> saved me the trouble of looking up what mode the customer&#8217;s project is using.<\/p>\n<p>\u00b2 Assuming the multi-threaded apartment exists at all.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Play threading games, win threading prizes.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-109242","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>Play threading games, win threading prizes.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/109242","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=109242"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/109242\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=109242"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=109242"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=109242"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}