{"id":93645,"date":"2016-06-10T07:00:00","date_gmt":"2016-06-10T21:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/?p=93645"},"modified":"2019-03-13T11:50:52","modified_gmt":"2019-03-13T18:50:52","slug":"20160610-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20160610-00\/?p=93645","title":{"rendered":"Investigating an app compat problem: Part 3: Paydirt"},"content":{"rendered":"<p>Last time, we learned that the proximate cause of failure was that we were trying to set a bit in a bit array, except that the <code>this<\/code> pointer was null. That didn&#8217;t really bring us any closer to the bug. What we need to do is find out why the calling function tried to invoke the method on a null pointer. <\/p>\n<p>The function that generated the null pointer is kind of long, but let&#8217;s see what we can get out of it. <\/p>\n<pre>\ncontoso!ContosoInitialize+0x5620:\n31426280 push    ebp\n31426281 mov     ebp ,esp\n31426283 push    0FFFFFFFFh\n31426285 push    offset contoso!ContosoInitialize+0xa57c3 (314c6423)\n3142628a mov     eax, dword ptr fs:[00000000h]\n31426290 push    eax\n31426291 mov     dword ptr fs:[0],esp\n31426298 sub     esp, 24h       \/\/ 24h bytes of local variables\n<\/pre>\n<p>So far, we have the standard prologue for functions that use exception handling when compiled by the Microsoft C++ compiler. The compiler uses <code>[ebp-4]<\/code> to keep track of what objects need to be unwound if an exception is raised, so don&#8217;t be surprised to see apparent write-only operations to <code>[ebp-4]<\/code>. These writes are actually clues to us that a stack object with a nontrivial destructor was just constructed or destructed. <\/p>\n<pre>\n3142629b mov     dword ptr [ebp-2Ch], ecx   \/\/ this\n3142629e mov     eax, dword ptr [ebp-2Ch]\n314262a1 mov     ecx, dword ptr [eax+400h]  \/\/ this-&gt;m_tlsIndex\n314262a7 cmp     ecx, dword ptr [contoso!ContosoInitialize+0xe5d3c (3150699c)] \/\/ some global variable\n<\/pre>\n<p>We learned from our first day that the value at offset <code>0x400<\/code> is a TLS slot index. We compare it against some global variable. What&#8217;s up with that global variable? <\/p>\n<p>Let&#8217;s search the DLL for all references to that global variable. <\/p>\n<pre>\n0:000&gt; s 31410000 3151e000 9c 69 50 31\n3142592c  9c 69 50 31 89 08 8b 55-fc c7 42 04 00 00 00 00  .iP1...U..B.....\n31425b6d  9c 69 50 31 74 0c 8b 4d-f0 e8 75 01 00 00 85 c0  .iP1t..M..u.....\n31425f9b  9c 69 d2 31 74 11 8b 55-fc 8b 84 95 f8 fe ff ff  .iP1t..U........\n31425ffe  9c 69 50 31 89 91 00 04-00 00 8b 45 fc 8b e5 5d  .iP1.......E...]\n314262a9  9c 69 50 31 75 71 8b 15-ac 69 d2 1f 52 8d 4d e8  .iP1uq...i..R.M.\n314262d0  9c 69 50 31 75 3b 6a 00-68 c0 5f c4 1f 8b 55 d4  .iP1u;j.h._...U.\n0:000&gt; u 3142592c-2 3142592c\ncontoso!ContosoInitialize+0x4cca:\n3142592a mov     ecx, dword ptr [contoso!ContosoInitialize+0xe5d3c (3150699c)]\n0:000&gt; u 31425b6d-2 31425b6d\ncontoso!ContosoInitialize+0x4f0b:\n31425b6b cmp     eax, dword ptr [contoso!ContosoInitialize+0xe5d3c (3150699c)]\n0:000&gt; u 31425f9b-2 31425f9b\ncontoso!ContosoInitialize+0x5339:\n31425f99 cmp     ecx, dword ptr [contoso!ContosoInitialize+0xe5d3c (3150699c)]\n0:000&gt; u 31425ffe-2 31425ffe\ncontoso!ContosoInitialize+0x539c:\n31425ffc mov     edx, dword ptr [contoso!ContosoInitialize+0xe5d3c (3150699c)]\n0:000&gt; u 314262a9-2 314262a9\ncontoso!ContosoInitialize+0x5647:\n314262a7 cmp     ecx, dword ptr [contoso!ContosoInitialize+0xe5d3c (3150699c)]\n0:000&gt; u 314262d0-2 314262d0\ncontoso!ContosoInitialize+0x566e:\n314262ce cmp     ecx, dword ptr [contoso!ContosoInitialize+0xe5d3c (3150699c)]\n<\/pre>\n<p>It appears that this is a read-only variable. Therefore, its current value is its permanent value. And we saw last time that the permanent value is zero. <\/p>\n<p>And we already found a bug. This code assumes that zero is not a valid TLS index. Actually, the invalid TLS index goes by the name <code>TLS_OUT_OF_INDEXES<\/code>, which is the value that <code>TlsAlloc<\/code>   uses to say &#8220;Sorry, I couldn&#8217;t allocate a TLS index for you.&#8221; If this app ever calls <code>TlsAlloc<\/code> and get zero back, it will think that it hasn&#8217;t yet assigned a TLS slot. <\/p>\n<p>But that&#8217;s not the bug that we&#8217;re chasing, because we got a TLS index of 65. But at least we can come up with a nice name for the variable. <\/p>\n<pre>\nDWORD invalidTlsIndex = 0;\n<\/pre>\n<p>Back to the function, already in progress. <\/p>\n<pre>\n314262ad jne     contoso!ContosoInitialize+0x56c0 (31426320) \/\/ if valid, then skip\n314262af mov     edx, dword ptr [contoso!ContosoInitialize+0xe5d4c (315069ac)] \/\/ get some other thing\n314262b5 push    edx\n314262b6 lea     ecx, [ebp-18h]\n314262b9 call    contoso!ContosoInitialize+0x5170 (31425dd0) \/\/ construct something, probably\n314262be mov     dword ptr [ebp-4], 0                                           \/\/ exception unwinding tracking\n<\/pre>\n<p>I&#8217;m guessing that we&#8217;re constructing something because it first this pattern: Put into the <code>ecx<\/code> register the address of some memory never accessed before, then call a function. The object being constructed here doesn&#8217;t participate in managing the TLS index: We assume this because it doesn&#8217;t take the TLS slot as a parameter. Therefore, we will ignore it for now. (Although I do know what it is, and you might be able to guess too, after we disassemble a little more.) <\/p>\n<pre>\n314262c5 mov     eax, dword ptr [ebp-2Ch]           \/\/ this\n314262c8 mov     ecx, dword ptr [eax+400h]          \/\/ this-&gt;m_tlsIndex\n314262ce cmp     ecx, dword ptr [contoso!ContosoInitialize+0xe5d3c (3150699c)] \/\/ invalidTlsIndex\n314262d4 jne     contoso!ContosoInitialize+0x56b1 (31426311) \/\/ jump if not equal\n<\/pre>\n<p>This next chunk of code is executed if the TLS index is zero; presumbaly it allocates a TLS slot. Let&#8217;s see. <\/p>\n<pre>\n314262d6 push    0                                  \/\/ mystery parameter\n314262d8 push    offset contoso!ContosoInitialize+0x5360 (31425fc0) \/\/ function callback\n314262dd mov     edx, dword ptr [ebp-2Ch]           \/\/ this\n314262e0 add     edx, 400h                          \/\/ &amp;this-&gt;m_tlsIndex\n314262e6 push    edx\n314262e7 call    contoso!ContosoInitialize+0x6680 (314272e0) \/\/ looks promising\n314262ec add     esp, 0Ch\n314262ef test    eax, eax\n314262f1 je      contoso!ContosoInitialize+0x56b1 (31426311) \/\/ jump if zero\n314262f3 mov     dword ptr [ebp-1Ch], 0             \/\/ local1c = 0\n314262fa mov     dword ptr [ebp-4], 0FFFFFFFFh\n31426301 lea     ecx, [ebp-18h]                     \/\/ destruct that thing on the stack\n31426304 call    contoso!ContosoInitialize+0x5200 (31425e60)\n31426309 mov     eax, dword ptr [ebp-1Ch]           \/\/ return local1c\n3142630c jmp     contoso!ContosoInitialize+0x5783 (314263e3)\n31426311 mov     dword ptr [ebp-4], 0FFFFFFFFh\n31426318 lea     ecx, [ebp-18h]                     \/\/ destruct that thing on the stack\n3142631b call    contoso!ContosoInitialize+0x5200 (31425e60)\n...\n314263e3 mov     ecx, dword ptr [ebp-0Ch]\n314263e6 mov     dword ptr fs:[0], ecx\n314263ed mov     esp, ebp\n314263ef pop     ebp\n314263f0 ret\n<\/pre>\n<p> So far, we have reverse-compiled the code to look like this: <\/p>\n<pre>\nSomeBitArrayClass1* Class2::f_31426280()\n{\n    if (this-&gt;m_tlsIndex == invalidTlsIndex)\n    {\n        Class3 object3(...);\n        if (this-&gt;m_tlsIndex == invalidTlsIndex)\n        {\n            if (f_314272e0(&amp;this-&gt;m_tlsIndex, f_31425fc0, 0) != 0)\n            {\n                return nullptr;\n            }\n        }\n    }\n    ... more code ...\n<\/pre>\n<p>The <code>Class3<\/code> object is probably some sort of synchronization object, since what we have here looks very much like a double-check-locking pattern. <\/p>\n<p>Anyway, that function at <code>314272e0<\/code> probably allocates the TLS slot, seeing as we pass the address of where we want to put the TLS index. <\/p>\n<pre>\ncontoso!ContosoInitialize+0x6680:\n314272e0 push    ebp\n314272e1 mov     ebp, esp\n314272e3 push    ecx\n314272e4 call    dword ptr [contoso!ContosoInitialize+0xa85bc (314c921c)] \/\/ TlsAlloc\n314272ea mov     ecx, dword ptr [ebp+8]     \/\/ arg1\n314272ed mov     dword ptr [ecx], eax       \/\/ save it\n314272ef mov     edx, dword ptr [ebp+8]\n314272f2 cmp     dword ptr [edx], 0FFFFFFFFh \/\/ was it invalid?\n314272f5 je      contoso!ContosoInitialize+0x66b3 (31427313) \/\/ Y: bail\n314272f7 mov     eax, dword ptr [ebp+10h]\n314272fa push    eax\n314272fb mov     ecx, dword ptr [ebp+0Ch]\n314272fe push    ecx\n314272ff mov     edx, dword ptr [ebp+8]\n31427302 mov     eax, dword ptr [edx]\n31427304 push    eax\n31427305 call    contoso!ContosoInitialize+0x53b0 (31426010) \/\/ succeeded, keep going\n3142730a mov     ecx, eax\n3142730c call    contoso!ContosoInitialize+0x5490 (314260f0)\n31427311 jmp     contoso!ContosoInitialize+0x6710 (31427370)\n31427370 mov     esp, ebp\n31427372 pop     ebp\n<\/pre>\n<p>This function allocates the TLS slot, and if successful, it does something ambiguous. The code seqeuence at <code>314272f7<\/code> could be any of <\/p>\n<pre>\n    f_31426010(*tlsIndex, callbackFunction, arg3)-&gt;f_314260f0()\n    f_31426010(*tlsIndex, callbackFunction)-&gt;f_314260f0(arg3)\n    f_31426010(*tlsIndex)-&gt;f_314260f0(callbackFunction, arg3)\n    f_31426010()-&gt;f_314260f0(*tlsIndex, callbackFunction, arg3)\n<\/pre>\n<p>    If the <code>f_31426010<\/code> and <code>f_314260f0<\/code>     functions were <code>__cdecl<\/code>, then there would be <code>add esp, N<\/code> instructions after each call,     and that would tell us how many parameters each function consumes.     But there isn&#8217;t, which means that these functions are <code>__stdcall<\/code>. <\/p>\n<p>    To find out which of the above four cases is the one we have,     we need to look at the function epilogue     for <code>f_31426010<\/code>. That will tell us how many bytes     of parameters it consumes, and that will tell us which of the     parameters belong to <code>f_31426010<\/code> and which belong     to <code>f_314260f0<\/code>. <\/p>\n<pre>\ncontoso!ContosoInitialize+0x53b0:\n31426010 push    ebp\n31426011 mov     ebp,esp\n...\n314260ea mov     esp,ebp\n314260ec pop     ebp\n314260ed ret\n<\/pre>\n<p>    Okay, the function ends with a plain <code>ret<\/code>, which     combined with the lack of <code>add esp, N<\/code>     in the calling code     means that it consumes zero parameters from the stack. <\/p>\n<p>    Therefore, we are in this case: <\/p>\n<pre>\n    f_31426010()-&gt;f_314260f0(*tlsIndex, callbackFunction, arg3)\n<\/pre>\n<p>    The parameters (including the TLS slot index     that we are tracking very closely) all go to the     <code>f_314260f0<\/code> function. <\/p>\n<pre>\ncontoso!ContosoInitialize+0x5490:\n314260f0 push    ebp\n314260f1 mov     ebp, esp\n314260f3 push    0FFFFFFFFh\n314260f5 push    offset contoso!ContosoInitialize+0xa5778 (314c63d8)\n314260fa mov     eax, dword ptr fs:[00000000h]\n31426100 push    eax\n31426101 mov     dword ptr fs:[0], esp\n31426108 sub     esp, 28h\n3142610b mov     dword ptr [ebp-34h], ecx       \/\/ save \"this\"\n3142610e mov     eax, dword ptr [contoso!ContosoInitialize+0xe5d4c (315069ac)]\n31426113 push    eax\n31426114 lea     ecx, [ebp-18h]\n31426117 call    contoso!ContosoInitialize+0x5170 (31425dd0) \/\/ construct Class3\n3142611c mov     dword ptr [ebp-4], 0\n31426123 mov     ecx, dword ptr [ebp+8]         \/\/ tlsIndex\n31426126 mov     dword ptr [ebp-10h], ecx       \/\/ local10 = tlsIndex\n31426129 cmp     dword ptr [ebp-10h], 40h       \/\/ tlsIndex compared with 64\n3142612d jae     contoso!ContosoInitialize+0x551f (3142617f) \/\/ Jump if tlsIndex &gt;= 64\n\n3142617f mov     dword ptr [ebp-30h], 0FFFFFFFFh \/\/ local30 = -1\n31426186 mov     dword ptr [ebp-4], 0FFFFFFFFh\n3142618d lea     ecx, [ebp-18h]\n31426190 call    contoso!ContosoInitialize+0x5200 (31425e60) \/\/ destruct Class3\n31426195 mov     eax, dword ptr [ebp-30h]       \/\/ return local30\n31426198 mov     ecx, dword ptr [ebp-0Ch]\n3142619b mov     dword ptr fs:[0], ecx\n314261a2 mov     esp, ebp\n314261a4 pop     ebp\n314261a5 ret     0Ch\n<\/pre>\n<p>Aha! We see that the code checks the numeric value of the TLS index. The only meaningful value to compare the index against is <code>TLS_OUT_OF_INDEXES<\/code>. Once you verify that you have a valid TLS index, the actual numeric value is opaque. Changing behavior based on the numeric value of the slot index is highly suspect. <\/p>\n<p>The partially-reverse-compiled function looks like this: <\/p>\n<pre>\nint f_314260f0(DWORD tlsIndex, callback, x)\n{\n    Class3 object3(...);\n    if (tlsIndex &lt; 64)\n    {\n        ...\n    }\n    else\n    {\n        return -1;\n    }\n}\n<\/pre>\n<p>Holy cow, this function simply rejects TLS slot indices that are greater than or equal to 64! Returning <code>-1<\/code> causes the calling function <code>f_31426280<\/code> to return <code>nullptr<\/code>, which leads to our null pointer crash. <\/p>\n<p>Now we understand why the program is crashing. It mishandles TLS slot indices that are 64 or higher. It tries to reject them, but notice that when function <code>f_314260f0<\/code> returns <code>-1<\/code>, the calling function does not free the TLS slot or reset <code>this-&gt;m_tlsIndex<\/code> back to <code>invalidTlsIndex<\/code>. Instead, it leaves the TLS slots index allocated, and the next time the code wants to use the object, it sees that the TLS slot index is valid and tries to use the value stored in that slot. <\/p>\n<p>Except we never stored anything there. <\/p>\n<p>What&#8217;s so special about the number 64? We&#8217;ll dig into that next time. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Finding the answer.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-93645","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>Finding the answer.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/93645","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=93645"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/93645\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=93645"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=93645"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=93645"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}