{"id":112144,"date":"2026-03-17T07:00:00","date_gmt":"2026-03-17T14:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=112144"},"modified":"2026-03-24T20:01:50","modified_gmt":"2026-03-25T03:01:50","slug":"20260317-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20260317-00\/?p=112144","title":{"rendered":"Windows stack limit checking retrospective: x86-32 also known as i386, second try"},"content":{"rendered":"<p><a title=\"Windows stack limit checking retrospective: x86-32, also known as i386\" href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20260312-00\/?p=112136\"> The last time we looked at the Windows stack limit checker on x86-32 (also known as i386)<\/a>, we noted that the function has changed over the years. Here&#8217;s the revised version.<\/p>\n<pre>_chkstk:\r\n    push    ecx             ; preserve register\r\n\r\n    lea     ecx, [esp][4]   ; ecx = original stack pointer - 4\r\n    sub     ecx, eax        ; ecx = new stack pointer - 4\r\n\r\n    sbb     eax, eax        ; clamp ecx to zero if underflow\r\n    not     eax\r\n    and     ecx, eax\r\n\r\n    mov     eax, esp        ; round current stack pointer\r\n    and     eax, -PAGE_SIZE ; to page boundary\r\n\r\n    ; eax = most recently probed page\r\n    ; ecx = desired final stack pointer\r\n\r\ncheck:\r\n    cmp     ecx, eax        ; done probing?\r\n    jb      probe           ; N: keep probing\r\n\r\n    mov     eax, ecx        ; eax = desired final stack pointer - 4\r\n    pop     ecx             ; restore register\r\n    xchg    esp, eax        ; move stack pointer to final home - 4\r\n                            ; eax gets old stack pointer\r\n    mov     eax, [eax]      ; get return address\r\n    mov     [esp], eax      ; put it on top of the stack\r\n    ret                     ; and \"return\" to it\r\n\r\ncs20:\r\n    sub     eax, PAGE_SIZE  ; move to next page\r\n    test    [eax], eax      ; probe it\r\n    jmp     check           ; go back to see if we're done\r\n<\/pre>\n<p>Instead of jumping to the caller, the code copies the caller&#8217;s address to the top of the stack and performs a <code>ret<\/code>. This is a significant change because it avoids <a title=\"Optimization is often counter-intuitive\" href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20041216-00\/?p=36973\"> desynchronizing the return address predictor<\/a>.<\/p>\n<p>The <code>ret<\/code> will increment the stack pointer by four bytes, so the code over-allocates the stack by 4 bytes to compensate.<\/p>\n<p>This code remains a drop-in replacement for the old <code>chkstk<\/code> function, so there is no need to change the compiler&#8217;s code generator. It also means that you can link together code compiled with the old <code>chkstk<\/code> and the new <code>chkstk<\/code> since the two versions are compatible. It does mean that we still has the wacky calling convention of returning with an adjusted stack pointer, but that&#8217;s now part of the ABI so we have to live with it.<\/p>\n<p>Since we perform a <code>ret<\/code> instruction on a return address that was not placed there by a matching <code>call<\/code> instruction, this code is not compatible with shadow stacks (which Intel calls Control-Flow Enforcement Technology, or CET). The <code>chkstk<\/code> function&#8217;s wacky calling convention makes it incompatible with shadow stacks.<\/p>\n<p>Okay, so much for that sadness. Next time, we&#8217;ll look at the Alpha AXP.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Appeasing the invisible return address predictor.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-112144","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>Appeasing the invisible return address predictor.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/112144","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=112144"}],"version-history":[{"count":1,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/112144\/revisions"}],"predecessor-version":[{"id":112161,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/112144\/revisions\/112161"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=112144"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=112144"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=112144"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}