{"id":112146,"date":"2026-03-18T07:00:00","date_gmt":"2026-03-18T14:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=112146"},"modified":"2026-03-18T09:11:59","modified_gmt":"2026-03-18T16:11:59","slug":"20260318-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20260318-00\/?p=112146","title":{"rendered":"Windows stack limit checking retrospective: Alpha AXP"},"content":{"rendered":"<p>We continue our historical survey of Windows stack-checking functions by looking at the Alpha AXP.<\/p>\n<pre>; on entry, t12 is the number of bytes to allocate\r\n; on exit, stack has been validated (but not adjusted)\r\n; modifies t8, t9, t10\r\n\r\n_chkstk:\r\n    subq    sp, t12, t8     ; t8 = new stack pointer\r\n    mov     v0, t10         ; save v0 in t10 (call_pal will overwrite it)\r\n    bgt     sp, usermode    ; branch if running in user mode\r\n\r\n    call_pal rdksp          ; PAL call to get start of kernel stack in v0\r\n    lda     t9, -KERNEL_STACK_SIZE(v0) ; t9 = end of stack\r\n    br      zero, havelimit\r\n\r\nusermode:\r\n    call_pal rdteb          ; PAL call to get TEB in v0\r\n    ldl     t9, StackLimit(v0) ; t9 = end of stack\r\n\r\nhavelimit:\r\n    mov     t10, v0         ; recover original v0 for caller\r\n\r\n    cmpult  t8, t9, t10     ; is stack growth needed?\r\n    beq     done            ; N: then nothing to do\r\n\r\n    ldil    r10, -PAGE_SIZE\r\n    and     t8, t10, t8     ; round down to nearest page\r\n\r\nprobe:\r\n    lda     t9, -PAGE_SIZE(t9)  ; prepare to touch a page\r\n    stq     zero, 0(t9)         ; touch it\r\n    cmpeq   t8, t9, t10         ; finished?\r\n    beq     t10, probe          ; N: keep going\r\n\r\ndone:\r\n    ret     zero, (ra)          ; return to caller\r\n<\/pre>\n<p>We see a lot of similarities to MIPS and PowerPC: The code short-circuits the case where the stack does not need to expand, and it relies on the architectural split between user mode and kernel mode at the halfway point in the address space. As with MIPS (but not PowerPC or 80386), the probe loop writes to the memory to fault it in.\u00b9<\/p>\n<p>A new wrinkle here is that this code uses 64-bit calculations when adjusting the stack pointer. The Alpha AXP is a 64-bit processor. Although it doesn&#8217;t have a &#8220;32-bit mode&#8221;, you can still pretend that it&#8217;s a 32-bit processor by simply choosing not to use any of the 64-bit features.<\/p>\n<p>This code appears to have been written early in the history of the Alpha AXP project, and it contains some seemingly unnecessary register preservation. For example, it goes out of its way to preserve <code>v0<\/code>, even though <code>v0<\/code> is a volatile register that does not contain anything interesting on entry to the function. My theory is that it does this because it wants to maintain compatibility with a non-Microsoft compiler that might use <code>v0<\/code> as part of its calling convention, and this allowed the Windows NT team to start porting their operating system without having to wait for the Microsoft Languages team to come up with an Alpha AXP version of the Microsoft Visual C compiler.<\/p>\n<p>Here is a typical usage of this function to build a large stack frame in a function prologue:<\/p>\n<pre>    mov     ra, t11             ; save return address\r\n    ldil    t12, 17320          ; large stack frame\r\n    bsr     ra, _chkstk         ; fault pages in if necessary\r\n    subsq   sp, t12, sp         ; allocate the stack frame\r\n    mov     t11, ra             ; restore return address for standard entry\r\n<\/pre>\n<p>The prologue relies on the fact that the <code>_chkstk<\/code> function preserves the <code>t11<\/code> register.<\/p>\n<p>Next time, we&#8217;ll jump back to the present by looking at the stack limit checking on x86-64 (also known as amd64).<\/p>\n<p>\u00b9 My new theory is that writing to memory as part of stack expansion avoids a soft page fault. If you only read, then the memory manager maps in a shared zero page to satisfy the read, and then marks the page as copy-on-write.\u00b2 And then the stack actually expands into that space and you take a soft page fault to promote the copy-on-write page to a full write page.\u00b3<\/p>\n<p>\u00b2 The idea here is that if you create a bunch of zero pages, the memory manager can map a single page of zeroes into all of them and make the pages copy-on-write. If you don&#8217;t actually write to them before the pages get paged out, then it avoided having to do the work of finding a writable zero page to map in, and it also avoids writing the modified page when the page gets paged out. Allocating a page that is never used happens a lot: You might allocate a megabyte of memory but use only the first 64KB of it.<\/p>\n<p>\u00b3 The calculation here is &#8220;When the stack expands, what is the likelihood that the memory will actually be written to?&#8221; If the stack expands by just a little bit, then the likelihood is high, but if it expands by a lot, then the likelihood decreases because the large stack expansion is probably due to a large stack array, and there&#8217;s a good chance that the function won&#8217;t actually use all of it. It&#8217;s a cost\/benefit analysis, and the authors of the <code>_chkstk<\/code> functions came to different conclusions, perhaps based on different usage patterns by code written for different processors.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Double the size, double the fun.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-112146","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>Double the size, double the fun.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/112146","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=112146"}],"version-history":[{"count":1,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/112146\/revisions"}],"predecessor-version":[{"id":112147,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/112146\/revisions\/112147"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=112146"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=112146"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=112146"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}