March 18th, 2026
0 reactions

Windows stack limit checking retrospective: Alpha AXP

We continue our historical survey of Windows stack-checking functions by looking at the Alpha AXP.

; on entry, t12 is the number of bytes to allocate
; on exit, stack has been validated (but not adjusted)
; modifies t8, t9, t10

_chkstk:
    subq    sp, t12, t8     ; t8 = new stack pointer
    mov     v0, t10         ; save v0 in t10 (call_pal will overwrite it)
    bgt     sp, usermode    ; branch if running in user mode

    call_pal rdksp          ; PAL call to get start of kernel stack in v0
    lda     t9, -KERNEL_STACK_SIZE(v0) ; t9 = end of stack
    br      zero, havelimit

usermode:
    call_pal rdteb          ; PAL call to get TEB in v0
    ldl     t9, StackLimit(v0) ; t9 = end of stack

havelimit:
    mov     t10, v0         ; recover original v0 for caller

    cmpult  t8, t9, t10     ; is stack growth needed?
    beq     done            ; N: then nothing to do

    ldil    r10, -PAGE_SIZE
    and     t8, t10, t8     ; round down to nearest page

probe:
    lda     t9, -PAGE_SIZE(t9)  ; prepare to touch a page
    stq     zero, 0(t9)         ; touch it
    cmpeq   t8, t9, t10         ; finished?
    beq     t10, probe          ; N: keep going

done:
    ret     zero, (ra)          ; return to caller

We see a lot of similarities to MIPS and PowerPC: The code short-circuits the case where the stack does not need to expand, and it relies on the architectural split between user mode and kernel mode at the halfway point in the address space. As with MIPS (but not PowerPC or 80386), the probe loop writes to the memory to fault it in.¹

A new wrinkle here is that this code uses 64-bit calculations when adjusting the stack pointer. The Alpha AXP is a 64-bit processor. Although it doesn’t have a “32-bit mode”, you can still pretend that it’s a 32-bit processor by simply choosing not to use any of the 64-bit features.

This code appears to have been written early in the history of the Alpha AXP project, and it contains some seemingly unnecessary register preservation. For example, it goes out of its way to preserve v0, even though v0 is a volatile register that does not contain anything interesting on entry to the function. My theory is that it does this because it wants to maintain compatibility with a non-Microsoft compiler that might use v0 as part of its calling convention, and this allowed the Windows NT team to start porting their operating system without having to wait for the Microsoft Languages team to come up with an Alpha AXP version of the Microsoft Visual C compiler.

Here is a typical usage of this function to build a large stack frame in a function prologue:

    mov     ra, t11             ; save return address
    ldil    t12, 17320          ; large stack frame
    bsr     ra, _chkstk         ; fault pages in if necessary
    subsq   sp, t12, sp         ; allocate the stack frame
    mov     t11, ra             ; restore return address for standard entry

The prologue relies on the fact that the _chkstk function preserves the t11 register.

Next time, we’ll jump back to the present by looking at the stack limit checking on x86-64 (also known as amd64).

¹ My new theory is that writing to memory as part of stack expansion avoids a soft page fault. If you only read, then the memory manager maps in a shared zero page to satisfy the read, and then marks the page as copy-on-write.² And then the stack actually expands into that space and you take a soft page fault to promote the copy-on-write page to a full write page.³

² The idea here is that if you create a bunch of zero pages, the memory manager can map a single page of zeroes into all of them and make the pages copy-on-write. If you don’t actually write to them before the pages get paged out, then it avoided having to do the work of finding a writable zero page to map in, and it also avoids writing the modified page when the page gets paged out. Allocating a page that is never used happens a lot: You might allocate a megabyte of memory but use only the first 64KB of it.

³ The calculation here is “When the stack expands, what is the likelihood that the memory will actually be written to?” If the stack expands by just a little bit, then the likelihood is high, but if it expands by a lot, then the likelihood decreases because the large stack expansion is probably due to a large stack array, and there’s a good chance that the function won’t actually use all of it. It’s a cost/benefit analysis, and the authors of the _chkstk functions came to different conclusions, perhaps based on different usage patterns by code written for different processors.

Topics
Code

Author

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

1 comment

Sort by :
  • skSdnW 7 seconds ago

    call_pal = call privileged architecture library