March 20th, 2026
0 reactions

Windows stack limit checking retrospective: arm64, also known as AArch64

Our survey of stack limit checking wraps up with arm64, also known as AArch64.

The stack limit checking takes two forms, one simple version for pure arm64 processes, and a more complex version for Arm64EC. I’m going to look at the simple version. The complex version differs in that it has to check whether the code is running on the native arm64 stack or the emulation stack before calculating the stack limit. That part isn’t all that interesting.

; on entry, x15 is the number of paragraphs to allocate
;           (bytes divided by 16)
; on exit, stack has been validated (but not adjusted)
; modifies x16, x17

chkstk:
    subs    x16, sp, x15, lsl #4
                            ; x16 = sp - x15 * 16
                            ; x16 = desired new stack pointer
    csello  x16, xzr, x16   ; clamp to 0 on underflow

    mov     x17, sp
    and     x17, x17, #-PAGE_SIZE   ; round down to nearest page
    and     x16, x16, #-PAGE_SIZE   ; round down to nearest page

    cmp     x16, x17        ; on the same page?
    beq     done            ; Y: nothing to do

probe:
    sub     x17, x17, #PAGE_SIZE ; move to next page¹
    ldr     xzr, [x17]      ; probe
    cmp     x17, x16        ; done?
    bne     probe           ; N: keep going

done:
    ret

The inbound value in x15 is the number of bytes desired divided by 16. Since the arm64 stack must be kept 16-byte aligned, we know that the division by 16 will not produce a remainder. Passing the amount in paragraphs expands the number of bytes expressible in a single constant load from 0xFFF0 to 0x0FFF0 (via the movz instruction), allowing convenient allocation of stack frames up to just shy of a megabyte in size. Since the default stack size is a megabyte, this is sufficient to cover all typical usages.

Here’s an example of how a function might use chkstk in its prologue:

    mov     x15, #17328/16      ; desired stack frame size divided by 16
    bl      chkstk              ; ensure enough stack space available
    sub     sp, sp, x15, lsl #4 ; reserve the stack space

Okay, so let’s summarize all of the different stack limit checks into a table, because people like tables.

  x86-32 MIPS PowerPC Alpha AXP x86-64 AArch64
unit requested Bytes Bytes Negative bytes Bytes Bytes Paragraphs
adjusts stack pointer before returning Yes No No No No No
detects stack placement at runtime No Yes Yes Yes Yes Yes
short-circuits No Yes Yes Yes Yes No
probe operation Read Write Read Write Either Read

As we discussed earlier, if the probe operation is a write, then short-circuiting is mandatory.

¹ If you’re paying close attention, you may have noticed that PAGE_SIZE is too large to fit in a 12-bit immediate constant. No problem, because the assembler rewrites it as

    sub x17, x17, #PAGE_SIZE/4096, lsl #12

Author

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

0 comments