Windows stack limit checking retrospective: x86-32, also known as i386

Raymond Chen

We start our survey of historical stack limit checking functions on Windows with the 80386 family of processors. This function has actually changed form over the years, so we’ll start with the “original flavor”.

Originally, the _chkstk function was called by putting the desired number of bytes in the eax register and calling the _chkstk function. The function touched each page of the stack, adjusted the stack pointer, and then returned with the adjusted stack pointer. This is an unusual calling convention since it is neither caller clean, nor is it callee clean. It’s callee-dirty! The function returns with more stack than it started.

_chkstk:
    push    ecx             ; preserve register

    ; calculate the stack pointer of the caller
    mov     ecx, esp
    add     ecx, 8          ; 4 bytes were auto-pushed for the return address,
                            ; we pushed 4 bytes for the ecx

touch:
    cmp     eax, PAGE_SIZE  ; less than a page to go?
    jb      finalpage       ; do the last page and finish
    sub     ecx, PAGE_SIZE  ; allocate a page from our pretend stack pointer
    or      dword ptr [ecx], 0 ; touch the memory
    sub     eax, PAGE_SIZE  ; did a page
    jmp     touch           ; go back and do some more

finalpage:
    sub     ecx, eax        ; allocate the leftovers from our pretend stack pointer
    or      dword ptr [ecx], 0 ; touch the memory
    mov     eax, esp        ; remember original stack pointer
    mov     esp, ecx        ; move the real stack to match our pretend stack
    mov     ecx, [eax]      ; recover original ecx
    mov     eax, 4[eax]     ; recover return address
    jmp     eax             ; "return" to caller

A function with a large stack frame would go something like

function:
    push    ebp         ; link into frame chain
    mov     ebp, esp
    push    ebx         ; save non-volatile register
    push    esi
    push    edi
    mov     eax, 17320  ; large stack frame
    call    _chkstk     ; allocate it from our stack safely
                        ; behaves like "sub esp, eax"

This goes into the competition for “wackiest x86-32 calling convention.”¹

Next time, we’ll look at how stack probing happens on MIPS, which has its own quirks, but nothing as crazy as this.

Bonus chatter: The strange calling convention dates back to the 16-bit 8086. And back then, there were two versions of the chkstk function, depending on whether you were calling it far or near.

; frame size in ax

chkstk:
#if NEAR
    pop     bx          ; pop 16-bit return address
#else // FAR
    pop     bx          ; pop 32-bit return address
    pop     dx
#endif

    inc     ax
    and     al, 0xFE    ; round up to even

    sub     ax, sp      ; check for stack overflow
    jae     overflow    ; Y: overflow
    neg     ax          ; ax = new stack pointer

    cmp     ax, ss:[pStackTop]
    ja      overflow    ; stack mysteriously too high

    cmp     ax, ss:[pStackMin] ; new stack limit?
    jbe     nochange
    mov     ss:[pStackMin], ax ; update stack limit
nochange:

    mov     sp, ax      ; update the stack pointer

#if NEAR
    jmp     bx          ; "return" to caller
#else // FAR
    push    dx          ; restore return address
    push    bx
    retf                ; return to caller
#endif

Author

Raymond Chen

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

9 comments

Discussion is closed. Login to edit/delete existing comments.

Jonathan Harston March 15, 2026

Reminds me of code I had to do with BSD Unix on the PDP11. On startup to set up the stack I had to copy the return address into a register, go through a loop "touching" memory downwards until it was ok, tell the OS I was moving the stack - and this is why I had to get the return address - once you told the OS you were moving the stack the stack vanished! so I set up the new stack and returned via the register value. But you couldn't move the stack before telling the OS...
Read more
Reminds me of code I had to do with BSD Unix on the PDP11. On startup to set up the stack I had to copy the return address into a register, go through a loop “touching” memory downwards until it was ok, tell the OS I was moving the stack – and this is why I had to get the return address – once you told the OS you were moving the stack the stack vanished! so I set up the new stack and returned via the register value. But you couldn’t move the stack before telling the OS as the destination memory didn’t exist. But once you’d told the OS you were moving the stack the source memory didn’t exist. That was a *PAIN* to debug.

Read less
Euro Micelli March 13, 2026 · Edited

I was very confused by the Bonus Chatter for a while. What would a 16-bit chkstk be for? Stacks are fixed at compile time on 16-bit Windows, and you can’t grow the stack anyway because it’s sandwiched between the static data and the local heap (Petzold 3.1 p281).
But of course, you would like to AT LEAST detect and defend against a stack overflow even if the hardware won’t help and your only option is to bail out. I presume the win16 functions are called in every function’s prolog and will trigger an Application Fault if the allocated stack would...
Read more
I was very confused by the Bonus Chatter for a while. What would a 16-bit chkstk be for? Stacks are fixed at compile time on 16-bit Windows, and you can’t grow the stack anyway because it’s sandwiched between the static data and the local heap (Petzold 3.1 p281).
But of course, you would like to AT LEAST detect and defend against a stack overflow even if the hardware won’t help and your only option is to bail out. I presume the win16 functions are called in every function’s prolog and will trigger an Application Fault if the allocated stack would be exceeded. It would also need to be called by alloca. And I remember there was a compiler flag to enable/disable such protection; it probably controlled whether to include/omit such call from the prolog.

Read less
Stephan Leclercq March 13, 2026

Is it just me, or there is a mixup between eax and ecx ? The caller should set eax, not ecx, and the first “push ecx” does not preserve the allocation size, it just preserves whatever is in ecx.
- Swap Swap March 13, 2026
  
  Agreed
Csaba Varga March 12, 2026 · Edited
Wow, this must be old code indeed, if it doesn’t worry about returning with a JMP. The x86 branch predictor assumes every CALL to be paired with a RET, so it will mispredict a bunch of future RETs if you get back to your caller without using a RET. The predictor-friendly way to return would be replacing the final two instructions with:
```
    push 4[eax]           ; copy return address to the top of the stack
    ret                   ; return to caller
```
- Neil Rashbrook March 13, 2026
  
  On the face of it, does chkstk behave similarly to alloca, or am I missing something?
  - Mark Fling March 13, 2026
    
    _alloca is the stack allocator intrinsic. __alloca_probe is emitted to check on _alloca. IIRC, it’s equivalent to __chkstk.
    
    I’m guilty of using _alloca, judiciously, when it makes sense. I just keep my allocations below PAGE_SIZE.
- Neil Rashbrook March 13, 2026
  
  None of the 16-bit x86 microprocessors had a branch predictor at all; the Pentium Pro was the first to have a return address stack; the Pentium did have branch prediction but I don’t think it needed strict pairing so while it would be temporarily confused by the JMP it would get its act together again at the next RET.
Peter Cooper Jr. March 12, 2026

Is it me or is there a footnote ¹ without it pointing to anything. Is that, like, a joke since it’s pointing to a page that hasn’t yet been allocated?