March 11th, 2026
likeheartintriguing6 reactions

How do compilers ensure that large stack allocations do not skip over the guard page?

Some time ago we took a closer look at the stack guard page and how a rogue stack access from another thread into the guard page could result in the guard page being lost. (And we used this information to investigate a stack overflow failure.)

You might have noticed that the “one guard page at a time” policy assumes that the stack grows one page at a time. But what if a function has a lot of local variables (or just one large local variable) such that the size of the local frame is greater than a page, and the first variable that the function uses is the one at the lowest address? That would result in a memory access in the reserved region (red in the diagram on the linked page), rather than in the guard page (yellow in the diagram), and since it’s not in a guard page, that is simply an invalid memory access, and the process would crash.

Yet processes don’t crash when this happens. How does that work?

The answer is that when the stack pointer needs to move by more than the size of a page (typically 4KB), the compiler generates a call to a helper function called something like _chkstk. The job of this function is to touch all of the pages spanned by the desired stack allocation, in order, so that guard pages can be converted to committed memory. The system maintains only one guard page, namely the page that is just below the allocated portion of the stack. Once you touch that guard page, the system converts it to a committed page, updates the stack limit, and creates a new guard page one page further down. That’s why the access has to be sequential: You have to make sure that the first access outside the stack limit is to wherever the guard page is.

The form of this stack-checking function has changed over the years, and we’ll be spending a few days doing a historical survey of how they worked. We’ll start next time with the 80386 family of processors, also known as x86-32 and i386.

Topics

Author

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

6 comments

Discussion is closed. Login to edit/delete existing comments.

Sort by :
  • LB

    Didn’t a previous blog entry discover that at some point a change was made to have multiple guard pages instead of just one, or am I misremembering?

  • Shawn Van Ness

    How does _alloca(size_t) play with this.. does it do the necessary _chkstk() probing?

    I’m sure using a TLS slot is probably always better idea — but curious if this is an argument against the use of _alloca()

  • Joshua Hudson · Edited

    I remember trying to look into this and discovering that on i386, MSVC and GCC expect different calling conventions for _chkstk and getting confused trying to sort it out.

  • Cole Tobin

    Why not have a page fault handler that detects the faulting address being the stack and page in the other pages?

    • Csaba Varga

      My guess: you don’t want an invalid pointer dereference to allocate a huge chunk of stack, just because the pointer happens to be pointing where the stack might grow, eventually. You want an invalid pointer dereference to segfault most of the time.

      99.9% of functions are happy with just having a single guard page, it causes zero overhead for them. The remaining 0.1% can probably tolerate the tiny performance hit of calling _chkstk . (The actual allocation may be costly, as pete.d’s example shows, but you would need to do that anyway when a function needs so much space from the stack.)