How do compilers ensure that large stack allocations do not skip over the guard page?

Raymond Chen

Some time ago we took a closer look at the stack guard page and how a rogue stack access from another thread into the guard page could result in the guard page being lost. (And we used this information to investigate a stack overflow failure.)

You might have noticed that the “one guard page at a time” policy assumes that the stack grows one page at a time. But what if a function has a lot of local variables (or just one large local variable) such that the size of the local frame is greater than a page, and the first variable that the function uses is the one at the lowest address? That would result in a memory access in the reserved region (red in the diagram on the linked page), rather than in the guard page (yellow in the diagram), and since it’s not in a guard page, that is simply an invalid memory access, and the process would crash.

Yet processes don’t crash when this happens. How does that work?

The answer is that when the stack pointer needs to move by more than the size of a page (typically 4KB), the compiler generates a call to a helper function called something like _chkstk. The job of this function is to touch all of the pages spanned by the desired stack allocation, in order, so that guard pages can be converted to committed memory. The system maintains only one guard page, namely the page that is just below the allocated portion of the stack. Once you touch that guard page, the system converts it to a committed page, updates the stack limit, and creates a new guard page one page further down. That’s why the access has to be sequential: You have to make sure that the first access outside the stack limit is to wherever the guard page is.

The form of this stack-checking function has changed over the years, and we’ll be spending a few days doing a historical survey of how they worked. We’ll start next time with the 80386 family of processors, also known as x86-32 and i386.

Author

Raymond Chen

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

6 comments

Discussion is closed. Login to edit/delete existing comments.

LB March 13, 2026

Didn’t a previous blog entry discover that at some point a change was made to have multiple guard pages instead of just one, or am I misremembering?
Shawn Van Ness March 12, 2026

How does _alloca(size_t) play with this.. does it do the necessary _chkstk() probing?

I’m sure using a TLS slot is probably always better idea — but curious if this is an argument against the use of _alloca()
Joshua Hudson March 12, 2026 · Edited

I remember trying to look into this and discovering that on i386, MSVC and GCC expect different calling conventions for _chkstk and getting confused trying to sort it out.
Cole Tobin March 12, 2026

Why not have a page fault handler that detects the faulting address being the stack and page in the other pages?
- Csaba Varga March 12, 2026
  
  My guess: you don’t want an invalid pointer dereference to allocate a huge chunk of stack, just because the pointer happens to be pointing where the stack might grow, eventually. You want an invalid pointer dereference to segfault most of the time.
  
  99.9% of functions are happy with just having a single guard page, it causes zero overhead for them. The remaining 0.1% can probably tolerate the tiny performance hit of calling _chkstk . (The actual allocation may be costly, as pete.d’s example shows, but you would need to do that anyway when a function needs so much space from the stack.)
pete.d March 11, 2026

Fun fact: the initially released version of MS Flight Sim 2000 included a huge performance bug any time shorelines were in the rendered view (i.e. flying around bodies of water). Some quick and dirty profiling revealed the _chkstk function to be the cause. Turned out, some huge temporary data structure was being allocated on the stack for the purpose of the shoreline rendering, and so every frame this very costly paging in of dozens of pages (or was it hundreds? I don't even remember at this point) was killing the frame rate.

Switching to a globally allocated buffer that could be...
Read more
Fun fact: the initially released version of MS Flight Sim 2000 included a huge performance bug any time shorelines were in the rendered view (i.e. flying around bodies of water). Some quick and dirty profiling revealed the _chkstk function to be the cause. Turned out, some huge temporary data structure was being allocated on the stack for the purpose of the shoreline rendering, and so every frame this very costly paging in of dozens of pages (or was it hundreds? I don’t even remember at this point) was killing the frame rate.

Switching to a globally allocated buffer that could be reused each frame fixed the problem.

(And no, I wasn’t the person who created the bug, but I did wind up finding and fixing it. The online patch for the release included the fix, and there were so many other bugs in the initial release that the patch got published almost immediately after the release, so for most people the problem was only around a few weeks at the most.)

Read less