Some time ago we took a closer look at the stack guard page and how a rogue stack access from another thread into the guard page could result in the guard page being lost. (And we used this information to investigate a stack overflow failure.)
You might have noticed that the “one guard page at a time” policy assumes that the stack grows one page at a time. But what if a function has a lot of local variables (or just one large local variable) such that the size of the local frame is greater than a page, and the first variable that the function uses is the one at the lowest address? That would result in a memory access in the reserved region (red in the diagram on the linked page), rather than in the guard page (yellow in the diagram), and since it’s not in a guard page, that is simply an invalid memory access, and the process would crash.
Yet processes don’t crash when this happens. How does that work?
The answer is that when the stack pointer needs to move by more than the size of a page (typically 4KB), the compiler generates a call to a helper function called something like _chkstk. The job of this function is to touch all of the pages spanned by the desired stack allocation, in order, so that guard pages can be converted to committed memory. The system maintains only one guard page, namely the page that is just below the allocated portion of the stack. Once you touch that guard page, the system converts it to a committed page, updates the stack limit, and creates a new guard page one page further down. That’s why the access has to be sequential: You have to make sure that the first access outside the stack limit is to wherever the guard page is.
The form of this stack-checking function has changed over the years, and we’ll be spending a few days doing a historical survey of how they worked. We’ll start next time with the 80386 family of processors, also known as x86-32 and i386.
Fun fact: the initially released version of MS Flight Sim 2000 included a huge performance bug any time shorelines were in the rendered view (i.e. flying around bodies of water). Some quick and dirty profiling revealed the _chkstk function to be the cause. Turned out, some huge temporary data structure was being allocated on the stack for the purpose of the shoreline rendering, and so every frame this very costly paging in of dozens of pages (or was it hundreds? I don't even remember at this point) was killing the frame rate.
Switching to a globally allocated buffer that could be...