We continue our historical survey of Windows stack-checking functions by looking at the PowerPC.
The weird thing here is that on PowerPC, you ask for the negative of the stack frame size. We’ll see why soon.
; on entry, r12 is the *negative* of the number of bytes to allocate
; on exit, stack has been validated (but not adjusted)
chkstk:
subi r0, r12, PAGE_SIZE - 1 ; expand by another page to make sure we get it all
; get the stack limit for the current stack
cmpwi sp, 0 ; check what kind of stack we are on¹
add r0, r0, sp ; r0 = proposed new stack limit
bge+ usermode ; nonnegative means user mode
mfsprg r11, 1 ; get kernel thread state
lwz r11, StackStart(r11) ; where the stack started
subi r11, r11, KERNEL_STACK_SIZE ; where the stack ends
b havelimit
usermode:
lwz r11, StackLimit(r13) ; get stack limit from TEB
havelimit:
sub r0, r11, r0 ; r0 = bytes of stack growth needed
srawi. r0, r0, 12 ; r0 = pages of stack growth needed
blelr ; if ≤ 0, then nothing to do
mtctr r0 ; prepare to loop
probe:
lwzu r0, -PAGE_SIZE(r11) ; touch a page and adjust r11
bdnz probe ; keep touching
blr ; return
As with the MIPS version, this code short-circuits the case where the stack has already grown enough to accommodate the allocation, but in order to do the calculations, it has to know where the stack limit is, which in turn means sniffing at the stack pointer to see whether it is a user-mode stack or a kernel-mode stack. This relies on the fact that on the PowerPC, the kernel/user split is architectural at the midpoint of the address space.
You would call this by doing something like
mflr r0 ; move return address to r0
stw r29, -12(r1) ; save non-volatile register
stw r30, -8(r1) ; save non-volatile register
stw r31, -4(r1) ; save non-volatile register
stw r0, -16(r1) ; save return address
li r12, -17320 ; large stack frame (negative)
bl _chkstk ; fault pages in if necessary
stwxu r1, r12, r1 ; create stack frame and link
; store r1 to memory at r12 + r1
; r1 = r12 + r1
And now we see why the chkstk function wants the stack frame size as a negative number: A negative number allows the caller to use the atomic stwxu indexed store and update instruction. The indexed store instructions add two registers to calculate the effective address. There is no variant that subtracts two registers, so using a negative number lets us get the effect of a subtraction while still formally performing an addition.
Next time, we’ll look at changes to the 80386 stack limit checker.
¹ I think there’s a micro-optimization opportunity here: Instead of
cmpwi sp, 0 ; check what kind of stack we are on¹
add r0, r0, sp ; r0 = proposed new stack limit
bge+ usermode ; nonnegative means user mode
we could ask the add to update the flags and use that result.
add. r0, r0, sp ; r0 = proposed new stack limit
; and see what kind of stack it is
bge+ usermode ; nonnegative means user mode
This produces a different result if the value in t8 was so large that it crossed from user mode to kernel mode or vice versa, but that’s okay. The old code didn’t handle that case either!
0 comments
Be the first to start the discussion.