Our survey of stack limit checking wraps up with arm64, also known as AArch64.
The stack limit checking takes two forms, one simple version for pure arm64 processes, and a more complex version for Arm64EC. I’m going to look at the simple version. The complex version differs in that it has to check whether the code is running on the native arm64 stack or the emulation stack before calculating the stack limit. That part isn’t all that interesting.
; on entry, x15 is the number of paragraphs to allocate
; (bytes divided by 16)
; on exit, stack has been validated (but not adjusted)
; modifies x16, x17
chkstk:
subs x16, sp, x15, lsl #4
; x16 = sp - x15 * 16
; x16 = desired new stack pointer
csello x16, xzr, x16 ; clamp to 0 on underflow
mov x17, sp
and x17, x17, #-PAGE_SIZE ; round down to nearest page
and x16, x16, #-PAGE_SIZE ; round down to nearest page
cmp x16, x17 ; on the same page?
beq done ; Y: nothing to do
probe:
sub x17, x17, #PAGE_SIZE ; move to next page¹
ldr xzr, [x17] ; probe
cmp x17, x16 ; done?
bne probe ; N: keep going
done:
ret
The inbound value in x15 is the number of bytes desired divided by 16. Since the arm64 stack must be kept 16-byte aligned, we know that the division by 16 will not produce a remainder. Passing the amount in paragraphs expands the number of bytes expressible in a single constant load from 0xFFF0 to 0x0FFF0 (via the movz instruction), allowing convenient allocation of stack frames up to just shy of a megabyte in size. Since the default stack size is a megabyte, this is sufficient to cover all typical usages.
Here’s an example of how a function might use chkstk in its prologue:
mov x15, #17328/16 ; desired stack frame size divided by 16
bl chkstk ; ensure enough stack space available
sub sp, sp, x15, lsl #4 ; reserve the stack space
Okay, so let’s summarize all of the different stack limit checks into a table, because people like tables.
| Â | x86-32 | MIPS | PowerPC | Alpha AXP | x86-64 | AArch64 |
|---|---|---|---|---|---|---|
| unit requested | Bytes | Bytes | Negative bytes | Bytes | Bytes | Paragraphs |
| adjusts stack pointer before returning | Yes | No | No | No | No | No |
| detects stack placement at runtime | No | Yes | Yes | Yes | Yes | Yes |
| short-circuits | No | Yes | Yes | Yes | Yes | No |
| probe operation | Read | Write | Read | Write | Either | Read |
As we discussed earlier, if the probe operation is a write, then short-circuiting is mandatory.
¹ If you’re paying close attention, you may have noticed that PAGE_SIZE is too large to fit in a 12-bit immediate constant. No problem, because the assembler rewrites it as
sub x17, x17, #PAGE_SIZE/4096, lsl #12
0 comments
Be the first to start the discussion.