The AArch64 processor (aka arm64), part 19: Miscellaneous instructions
There are far more instructions than I’m going to cover here in this series. I’ve skipped over the floating point instructions, the SIMD instructions, and specialty instructions that I haven’t yet seen come out of the compiler. I’m also largely skipping over the instructions that are not part of the core instruction set but are available only in optional extensions.
Here are a few that are still interesting, even if I haven’t seen the compiler generate them.
; count leading zeroes (high order bits) clz Rd, Rm ; Rd = number of leading zeroes in Rm ; count leading sign bits (high order bits) cls Rd, Rm ; Rd = number of leading sign bits in Rm ; reverse bits in register rbit Rd, Rm ; Rd = Rm bitwise reversed ; reverse bytes in register rev Rd, Rm ; Rd = Rm bytewise reversed ; reverse bytes in each halfword rev16 Rd, Rm ; reverse bytes in each word rev32 Rd, Rm ; reverse bytes in doubleword ; (pseudo-instruction, equivalent to rev with 64-bit register) rev64 Rd, Rm
A few miscellaneous bit-fiddling instructions. The reversal instructions are primarily for changing data endianness. AArch64 lost the
REVSH instruction from AArch32.
The next few instructions provide multiprocessing hints.
; yield to other threads yield ; wait for interrupt (privileged instruction) wfi
YIELD instruction is a hint to multi-threading processors that the current thread should be de-prioritized in favor of other threads. You typically see this instruction dropped into spin loops, via the intrinsic
WFI instruction instructs the processor to go into a low-power state until an interrupt occurs. There are other instructions related to “events” which I won’t bother going into.
The next few instructions are for communicating with the operating system:
hlt #imm16 ; halt svc #imm16 ; system call brk #imm16 ; software breakpoint udf #imm16 ; undefined opcode
The instructions carry a 16-bit immediate that the operating system can choose to use for whatever purpose it desires.
The undefined opcode is a range of instructions from
0x0000ffff that are architecturally set aside as permanently undefined instructions.¹
BRK for special operations.
brk #0xf000 ; breakpoint brk #0xf001 ; assertion failure brk #0xf002 ; debug service brk #0xf003 ; fastfail brk #0xf004 ; divide by zero
The divide-by-zero breakpoint is emitted by the compiler if it detects a zero denominator.
cbnz w0, @F ; jump if denominator is nonzero brk #0xf004 ; oops: manually raise div0 exception @@: sdiv w0, w1, w2 ; signed divide
And of course, we have this guy:
NOP instruction does nothing but occupy space. Use it to pad code to meet alignment requirements, but do not use it for timing.
Now that we have the basic instruction set under our belt, we’ll look at the calling convention next time.
¹ This means that zero encodes
udf #0, which will trap on an invalid instruction. This is different from classic ARM, where zero encodes
andeq r0, r0, r0 which is functionally a nop, and Thumb-2, where zero is the
movs r0, r0 instruction, which is a mostly-nop except that it sets flags. Personally, I’m a big fan of having zero encode an invalid instruction. It helps post-mortem debugging a lot.
It’s really interesting that ARM carries on the BRK mnemonic from the 6502 (together with the “hidden” immediate, more on this bellow) but does not assign it the opcode 0x00. I agree that it is a big help to have a trap, break point or invalid opcode at zero. It is, by far, the most frequent byte value in data memory, and it guarantees that any jump in the wild will break into the debugger almost immediately. It is specially useful when the OS or firmware has a built-in debugger, such as the Apple II’s Monitor.
About the hidden parameter. The BRK instruction is treated like a software interrupt. It causes the PC (program counter) and P (status) registers to be pushed on the stack, like a hardware interrupt (arranged so the RTI instruction can resume execution). Normally, the address pushed is that of the *next* instruction (the point where the execution should resume). But after a BRK, the address pushed into the stack it *two* bytes after the one containing the 0x00 opcode. That suggests that the chip’s designers intended to have a 1 byte immediate to the BRK instruction to indicate a trap code, but failed to document it.
From right up-to-date to a blast from the past: https://scarybeastsecurity.blogspot.com/2020/11/reverse-engineering-forgotten-1970s.html
This old Intel 8721 disk controller also had a yield instruction and some curious task management built in.
A fascinating read to go alongside this fascinating read.
Why 64k undefined instructions? Seems like a lot. For signaling maybe? But there’s already svc and brk…
A lot of the time In C++ and similar, a lot of “null pointers” aren’t actually the value zero. Instead they’re small offsets from zero.
For example, in C++ there’s a jump table (vtable) of the actual ‘final’ function that will be called for virtual functions.
When you call a virtual function on a null object, the CPU calls some entry in a null jump table, which will have an address of something like zero + (8 * N) + offset.
It also guards against a lot of real bad values – eg loop indices are relatively small most of the time etc.