The AArch64 processor (aka arm64), part 19: Miscellaneous instructions

Raymond Chen

There are far more instructions than I’m going to cover here in this series. I’ve skipped over the floating point instructions, the SIMD instructions, and specialty instructions that I haven’t yet seen come out of the compiler. I’m also largely skipping over the instructions that are not part of the core instruction set but are available only in optional extensions.

Here are a few that are still interesting, even if I haven’t seen the compiler generate them.

    ; count leading zeroes (high order bits)
    clz     Rd, Rm          ; Rd = number of leading zeroes in Rm

    ; count leading sign bits (high order bits)
    cls     Rd, Rm          ; Rd = number of leading sign bits in Rm

    ; reverse bits in register
    rbit    Rd, Rm          ; Rd = Rm bitwise reversed

    ; reverse bytes in register
    rev     Rd, Rm          ; Rd = Rm bytewise reversed

    ; reverse bytes in each halfword
    rev16   Rd, Rm

    ; reverse bytes in each word
    rev32   Rd, Rm

    ; reverse bytes in doubleword
    ; (pseudo-instruction, equivalent to rev with 64-bit register)
    rev64   Rd, Rm

A few miscellaneous bit-fiddling instructions. The reversal instructions are primarily for changing data endianness. AArch64 lost the REVSH instruction from AArch32.

The next few instructions provide multiprocessing hints.

    ; yield to other threads

    ; wait for interrupt (privileged instruction)

The YIELD instruction is a hint to multi-threading processors that the current thread should be de-prioritized in favor of other threads. You typically see this instruction dropped into spin loops, via the intrinsic __yield().

The WFI instruction instructs the processor to go into a low-power state until an interrupt occurs. There are other instructions related to “events” which I won’t bother going into.

The next few instructions are for communicating with the operating system:

        hlt     #imm16      ; halt
        svc     #imm16      ; system call
        brk     #imm16      ; software breakpoint
        udf     #imm16      ; undefined opcode

The instructions carry a 16-bit immediate that the operating system can choose to use for whatever purpose it desires.

The undefined opcode is a range of instructions from 0x00000000 through 0x0000ffff that are architecturally set aside as permanently undefined instructions.¹

Windows uses BRK for special operations.

        brk     #0xf000     ; breakpoint
        brk     #0xf001     ; assertion failure
        brk     #0xf002     ; debug service
        brk     #0xf003     ; fastfail
        brk     #0xf004     ; divide by zero

The divide-by-zero breakpoint is emitted by the compiler if it detects a zero denominator.

    cbnz    w0, @F          ; jump if denominator is nonzero
    brk     #0xf004         ; oops: manually raise div0 exception
@@: sdiv    w0, w1, w2      ; signed divide

And of course, we have this guy:


The NOP instruction does nothing but occupy space. Use it to pad code to meet alignment requirements, but do not use it for timing.

Now that we have the basic instruction set under our belt, we’ll look at the calling convention next time.

¹ This means that zero encodes udf #0, which will trap on an invalid instruction. This is different from classic ARM, where zero encodes andeq r0, r0, r0 which is functionally a nop, and Thumb-2, where zero is the movs r0, r0 instruction, which is a mostly-nop except that it sets flags. Personally, I’m a big fan of having zero encode an invalid instruction. It helps post-mortem debugging a lot.


Discussion is closed. Login to edit/delete existing comments.

  • Yukkuri Reimu 0

    Why 64k undefined instructions? Seems like a lot. For signaling maybe? But there’s already svc and brk…

    • Richard Thompson 0

      A lot of the time In C++ and similar, a lot of “null pointers” aren’t actually the value zero. Instead they’re small offsets from zero.

      For example, in C++ there’s a jump table (vtable) of the actual ‘final’ function that will be called for virtual functions.
      When you call a virtual function on a null object, the CPU calls some entry in a null jump table, which will have an address of something like zero + (8 * N) + offset.

      It also guards against a lot of real bad values – eg loop indices are relatively small most of the time etc.

  • Antonio Rodríguez 0

    It’s really interesting that ARM carries on the BRK mnemonic from the 6502 (together with the “hidden” immediate, more on this bellow) but does not assign it the opcode 0x00. I agree that it is a big help to have a trap, break point or invalid opcode at zero. It is, by far, the most frequent byte value in data memory, and it guarantees that any jump in the wild will break into the debugger almost immediately. It is specially useful when the OS or firmware has a built-in debugger, such as the Apple II’s Monitor.

    About the hidden parameter. The BRK instruction is treated like a software interrupt. It causes the PC (program counter) and P (status) registers to be pushed on the stack, like a hardware interrupt (arranged so the RTI instruction can resume execution). Normally, the address pushed is that of the *next* instruction (the point where the execution should resume). But after a BRK, the address pushed into the stack it *two* bytes after the one containing the 0x00 opcode. That suggests that the chip’s designers intended to have a 1 byte immediate to the BRK instruction to indicate a trap code, but failed to document it.

Feedback usabilla icon