The AArch64 processor (aka arm64), part 12: Memory access and alignment

Accessing memory is done primarily through load and store instructions.

    ; load word or doubleword register
    ldr     Rn/zr, [...]

    ; load unsigned byte
    ldrb    Wn/zr, [...]

    ; load signed byte
    ldrsb   Rn/zr, [...]

    ; load unsigned halfword
    ldrh    Wn/zr, [...]

    ; load signed halfword
    ldrsh   Rn/zr, [...]

    ; load signed word
    ldrsw   Xn/zr, [...]

    ; load pair of registers
    ldp     Rd1/zr, Rd2/zr, [...]

    ; load pair of registers as signed word
    ldpsw   Xd1/zr, Xd2/zr, [...]

AArch64 does not have AArch32’s LDM instruction for loading up to 13 registers at once. As a consolation present, it gives you a LDP instruction for loading two registers, either 32-bit or 64-bit, from consecutive bytes of memory. (The first register uses the lower address.) The LDP instruction is commonly used with the 64-bit registers to load spilled registers from the stack.

There is a corresponding selection of instructions for storing to memory, but obviously the sign extension variations are not relevant.

    ; store word or doubleword register
    str     Rn/zr, [...]

    ; store byte
    strb    Wn/zr, [...]

    ; store halfword
    strh    Wn/zr, [...]

    ; store pair of registers
    stp     Rd1/zr, Rd2/zr, [...]

Not all addressing modes are available for all variations. This is not something you worry about when reading assembly language, but it’s something you need to keep in mind when writing it.

Size	`[Xn/sp, #imm]` (−256 … +255)	`[Xn/sp, #imm]` `[Xn/sp, #imm]!` `[Xn/sp], #imm`	`[pc, #imm]` (±1MB)	`[Xn/sp, Rn/zr, extend]`
byte	•	•		•
halfword	•	•		•
word	•	•	loads only	•
doubleword	•	•	loads only	•
pair		•

The reach of the second column is is (0 … 4095) × size, except that the reach of the the register pairs is (−64 … 63) × size.

All operand sizes support register indirect with offset. Only word and doubleword support pc-relative (and even those are supported only for loads). And register pairs support only register indirect with offset.

There are some ambiguous encodings, because a constant offset in the range 0 … 255 that is a multiple of the operand size can be encoded either as a 9-bit signed byte offset, or as a 12-bit unsigned element offset. By default, assemblers will use the 12-bit unsigned element offset, but you can force the 9-bit signed byte offset by changing the opcode from LDxxx and STxxx to LDUxxx and STUxxx. The U stands for unscaled.

Windows enables automatic unaligned access fixups. Simple unaligned memory accesses are fixed up automatically by the processor, but you lose atomicity: It is possible for an unaligned memory access to read a torn value. Any such tearing is at the byte level.

Original value	`12`	`34`	`56`	`78`	aligned

Processor 1 reads					misaligned

Processor 2 writes	`AB`	`CD`	`EF`	`01`	aligned

You can still take alignment faults if the misaligned memory access is fancy, such as a locked load, store exclusive, or a load with a memory barrier. We’ll learn about these special memory accesses next time.

Author

Raymond Chen

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

The AArch64 processor (aka arm64), part 12: Memory access and alignment

Author

0 comments

Read next

The AArch64 processor (aka arm64), part 13: Atomic access

The AArch64 processor (aka arm64), part 14: Barriers

Author

0 comments

Read next

The AArch64 processor (aka arm64), part 13: Atomic access

The AArch64 processor (aka arm64), part 14: Barriers

Stay informed