The AArch64 processor (aka arm64), part 1: Introduction

The 64-bit version of the ARM architecture is formally known as AArch64. It is the 64-bit version of classic 32-bit ARM, which has been retroactively renamed AArch32.

Even though the architecture formally goes by the name AArch64, many people (including Windows) call it arm64. Even more confusing, the instruction set is called A64. (The 32-bit ARM instruction sets have also been retroactively renamed: Classic ARM is now called A32, and Thumb-2 is now called T32.)

AArch64 differs from AArch32 so much that I’m going to cover it fresh rather than treating it as an extension of AArch32. That said, I will nevertheless call out notable points of difference from AArch32.

No more Thumb mode

AArch64 is an extension of the classic ARM instruction set, not an extension of Thumb-2. So we’re back to fixed-size 32-bit instructions (aligned on 4-byte boundaries). No more gymnastics with low registers and high registers, or using non-intuitive instructions to avoid a 32-bit encoding, or remembering to set the bottom bit on code addresses to avoid accidentally switching into classic mode.

A note for those familiar with the classic ARM instruction set: One thing that did not get carried forward was arbitrary predication. The answers to this StackOverflow question dig into the reasons why predication was removed. Short version: Predication is rarely used, it consumes a lot of opcode space, it doesn’t interact well with out-of-order execution, and branch prediction is almost as good.

Data sizes

The architectural terms for data sizes are the same as AArch32.

Term	Size
byte	8 bits
halfword	16 bits
word	32 bits
doubleword	64 bits

The processor supports both big-endian and little-endian operation. Windows uses it exclusively in little-endian mode. AArch64 lost the Aarch32 SETEND instruction for switching endianness from user mode. Not that Windows supported it anyway.

Registers

Everything has doubled. The general-purpose registers are now 64 bits wide instead of 32. And the number of such registers has doubled from 16 to 32 okay just 31. The encoding that would correspond to register 31 has been reused for other purposes. So not quite doubled.

Register	Preserved?	Notes
`x0`	No	Parameter 1, return value
`x1`	No	Parameter 2
`x2`	No	Parameter 3
`x3`	No	Parameter 4
`x4`	No	Parameter 5
`x5`	No	Parameter 6
`x6`	No	Parameter 7
`x7`	No	Parameter 8
`x8`	No
`x9`	No
`x10`	No
`x11`	No
`x12`	No
`x13`	No
`x14`	No
`x15`	No
`x16` (`xip0`)	Volatile	Intra-procedure call scratch register
`x17` (`xip1`)	Volatile	Intra-procedure call scratch register
`x18` (`xpr`)	read-only	TEB
`x19`	Yes
`x20`	Yes
`x21`	Yes
`x22`	Yes
`x23`	Yes
`x24`	Yes
`x25`	Yes
`x26`	Yes
`x27`	Yes
`x28`	Yes
`x29` (`fp`)	Yes	frame pointer
`x30` (`lr`)	No	link register
register “31” usually represents `sp` or `zr`, depending on instruction

The link register is architectural; the rest are convention.

You can refer to the least significant 32 bits of each 64-bit register by changing the leading x to a w, so we have w0 through w30. If an instruction targets a w register, the result is zero-extended to fill the x register.¹

Particularly notable is that the stack pointer sp and program counter pc are no longer general-purpose registers, like they were in AArch32. The registers still exist, but they are treated as special registers rather than being encoded in the same way as the other general-purpose registers.

In AArch64, the pc special register reads as the address of the instruction being executed, rather than being four bytes ahead, as it was in AArch32. The extra +4 in AArch32 was an artifact of the internal pipelining of the original ARM and became a backward compatibility constraint even as the pipeline depth changed.

Windows requires that the stack remain 16-byte aligned, and it enables hardware enforcement of this requirement. The 32-bit subregister of sp is called wsp, although it is of no practical use. (The 64-bit register is still called sp, not xsp. Go figure.)

There is a 16-byte red zone below the stack pointer, but it’s reserved for code analysis. Intrusive profilers inject assembly language fragments into compiled code to update profiling information, and they need some space to store two registers so they can free up some registers to do their profiling work.

The xip0 and xip1 registers are volatile because they are used to assist with branch instructions that try to branch to an address that is out of range. We’ll see later that these registers are also used by function prologues and epilogues.

There is a new xzr pseudo-register (and its 32-bit alias wzr) which reads as zero, and writes are ignored. As I noted in the above table, if an instruction encodes a register number of 31, then a special behavior kicks in, typically by treating mythical register 31 as an alias for sp or zr. Generally speaking, when being used as a base address register, imaginary register 31 represents sp, but when used for arithmetic or as a destination register, it represents zr.²

In instruction descriptions, I will use these shorthands:

Shorthand	Meaning
`Xn`	Any `x#` register
`Xn/zr`	Any `x#` register or `xzr`
`Xn/sp`	Any `x#` register or `sp`
`Wn`	Any `w#` register
`Wn/zr`	Any `w#` register or `wzr`
`Wn/sp`	Any `w#` register or `wsp`
`Rn`	Any `x#` or `w#` register
`Rn/zr`	Any `x#` register, `w#` register, `xzr` or `wzr`

The floating point registers have been reorganized. They have doubled in size (to 128 bits) as well as in number, and the single-precision registers are no longer paired up.

Register	Preserved?	Notes
`v0`	No	Parameter 1, return value
`v1`	No	Parameter 2
`v2`	No	Parameter 3
`v3`	No	Parameter 4
`v4`	No	Parameter 5
`v5`	No	Parameter 6
`v6`	No	Parameter 7
`v7`	No	Parameter 8
`v8` through `v15`	Low 64 bits only	Upper 64 bits are not preserved
`v16` through `v31`	No

Each floating point register can be viewed in multiple ways. The partial registers are stored in the least significant bits of the full register.

Name	Meaning	Notes
`v#`	SIMD vector
`q#`	128-bit value	quad precision
`d#`	64-bit value	double precision
`s#`	32-bit value	single precision
`h#`	16-bit value	half precision
`b#`	8-bit value

The flags register is formally known as the Application Program Status Register (APSR). The flags available to user mode are the same as in AArch32:

Mnemonic	Meaning	Notes
N	Negative	Set if the result is negative
Z	Zero	Set if the result is zero
C	Carry	Multiple purposes
V	Overflow	Signed overflow
Q	Saturation	Accumulated overflow
GE[n]	Greater than or equal to	4 flags (SIMD)

The overflow flag records whether the most recent operation resulted in signed overflow. The saturation flag is used by multimedia instructions to accumulate whether any overflow occurred since it was last cleared. The GE flags record the result of SIMD operations. By convention, flags are not preserved across calls.

There are a number of AArch64 features that you are extremely unlikely to see in Windows code, such as tagged pointers, tagged memory, and pointer authentication, so I won’t cover them here. I also won’t cover floating point instructions or SIMD instructions.

Next time, we’ll look at some of the weird transformations that can be performed inside an instruction.

Additional references:

Code in ARM Assembly: Registers explained. An analogous series looking at AArch64 from the Apple point of view rather than Windows.
Writing ARM64 Code for Apple Platforms: The Apple ABI specification for AArch64.

¹ The Windows debugger isn’t quite sure which name to use for these registers. The disassembler calls the registers xip0, xip1, and xpr, but the expression evaluator doesn’t understand those names; you have to call them @x16, @x17, and @x18. On the other hand, the expression evaluator does understand @fp and @lr and refuses to acknowledge the existence of the names @x29 and @x30. Furthermore, the expression evaluator doesn’t understand any of the w aliases.

² AArch64’s register 31 is similar to PowerPC’s register 0, which changes meaning depending on the instruction. In PowerPC assembly, it was on you to keep track of which encodings treat register 0 as a value register, and which treat it as a zero register. At least AArch64 expresses the two cases differently: If an encoding uses pseudo-register 31 to mean sp, then you really must write sp. If you write xzr, you get an error.

PowerPC on the other hand would happily let you specify r0 even if the instruction treats it as zero. Which was one of the jokes from the short-lived parody twitter account that mocked PowerPC.

mscdfr – Means Something Completely Different For r0

— PowerPC Instructions (@ppcinstructions) January 21, 2015

Pedro Justo

July 26, 2022 · Edited

Funny the article has the link to Apple's ABI but is missing the link to Windows ABI. Here it is:
https://docs.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=msvc-170

Plus the zoom-in on the Exception Handling portion of the ABI:
https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling?view=msvc-170

And the shameless plug to the Arm64EC ABI:
https://docs.microsoft.com/en-us/windows/arm/arm64ec-abi

Small note on register assignment table:
- x8 is used for returning large types by value.
- x15 is used for __chkstk.

Final note: Latest builds of Windows available through the Windows Insiders programs already have Pointer Authentication enabled. So it is quite *likely* you'll start seeing it since it will show up on every non-leaf function's prolog and epilog.

4 comments

Discussion is closed. Login to edit/delete existing comments.

Dougall J July 27, 2022 · Edited

I think AArch64 removed the Q and GE[n] flags too. The Application Program Status Register documentation states:

“The APSR exists only in AArch32 state. In AArch64 state, the Q and GE flags cannot be read or written to, and the condition flags are held in a special-purpose register called the NZCV register.”
Jonathan Harston July 27, 2022

A few years ago I helped in a project to write an ARM64 assembler. Yeah gods, what have they done to the instruction encoding? The encoding is this, except sometimes it’s this, or occasionally this. And there’s bits where orthogonality would give a chunk of “null” operations, but instead they stuff extra instructions into those encodings, meaning your assembler has to overthink and check itself to filter out an assembly statement that encodes into a stuffed instruction.
Pedro Justo July 26, 2022 · Edited

Funny the article has the link to Apple's ABI but is missing the link to Windows ABI. Here it is:
https://docs.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=msvc-170

Plus the zoom-in on the Exception Handling portion of the ABI:
https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling?view=msvc-170

And the shameless plug to the Arm64EC ABI:
https://docs.microsoft.com/en-us/windows/arm/arm64ec-abi

Small note on register assignment table:
- x8 is used for returning large types by value.
- x15 is used for __chkstk.

Final note: Latest builds of Windows available through the Windows Insiders programs already have Pointer Authentication enabled. So it is quite *likely* you'll start seeing it since it will show up on every non-leaf function's prolog and epilog.

Read more
Funny the article has the link to Apple’s ABI but is missing the link to Windows ABI. Here it is:
https://docs.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=msvc-170

Plus the zoom-in on the Exception Handling portion of the ABI:
https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling?view=msvc-170

And the shameless plug to the Arm64EC ABI:
https://docs.microsoft.com/en-us/windows/arm/arm64ec-abi

Small note on register assignment table:
– x8 is used for returning large types by value.
– x15 is used for __chkstk.

Final note: Latest builds of Windows available through the Windows Insiders programs already have Pointer Authentication enabled. So it is quite *likely* you’ll start seeing it since it will show up on every non-leaf function’s prolog and epilog.

Read less
Matt D. July 26, 2022

Nice to see a new series on an ISA!

Application Binary Interface for the Arm Architecture is also a good reference, https://github.com/ARM-software/abi-aa/
Particularly AAPCS64, Procedure Call Standard for the Arm 64-bit Architecture (AArch64), https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst

The AArch64 processor (aka arm64), part 1: Introduction

Author

4 comments

Read next

The AArch64 processor (aka arm64), part 2: Extended register operations

The AArch64 processor (aka arm64), part 3: Addressing modes

Author

4 comments

Read next

The AArch64 processor (aka arm64), part 2: Extended register operations

The AArch64 processor (aka arm64), part 3: Addressing modes

Stay informed