{"id":106898,"date":"2022-07-26T07:00:00","date_gmt":"2022-07-26T14:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=106898"},"modified":"2022-07-26T07:28:21","modified_gmt":"2022-07-26T14:28:21","slug":"20220726-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20220726-00\/?p=106898","title":{"rendered":"The AArch64 processor (aka arm64), part 1: Introduction"},"content":{"rendered":"<p>The 64-bit version of the ARM architecture is formally known as AArch64. It is the 64-bit version of classic 32-bit ARM, which has been retroactively renamed AArch32.<\/p>\n<p>Even though the architecture formally goes by the name AArch64, many people (including Windows) call it arm64. Even more confusing, the instruction set is called A64. (The 32-bit ARM instruction sets have also been retroactively renamed: Classic ARM is now called A32, and Thumb-2 is now called T32.)<\/p>\n<p>AArch64 differs from AArch32 so much that I&#8217;m going to cover it fresh rather than treating it as an extension of AArch32. That said, I will nevertheless call out notable points of difference from AArch32.<\/p>\n<p><b>No more Thumb mode<\/b><\/p>\n<p>AArch64 is an extension of the classic ARM instruction set, not an extension of Thumb-2. So we&#8217;re back to fixed-size 32-bit instructions (aligned on 4-byte boundaries). No more gymnastics with low registers and high registers, or using non-intuitive instructions to avoid a 32-bit encoding, or remembering to set the bottom bit on code addresses to avoid accidentally switching into classic mode.<\/p>\n<p>A note for those familiar with the classic ARM instruction set: One thing that did not get carried forward was arbitrary predication. The answers to this StackOverflow question <a href=\"https:\/\/stackoverflow.com\/questions\/22168992\/why-are-conditionally-executed-instructions-not-present-in-later-arm-instruction\"> dig into the reasons why predication was removed<\/a>. Short version: Predication is rarely used, it consumes a lot of opcode space, it doesn&#8217;t interact well with out-of-order execution, and branch prediction is almost as good.<\/p>\n<p><b>Data sizes<\/b><\/p>\n<p>The architectural terms for data sizes are the same as AArch32.<\/p>\n<table class=\"cp3\" style=\"border-collapse: collapse;\" border=\"1\" cellspacing=\"0\" cellpadding=\"3\">\n<tbody>\n<tr>\n<th>Term<\/th>\n<th>Size<\/th>\n<\/tr>\n<tr>\n<td>byte<\/td>\n<td>\u20078 bits<\/td>\n<\/tr>\n<tr>\n<td>halfword<\/td>\n<td>16 bits<\/td>\n<\/tr>\n<tr>\n<td>word<\/td>\n<td>32 bits<\/td>\n<\/tr>\n<tr>\n<td>doubleword<\/td>\n<td>64 bits<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The processor supports both big-endian and little-endian operation. Windows uses it exclusively in little-endian mode. AArch64 lost the Aarch32 <code>SETEND<\/code> instruction for switching endianness from user mode. Not that Windows supported it anyway.<\/p>\n<p><b>Registers<\/b><\/p>\n<p>Everything has doubled. The general-purpose registers are now 64 bits wide instead of 32. And the number of such registers has doubled from 16 to <span style=\"text-decoration: line-through;\">32<\/span> okay just 31. The encoding that would correspond to register 31 has been reused for other purposes. So not quite doubled.<\/p>\n<table class=\"cp3\" style=\"border-collapse: collapse;\" border=\"1\" cellspacing=\"0\" cellpadding=\"3\">\n<tbody>\n<tr>\n<th>Register<\/th>\n<th>Preserved?<\/th>\n<th>Notes<\/th>\n<\/tr>\n<tr>\n<td><var>x0<\/var><\/td>\n<td>No<\/td>\n<td>Parameter 1, return value<\/td>\n<\/tr>\n<tr>\n<td><var>x1<\/var><\/td>\n<td>No<\/td>\n<td>Parameter 2<\/td>\n<\/tr>\n<tr>\n<td><var>x2<\/var><\/td>\n<td>No<\/td>\n<td>Parameter 3<\/td>\n<\/tr>\n<tr>\n<td><var>x3<\/var><\/td>\n<td>No<\/td>\n<td>Parameter 4<\/td>\n<\/tr>\n<tr>\n<td><var>x4<\/var><\/td>\n<td>No<\/td>\n<td>Parameter 5<\/td>\n<\/tr>\n<tr>\n<td><var>x5<\/var><\/td>\n<td>No<\/td>\n<td>Parameter 6<\/td>\n<\/tr>\n<tr>\n<td><var>x6<\/var><\/td>\n<td>No<\/td>\n<td>Parameter 7<\/td>\n<\/tr>\n<tr>\n<td><var>x7<\/var><\/td>\n<td>No<\/td>\n<td>Parameter 8<\/td>\n<\/tr>\n<tr>\n<td><var>x8<\/var><\/td>\n<td>No<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td><var>x9<\/var><\/td>\n<td>No<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td><var>x10<\/var><\/td>\n<td>No<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td><var>x11<\/var><\/td>\n<td>No<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td><var>x12<\/var><\/td>\n<td>No<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td><var>x13<\/var><\/td>\n<td>No<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td><var>x14<\/var><\/td>\n<td>No<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td><var>x15<\/var><\/td>\n<td>No<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td><var>x16<\/var> (<var>xip0<\/var>)<\/td>\n<td>Volatile<\/td>\n<td>Intra-procedure call scratch register<\/td>\n<\/tr>\n<tr>\n<td><var>x17<\/var> (<var>xip1<\/var>)<\/td>\n<td>Volatile<\/td>\n<td>Intra-procedure call scratch register<\/td>\n<\/tr>\n<tr>\n<td><var>x18<\/var> (<var>xpr<\/var>)<\/td>\n<td>read-only<\/td>\n<td>TEB<\/td>\n<\/tr>\n<tr>\n<td><var>x19<\/var><\/td>\n<td>Yes<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td><var>x20<\/var><\/td>\n<td>Yes<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td><var>x21<\/var><\/td>\n<td>Yes<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td><var>x22<\/var><\/td>\n<td>Yes<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td><var>x23<\/var><\/td>\n<td>Yes<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td><var>x24<\/var><\/td>\n<td>Yes<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td><var>x25<\/var><\/td>\n<td>Yes<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td><var>x26<\/var><\/td>\n<td>Yes<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td><var>x27<\/var><\/td>\n<td>Yes<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td><var>x28<\/var><\/td>\n<td>Yes<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td><var>x29<\/var> (<var>fp<\/var>)<\/td>\n<td>Yes<\/td>\n<td>frame pointer<\/td>\n<\/tr>\n<tr>\n<td><var>x30<\/var> (<var>lr<\/var>)<\/td>\n<td>No<\/td>\n<td>link register<\/td>\n<\/tr>\n<tr>\n<td colspan=\"3\">register &#8220;31&#8221; usually represents <var>sp<\/var> or <var>zr<\/var>, depending on instruction<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The link register is architectural; the rest are convention.<\/p>\n<p>You can refer to the least significant 32 bits of each 64-bit register by changing the leading <var>x<\/var> to a <var>w<\/var>, so we have <var>w0<\/var> through <var>w30<\/var>. If an instruction targets a <var>w<\/var> register, the result is zero-extended to fill the <var>x<\/var> register.\u00b9<\/p>\n<p>Particularly notable is that the stack pointer <var>sp<\/var> and program counter <var>pc<\/var> are no longer general-purpose registers, like they were in AArch32. The registers still exist, but they are treated as special registers rather than being encoded in the same way as the other general-purpose registers.<\/p>\n<p>In AArch64, the <var>pc<\/var> special register reads as the address of the instruction being executed, rather than being four bytes ahead, as it was in AArch32. The extra +4 in AArch32 was an artifact of the internal pipelining of the original ARM and became a backward compatibility constraint even as the pipeline depth changed.<\/p>\n<p>Windows requires that the stack remain 16-byte aligned, and it enables hardware enforcement of this requirement. The 32-bit subregister of <var>sp<\/var> is called <var>wsp<\/var>, although it is of no practical use. (The 64-bit register is still called <var>sp<\/var>, not <var>xsp<\/var>. Go figure.)<\/p>\n<p>There is a 16-byte red zone below the stack pointer, but it&#8217;s reserved for code analysis. Intrusive profilers inject assembly language fragments into compiled code to update profiling information, and they need some space to store two registers so they can free up some registers to do their profiling work.<\/p>\n<p>The <var>xip0<\/var> and <var>xip1<\/var> registers are volatile because they are used to assist with branch instructions that try to branch to an address that is out of range. We&#8217;ll see later that these registers are also used by function prologues and epilogues.<\/p>\n<p>There is a new <var>xzr<\/var> pseudo-register (and its 32-bit alias <var>wzr<\/var>) which reads as zero, and writes are ignored. As I noted in the above table, if an instruction encodes a register number of 31, then a special behavior kicks in, typically by treating mythical register 31 as an alias for <var>sp<\/var> or <var>zr<\/var>. Generally speaking, when being used as a base address register, imaginary register 31 represents <var>sp<\/var>, but when used for arithmetic or as a destination register, it represents <var>zr<\/var>.\u00b2<\/p>\n<p>In instruction descriptions, I will use these shorthands:<\/p>\n<table class=\"cp3\" style=\"border-collapse: collapse;\" border=\"1\" cellspacing=\"0\" cellpadding=\"3\">\n<tbody>\n<tr>\n<th>Shorthand<\/th>\n<th>Meaning<\/th>\n<\/tr>\n<tr>\n<td><code>Xn<\/code><\/td>\n<td>Any <var>x#<\/var> register<\/td>\n<\/tr>\n<tr>\n<td><code>Xn\/zr<\/code><\/td>\n<td>Any <var>x#<\/var> register or <var>xzr<\/var><\/td>\n<\/tr>\n<tr>\n<td><code>Xn\/sp<\/code><\/td>\n<td>Any <var>x#<\/var> register or <var>sp<\/var><\/td>\n<\/tr>\n<tr>\n<td><code>Wn<\/code><\/td>\n<td>Any <var>w#<\/var> register<\/td>\n<\/tr>\n<tr>\n<td><code>Wn\/zr<\/code><\/td>\n<td>Any <var>w#<\/var> register or <var>wzr<\/var><\/td>\n<\/tr>\n<tr>\n<td><code>Wn\/sp<\/code><\/td>\n<td>Any <var>w#<\/var> register or <var>wsp<\/var><\/td>\n<\/tr>\n<tr>\n<td><code>Rn<\/code><\/td>\n<td>Any <var>x#<\/var> or <var>w#<\/var> register<\/td>\n<\/tr>\n<tr>\n<td><code>Rn\/zr<\/code><\/td>\n<td>Any <var>x#<\/var> register, <var>w#<\/var> register, <var>xzr<\/var> or <var>wzr<\/var><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The floating point registers have been reorganized. They have doubled in size (to 128 bits) as well as in number, and the single-precision registers are no longer paired up.<\/p>\n<table class=\"cp3\" style=\"border-collapse: collapse;\" border=\"1\" cellspacing=\"0\" cellpadding=\"3\">\n<tbody>\n<tr>\n<th>Register<\/th>\n<th>Preserved?<\/th>\n<th>Notes<\/th>\n<\/tr>\n<tr>\n<td><var>v0<\/var><\/td>\n<td>No<\/td>\n<td>Parameter 1, return value<\/td>\n<\/tr>\n<tr>\n<td><var>v1<\/var><\/td>\n<td>No<\/td>\n<td>Parameter 2<\/td>\n<\/tr>\n<tr>\n<td><var>v2<\/var><\/td>\n<td>No<\/td>\n<td>Parameter 3<\/td>\n<\/tr>\n<tr>\n<td><var>v3<\/var><\/td>\n<td>No<\/td>\n<td>Parameter 4<\/td>\n<\/tr>\n<tr>\n<td><var>v4<\/var><\/td>\n<td>No<\/td>\n<td>Parameter 5<\/td>\n<\/tr>\n<tr>\n<td><var>v5<\/var><\/td>\n<td>No<\/td>\n<td>Parameter 6<\/td>\n<\/tr>\n<tr>\n<td><var>v6<\/var><\/td>\n<td>No<\/td>\n<td>Parameter 7<\/td>\n<\/tr>\n<tr>\n<td><var>v7<\/var><\/td>\n<td>No<\/td>\n<td>Parameter 8<\/td>\n<\/tr>\n<tr>\n<td><var>v8<\/var> through <var>v15<\/var><\/td>\n<td>Low 64 bits only<\/td>\n<td>Upper 64 bits are not preserved<\/td>\n<\/tr>\n<tr>\n<td><var>v16<\/var> through <var>v31<\/var><\/td>\n<td>No<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Each floating point register can be viewed in multiple ways. The partial registers are stored in the least significant bits of the full register.<\/p>\n<table class=\"cp3\" style=\"border-collapse: collapse;\" border=\"1\" cellspacing=\"0\" cellpadding=\"3\">\n<tbody>\n<tr>\n<th>Name<\/th>\n<th>Meaning<\/th>\n<th>Notes<\/th>\n<\/tr>\n<tr>\n<td><var>v#<\/var><\/td>\n<td>SIMD vector<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td><var>q#<\/var><\/td>\n<td>128-bit value<\/td>\n<td>quad precision<\/td>\n<\/tr>\n<tr>\n<td><var>d#<\/var><\/td>\n<td>64-bit value<\/td>\n<td>double precision<\/td>\n<\/tr>\n<tr>\n<td><var>s#<\/var><\/td>\n<td>32-bit value<\/td>\n<td>single precision<\/td>\n<\/tr>\n<tr>\n<td><var>h#<\/var><\/td>\n<td>16-bit value<\/td>\n<td>half precision<\/td>\n<\/tr>\n<tr>\n<td><var>b#<\/var><\/td>\n<td>8-bit value<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The flags register is formally known as the Application Program Status Register (APSR). The flags available to user mode are the same as in AArch32:<\/p>\n<table class=\"cp3\" style=\"border-collapse: collapse;\" border=\"1\" cellspacing=\"0\" cellpadding=\"3\">\n<tbody>\n<tr>\n<th>Mnemonic<\/th>\n<th>Meaning<\/th>\n<th>Notes<\/th>\n<\/tr>\n<tr>\n<td>N<\/td>\n<td>Negative<\/td>\n<td>Set if the result is negative<\/td>\n<\/tr>\n<tr>\n<td>Z<\/td>\n<td>Zero<\/td>\n<td>Set if the result is zero<\/td>\n<\/tr>\n<tr>\n<td>C<\/td>\n<td>Carry<\/td>\n<td>Multiple purposes<\/td>\n<\/tr>\n<tr>\n<td>V<\/td>\n<td>Overflow<\/td>\n<td>Signed overflow<\/td>\n<\/tr>\n<tr>\n<td>Q<\/td>\n<td>Saturation<\/td>\n<td>Accumulated overflow<\/td>\n<\/tr>\n<tr>\n<td>GE[n]<\/td>\n<td>Greater than or equal to<\/td>\n<td>4 flags (SIMD)<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The overflow flag records whether the most recent operation resulted in signed overflow. The saturation flag is used by multimedia instructions to accumulate whether any overflow occurred since it was last cleared. The GE flags record the result of SIMD operations. By convention, flags are not preserved across calls.<\/p>\n<p>There are a number of AArch64 features that you are extremely unlikely to see in Windows code, such as tagged pointers, tagged memory, and pointer authentication, so I won&#8217;t cover them here. I also won&#8217;t cover floating point instructions or SIMD instructions.<\/p>\n<p>Next time, we&#8217;ll look at some of the weird transformations that can be performed inside an instruction.<\/p>\n<p><b>Additional references<\/b>:<\/p>\n<ul>\n<li><a href=\"https:\/\/eclecticlight.co\/2021\/06\/16\/code-in-arm-assembly-registers-explained\/\"> Code in ARM Assembly: Registers explained<\/a>. An analogous series looking at AArch64 from the Apple point of view rather than Windows.<\/li>\n<li><a href=\"https:\/\/developer.apple.com\/documentation\/xcode\/writing-arm64-code-for-apple-platforms\"> Writing ARM64 Code for Apple Platforms<\/a>: The Apple ABI specification for AArch64.<\/li>\n<\/ul>\n<p>\u00b9 The Windows debugger isn&#8217;t quite sure which name to use for these registers. The disassembler calls the registers <var>xip0<\/var>, <var>xip1<\/var>, and <var>xpr<\/var>, but the expression evaluator doesn&#8217;t understand those names; you have to call them <code>@x16<\/code>, <code>@x17<\/code>, and <code>@x18<\/code>. On the other hand, the expression evaluator does understand <code>@fp<\/code> and <code>@lr<\/code> and refuses to acknowledge the existence of the names <code>@x29<\/code> and <code>@x30<\/code>. Furthermore, the expression evaluator doesn&#8217;t understand any of the <var>w<\/var> aliases.<\/p>\n<p>\u00b2 AArch64&#8217;s register 31 is similar to PowerPC&#8217;s register 0, which <a href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20180808-00\/?p=99445\"> changes meaning depending on the instruction<\/a>. In PowerPC assembly, it was on you to keep track of which encodings treat register 0 as a value register, and which treat it as a zero register. At least AArch64 expresses the two cases differently: If an encoding uses pseudo-register 31 to mean <var>sp<\/var>, then you really must write <var>sp<\/var>. If you write <var>xzr<\/var>, you get an error.<\/p>\n<p>PowerPC on the other hand would happily let you specify <var>r0<\/var> even if the instruction treats it as zero. Which was one of the jokes from the <a href=\"https:\/\/twitter.com\/ppcinstructions\"> short-lived parody twitter account<\/a> that mocked PowerPC.<\/p>\n<blockquote class=\"twitter-tweet\" data-lang=\"en\">\n<p dir=\"ltr\" lang=\"en\">mscdfr &#8211; Means Something Completely Different For r0<\/p>\n<p>\u2014 PowerPC Instructions (@ppcinstructions) <a href=\"https:\/\/twitter.com\/ppcinstructions\/status\/557938532401295360?ref_src=twsrc%5Etfw\">January 21, 2015<\/a><\/p><\/blockquote>\n<p><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Make it a double.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[2],"class_list":["post-106898","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-history"],"acf":[],"blog_post_summary":"<p>Make it a double.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/106898","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=106898"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/106898\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=106898"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=106898"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=106898"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}