{"id":106953,"date":"2022-08-08T07:00:00","date_gmt":"2022-08-08T14:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=106953"},"modified":"2022-08-08T06:50:38","modified_gmt":"2022-08-08T13:50:38","slug":"20220808-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20220808-00\/?p=106953","title":{"rendered":"The AArch64 processor (aka arm64), part 10: Loading constants"},"content":{"rendered":"<p>Since AArch64 uses fixed-size 32-bit instructions, you have to exercise some creativity to load a 64-bit constant.<\/p>\n<pre>    ; move wide with zero\r\n    ; Rd = imm16 &lt;&lt; n\r\n    ; n can be 0, 16, 32, or 48\r\n    movz    Rd, #imm16, LSL #n\r\n\r\n    ; move wide with not\r\n    ; Rd = ~(imm16 &lt;&lt; n)\r\n    ; n can be 0, 16, 32, or 48\r\n    movn    Rd, #imm16, LSL #n\r\n\r\n    ; move wide with keep\r\n    ; Rd[n+15:n] = imm16\r\n    movk    Rd, #imm16, LSL #n\r\n<\/pre>\n<p>The <code>MOVZ<\/code> instruction loads a 16-bit unsigned value into one of the four lanes of a 64-bit destination, or one of the two lanes of a 32-bit destination. All the remaining lanes are set to zero.<\/p>\n<p>The <code>MOVN<\/code> instruction does the same thing as <code>MOVZ<\/code>, except the whole thing is bitwise negated. (Be careful not to confuse <code>MOVN<\/code> with <code>MVN<\/code>.)<\/p>\n<p>The <code>MOVK<\/code> instruction does the same thing as <code>MOVZ<\/code>, except that instead of setting the other lanes to zero, the other lanes are left unchanged.<\/p>\n<p>Loading a 32-bit value can be done in two instructions by using <code>MOVZ<\/code> to load 16 bits into half of the register, than the <code>MOVK<\/code> into the other half.<\/p>\n<pre>    movz    r0, #0x1234             ; r0 = 0x00001234\r\n    movk    r0, #0xABCD, LSL #16    ; r0 = 0xABCD1234\r\n<\/pre>\n<p>This technique can be extended to load a 64-bit value in four steps, but that&#8217;s getting quite unwieldy. The compiler is more likely to store the value in the code segment and use a <var>pc<\/var>-relative addressing mode to load it.<\/p>\n<pre>    ; special syntax for pc-relative loads\r\n    ldr     x0, =0x123456789ABCDEF0 ; load 64-bit value\r\n    ldr     w0, =0x12345678         ; load 32-bit value\r\n<\/pre>\n<p>As I noted in the discussion of addressing modes, the assembler and disassembler use this special equals-sign notation to represent a <var>pc<\/var>-relative load. It means that the value is stored in a <var>literal pool<\/var> in the code segment, and a <var>pc<\/var>-relative load is being used to fetch it. The assembler batches up all of these literals and emits them between functions. The <var>pc<\/var>-relative load has a reach of \u00b11MB, so you are unlikely to run into the problem that you had on AArch32, where the reach was only \u00b14KB, and you had to find a safe place to dump the literals in the middle of the function.<\/p>\n<p>There are quite a number of instructions that generate constants, and if you use the <code>MOV<\/code> pseudo-instruction, the assembler will try to find one that works.<\/p>\n<pre>    ; load up a constant somehow\r\n    mov     Rd, #imm\r\n<\/pre>\n<table class=\"cp3\" style=\"border-collapse: collapse;\" border=\"1\" cellspacing=\"0\" cellpadding=\"3\">\n<tbody>\n<tr>\n<th>Instruction<\/th>\n<th>Used for<\/th>\n<\/tr>\n<tr>\n<td><code>add Rd, zr, #imm12<\/code><\/td>\n<td><code>0x00000000`00000XXX<\/code><\/td>\n<\/tr>\n<tr>\n<td><code>add Rd, zr, #imm12, LSL #12<\/code><\/td>\n<td><code>0x00000000`00XXX000<\/code><\/td>\n<\/tr>\n<tr>\n<td><code>sub Wd, wzr, #imm12<\/code><\/td>\n<td><code>0x00000000`FFFFFXXX<\/code><\/td>\n<\/tr>\n<tr>\n<td><code>sub Wd, wzr, #imm12, LSL #12<\/code><\/td>\n<td><code>0x00000000`FFXXXFFF<\/code><\/td>\n<\/tr>\n<tr>\n<td><code>sub Xd, xzr, #imm12<\/code><\/td>\n<td><code>0xFFFFFFFF`FFFFFXXX<\/code><\/td>\n<\/tr>\n<tr>\n<td><code>sub Xd, xzr, #imm12, LSL #12<\/code><\/td>\n<td><code>0xFFFFFFFF`FFXXXFFF<\/code><\/td>\n<\/tr>\n<tr>\n<td><code>movz Rd, #imm16<\/code><\/td>\n<td><code>0x00000000`0000XXXX<\/code><\/td>\n<\/tr>\n<tr>\n<td><code>movz Rd, #imm16, LSL #16<\/code><\/td>\n<td><code>0x00000000`XXXX0000<\/code><\/td>\n<\/tr>\n<tr>\n<td><code>movz Rd, #imm16, LSL #32<\/code><\/td>\n<td><code>0x0000XXXX`00000000<\/code><\/td>\n<\/tr>\n<tr>\n<td><code>movz Rd, #imm16, LSL #48<\/code><\/td>\n<td><code>0xXXXX0000`00000000<\/code><\/td>\n<\/tr>\n<tr>\n<td><code>movn Wd, #imm16<\/code><\/td>\n<td><code>0x00000000`FFFFXXXX<\/code><\/td>\n<\/tr>\n<tr>\n<td><code>movn Wd, #imm16, LSL #16<\/code><\/td>\n<td><code>0x00000000`XXXXFFFF<\/code><\/td>\n<\/tr>\n<tr>\n<td><code>movn Xd, #imm16<\/code><\/td>\n<td><code>0xFFFFFFFF`FFFFXXXX<\/code><\/td>\n<\/tr>\n<tr>\n<td><code>movn Xd, #imm16, LSL #16<\/code><\/td>\n<td><code>0xFFFFFFFF`XXXXFFFF<\/code><\/td>\n<\/tr>\n<tr>\n<td><code>movn Xd, #imm16, LSL #32<\/code><\/td>\n<td><code>0xFFFFXXXX`FFFFFFFF<\/code><\/td>\n<\/tr>\n<tr>\n<td><code>movn Xd, #imm16, LSL #48<\/code><\/td>\n<td><code>0xXXXXFFFF`FFFFFFFF<\/code><\/td>\n<\/tr>\n<tr>\n<td><code>orr Xd, xzr, #imm<\/code><\/td>\n<td>Value can be expressed as a <!-- backref: The AArch64 processor (aka arm64), part 6: Bitwise operations -->Bitwise operation constant<\/td>\n<\/tr>\n<tr>\n<td><code>orr Wd, wzr, #imm<\/code><\/td>\n<td>Value can be expressed as lower 32 bits of a <!-- backref: The AArch64 processor (aka arm64), part 6: Bitwise operations -->Bitwise operation constant<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>A common type of sort-of constant is the address of a global variable. It&#8217;s a constant whose value isn&#8217;t discovered until runtime. We&#8217;ll look at those next time.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Getting them into a register.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[2],"class_list":["post-106953","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-history"],"acf":[],"blog_post_summary":"<p>Getting them into a register.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/106953","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=106953"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/106953\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=106953"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=106953"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=106953"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}