{"id":106958,"date":"2022-08-10T07:00:00","date_gmt":"2022-08-10T14:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=106958"},"modified":"2022-11-29T09:42:31","modified_gmt":"2022-11-29T17:42:31","slug":"20220810-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20220810-00\/?p=106958","title":{"rendered":"The AArch64 processor (aka arm64), part 12: Memory access and alignment"},"content":{"rendered":"<p>Accessing memory is done primarily through load and store instructions.<\/p>\n<pre>    ; load word or doubleword register\r\n    ldr     Rn\/zr, [...]\r\n\r\n    ; load unsigned byte\r\n    ldrb    Wn\/zr, [...]\r\n\r\n    ; load signed byte\r\n    ldrsb   Rn\/zr, [...]\r\n\r\n    ; load unsigned halfword\r\n    ldrh    Wn\/zr, [...]\r\n\r\n    ; load signed halfword\r\n    ldrsh   Rn\/zr, [...]\r\n\r\n    ; load signed word\r\n    ldrsw   Xn\/zr, [...]\r\n\r\n    ; load pair of registers\r\n    ldp     Rd1\/zr, Rd2\/zr, [...]\r\n\r\n    ; load pair of registers as signed word\r\n    ldpsw   Xd1\/zr, Xd2\/zr, [...]\r\n<\/pre>\n<p>AArch64 does not have AArch32&#8217;s <code>LDM<\/code> instruction for loading up to 13 registers at once. As a consolation present, it gives you a <code>LDP<\/code> instruction for loading two registers, either 32-bit or 64-bit, from consecutive bytes of memory. (The first register uses the lower address.) The <code>LDP<\/code> instruction is commonly used with the 64-bit registers to load spilled registers from the stack.<\/p>\n<p>There is a corresponding selection of instructions for storing to memory, but obviously the sign extension variations are not relevant.<\/p>\n<pre>    ; store word or doubleword register\r\n    str     Rn\/zr, [...]\r\n\r\n    ; store byte\r\n    strb    Wn\/zr, [...]\r\n\r\n    ; store halfword\r\n    strh    Wn\/zr, [...]\r\n\r\n    ; store pair of registers\r\n    stp     Rd1\/zr, Rd2\/zr, [...]\r\n<\/pre>\n<p>Not all addressing modes are available for all variations. This is not something you worry about when reading assembly language, but it&#8217;s something you need to keep in mind when writing it.<\/p>\n<table class=\"cp3\" style=\"border-collapse: collapse;\" border=\"1\" cellspacing=\"0\" cellpadding=\"3\">\n<tbody>\n<tr>\n<th>Size<\/th>\n<th style=\"text-align: center;\"><code>[Xn\/sp, #imm]<\/code><br \/>\n<span style=\"font-weight: normal; font-size: 80%;\">(\u2212256 \u2026 +255)<\/span><\/th>\n<th style=\"text-align: center;\"><code>[Xn\/sp, #imm]<\/code><br \/>\n<code>[Xn\/sp, #imm]!<\/code><br \/>\n<code>[Xn\/sp], #imm<\/code><\/th>\n<th><code>[pc, #imm]<\/code><br \/>\n<span style=\"font-weight: normal; font-size: 80%;\">(\u00b11MB)<\/span><\/th>\n<th><code>[Xn\/sp, Rn\/zr, extend]<\/code><\/th>\n<\/tr>\n<tr>\n<td>byte<\/td>\n<td style=\"text-align: center;\">\u2022<\/td>\n<td style=\"text-align: center;\">\u2022<\/td>\n<td>&nbsp;<\/td>\n<td style=\"text-align: center;\">\u2022<\/td>\n<\/tr>\n<tr>\n<td>halfword<\/td>\n<td style=\"text-align: center;\">\u2022<\/td>\n<td style=\"text-align: center;\">\u2022<\/td>\n<td>&nbsp;<\/td>\n<td style=\"text-align: center;\">\u2022<\/td>\n<\/tr>\n<tr>\n<td>word<\/td>\n<td style=\"text-align: center;\">\u2022<\/td>\n<td style=\"text-align: center;\">\u2022<\/td>\n<td style=\"text-align: center;\">loads only<\/td>\n<td style=\"text-align: center;\">\u2022<\/td>\n<\/tr>\n<tr>\n<td>doubleword<\/td>\n<td style=\"text-align: center;\">\u2022<\/td>\n<td style=\"text-align: center;\">\u2022<\/td>\n<td style=\"text-align: center;\">loads only<\/td>\n<td style=\"text-align: center;\">\u2022<\/td>\n<\/tr>\n<tr>\n<td>pair<\/td>\n<td style=\"text-align: center;\">\u00a0<\/td>\n<td style=\"text-align: center;\">\u2022<\/td>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The reach of the second column is is (0 \u2026 4095) \u00d7 <var>size<\/var>, except that the reach of the the register pairs is (\u221264 \u2026 63) \u00d7 <var>size<\/var>.<\/p>\n<p>All operand sizes support register indirect with offset. Only word and doubleword support <var>pc<\/var>-relative (and even those are supported only for loads). And register pairs support only register indirect with offset.<\/p>\n<p>There are some ambiguous encodings, because a constant offset in the range 0 \u2026 255 that is a multiple of the operand size can be encoded either as a 9-bit signed byte offset, or as a 12-bit unsigned element offset. By default, assemblers will use the 12-bit unsigned element offset, but you can force the 9-bit signed byte offset by changing the opcode from <code>LDxxx<\/code> and <code>STxxx<\/code> to <code>LDUxxx<\/code> and <code>STUxxx<\/code>. The <code>U<\/code> stands for <i>unscaled<\/i>.<\/p>\n<p>Windows enables automatic unaligned access fixups. Simple unaligned memory accesses are fixed up automatically by the processor, but you lose atomicity: It is possible for an unaligned memory access to read a torn value. Any such tearing is at the byte level.<\/p>\n<table style=\"border-collapse: collapse;\" border=\"0\" cellspacing=\"0\" cellpadding=\"3\">\n<tbody>\n<tr>\n<td>Original value<\/td>\n<td style=\"border: solid 1px gray;\"><code>12<\/code><\/td>\n<td style=\"border: solid 1px gray;\"><code>34<\/code><\/td>\n<td style=\"border: solid 1px gray;\"><code>56<\/code><\/td>\n<td style=\"border: solid 1px gray;\"><code>78<\/code><\/td>\n<td>aligned<\/td>\n<\/tr>\n<tr style=\"height: 1ex;\">\n<td colspan=\"4\">\u00a0<\/td>\n<\/tr>\n<tr>\n<td>Processor 1 reads<\/td>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px gray; background-color: #ddd;\">\u00a0<\/td>\n<td style=\"border: solid 1px gray; background-color: #ddd;\">\u00a0<\/td>\n<td>&nbsp;<\/td>\n<td>misaligned<\/td>\n<\/tr>\n<tr style=\"height: 1ex;\">\n<td colspan=\"4\">\u00a0<\/td>\n<\/tr>\n<tr>\n<td>Processor 2 writes<\/td>\n<td style=\"border: solid 1px gray;\"><code>AB<\/code><\/td>\n<td style=\"border: solid 1px gray;\"><code>CD<\/code><\/td>\n<td style=\"border: solid 1px gray;\"><code>EF<\/code><\/td>\n<td style=\"border: solid 1px gray;\"><code>01<\/code><\/td>\n<td>aligned<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The misaligned halfword read from processor 1 could produce <code>34|56<\/code>, <code>34|EF<\/code>, <code>CD|56<\/code>, or <code>CD|EF<\/code>. But it won&#8217;t produce <code>3D|EF<\/code>.<\/p>\n<p>You can still take alignment faults if the misaligned memory access is fancy, such as a locked load, store exclusive, or a load with a memory barrier. We&#8217;ll learn about these special memory accesses next time.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The load and store part of the load\/store architecture.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[2],"class_list":["post-106958","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-history"],"acf":[],"blog_post_summary":"<p>The load and store part of the load\/store architecture.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/106958","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=106958"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/106958\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=106958"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=106958"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=106958"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}