{"id":106945,"date":"2022-08-04T07:00:00","date_gmt":"2022-08-04T14:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=106945"},"modified":"2022-11-29T10:15:42","modified_gmt":"2022-11-29T18:15:42","slug":"20220804-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20220804-00\/?p=106945","title":{"rendered":"The AArch64 processor (aka arm64), part 8: Bit shifting and rotation"},"content":{"rendered":"<p>Bit shifting and rotation instructions on AArch64 fall into two general categories: Hard-coded shift amounts and variable shifts.<\/p>\n<p>The hard-coded shifts are done by repurposing the versatile <a title=\"The AArch64 processor (aka arm64), part 7: Bitfield manipulation\" href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20220803-00\/?p=106941\"> bitfield manipulation instructions<\/a>.<\/p>\n<pre>    ; logical shift left by fixed amount\r\n    ; ubfiz Rd, Rn, #(size-shift), #shift\r\n    lsl     Rd\/zr, Rn\/zr, #shift\r\n\r\n    ; logical shift right by fixed amount\r\n    ; ubfx  Rd, Rn, #(size-shift), #shift\r\n    lsr     Rd\/zr, Rn\/zr, #shift\r\n\r\n    ; arithmetic shift right by fixed amount\r\n    ; sbfx  Rd, Rn, #(size-shift), #shift\r\n    asr     Rd\/zr, Rn\/zr, #shift\r\n<\/pre>\n<p>Left shifting is done by doing a bit insertion of the surviving bits into the upper bits of the destination. It&#8217;s the special case where the number of bits is exactly equal to the register size minus the shift amount.<\/p>\n<table style=\"border-collapse: collapse; text-align: center;\" border=\"0\" cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td>shift<\/td>\n<td colspan=\"2\">size\u2212shift<\/td>\n<\/tr>\n<tr>\n<td style=\"border: solid 1px gray; border-right: none; text-align: right; width: 4em;\">\u00a0<\/td>\n<td style=\"border: solid 1px gray; border-right: none; width: 3em; background-color: #ddd;\">\u00a0<\/td>\n<td style=\"border: solid 1px gray; border-left: none; width: 4em; background-color: #ddd;\">\u00a0<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<td colspan=\"1\">\n<table style=\"width: 100%;\" border=\"0\" cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td align=\"left\">\u21d9<\/td>\n<td align=\"center\">\u21d9<\/td>\n<td align=\"center\">\u21d9<\/td>\n<td align=\"right\">\u21d9<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td style=\"border: solid 1px gray; border-right: none; width: 4em; background-color: #ddd;\">\u00a0<\/td>\n<td style=\"border: solid 1px gray; border-left: none; width: 3em; background-color: #ddd;\">\u00a0<\/td>\n<td style=\"border: solid 1px gray; width: 4em;\">zero-fill<\/td>\n<\/tr>\n<tr>\n<td colspan=\"2\">size\u2212shift<\/td>\n<td>shift<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Right shifting is the same thing, but using the unsigned bitfield extract instruction to go in the opposite direction:<\/p>\n<table style=\"border-collapse: collapse; text-align: center;\" border=\"0\" cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td colspan=\"2\">size\u2212shift<\/td>\n<td>shift<\/td>\n<\/tr>\n<tr>\n<td style=\"border: solid 1px gray; border-right: none; width: 4em; background-color: #ddd;\">\u00a0<\/td>\n<td style=\"border: solid 1px gray; border-left: none; width: 3em; background-color: #ddd;\">\u00a0<\/td>\n<td style=\"border: solid 1px gray; width: 4em;\">\u00a0<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<td colspan=\"1\">\n<table style=\"width: 100%;\" border=\"0\" cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td align=\"left\">\u21d8<\/td>\n<td align=\"center\">\u21d8<\/td>\n<td align=\"center\">\u21d8<\/td>\n<td align=\"right\">\u21d8<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td style=\"border: solid 1px gray; width: 4em;\">zero-fill<\/td>\n<td style=\"border: solid 1px gray; border-right: none; width: 3em; background-color: #ddd;\">\u00a0<\/td>\n<td style=\"border: solid 1px gray; border-left: none; width: 4em; background-color: #ddd;\">\u00a0<\/td>\n<\/tr>\n<tr>\n<td>shift<\/td>\n<td colspan=\"2\">size\u2212shift<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>And arithmetic right shifting uses the signed bitfield extract in order to get sign-extension behavior.<\/p>\n<table style=\"border-collapse: collapse; text-align: center;\" border=\"0\" cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td colspan=\"2\">size\u2212shift<\/td>\n<td>shift<\/td>\n<\/tr>\n<tr>\n<td style=\"border: solid 1px gray; border-right: none; width: 4em; text-align: left; background-color: #ddd; color: black;\">S<\/td>\n<td style=\"border: solid 1px gray; border-left: none; width: 3em; background-color: #ddd;\">\u00a0<\/td>\n<td style=\"border: solid 1px gray; width: 4em;\">\u00a0<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: left;\">\u21d3<\/td>\n<td colspan=\"1\">\n<table style=\"width: 100%;\" border=\"0\" cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td align=\"left\">\u21d8<\/td>\n<td align=\"center\">\u21d8<\/td>\n<td align=\"center\">\u21d8<\/td>\n<td align=\"right\">\u21d8<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td style=\"border: solid 1px gray; width: 4em;\">sign-fill<\/td>\n<td style=\"border: solid 1px gray; border-right: none; width: 3em; text-align: left; background-color: #ddd; color: black;\">S<\/td>\n<td style=\"border: solid 1px gray; border-left: none; width: 4em; background-color: #ddd;\">\u00a0<\/td>\n<\/tr>\n<tr>\n<td>shift<\/td>\n<td colspan=\"2\">size\u2212shift<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Rotation can be synthesized from double-register extraction by using the rotation source as both of the source registers for extraction.<\/p>\n<pre>    ; rotate right by fixed amount\r\n    ; extr  Rd, Rs, Rs, #shift\r\n    ror     Rd\/zr, Rs\/zr, #shift\r\n<\/pre>\n<table style=\"border-collapse: collapse; text-align: center;\" border=\"0\" cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td>&nbsp;<\/td>\n<td style=\"border: 1px gray; border-style: solid solid none solid;\" colspan=\"2\">size<\/td>\n<td style=\"border: 1px gray; border-style: solid solid none solid;\">shift<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td style=\"border: solid 1px gray; border-right-style: dashed; width: 4em; position: relative;\">\n<div style=\"position: absolute; width: 7em; top: 0; left: 0; text-align: center;\">Rs<\/div>\n<\/td>\n<td style=\"border: solid 1px gray; border-left-style: dashed; width: 3em; background-color: #ddd;\">\u00a0<\/td>\n<td style=\"border: solid 1px gray; border-right-style: dashed; width: 4em; position: relative; background-color: #ddd; color: black;\">\n<div style=\"position: absolute; width: 7em; top: 0; left: 0; text-align: center;\">Rs<\/div>\n<\/td>\n<td style=\"border: solid 1px gray; border-left-style: dashed; width: 3em;\">\u00a0<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<td colspan=\"2\">\n<table style=\"width: 100%;\" border=\"0\" cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td align=\"center\">\u21d3<\/td>\n<td align=\"center\">\u21d3<\/td>\n<td align=\"center\">\u21d3<\/td>\n<td align=\"center\">\u21d3<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px gray; text-align: center; background-color: #ddd; color: black;\" colspan=\"2\">Rd<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Note that there is no &#8220;rotate with carry&#8221; instruction. The AArch32 <code>rrx<\/code> instruction does not exist in AArch64.\u00b9 It would have been handy for <a href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20220207-00\/?p=106223\"> finding the average of two unsigned integers without overflow<\/a>.<\/p>\n<p>The variable shifts have their own dedicated instructions.<\/p>\n<pre>    ; logical shift left variable\r\n    ; Wd = Wn &lt;&lt; (Wm &amp; 31)\r\n    ; Xd = Xn &lt;&lt; (Xm &amp; 63)\r\n    lslv    Rd\/zr, Rn\/zr, Rm\/zr\r\n\r\n    ; logical shift right variable\r\n    ; Wd = Wn &gt;&gt; (Wm &amp; 31), unsigned shift\r\n    ; Xd = Xn &gt;&gt; (Xm &amp; 63), unsigned shift\r\n    lsrv    Rd\/zr, Rn\/zr, Rm\/zr\r\n\r\n    ; arithmetic shift right variable\r\n    ; Wd = Wn &gt;&gt; (Wm &amp; 31), signed shift\r\n    ; Xd = Xn &gt;&gt; (Xm &amp; 63), signed shift\r\n    asrv    Rd\/zr, Rn\/zr, Rm\/zr\r\n\r\n    ; rotate right variable\r\n    ; Rd = Rn rotated right by Rm positions\r\n    rorv    Rd\/zr, Rn\/zr, Rm\/zr\r\n<\/pre>\n<p>Note that the shift amount is taken modulo the bit size of the operand. (This doesn&#8217;t really matter for <code>RORV<\/code> since rotating by the operand bit size has no effect.)<\/p>\n<p>The pseudo-instructions <code>LSL<\/code> <code>LSR<\/code>, <code>ASR<\/code>, and <code>ROR<\/code> accept a register as the second input operand and convert it to the corresponding <code>V<\/code> instruction. This means that when writing assembly, you can just write <code>LSL<\/code> and let the assembler figure out which real opcode it corresponds to.<\/p>\n<p>There are no <code>S<\/code> variants to the bit shifting instructions. They never update flags, unlike AArch32, which updated the carry with the last bit shifted out. If you want to know what bit got shifted out, you&#8217;ll have to calculate it yourself, say by shifting the same value again, but by one less position, and then inspecting the top\/bottom bit (depending on the shift direction).<\/p>\n<p>I have my guesses as to why the designers removed the flags behavior from these instructions: First, it removes a partial register update (flags), which creates a usually-unwanted dependency on the previous flags. Second, no major programming language gives you access to the bit that was shifted out, so it wasn&#8217;t used in practice anyway.<\/p>\n<p><b>Exercise<\/b>: Suppose there was no double-register extraction instruction or variable rotation instruction. Synthesize fixed and variable rotation from other instructions. (Answer below.)<\/p>\n<p><b>Bonus chatter<\/b>: In AArch32, the bottom 8 bits of the shift-count register were used. But in AArch64, only the bottom 5 (for 32-bit operands) or 6 (for 64-bit operands) bits are used.<\/p>\n<p><b>Answer to exercise<\/b>: You can synthesize a fixed rotation from a shift and a bitfield insertion.<\/p>\n<pre>    ; rotate r1 left by #imm, producing r0\r\n                                        ; r1 = ABCDEFGH\r\n    lsl     r0, r1, #imm                ; r0 = EFGH0000\r\n    bfxil   r0, r1, #(size-imm), #imm   ; r0 = EFGHABCD\r\n<\/pre>\n<p>A variable rotation can be synthesized from a pair of shifts.<\/p>\n<pre>    ; rotate r1 left by r2, producing r0\r\n    ; (destroys r2)\r\n                                        ; r1 = ABCDEFGH\r\n    lslv    r0, r1, r2                  ; r0 = EFGH0000\r\n    mvn     r2, r2                      ; r2 = leftover bits\r\n    lsrv    r2, r1, r2                  ; r2 = 0000ABCD\r\n    orr     r0, r0, r2                  ; r0 = EFGHABCD\r\n<\/pre>\n<p>\u00b9 Although it doesn&#8217;t explicitly have a &#8220;rotate left through carry&#8221; instruction, you can still do it in a single instruction:<\/p>\n<pre>    adcs    r0, r1, r1  ; r0 = r1 rotated left through carry\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Sliding around.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[2],"class_list":["post-106945","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-history"],"acf":[],"blog_post_summary":"<p>Sliding around.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/106945","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=106945"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/106945\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=106945"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=106945"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=106945"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}