{"id":106915,"date":"2022-07-29T07:00:00","date_gmt":"2022-07-29T14:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=106915"},"modified":"2022-08-01T11:54:25","modified_gmt":"2022-08-01T18:54:25","slug":"20220729-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20220729-00\/?p=106915","title":{"rendered":"The AArch64 processor (aka arm64), part 4: Addition and subtraction"},"content":{"rendered":"<p>Most of the binary operation instructions are of the form<\/p>\n<pre>    op      x, y, z         x = y op z\r\n<\/pre>\n<p>They take two source operands, combine them according to some operation, and put the result in the destination register.<\/p>\n<p>Similarly, most of the unary operation instructions look like<\/p>\n<pre>    op      x, y            ; x = op y\r\n<\/pre>\n<p>The destination is typically a numbered register or <var>sp<\/var>, and can be a 64-bit register or a 32-bit subregister. If you use a 32-bit subregister, then the result is zero-extended to a 64-bit value.<\/p>\n<p>Okay, let&#8217;s start with addition:<\/p>\n<pre>    add     Rd\/sp, Rn\/sp, #imm12\r\n    add     Rd\/sp, Rn\/sp, #imm12, LSL #12\r\n\r\n    add     Rd\/zr, Rn\/zr, Rm\/zr, LSL #n\r\n    add     Rd\/zr, Rn\/zr, Rm\/zr, LSR #n\r\n    add     Rd\/zr, Rn\/zr, Rm\/zr, ASR #n\r\n\r\n    add     Rd\/sp, Rn\/sp, Rm\/zr, UXTB #n    ; 0 \u2264 n \u2264 4\r\n    add     Rd\/sp, Rn\/sp, Rm\/zr, UXTH #n    ; 0 \u2264 n \u2264 4\r\n    add     Rd\/sp, Rn\/sp, Rm\/zr, UXTW #n    ; 0 \u2264 n \u2264 4\r\n    add     Rd\/sp, Rn\/sp, Rm\/zr, UXTX #n    ; 0 \u2264 n \u2264 4\r\n    add     Rd\/sp, Rn\/sp, Rm\/zr, SXTB #n    ; 0 \u2264 n \u2264 4\r\n    add     Rd\/sp, Rn\/sp, Rm\/zr, SXTH #n    ; 0 \u2264 n \u2264 4\r\n    add     Rd\/sp, Rn\/sp, Rm\/zr, SXTW #n    ; 0 \u2264 n \u2264 4\r\n    add     Rd\/sp, Rn\/sp, Rm\/zr, SXTX #n    ; 0 \u2264 n \u2264 4\r\n<\/pre>\n<p>To ask for flags to be set based on the result, apply an <code>S<\/code> suffix to the opcode, producing <code>ADDS<\/code>.<\/p>\n<p>Note that some of these encodings permit the operand to be <code>sp<\/code>, but others allow <code>zr<\/code>.<\/p>\n<p>The first two versions add an immediate. It is either a 12-bit unsigned immediate (0 \u2264 <var>n<\/var> \u2264 4095) or a 12-bit unsigned immediate shifted left by 12. This means that you can express constants of the form <code>0x00000XXX<\/code> and <code>0x00XXX000<\/code>. The disassembler does the <code>LSL #12<\/code> for you, so you won&#8217;t actually see the <code>#imm12, LSL #12<\/code> version disassembled as such. Instead, you&#8217;ll see the shifted constant:<\/p>\n<pre>    add     x0, x1, #0x123000   ; encoded as #0x123, LSL #12\r\n<\/pre>\n<p>The next block of variants adds a shifted register. You are allowed to shift doublewords by up to 63 positions and words up to 31 positions. You don&#8217;t need any larger shifts, because unsigned shifting by an amount greater than or equal to the operand bit size just gives you zero, so you should just have used <var>zr<\/var>. And signed shifting right by an amount greater than or equal to the operand size is the same as shifting right by one less than the operand bit size.<\/p>\n<p>The last block lets you use the extended registers. You can use all of the extended forms, and the shift amount can be up to four positions. These extended registers with shifts are convenient for calculating array offsets:<\/p>\n<pre>    ; x0 = x1 + (int32_t)x2 * 16\r\n    add     x0, x1, x2, SXTW #4\r\n<\/pre>\n<p>In this case, <var>x1<\/var> is the base of an array where each element is of size 16, and <var>x2<\/var> is a 32-bit signed array index, and we calculate the address of the element into <var>x0<\/var>.<\/p>\n<p>The ARM uses true carry. This means that for subtraction, the carry is clear when a borrow occurs, and subtract with carry subtracts an additional unit if inbound carry is clear.<\/p>\n<p>The subtraction instruction has the same available variants as the addition instructions.<\/p>\n<pre>    ; calculate x = y - z\r\n    sub     x, y, z     ; same options as add\r\n\r\n    ; calculate x = y - z, set flags\r\n    subs    x, y, z     ; same options as adds\r\n<\/pre>\n<p>Adding and subtracting with carry have only one encoding option.<\/p>\n<pre>    ; Rd = Rn + Rm + carry\r\n    adc     Rd\/zr, Rn\/zr, Rm\/zr\r\n\r\n    ; Rd = Rn + Rm + carry, set flags\r\n    adcs    Rd\/zr, Rn\/zr, Rm\/zr\r\n\r\n    ; Rd = Rn - Rm - !carry\r\n    sbc     Rd\/zr, Rn\/zr, Rm\/zr\r\n\r\n    ; Rd = Rn - Rm - !carry, set flags\r\n    sbcs    Rd\/zr, Rn\/zr, Rm\/zr\r\n<\/pre>\n<p>From the addition and subtraction instructions, we can construct these pseudo-instructions, taking advantage of literal zeros and the hard-coded zero register: Reads from the zero register produce zero, and writes to the zero register are discarded.<\/p>\n<pre>    ; move register to\/from sp\r\n    mov     sp, Rn              ; add sp, Rn, #0\r\n    mov     Rn, sp              ; add Rn, sp, #0\r\n\r\n    ; move constant to register\r\n    mov     Rn, #imm12          ; add Rn, zr, #imm12\r\n    mov     Rn, #imm12, LSL #12 ; add Rn, zr, #imm12, LSL #12\r\n<\/pre>\n<p>Adding zero gives you the ability to move between <var>sp<\/var> and the general-purpose registers. And adding an immediate to the zero register loads a constant. We&#8217;ll see later that other register-to-register moves are encoded with a different pseudo-instruction, and there are plenty of options for loading constants beyond just this one.<\/p>\n<p>The use of true carry permits the following group of pseudo-instructions for adding or subtracting negative numbers:<\/p>\n<pre>    add     a, b, #-n           ; sub  a, b, #n\r\n    adds    a, b, #-n           ; subs a, b, #n\r\n\r\n    sub     a, b, #-n           ; add  a, b, #n\r\n    subs    a, b, #-n           ; adds a, b, #n\r\n<\/pre>\n<p>The immediate operand to the <code>ADD<\/code> and <code>SUB<\/code> instruction families is treated as unsigned, but you can switch to the opposite instruction to get negative values (provided <var>n<\/var> \u2260 0). Note that this works due to ARM&#8217;s use of true carry. (If ARM had used borrow, then this conversion would set the carry bit incorrectly.)<\/p>\n<pre>    cmp     x, y                ; subs zr, x, y\r\n    cmn     x, y                ; adds zr, x, y\r\n<\/pre>\n<p>The <var>compare<\/var> and <var>compare negative<\/var> instructions are just subtraction and addition that set flags and throw away the result. Beware of <a title=\"The ARM processor (Thumb-2), part 6: The lie hiding inside the CMN instruction\" href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20210607-00\/?p=105288\"> the lie hiding inside the CMN instruction<\/a>.<\/p>\n<pre>    ; negate (possibly setting flags)\r\n    neg     x, y, shift         ; sub  x, zr, y, shift\r\n    negs    x, y, shift         ; subs x, zr, y, shift\r\n\r\n    ; negate with carry (possibly setting flags)\r\n    ngc     x, y, shift         ; sbc  x, zr, y, shift\r\n    ngcs    x, y, shift         ; sbcs x, zr, y, shift\r\n<\/pre>\n<p>Subtracting from zero gives you the ability to negate a value. Note that these pseudo-instructions are available only with shifted registers because the corresponding subtraction instructions support <var>zr<\/var> as the first input only when the second input is a shifted register. (Of course, you can shift by <var>#0<\/var> if you didn&#8217;t really want to shift the second input.)<\/p>\n<p>That turned out to be a lot to say about addition and subtraction. Next time, we&#8217;ll look at the fancier arithmetic operations: Multiplication and division.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Starting with the basic arithmetic.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[2],"class_list":["post-106915","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-history"],"acf":[],"blog_post_summary":"<p>Starting with the basic arithmetic.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/106915","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=106915"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/106915\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=106915"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=106915"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=106915"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}