{"id":106922,"date":"2022-08-01T07:00:00","date_gmt":"2022-08-01T14:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=106922"},"modified":"2022-08-01T06:59:53","modified_gmt":"2022-08-01T13:59:53","slug":"20220801-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20220801-00\/?p=106922","title":{"rendered":"The AArch64 processor (aka arm64), part 5: Multiplication and division"},"content":{"rendered":"<p>There are a lot of ways of multiplying two values. The most basic way is to multiply two registers of the same size, producing a result of the same size.<\/p>\n<pre>    ; multiply and add\r\n    ; Rd = Ra + (Rn \u00d7 Rm)\r\n    madd    Rd\/zr, Rn\/zr, Rm\/zr, Ra\/zr\r\n\r\n    ; multiply and subtract\r\n    ; Rd = Ra - (Rn \u00d7 Rm)\r\n    msub    Rd\/zr, Rn\/zr, Rm\/zr, Ra\/zr\r\n<\/pre>\n<p>The product is then added to or subtracted from a third register.<\/p>\n<p>You get some pseudo-instructions if you hard-code the third input operand to zero.<\/p>\n<pre>    ; multiply\r\n    mul     a, b, c                         ; madd a, b, c, zr\r\n\r\n    ; multiply and negate\r\n    mneg    a, b, c                         ; msub a, b, c, zr\r\n<\/pre>\n<p>The next fancier way of multiplying two registers is to multiply two 32-bit registers and get a 64-bit result.<\/p>\n<pre>    ; unsigned multiply and add long\r\n    ; Xd = Xa + (Wn \u00d7 Wm), unsigned multiply\r\n    umaddl  Xd\/zr, Wn\/zr, Wm\/zr, Xa\/zr\r\n\r\n    ; unsigned multiply and subtract long\r\n    ; Xd = Xa - (Wn \u00d7 Wm), unsigned multiply\r\n    umsubl  Xd\/zr, Wn\/zr, Wm\/zr, Xa\/zr\r\n\r\n    ; signed multiply and add long\r\n    ; Xd = Xa + (Wn \u00d7 Wm), signed multiply\r\n    smaddl  Xd\/zr, Wn\/zr, Wm\/zr, Xa\/zr\r\n\r\n    ; signed multiply and subtract long\r\n    ; Xd = Xa - (Wn \u00d7 Wm), signed multiply\r\n    smsubl  Xd\/zr, Wn\/zr, Wm\/zr, Xa\/zr\r\n<\/pre>\n<p>Again, the result of the multiplication is added to or subtracted from an accumulator. The naming of this opcode is a little confusing, because the word <i>long<\/i> in the opcode talks about the multiplication, not the addition or subtraction. The multiplication is 32 \u00d7 32 \u2192 64, and the result is then accumulated as a 64-bit value.<\/p>\n<p>You can probably guess what the pseudo-instructions are. Just hard-code the zero register as the accumulator.<\/p>\n<pre>    ; unsigned multiply long\r\n    umull   a, b, c                     ; umaddl a, b, c, zr\r\n\r\n    ; unsigned multiply and negate long\r\n    umnegl  a, b, c                     ; umsubl a, b, c, zr\r\n\r\n    ; signed multiply long\r\n    smull   a, b, c                     ; smaddl a, b, c, zr\r\n\r\n    ; signed multiply and negate long\r\n    smnegl  a, b, c                     ; smsubl a, b, c, zr\r\n<\/pre>\n<p>The last multiplication instruction gives you the missing piece of the 64 \u00d7 64 \u2192 128 multiply.<\/p>\n<pre>    ; unsigned multiply high\r\n    ; Xd = (Xn \u00d7 Xm) &gt;&gt; 64, unsigned multiply\r\n    umulh   Xd\/zr, Xn\/zr, Xm\/zr\r\n\r\n    ; signed multiply high\r\n    ; Xd = (Xn \u00d7 Xm) &gt;&gt; 64, signed multiply\r\n    smulh   Xd\/zr, Xn\/zr, Xm\/zr\r\n<\/pre>\n<p>These give you the upper 64 bits of a 64 \u00d7 64 \u2192 128 multiply. If you want the full 128 bits, you combine it with the corresponding 64 \u00d7 64 \u2192 64 multiply to get the lower 64 bits.<\/p>\n<pre>    ; unsigned 64 \u00d7 64 \u2192 128\r\n    ; r1:r0 = r2 \u00d7 r3\r\n    mul     r0, r2, r3\r\n    umulh   r1, r2, r3\r\n\r\n    ; signed 64 \u00d7 64 \u2192 128\r\n    ; r1:r0 = r2 \u00d7 r3\r\n    mul     r0, r2, r3\r\n    smulh   r1, r2, r3\r\n<\/pre>\n<p>Don&#8217;t be fooled by the lack of symmetry: Even though there is a <code>UMULL<\/code> instruction, it is not the counterpart to <code>UMULH<\/code>, and <code>SMULL<\/code> instruction is not the counterpart to <code>SMULH<\/code>!<\/p>\n<p>Whereas there are a large variety of ways to multiple two registers, there are only two ways to divide them.<\/p>\n<pre>    ; unsigned divide\r\n    ; Rd = Rn \u00f7 Rm, unsigned divide, round toward zero\r\n    udiv    Rd\/zr, Rn\/zr, Rm\/zr\r\n\r\n    ; signed divide\r\n    ; Rd = Rn \u00f7 Rm, signed divide, round toward zero\r\n    sdiv    Rd\/zr, Rn\/zr, Rm\/zr\r\n<\/pre>\n<p>If you try to divide by zero, there is no exception. The result is just zero. If you want to trap division by zero, you&#8217;ll have to test for a zero denominator explicitly.<\/p>\n<p>There is also no exception for dividing the most negative integer by \u22121. You just get the most negative integer back.<\/p>\n<p>None of the multiplication or division operations set flags.<\/p>\n<p>There is no instruction for calculating the remainder. You can do that manually by calculating <var>r<\/var> = <var>n<\/var> \u2212 (<var>n<\/var> \u00f7 <var>d<\/var>) \u00d7 <var>d<\/var>. This can be done by following up the division with an <code>msub<\/code>:<\/p>\n<pre>    ; unsigned remainder after division\r\n    udiv    Rq, Rn, Rm          ; Rq = Rn \u00f7 Rm\r\n    msub    Rr, Rq, Rm, Rn      ; Rr = Rn - Rq \u00d7 Rm\r\n                                ;    = Rn - (Rn \u00f7 Rm) \u00d7 Rm\r\n\r\n    ; signed remainder after division\r\n    sdiv    Rq, Rn, Rm          ; Rq = Rn \u00f7 Rm\r\n    msub    Rr, Rq, Rm, Rn      ; Rr = Rn - Rq \u00d7 Rm\r\n                                ;    = Rn - (Rn \u00f7 Rm) \u00d7 Rm\r\n<\/pre>\n<p>Next time, we&#8217;ll look at the logical operations and their extremely weird immediates.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Arithmetic gets harder.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[2],"class_list":["post-106922","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-history"],"acf":[],"blog_post_summary":"<p>Arithmetic gets harder.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/106922","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=106922"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/106922\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=106922"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=106922"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=106922"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}