The AArch64 processor (aka arm64), part 5: Multiplication and division
There are a lot of ways of multiplying two values. The most basic way is to multiply two registers of the same size, producing a result of the same size.
; multiply and add ; Rd = Ra + (Rn × Rm) madd Rd/zr, Rn/zr, Rm/zr, Ra/zr ; multiply and subtract ; Rd = Ra - (Rn × Rm) msub Rd/zr, Rn/zr, Rm/zr, Ra/zr
The product is then added to or subtracted from a third register.
You get some pseudo-instructions if you hard-code the third input operand to zero.
; multiply mul a, b, c ; madd a, b, c, zr ; multiply and negate mneg a, b, c ; msub a, b, c, zr
The next fancier way of multiplying two registers is to multiply two 32-bit registers and get a 64-bit result.
; unsigned multiply and add long ; Xd = Xa + (Wn × Wm), unsigned multiply umaddl Xd/zr, Wn/zr, Wm/zr, Xa/zr ; unsigned multiply and subtract long ; Xd = Xa - (Wn × Wm), unsigned multiply umsubl Xd/zr, Wn/zr, Wm/zr, Xa/zr ; signed multiply and add long ; Xd = Xa + (Wn × Wm), signed multiply smaddl Xd/zr, Wn/zr, Wm/zr, Xa/zr ; signed multiply and subtract long ; Xd = Xa - (Wn × Wm), signed multiply smsubl Xd/zr, Wn/zr, Wm/zr, Xa/zr
Again, the result of the multiplication is added to or subtracted from an accumulator. The naming of this opcode is a little confusing, because the word long in the opcode talks about the multiplication, not the addition or subtraction. The multiplication is 32 × 32 → 64, and the result is then accumulated as a 64-bit value.
You can probably guess what the pseudo-instructions are. Just hard-code the zero register as the accumulator.
; unsigned multiply long umull a, b, c ; umaddl a, b, c, zr ; unsigned multiply and negate long umnegl a, b, c ; umsubl a, b, c, zr ; signed multiply long smull a, b, c ; smaddl a, b, c, zr ; signed multiply and negate long smnegl a, b, c ; smsubl a, b, c, zr
The last multiplication instruction gives you the missing piece of the 64 × 64 → 128 multiply.
; unsigned multiply high ; Xd = (Xn × Xm) >> 64, unsigned multiply umulh Xd/zr, Xn/zr, Xm/zr ; signed multiply high ; Xd = (Xn × Xm) >> 64, signed multiply smulh Xd/zr, Xn/zr, Xm/zr
These give you the upper 64 bits of a 64 × 64 → 128 multiply. If you want the full 128 bits, you combine it with the corresponding 64 × 64 → 64 multiply to get the lower 64 bits.
; unsigned 64 × 64 → 128 ; r1:r0 = r2 × r3 mul r0, r2, r3 umulh r1, r2, r3 ; signed 64 × 64 → 128 ; r1:r0 = r2 × r3 mul r0, r2, r3 smulh r1, r2, r3
Don’t be fooled by the lack of symmetry: Even though there is a
UMULL instruction, it is not the counterpart to
SMULL instruction is not the counterpart to
Whereas there are a large variety of ways to multiple two registers, there are only two ways to divide them.
; unsigned divide ; Rd = Rn ÷ Rm, unsigned divide, round toward zero udiv Rd/zr, Rn/zr, Rm/zr ; signed divide ; Rd = Rn ÷ Rm, signed divide, round toward zero sdiv Rd/zr, Rn/zr, Rm/zr
If you try to divide by zero, there is no exception. The result is just zero. If you want to trap division by zero, you’ll have to test for a zero denominator explicitly.
There is also no exception for dividing the most negative integer by −1. You just get the most negative integer back.
None of the multiplication or division operations set flags.
There is no instruction for calculating the remainder. You can do that manually by calculating r = n − (n ÷ d) × d. This can be done by following up the division with an
; unsigned remainder after division udiv Rq, Rn, Rm ; Rq = Rn ÷ Rm msub Rr, Rq, Rm, Rn ; Rr = Rn - Rq × Rm ; = Rn - (Rn ÷ Rm) × Rm ; signed remainder after division sdiv Rq, Rn, Rm ; Rq = Rn ÷ Rm msub Rr, Rq, Rm, Rn ; Rr = Rn - Rq × Rm ; = Rn - (Rn ÷ Rm) × Rm
Next time, we’ll look at the logical operations and their extremely weird immediates.
I’m just glad they belatedly decided that division was actually worth doing in hardware. (I think v7-R also has division instructions, but I’ve never seen that in the wild.)