{"id":102776,"date":"2019-08-09T07:00:00","date_gmt":"2019-08-09T14:00:00","guid":{"rendered":"http:\/\/devblogs.microsoft.com\/oldnewthing\/?p=102776"},"modified":"2019-09-13T21:27:19","modified_gmt":"2019-09-14T04:27:19","slug":"20190809-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20190809-00\/?p=102776","title":{"rendered":"The SuperH-3, part 5: Multiplication"},"content":{"rendered":"<p><a href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20190808-00\/?p=102774\"> Last time, we looked at simple addition and subtraction<\/a>. Now let&#8217;s look at multiplication.<\/p>\n<p>Multiplication operations report their results in a pair of 32-bit registers called called <var>MACH<\/var> and <var>MACL<\/var>, which collectively form a 64-bit virtual register known as <var>MAC<\/var> (multiply and accumulate).<\/p>\n<p>We start with the simple multiplication operations.<\/p>\n<pre>    MUL.L   Rm, Rn  ; MACL =           Rm *           Rn, no effect on MACH\r\n    MULS.W  Rm, Rn  ; MACL = ( int16_t)Rm * ( int16_t)Rn, no effect on MACH\r\n    MULU.W  Rm, Rn  ; MACL = (uint16_t)Rm * (uint16_t)Rn, no effect on MACH\r\n<\/pre>\n<p>The <code>.W<\/code> operations treat the two source operands as 16-bit values, either signed or unsigned, and store the 32-bit result into <var>MACL<\/var>. The <code>MUL.L<\/code> treats the source operands as full 32-bit values, and produces a 32-bit result in <var>MACL<\/var>. (It doesn&#8217;t matter whether the sources are considered signed or unsigned because the lower 32 bits of the result are the same either way.)<\/p>\n<p>The next instructions produce 64-bit results.<\/p>\n<pre>    DMULS.L Rm, Rn      ; MAC = Rn * Rm,   signed 32x32\u219264 multiply\r\n    DMULU.L Rm, Rn      ; MAC = Rn * Rm, unsigned 32x32\u219264 multiply\r\n\r\n    MAC.L   @Rm+, @Rn+  ; MAC += @Rm++ * @Rn++, signed 32x32\u219264 multiply\r\n    MAC.W   @Rm+, @Rn+  ; MAC += @Rm++ * @Rn++, signed 16x16\u219264 multiply\r\n<\/pre>\n<p>The <code>MAC.x<\/code> instructions are interesting in that they access two memory locations in one instruction. Both <var>Rm<\/var> and <var>Rn<\/var> are treated as addresses, 16-bit or 32-bit values are loaded from those addresses, the loaded values are treated as signed integers, multiplied together, and the result added to the 64-bit accumulator register <var>MAC<\/var>, and finally the registers are incremented by the operand size. The design of the instruction is evidently for performing a dot product of two vectors.<\/p>\n<p>There&#8217;s an additional wrinkle to the <code>MAC.x<\/code> instructions: If you set the <var>S<\/var> flag, then the operations use saturating addition rather than wraparound addition. For <code>MAC.L<\/code>, the saturation is as a 48-bit value, and the value is sign-extended to a 64-bit value in <var>MAC<\/var>. For <code>MAC.W<\/code>, the saturation is as a 32-bit value, and the bottom bit of <var>MACH<\/var> is set to 1 if an overflow occurred.<\/p>\n<p>In practice, of these multiplication instructions, you will likely see only <code>MUL.L<\/code> in compiler-generated code.<\/p>\n<p>Oh wait, how do you get the answers out of the <var>MAC<\/var> registers? Yeah, there are instructions for that too.<\/p>\n<pre>    CLRMAC              ; MAC = 0\r\n\r\n    LDS     Rm, MACH    ; MACH = Rm\r\n    LDS     Rm, MACL    ; MACL = Rm\r\n    LDS.L   @Rm+, MACH  ; MACH = @Rm+\r\n    LDS.L   @Rm+, MACL  ; MACL = @Rm+\r\n\r\n    STS     MACH, Rn    ; Rn = MACH\r\n    STS     MACL, Rn    ; Rn = MACL\r\n    STS.L   MACH, @-Rn  ; @-Rn = MACH\r\n    STS.L   MACL, @-Rn  ; @-Rn = MACL\r\n<\/pre>\n<p>The <code>CLRMAC<\/code> instruction sets <var>MAC<\/var> to zero, which is a good starting point for subsequent <code>MAC.x<\/code> instructions.<\/p>\n<p>The <code>LDS<\/code> instructions move values into the <var>MAC<\/var> registers. You can move a value directly from a register or load it (with post-increment) from memory. Conversely, the <code>STS<\/code> instructions move values out of the <var>MAC<\/var> registers, either into a general-purpose register or into memory.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20190812-00\/?p=102778\"> Next up is integer division<\/a>, which is going to be interesting.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Now things get more complicated.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[2],"class_list":["post-102776","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-history"],"acf":[],"blog_post_summary":"<p>Now things get more complicated.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/102776","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=102776"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/102776\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=102776"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=102776"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=102776"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}