{"id":98435,"date":"2018-04-04T07:00:00","date_gmt":"2018-04-04T21:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/?p=98435"},"modified":"2021-06-04T09:33:19","modified_gmt":"2021-06-04T16:33:19","slug":"20180404-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20180404-00\/?p=98435","title":{"rendered":"The MIPS R4000, part 3: Multiplication, division, and the temperamental HI and LO registers"},"content":{"rendered":"<p>The MIPS R4000 can perform multiplication and division in hardware, but it does so in an unusual way, and this is where the temperamental <var>HI<\/var> and <var>LO<\/var> registers enter the picture.<\/p>\n<p>The <var>HI<\/var> and <var>LO<\/var> registers are 32-bit registers which hold or accumulate the results of a multiplication or addition. You cannot operate on them directly. They are set by a suitable arithmetic operation, and by special instructions for moving values in and out.<\/p>\n<p>The multiplication instructions treat <var>HI<\/var> and <var>LO<\/var> as a logical 64-bit register, where the high-order 32 bits are in the <var>HI<\/var> register and the low-order 32 bits are in the <var>LO<\/var> register.<\/p>\n<pre>    MUL     rd, rs, rt      ; rd = rs * rt, corrupts HI and LO\r\n    MULT    rs, rt          ; HI:LO = rs * rt (signed)\r\n    MULTU   rs, rt          ; HI:LO = rs * rt (unsigned)\r\n<\/pre>\n<p>The simplest version is <code>MUL<\/code> which multiples two 32-bit registers and stores a 32-bit result into a general-purpose register. As a side effect, it corrupts the <var>HI<\/var> and <var>LO<\/var> registers. (This is the only multiplication or division operation that puts the result in a general-purpose register instead of into <var>HI<\/var> and <var>LO<\/var>.)<\/p>\n<p>The <code>MULT<\/code> instruction multiplies two signed 32-bit values to form a 64-bit result, which it stores in <var>HI<\/var> and <var>LO<\/var>.<\/p>\n<p>The <code>MULTU<\/code> instruction does the same thing, but treats the factors as unsigned.<\/p>\n<p>The next group of multiplication instructions performs accumulation.<\/p>\n<pre>    MADD    rs, rt          ; HI:LO += rs * rt (signed)\r\n    MADDU   rs, rt          ; HI:LO += rs * rt (unsigned)\r\n    MSUB    rs, rt          ; HI:LO -= rs * rt (signed)\r\n    MSUBU   rs, rt          ; HI:LO -= rs * rt (unsigned)\r\n<\/pre>\n<p>After performing the appropriate multiplication operation, the 64-bit result is added to or subtracted from the value currently in the <var>HI<\/var> and <var>LO<\/var> registers.<\/p>\n<p>Note that the <code>U<\/code> suffix applies to the signed-ness of the multiplication, not to whether the operation traps on signed overflow during addition or subtraction. None of the multiplication instructions trap.<\/p>\n<p>The operation runs faster if you put the smaller factor in <var>rt<\/var>, so if you know (or suspect) that one of the values is smaller than the other, you can try to arrange for the smaller number to be in <var>rt<\/var>.<\/p>\n<p>You might think that the division operations take a 64-bit value in <var>HI<\/var> and <var>LO<\/var> and divide it by a 32-bit register. But you&#8217;d be wrong. They divide a 32-bit value by another 32-bit value and store the quotient and remainder in in <var>HI<\/var> and <var>LO<\/var>.<\/p>\n<pre>    DIV     rd, rs, rt      ; LO = rs \/ rt, HI = rs % rt (signed)\r\n    DIVU    rd, rs, rt      ; LO = rs \/ rt, HI = rs % rt (unsigned)\r\n<\/pre>\n<p>None of the division operations trap, not even for overflow or divide-by-zero. If you divide by zero or incur division overflow, the results in <var>HI<\/var> and <var>LO<\/var> are garbage. If you care about overflow or division by zero, you need to check for it explicitly.<\/p>\n<p>Okay, that&#8217;s great. We&#8217;ve done some calculations and put the results into <var>HI<\/var> and <var>LO<\/var>. But how do we get the answer out? (And how do you put the initial values in, if you are using <code>MADD<\/code> or <code>MSUB<\/code>?)<\/p>\n<pre>    MFHI    rd              ; rd = HI \"move from HI\"\r\n    MFLO    rd              ; rd = LO \"move from LO\"\r\n    MTHI    rs              ; HI = rs \"move to HI\"\r\n    MTLO    rs              ; LO = rs \"move to LO\"\r\n<\/pre>\n<p>The multiplication and division operations take some time to execute,\u00b9 and if you try to read the results too soon, you will stall until the results are available. Therefore, it&#8217;s best to distract yourself with some other operations while waiting for the multiplication or division operation to do its thing. (For example, you might check if you need to raise a runtime exception because you just asked the processor to divide by zero.)<\/p>\n<p>The temperamental part of the <var>HI<\/var> and <var>LO<\/var> registers is in how you read the values out.<\/p>\n<p>Tricky rule number one: Once you perform a <code>MTHI<\/code> or <code>MTLO<\/code> instruction, <i>both<\/i> of the previous values in <var>HI<\/var> and <var>LO<\/var> are lost. That means you can&#8217;t do this:<\/p>\n<pre>    MULT    r1, r2          ; HI:LO = r1 * r2 (signed)\r\n    ... stuff that doesn't involve HI or LO ...\r\n    MTHI    r3              ; HI = r3\r\n    ... stuff that doesn't involve HI or LO ...\r\n    MFLO    r4              ; r4 = GARBAGE\r\n<\/pre>\n<p>You might na\u00efvely think that the <code>MTHI<\/code> replaces the value in the <var>HI<\/var> register and leaves the <var>LO<\/var> register alone, but since this is the first write to either of the <var>HI<\/var> or <var>LO<\/var> registers since the last multiplication or division operation, <i>both<\/i> registers are lost, and your attempt to fetch the value of <var>LO<\/var> will return garbage.<\/p>\n<p>Note that this applies only to the first write to <var>HI<\/var> or <var>LO<\/var>. The second write behaves as you would expect. For example, if you perform <code>MTHI<\/code> followed by <code>MTLO<\/code>, the <code>MTHI<\/code> will set <var>HI<\/var> and corrupt <var>LO<\/var>, but the <code>MTLO<\/code> will set <var>LO<\/var> and leave <var>HI<\/var> alone.<\/p>\n<p>Tricky rule number two: If you try to read a value from <var>HI<\/var> or <var>LO<\/var>, you must wait two instructions before performing any operation that writes to <var>HI<\/var> or <var>LO<\/var>.\u00b2 Otherwise, the reads will produce garbage. The instruction that writes to <var>HI<\/var> or <var>LO<\/var> could be a multiplication or division operation, or it could be <code>MTHI<\/code> or <code>MTLO<\/code>.<\/p>\n<p>Tricky rule number two means that the following sequence is invalid:<\/p>\n<pre>    DIV     r1, r2          ; LO = r1 \/ r2, HI = r1 % r2 (signed)\r\n    ... stuff that doesn't involve HI or LO ...\r\n    MFHI    r3              ; r3 = <span style=\"text-decoration: line-through;\">r1 % r2<\/span> GARBAGE\r\n    MULT    r4, r5          ; HI:LO = r4 * r5 (signed)\r\n<\/pre>\n<p>Since the <code>MULT<\/code> comes too soon after the <code>MFHI<\/code>, the <code>MFHI<\/code> will put garbage into <var>r3<\/var>. You need to stick two instructions between the <code>MFHI<\/code> and the <code>MULT<\/code> in order to avoid this.<\/p>\n<p>(Tricky rule number two was removed in the R8000. On the R8000, if you perform a multiplication or division or <code>MTxx<\/code> too soon after a <code>MFxx<\/code>, the processor will stall until the danger window has passed.)<\/p>\n<p>Okay, next time we&#8217;ll look at constants.<\/p>\n<p>\u00b9 Wikipedia says that <a href=\"https:\/\/en.wikipedia.org\/wiki\/R4000#Integer_execution\">latency of 32-bit multiplication was 10 cycles, and latency of 32-bit division was a whopping 69 cycles<\/a>.<\/p>\n<p>\u00b2 Commenter <a href=\"http:\/\/os161.eecs.harvard.edu\">David Holland<\/a> explains that this weird rule is due to a pipeline hazard: The multiply or divide operation is not recalled if an exception occurs while the operation is in flight. If the <code>MFLO<\/code> and a subsequent multiply are both in flight and an interrupt occurs, the multiply will complete by the time the exception handler gets around to saving the <var>HI<\/var> and <var>LO<\/var> registers. When execution resumes at the <code>MFLO<\/code>, it will read the low result of the <i>following<\/i> multiplication, rather than the preceding one. That&#8217;s why you have to wait two cycles: You have to make sure that the <code>MFLO<\/code> has cleared the pipeline before initiating any new operations that may write to <var>HI<\/var> and <var>LO<\/var>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>You have to treat them nicely or they will refuse to co&ouml;perate.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-98435","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>You have to treat them nicely or they will refuse to co&ouml;perate.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/98435","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=98435"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/98435\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=98435"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=98435"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=98435"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}