{"id":99445,"date":"2018-08-08T07:00:00","date_gmt":"2018-08-08T21:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/?p=99445"},"modified":"2019-03-13T00:38:09","modified_gmt":"2019-03-13T07:38:09","slug":"20180808-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20180808-00\/?p=99445","title":{"rendered":"The PowerPC 600 series, part 3: Arithmetic"},"content":{"rendered":"<p>Before we start with arithmetic, we need to have a talk about carry. <\/p>\n<p>The PowerPC uses true carry for both addition and subtraction. This is different from the x86 family of processors, for which the carry flag is actually a borrow bit when used in subtraction. <a HREF=\"https:\/\/en.wikipedia.org\/wiki\/Carry_flag#Carry_flag_vs._borrow_flag\">You can read more about the difference on Wikipedia<\/a>. There are some instructions which perform a combined addition and subtraction, and in that case, the only sane choice is to use true carry. (If you had chosen carry as borrow, then it wouldn&#8217;t be clear whether the final carry bit represented the carry from the addition or the borrow from subtraction.) <\/p>\n<p>To emphasize the fact that the PowerPC uses true carry, I will rewrite all subtractions as additions, taking advantage of the twos complement identity <\/p>\n<pre>\n    -x = ~x + 1\n<\/pre>\n<p>Okay, now we can do some arithmetic. Let&#8217;s start with addition. <\/p>\n<pre>\n    add     rd, ra, rb      ; rd = ra + rb\n    add.    rd, ra, rb      ; rd = ra + rb, update cr0\n    addo    rd, ra, rb      ; rd = ra + rb, update         XER overflow bits\n    addo.   rd, ra, rb      ; rd = ra + rb, update cr0 and XER overflow bits\n<\/pre>\n<p>These instructions add two source registers and optionally update the <var>xer<\/var> register to capture any possible overflow (by appending an <code>o<\/code>), and also optionally update the <var>cr0<\/var> register to reflect the sign of the result and any summary overflow (by appending a period). <\/p>\n<p>I don&#8217;t know what they were thinking, using an easily-overlooked mark of punctuation to carry important information. <\/p>\n<p>There is also a version of the above instruction that takes a signed 16-bit immediate: <\/p>\n<pre>\n    addi    rd, ra\/0, imm16 ; rd = ra\/0 + (int16_t)imm16\n<\/pre>\n<p>Note that this variant does not accept <code>o<\/code> or <code>.<\/code> suffixes. <\/p>\n<p>The <var>ra\/0<\/var> notation means &#8220;This can be any general purpose register, but if you ask for <var>r0<\/var>, you actually get the constant zero.&#8221; The register <var>r0<\/var> is weird like that. Sometimes it stands for itself, but sometimes it reads as zero. As a result, the <var>r0<\/var> register isn&#8217;t used much. <\/p>\n<p>The assembler lets you write <var>r0<\/var> through <var>r31<\/var> as synonyms for the integers 0 through 31, so the following are equivalent: <\/p>\n<pre>\n    add     r3, r0, r4      ; r3 = r0 + r4\n    add      3,  0,  4      ; r3 = r0 + r4\n    add     r3, r0,  4      ; r3 = r0 + r4\n<\/pre>\n<p>This can get very confusing. That last example sure looks like you&#8217;re setting <var>r3<\/var> to <var>r0<\/var> plus 4, but it&#8217;s not. The 4 is in a position where a register is expected, so it actually means <var>r4<\/var>. <\/p>\n<p>Similarly, you might think you&#8217;re adding an immediate to <var>r0<\/var> when you write <\/p>\n<pre>\n    addi    r3, r0, 256     ; r3 = r0 + 256, right?\n<\/pre>\n<p>but nope, the value of 0 as the second operand to <code>addi<\/code> is interpreted as the constant zero, not register number zero. <\/p>\n<p>Fortunately, the Windows disassembler always calls registers by their mnemonic rather than by number. <\/p>\n<p>Wait, we&#8217;re not done with addition yet. <\/p>\n<pre>\n    ; add and set carry\n    addc    rd, ra, rb      ; rd = ra + rb, update carry\n    addc.   rd, ra, rb      ; rd = ra + rb, update carry and cr0\n    addco   rd, ra, rb      ; rd = ra + rb, update carry         and XER overflow bits\n    addco.  rd, ra, rb      ; rd = ra + rb, update carry and cr0 and XER overflow bits\n<\/pre>\n<p>The &#8220;add and set carry&#8221; instructions act like the corresponding regular add instructions, except that the also update the carry bit in <var>xer<\/var> based on whether a carry propagated out of the highest-order bit. <\/p>\n<pre>\n    ; add extended\n    adde    rd, ra, rb      ; rd = ra + rb + carry, update carry\n    adde.   rd, ra, rb      ; rd = ra + rb + carry, update carry and cr0\n    addeo   rd, ra, rb      ; rd = ra + rb + carry, update carry         and XER overflow bits\n    addeo.  rd, ra, rb      ; rd = ra + rb + carry, update carry and cr0 and XER overflow bits\n<\/pre>\n<p>The &#8220;add extended&#8221; instructions act like the corresponding &#8220;add and set carry&#8221; instructions, except that they also add 1 if the carry bit was set. This makes multiword addition convenient. <\/p>\n<pre>\n    ; add minus one extended\n    addme   rd, ra          ; rd = ra + carry + ~0, update carry\n    addme.  rd, ra          ; rd = ra + carry + ~0, update carry and cr0\n    addmeo  rd, ra          ; rd = ra + carry + ~0, update carry         and XER overflow bits\n    addmeo. rd, ra          ; rd = ra + carry + ~0, update carry and cr0 and XER overflow bits\n<\/pre>\n<p>The &#8220;add minus one extended&#8221; instruction is like &#8220;add extended&#8221; except that the second parameter is hard-coded to &minus;1. I wrote <code>~0<\/code> instead of &minus;1 to emphasize that we are using true carry. (This is the combined addition-and-subtraction instruction I alluded to at the top of the article. It adds carry and then subtracts one.) <b>Added<\/b>: As commenter Neil noted below, through the magic of true carry, this is the same as &#8220;subtract zero extended&#8221;, which makes it handy for multiword arithmetic. <\/p>\n<pre>\n    ; add zero extended\n    addze   rd, ra          ; rd = ra + carry, update carry\n    addze.  rd, ra          ; rd = ra + carry, update carry and cr0\n    addzeo  rd, ra          ; rd = ra + carry, update carry         and XER overflow bits\n    addzeo. rd, ra          ; rd = ra + carry, update carry and cr0 and XER overflow bits\n<\/pre>\n<p>The &#8220;add zero extended&#8221; instruction is like &#8220;add extended&#8221; except that the second parameter is hard-coded to zero. <\/p>\n<p>And then there are some instructions that take signed 16-bit immediates: <\/p>\n<pre>\n    ; add immediate shifted\n    addis   rd, ra\/0, imm16  ; rd = ra\/0 + (imm16 &lt;&lt; 16)\n\n    ; add immediate and set carry\n    addic   rd, ra, imm16    ; rd = ra + (int16_t)imm16, update carry\n\n    ; add immediate and set carry and update cr0\n    addic.  rd, ra, imm16    ; rd = ra + (int16_t)imm16, update carry and cr0\n<\/pre>\n<p>Phew, that was addition. There are also subtraction instructions, which should look mostly familiar now that you&#8217;ve seen addition. <\/p>\n<pre>\n    ; subtract from\n    subf    rd, ra, rb      ; rd = ~ra + rb + 1\n    subf.   rd, ra, rb      ; rd = ~ra + rb + 1, update cr0\n    subfo   rd, ra, rb      ; rd = ~ra + rb + 1, update         XER overflow bits\n    subfo.  rd, ra, rb      ; rd = ~ra + rb + 1, update cr0 and XER overflow bits\n\n    ; subtract from and set carry\n    subfc   rd, ra, rb      ; rd = ~ra + rb + 1, update carry\n    subfc.  rd, ra, rb      ; rd = ~ra + rb + 1, update carry and cr0\n    subfco  rd, ra, rb      ; rd = ~ra + rb + 1, update carry         and XER overflow bits\n    subfco. rd, ra, rb      ; rd = ~ra + rb + 1, update carry and cr0 and XER overflow bits\n\n    ; subtract from extended\n    subfe    rd, ra, rb     ; rd = ~ra + rb + carry, update carry\n    subfe.   rd, ra, rb     ; rd = ~ra + rb + carry, update carry and cr0\n    subfeo   rd, ra, rb     ; rd = ~ra + rb + carry, update carry         and XER overflow bits\n    subfeo.  rd, ra, rb     ; rd = ~ra + rb + carry, update carry and cr0 and XER overflow bits\n\n    ; subtract from minus one extended\n    subfme   rd, ra         ; rd = ~ra + carry + ~0, update carry\n    subfme.  rd, ra         ; rd = ~ra + carry + ~0, update carry and cr0\n    subfmeo  rd, ra         ; rd = ~ra + carry + ~0, update carry         and XER overflow bits\n    subfmeo. rd, ra         ; rd = ~ra + carry + ~0, update carry and cr0 and XER overflow bits\n\n    ; subtract from zero extended\n    subfze   rd, ra         ; rd = ~ra + carry, update carry\n    subfze.  rd, ra         ; rd = ~ra + carry, update carry and cr0\n    subfzeo  rd, ra         ; rd = ~ra + carry, update carry         and XER overflow bits\n    subfzeo. rd, ra         ; rd = ~ra + carry, update carry and cr0 and XER overflow bits\n\n    ; subtract from immediate and set carry\n    subfic  rd, ra, imm16   ; rd = ~ra + (int16_t)imm16 + 1, update carry\n<\/pre>\n<p>Note that the instruction is &#8220;subtract from&#8221;, not &#8220;subtract&#8221;. The second operand is subtracted from the third operand; in other words, the two operands are backwards. Fortunately, the assembler provides a family of synthetic instructions that simply swap the last two operands: <\/p>\n<pre>\n    subf    rd, rb, ra      ; sub  rd, ra, rb\n    ; similarly \"sub.\", \"subo\", and \"subo.\".\n\n    subfc   rd, rb, ra      ; subc rd, ra, rb\n    ; similarly \"subc.\", \"subco\", and \"subco.\".\n<\/pre>\n<p>Second problem is that there is no <code>subfis<\/code> to subtract a shifted immediate, nor is there <code>subfic.<\/code> to update flags after subtracting from an immediate. But the assembler can synthesize those too: <\/p>\n<pre>\n    addi    rd, ra\/0, -imm16 ; subi   rd, ra\/0, imm16\n    addis   rd, ra\/0, -imm16 ; subis  rd, ra\/0, imm16\n    addic   rd, ra, -imm16   ; subic  rd, ra, imm16\n    addic.  rd, ra, -imm16   ; subic. rd, ra, imm16\n<\/pre>\n<p>PowerPC&#8217;s use of true carry allows this trick to work while still preserving the semantics of carry and overflow. <\/p>\n<p>We wrap up with multiplication and division. <\/p>\n<pre>\n    ; multiply low immediate\n    mulli   rd, ra, imm16    ; rd = (int32_t)ra * (int16_t)imm16\n\n    ; multiply low word\n    mullw   rd, ra, rb       ; rd = (int32_t)ra * (int32_t)rb\n    ; also \"mullw.\", \"mullwo\", and \"mullwo.\".\n\n    ; multiply high word\n    mulhw   rd, ra, rb       ; rd = ((int32_t)ra * (int32_t)rb) &gt;&gt; 32\n    ; also \"mulhw.\"\n\n    ; multiply high word unsigned\n    mulhwu  rd, ra, rb       ; rd = ((uint32_t)ra * (uint32_t)rb) &gt;&gt; 32\n    ; also \"mulhwu.\"\n<\/pre>\n<p>The &#8220;multiply low&#8221; instructions perform the multiplication and return the low-order 32 bits. The &#8220;multiply high&#8221; instructions return the high-order 32 bits. <\/p>\n<p>Finally, we have division: <\/p>\n<pre>\n    ; divide word\n    divw    rd, ra, rb       ; rd = (int32_t)ra &divide; (int32_t)rb\n    ; also \"divw.\", \"divwo\", and \"divwo.\".\n\n    ; divide word unsigned\n    divwu   rd, ra, rb       ; rd = (uint32_t)ra &divide; (uint32_t)rb\n    ; also \"divwu.\", \"divwuo\", and \"divwuo.\".\n<\/pre>\n<p>If you try to divide by zero or (for <code>divw<\/code>) if you try to divide <code>0x80000000<\/code> by &minus;1, then the results are garbage, and if you used the <code>o<\/code> version of the instruction, then the overflow flag is set. No trap is generated. (If you didn&#8217;t use the <code>o<\/code> version, then you get no indication that anything went wrong. You just get garbage.) <\/p>\n<p>There is no modulus instruction. If you want to get the remainder, take the quotient, multiple it by the divisor, and subtract it from the dividend. <\/p>\n<p>Okay, that was arithmetic. <a HREF=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/\">Next up<\/a> are the bitwise logical operators and combining arithmetic and logical operators to load constants. <\/p>\n<p><b>Bonus snark<\/b>: For a reduced instruction set computer, it sure has an awful lot of instructions. And we haven&#8217;t even gotten to control flow yet. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Who knew there were so many ways to add numbers.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[2],"class_list":["post-99445","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-history"],"acf":[],"blog_post_summary":"<p>Who knew there were so many ways to add numbers.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/99445","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=99445"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/99445\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=99445"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=99445"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=99445"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}