{"id":99435,"date":"2018-08-07T07:00:00","date_gmt":"2018-08-07T21:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/?p=99435"},"modified":"2019-03-13T00:38:05","modified_gmt":"2019-03-13T07:38:05","slug":"20180807-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20180807-00\/?p=99435","title":{"rendered":"The PowerPC 600 series, part 2: Condition registers and the integer exception register"},"content":{"rendered":"<p>The integer exception register <var>xer<\/var> contains a bunch of stuff, but the ones that are relevant to us are <\/p>\n<table BORDER=\"1\" CELLSPACING=\"0\" CELLPADDING=\"3\" CLASS=\"cp3\" STYLE=\"border: solid 1px black;border-collapse: collapse\">\n<tr>\n<th>Bit<\/th>\n<th>Name<\/th>\n<\/tr>\n<tr>\n<td>0<\/td>\n<td>Summary overflow<\/td>\n<\/tr>\n<tr>\n<td>1<\/td>\n<td>Overflow<\/td>\n<\/tr>\n<tr>\n<td>2<\/td>\n<td>Carry<\/td>\n<\/tr>\n<\/table>\n<p>Some instructions update the overflow and summary overflow bits in the <var>xer<\/var> register. When those instructions are executed, the overflow bit is updated to represent whether the operation resulted in a signed overflow. The summary overflow bit accumulates all the overflow bits since it was last explicitly reset. This lets you perform a series of arithmetic operations and then test a single bit at the end to see if an overflow occurred anywhere along the way. <\/p>\n<p>Some instructions consume and\/or target the carry bit in <var>xer<\/var>. We&#8217;ll discuss how carry works when we get to integer arithmetic. <\/p>\n<p>Each of the <var>cr#<\/var> condition registers consists of four bits, numbered from most signficant to least significant. <\/p>\n<table BORDER=\"1\" CELLSPACING=\"0\" CELLPADDING=\"3\" CLASS=\"cp3\" STYLE=\"border: solid 1px black;border-collapse: collapse\">\n<tr>\n<th>Bit<\/th>\n<th>Name<\/th>\n<th>Mnemonic<\/th>\n<\/tr>\n<tr>\n<td>0<\/td>\n<td>Less than<\/td>\n<td><var>lt<\/var><\/td>\n<\/tr>\n<tr>\n<td>1<\/td>\n<td>Greater than<\/td>\n<td><var>gt<\/var><\/td>\n<\/tr>\n<tr>\n<td>2<\/td>\n<td>Equal to<\/td>\n<td><var>eq<\/var><\/td>\n<\/tr>\n<tr>\n<td>3<\/td>\n<td>Summary overflow<\/td>\n<td><var>so<\/var><\/td>\n<\/tr>\n<\/table>\n<p>For convenience, the assembler predefines the constants <var>lt<\/var>, <var>gt<\/var>, <var>eq<\/var>, and <var>so<\/var> to represent their respective bit numbers. <\/p>\n<p>The <code>cmp<\/code> family of instructions compare two values and write the result to a condition register. <\/p>\n<pre>\n    cmpw    crd, ra, rb     ; crd = compare ( int32_t)ra with ( int32_t)rb\n    cmpwi   crd, ra, imm16  ; crd = compare ( int32_t)ra with ( int16_t)imm16\n    cmplw   crd, ra, rb     ; crd = compare (uint32_t)ra with (uint32_t)rb\n    cmplwi  crd, ra, imm16  ; crd = compare (uint32_t)ra with (uint16_t)imm16\n<\/pre>\n<p>You can compare two registers, or you can compare a register with an immediate, and you can choose whether the comparison is signed or unsigned. (Recall that the <code>l<\/code> stands for <i>logical<\/i>.) For example: <\/p>\n<pre>\n    cmpw    cr3, r0, r1     ; cr3 = compare r0 with r1 as signed values\n<\/pre>\n<p> The <var>lt<\/var>, <var>gt<\/var>, and <var>eq<\/var> bits are set according to the result of the comparison, and the <var>so<\/var> bit receives a copy of the current summary overflow bit in <var>xer<\/var>. <\/p>\n<p>If you do not specify a destination comparison register, it defaults to <var>cr0<\/var>: <\/p>\n<pre>\n    cmpw    ra, rb          ; cr0 = compare ( int32_t)ra with ( int32_t)rb\n    cmpwi   ra, imm16       ; cr0 = compare ( int32_t)ra with ( int16_t)imm16\n    cmplw   ra, rb          ; cr0 = compare (uint32_t)ra with (uint32_t)rb\n    cmplwi  ra, imm16       ; cr0 = compare (uint32_t)ra with (uint16_t)imm16\n<\/pre>\n<p>As we&#8217;ll see later, some arithmetic instructions implicitly update <var>cr0<\/var> by comparing the computed result against zero. (Similarly, some floating point operations implicitly update <var>cr1<\/var>.) When performed as part of an arithmetic instruction, the comparison is always performed as a signed comparison, even if the instruction&#8217;s underlying operation was unsigned. <\/p>\n<p>If you combine an update of <var>cr0<\/var> with an arithmetic operation, the <var>so<\/var> bit is a copy of the summary overflow bit in the <var>xer<\/var> register at the end of the instruction. That means that if an arithmetic operation requests both <var>cr0<\/var> and <var>xer<\/var> to be updated, the <var>xer<\/var> register is updated first, and then the summary overflow bit from <var>xer<\/var> is copied to the <var>so<\/var> bit in <var>cr0<\/var>. That means that the <var>so<\/var> bit in <var>cr0<\/var> captures whether a signed overflow occurred in any overflow-detecting operation up to and including the current one. <\/p>\n<p>The Microsoft compiler tends to prefer to target <var>cr6<\/var> and <var>cr7<\/var> in its comparison instructions. It doesn&#8217;t make much difference to the processor, but I suspect the compiler tries to avoid <var>cr0<\/var>  so that it doesn&#8217;t conflict with the use of <var>cr0<\/var> by the arithmetic instructions. <\/p>\n<pre>\n    mcrxr  crd              ; crd = first four bits of xer\n<\/pre>\n<p>The &#8220;move to condition register from <var>xer<\/var>&#8221; instruction copies the summary overflow, overflow, and carry bits from the <var>xer<\/var> register to the specified condition register, and then it clears the bits from <var>xer<\/var>. <\/p>\n<p>No, I don&#8217;t know why they left the &#8220;e&#8221; out of the opcode. <\/p>\n<p>This is how you reset the summary overflow.&sup1; <\/p>\n<pre>\n    mtxer  ra               ; xer = ra\n    mfxer  rd               ; rd = xer\n<\/pre>\n<p>These instructions&sup2; move to\/from the <var>xer<\/var> register. They are another way to clear the <var>xer<\/var> register, or to set it to a particular initial state. <\/p>\n<p>There are a good number of bitwise operations that combine two condition register bits and store the result into a third condition register bit. These let you build boolean expressions out of condition registers. <\/p>\n<pre>\n    crand   bd, ba, bb  ; cr[bd] =   cr[ba] &amp;  cr[bb]\n    cror    bd, ba, bb  ; cr[bd] =   cr[ba] |  cr[bb]\n    crxor   bd, ba, bb  ; cr[bd] =   cr[ba] ^  cr[bb]\n    crnand  bd, ba, bb  ; cr[bd] = !(cr[ba] &amp;  cr[bb])\n    crnor   bd, ba, bb  ; cr[bd] = !(cr[ba] |  cr[bb])\n    creqv   bd, ba, bb  ; cr[bd] = !(cr[ba] ^  cr[bb])\n    crandc  bd, ba, bb  ; cr[bd] =   cr[ba] &amp; !cr[bb] \"and complement\"\n    crorc   bd, ba, bb  ; cr[bd] =   cr[ba] | !cr[bb] \"or complement\"\n<\/pre>\n<p>Remember that the PowerPC numbers bits from most significant to least significant, so bit zero is the high-order bit. <\/p>\n<p>To save you from having to memorize all the bit numbers, the assembler lets you write <var>cr0<\/var> to mean 0, <var>cr1<\/var> to mean 1, and so through <var>cr7<\/var> which means 7. Combined with the constants for the four bits in the condition register, this lets you write <\/p>\n<pre>\n    crand   4*cr3+eq, 4*cr2+lt, 4*cr6+gt ; cr3[eq] = cr2[lt] &amp; cr6[gt]\n<\/pre>\n<p>instead of the instruction only a processor&#8217;s mother could love: <\/p>\n<pre>\n    crand   14, 8, 25                    ; cr3[eq] = cr2[lt] &amp; cr6[gt]\n<\/pre>\n<p>There are also special instruction for transferring between <var>cr<\/var> and a general-purpose register. <\/p>\n<pre>\n    mfcr    rt           ; rt = cr\n    mtcrf   mask, ra     ; cr = ra (selected by mask)\n<\/pre>\n<p>The mask is an 8-bit immediate. If a bit is set, then the corresponding <var>cr#<\/var> is copied from the corresponding bits of <var>ra<\/var>. For example, 128 means &#8220;Copy the top four bits of <var>ra<\/var> into <var>cr0<\/var>, and leave all the other condition registers alone.&#8221; Recall that the PowerPC counts bits from most significant to least significant, so <var>cr0<\/var> is stored in the highest-order four bits. <\/p>\n<p>The assembler provides synthetic instructions for various special cases of the above operations: <\/p>\n<pre>\n    creqv   bd, bd, bd  ; crset   bd          ; cr[bd]  = 1\n    crxor   bd, bd, bd  ; crclr   bd          ; cr[bd]  = 0\n    cror    bd, ba, ba  ; crmove  bd, ba      ; cr[bd]  = cr[ba]\n    crnor   bd, ba, ba  ; crnot   bd, ba      ; cr[bd]  = !cr[ba]\n    mtcr    ra          ; mtcrf   255, ra     ; cr = ra\n<\/pre>\n<p>Here&#8217;s an example of how these boolean operations could be used: <\/p>\n<pre>\n    cmpw    cr2, r4, r5 ; compare r4 with r5, put result in cr2\n    cmpw    cr3, r6, r7 ; compare r6 with r7, put result in cr3\n    crandc  4*cr0+eq, 4*cr2+gt, 4*cr4+eq ; cr0[eq] = cr2[gt] &amp; !cr4[eq]\n    beq     destination ; jump if r4 &gt; r5 &amp;&amp; r6 != r7\n<\/pre>\n<p>We perform two comparison operations and put the results into <var>cr2<\/var> and <var>cr3<\/var>. We then perform a boolean &#8220;and not&#8221; operation that calculates <\/p>\n<pre>\n    cr0[eq] = (r4 &gt; r5) &amp; !(r6 == r7)\n            = (r4 &gt; r5) &amp; (r6 != r7)\n<\/pre>\n<p>The result is placed into the <var>eq<\/var> position of <var>cr0<\/var>, which makes it a perfect place to be the branch condition of the <code>beq<\/code> instruction. <\/p>\n<p>The traditional way of doing this on processors that don&#8217;t have these fancy condition register operations is to perform a test and a conditional branch, then another test and another conditional branch. Combining the results of the test and performing a single branch means that the entire sequence consumes only one slot in the branch predictor. This leaves more slots free to predict other branches, and the single slot this sequence does consume can predict the final result, which might be easier to predict than the individual pieces. (For example, the test might be validating that a parameter is one of two valid values. The parameter is almost always valid, even though one might not be able to predict which of the two valid values it is at any particular time.) <\/p>\n<p>Fabian Giesen notes that <a HREF=\"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/20170822-00\/?p=96865#comment-1306986\">in practice, you don&#8217;t get to perform this optimization as often as you&#8217;d like<\/a> because of short-circuiting rules in many programming languages. Under those rules, this optimization works only if the second term can be evaluated without any risk of taking any exceptions (or if the language permits you to take an exception anyway, say, because any exception would be the result of undefined behavior). <\/p>\n<p>I have yet to see the Microsoft C compiler for PowerPC perform this optimization. It just does things the conventional way. But that may just be because I haven&#8217;t encountered a situation where the optimization is even possible. (Also, because I&#8217;m studying code from Windows NT 3.51, and compiler technology was not as advanced back then.) <\/p>\n<p>Okay, <a HREF=\"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/20180808-00\/?p=99445\">next time<\/a> we&#8217;ll start doing some arithmetic. <\/p>\n<p>&sup1; You might have noticed that there are only three interesting bits in <var>xer<\/var> but room for four bits in a condition register. The last bit is undefined. Usually, you don&#8217;t care much about the bits that got transferred; the main purpose of the instruction is its side effect of clearing the summary overflow. <\/p>\n<p>&sup2; These instructions are actually special cases of the <code>mtspr<\/code> and <code>mfspr<\/code> instructions which move to\/from a special register. The <var>xer<\/var> register is formally register <var>spr1<\/var>, so the <code>mtxer<\/code> and <code>mfxer<\/code> instructions are technically synthetic instructions. <\/p>\n<pre>\n    mtspr  1, ra            ; spr1 = ra\n    mfspr  1, rd            ; rd = spr1\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Keeping track of things that happened.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[2],"class_list":["post-99435","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-history"],"acf":[],"blog_post_summary":"<p>Keeping track of things that happened.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/99435","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=99435"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/99435\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=99435"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=99435"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=99435"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}