{"id":99495,"date":"2018-08-15T07:00:00","date_gmt":"2018-08-15T21:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/?p=99495"},"modified":"2019-03-13T00:38:22","modified_gmt":"2019-03-13T07:38:22","slug":"20180815-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20180815-00\/?p=99495","title":{"rendered":"The PowerPC 600 series, part 8: Control transfer"},"content":{"rendered":"<p>The PowerPC 600 series has a few types of control transfer instructions. Let&#8217;s look at direct branches first. <\/p>\n<pre>\n    b       target          ; branch to target\n    bl      target          ; branch to target and link\n<\/pre>\n<p>The direct branch instructions perform an unconditional relative branch to the target. It has a reach of &plusmn;32<a HREF=\"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/20090611-00\/?p=17933\">MB<\/a>. All the &#8220;&#8230; and link&#8221; instructions  set the <var>lr<\/var> register to the return address (the instruction after the branch). This happens even for conditional branches when the branch is not taken. <\/p>\n<p>There are also absolute versions of these instructions: <\/p>\n<pre>\n    ba      target          ; branch to target (absolute form)\n    bla     target          ; branch to target and link (absolute form)\n<\/pre>\n<p>The absolute versions treat the displacement as an absolute address rather than as a displacement from the current instruction pointer. These are not useful in Windows NT, but could be useful in embedded systems. <\/p>\n<p>Things get exciting when you look at the conditional branches. Formally, they are written as <\/p>\n<pre>\n    bc      BO, BI, target  ; branch conditional\n    bcl     BO, BI, target  ; branch conditional and link\n<\/pre>\n<p>Conditional branch instructions have a reach of only &plusmn;32<a HREF=\"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/20090611-00\/?p=17933\">KB<\/a>. There are also absolute variants <code>bca<\/code> and <code>bcla<\/code> which treat the displacement as an absolute address, allowing conditional branches to the top and bottom 32KB of address space. Again, absolute addressing is not that useful in Windows NT. <\/p>\n<p>The magical <var>BO<\/var> and <var>BI<\/var> parameters describe the condition to be tested. You can optionally decrement the <var>ctr<\/var> register and check if the result is zero or nonzero.&sup1; You can also optionally check if a specific bit in the <var>cr<\/var> register is set (true) or clear (false), and sometimes you can provide a static prediction hint. The following combinations are valid: <\/p>\n<table BORDER=\"1\" CELLPADDING=\"3\" CLASS=\"cp3\" STYLE=\"border-collapse: collapse\">\n<tr>\n<th>Decrement <var>ctr<\/var>?<\/th>\n<th>Test a bit in <var>cr<\/var>?<\/th>\n<th>Prediction hint<\/th>\n<th><var>BO<\/var><\/th>\n<th>Mnemonic<\/th>\n<\/tr>\n<tr>\n<td>Yes, test for nonzero<\/td>\n<td>No<\/td>\n<td><\/td>\n<td ALIGN=\"right\">16<\/td>\n<td><code>dnz<\/code><\/td>\n<\/tr>\n<tr>\n<td>Yes, test for nonzero<\/td>\n<td>No<\/td>\n<td>Not taken<\/td>\n<td ALIGN=\"right\">24<\/td>\n<td><code>dnz-<\/code><\/td>\n<\/tr>\n<tr>\n<td>Yes, test for nonzero<\/td>\n<td>No<\/td>\n<td>Taken<\/td>\n<td ALIGN=\"right\">25<\/td>\n<td><code>dnz+<\/code><\/td>\n<\/tr>\n<tr>\n<td>Yes, test for nonzero<\/td>\n<td>Test for false<\/td>\n<td><\/td>\n<td ALIGN=\"right\">0<\/td>\n<td><code>dnzf<\/code><\/td>\n<\/tr>\n<tr>\n<td>Yes, test for nonzero<\/td>\n<td>Test for true<\/td>\n<td><\/td>\n<td ALIGN=\"right\">8<\/td>\n<td><code>dnzt<\/code><\/td>\n<\/tr>\n<tr>\n<td>Yes, test for zero<\/td>\n<td>No<\/td>\n<td><\/td>\n<td ALIGN=\"right\">18<\/td>\n<td><code>dz<\/code><\/td>\n<\/tr>\n<tr>\n<td>Yes, test for zero<\/td>\n<td>No<\/td>\n<td>Not taken<\/td>\n<td ALIGN=\"right\">26<\/td>\n<td><code>dz-<\/code><\/td>\n<\/tr>\n<tr>\n<td>Yes, test for zero<\/td>\n<td>No<\/td>\n<td>Taken<\/td>\n<td ALIGN=\"right\">27<\/td>\n<td><code>dz+<\/code><\/td>\n<\/tr>\n<tr>\n<td>Yes, test for zero<\/td>\n<td>Test for true<\/td>\n<td><\/td>\n<td ALIGN=\"right\">10<\/td>\n<td><code>dzt<\/code><\/td>\n<\/tr>\n<tr>\n<td>Yes, test for zero<\/td>\n<td>Test for false<\/td>\n<td><\/td>\n<td ALIGN=\"right\">2<\/td>\n<td><code>dzf<\/code><\/td>\n<\/tr>\n<tr>\n<td>No<\/td>\n<td>Test for false<\/td>\n<td><\/td>\n<td ALIGN=\"right\">4<\/td>\n<td><code>f<\/code><\/td>\n<\/tr>\n<tr>\n<td>No<\/td>\n<td>Test for false<\/td>\n<td>Not taken<\/td>\n<td ALIGN=\"right\">6<\/td>\n<td><code>f-<\/code><\/td>\n<\/tr>\n<tr>\n<td>No<\/td>\n<td>Test for false<\/td>\n<td>Taken<\/td>\n<td ALIGN=\"right\">7<\/td>\n<td><code>f+<\/code><\/td>\n<\/tr>\n<tr>\n<td>No<\/td>\n<td>Test for true<\/td>\n<td><\/td>\n<td ALIGN=\"right\">12<\/td>\n<td><code>t<\/code><\/td>\n<\/tr>\n<tr>\n<td>No<\/td>\n<td>Test for true<\/td>\n<td>Not taken<\/td>\n<td ALIGN=\"right\">14<\/td>\n<td><code>t-<\/code><\/td>\n<\/tr>\n<tr>\n<td>No<\/td>\n<td>Test for true<\/td>\n<td>Taken<\/td>\n<td ALIGN=\"right\">15<\/td>\n<td><code>t+<\/code><\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"2\" ALIGN=\"center\">Unconditional<\/td>\n<td>Taken<\/td>\n<td ALIGN=\"right\">20<\/td>\n<td><\/td>\n<\/tr>\n<\/table>\n<p>Any <var>BO<\/var> values not in the above table are reserved for future use and should be avoided if you know what&#8217;s good for you. <\/p>\n<p>A static prediction hint overrides any internal branch prediction algorithm, so you&#8217;d better have very high confidence that your hint is correct. <\/p>\n<p>These mnemonics save you from having to memorize the <var>BO<\/var> numbers. <\/p>\n<pre>\n    b<u>xx<\/u>     BI, target  ; branch conditional\n    b<u>xx<\/u>l    BI, target  ; branch conditional and link\n<\/pre>\n<p>Except that if the mnemonic ends in a <code>+<\/code> or <code>-<\/code>, then the prediction hint goes at the very end. For example, &#8220;branch if false and link, predict not taken&#8221; is <code>bfl-<\/code>. <\/p>\n<p>The bit index <var>BI<\/var> can be written as a number, but as we saw when we learned about condition registers, you can combine the condition register bit mnemonics with with the <var>cr#<\/var> mnemonics to produce a reference to a condition bit. For example, <code>4*cr2+gt<\/code> means &#8220;The <var>gt<\/var> bit in the <var>cr2<\/var> condition register.&#8221; And since the numeric value of <var>cr0<\/var> is zero, you can omit <code>4*cr0+<\/code>, which results in some surprisingly readable results like <\/p>\n<pre>\n    bt       eq, target  ; branch if eq is set in cr0\n<\/pre>\n<p>The assembler goes one step further and provides a few combination mnemonics:&sup2; <\/p>\n<table BORDER=\"1\" CELLPADDING=\"3\" CLASS=\"cp3\" STYLE=\"border-collapse: collapse\">\n<tr>\n<th>Branch and condition<\/th>\n<th>Mnemonic<\/th>\n<th>Meaning<\/th>\n<\/tr>\n<tr>\n<td><code>bt lt<\/code><\/td>\n<td><code>blt<\/code><\/td>\n<td>Branch if less than<\/td>\n<\/tr>\n<tr>\n<td><code>bt gt<\/code><\/td>\n<td><code>bgt<\/code><\/td>\n<td>Branch if greater than<\/td>\n<\/tr>\n<tr>\n<td><code>bt eq<\/code><\/td>\n<td><code>beq<\/code><\/td>\n<td>Branch if equal<\/td>\n<\/tr>\n<tr>\n<td><code>bt so<\/code><\/td>\n<td><code>bso<\/code><\/td>\n<td>Branch if summary overflow<\/td>\n<\/tr>\n<tr>\n<td><code>bf lt<\/code><\/td>\n<td><code>bnl<\/code><\/td>\n<td>Branch if not less than<\/td>\n<\/tr>\n<tr>\n<td><code>bf gt<\/code><\/td>\n<td><code>bng<\/code><\/td>\n<td>Branch if not greater than<\/td>\n<\/tr>\n<tr>\n<td><code>bf eq<\/code><\/td>\n<td><code>bne<\/code><\/td>\n<td>Branch if not equal<\/td>\n<\/tr>\n<tr>\n<td><code>bf so<\/code><\/td>\n<td><code>bns<\/code><\/td>\n<td>Branch if not summary overflow<\/td>\n<\/tr>\n<\/table>\n<p>The mnemonics can separate the condition bit from the condition register, so you can get <\/p>\n<pre>\n    beq      cr4, target  ; branch if eq is set in cr4\n<\/pre>\n<p>Okay, the next type of branch instruction is the computed jump. <\/p>\n<pre> \n    bcctr    BO, BI, BH   ; branch conditional to address in ctr\n    bcctrl   BO, BI, BH   ; branch conditional to address in ctr and link\n\n    bclr     BO, BI, BH   ; branch conditional to address in lr\n    bclrl    BO, BI, BH   ; branch conditional to address in lr and link\n<\/pre>\n<p>You are not allowed to use any of the &#8220;decrement <var>ctr<\/var>&#8221; branch operations with the <code>bcctr<\/code> or <code>bcctrl<\/code> instructions because shame on you for even thinking about trying it. <\/p>\n<p>The <var>BO<\/var> and <var>BI<\/var> codes follow the same rules as above, and the assembler provides mnemonics for various combinations. If you go to PowerPC reference materials, you&#8217;ll see <a HREF=\"https:\/\/developer.apple.com\/library\/content\/documentation\/DeveloperTools\/Reference\/Assembler\/050-PowerPC_Addressing_Modes_and_Assembler_Instructions\/ppc_instructions.html#\/\/apple_ref\/doc\/uid\/TP30000824-TPXREF105\">horrid tables<\/a> that look like some sort of dystopian declension table from a long-forgotten Slavic language. In this hypothetical language, <code>bdnztlrl<\/code> means something like &#8220;branch on odd-numbered Thursdays,&#8221; I guess. (Okay, it actually means &#8220;<u>b<\/u>ranch, after <u>d<\/u>ecrementing <code>ctr<\/code>, if the result is <u>n<\/u>on<u>z<\/u>ero, and if the condition bit is <u>t<\/u>rue, to the address in the <code><u>lr<\/u><\/code> register, and <u>l<\/u>ink.&#8221;) <\/p>\n<p>The <var>BH<\/var> field provides a hint for branch prediction, primarily whether the branch target is likely to be the same as the previous time the branch was encountered. Branches through an import table are likely to be the same each time. Branches through a vtable could also use this hint if the method is being dispatched from the same object in a loop. (The vtable is unlikely to change during the loop.) <\/p>\n<p>The processor optimizes on the assumption that <code>bctr<\/code> is a computed jump and <code>blr<\/code> is a subroutine return,&sup3; although the <var>BH<\/var> hints can tweak those assumptions. Furthermore, Windows NT <i>requires<\/i> that non-leaf subroutine returns be encoded exclusively as <code>blr<\/code>. You are not allowed to pull fancy tricks like <code>beqlr<\/code> to perform a conditional subroutine return. This is not a significant problem in practice because there&#8217;s usually other stuff that needs to be done as part of the function epilogue. Adding this rule makes the exception unwinding code easier. <\/p>\n<p>For the same reason, the conditional versions of the &#8220;and link&#8221; branches are mostly useless in practice because even if you can conditionalize the link, you still prepared the function call unconditionally. You might have been better off just branching over the function call entirely. <\/p>\n<p>Okay, so great, you have these instructions that operate on the <var>lr<\/var> and <var>ctr<\/var> registers, but how do you actually get values in and out of them? <\/p>\n<pre>\n    mflr    rt           ; rt = lr\n    mfctr   rt           ; rt = ctr\n\n    mtlr    rs           ; lr = rs\n    mtctr   rs           ; ctr = rs\n<\/pre>\n<p>The &#8220;move from\/to <var>lr<\/var>\/<var>ctr<\/var>&#8221; instructions let you move values into and out of the <var>lr<\/var> and <var>ctr<\/var> registers. (Like <code>mfxer<\/code> and <code>mtxer<\/code>, these are actually shorthand for <code>mfspr<\/code> and <code>mtspr<\/code> with the appropriate magic number for <var>lr<\/var> or <var>ctr<\/var>.) <\/p>\n<p>In practice, the first instruction of a non-leaf function is <code>mflr r0<\/code> to save the return address, and when it&#8217;s ready to return, it will do a <code>mtlr r0<\/code> to load up the return address in preparation for the <code>blr<\/code>. This is pretty much the only thing the Microsoft compiler uses the <var>r0<\/var> register for: Transferring the return address in and out of <var>lr<\/var>. <\/p>\n<p>But wait, I&#8217;m getting ahead of myself. I promised to talk about the table of contents, so let&#8217;s do that <a HREF=\"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/20180816-00\/?p=99505\">next time<\/a>. <\/p>\n<p><b>Bonus chatter<\/b>: PowerPC mnemonics are so absurd that there was even <a HREF=\"https:\/\/twitter.com\/ppcinstructions\">a short-lived parody twitter account for them<\/a>. Now that you&#8217;ve learned most of the instructions, you may understand some of the more insidey jokes, like <\/p>\n<blockquote class=\"twitter-tweet\">\n<p lang=\"en\" dir=\"ltr\">mscdfr &#8211; Means Something Completely Different For r0<\/p>\n<p>&mdash; PowerPC Instructions (@ppcinstructions) <a href=\"https:\/\/twitter.com\/ppcinstructions\/status\/557938532401295360?ref_src=twsrc%5Etfw\">January 21, 2015<\/a><\/p><\/blockquote>\n<p>&sup1; Note that even if you loaded a 64-bit value into the <var>ctr<\/var> register (because you detected that you had a 64-bit-capable processor), the test for zero or non-zero is performed only against the least-significant 32 bits of the <var>ctr<\/var> register when the processor is in 32-bit mode (which is what Windows NT uses). <\/p>\n<p>&sup2; The assembler also provides <code>bge<\/code> (branch if greater than or equal to) as an alias for <code>bnl<\/code> (branch if not less than). I think that&#8217;s misleading, because <code>bge<\/code> suggests that the test checks two bits (<var>gt<\/var> and <var>eq<\/var>) and branches if either is set. But in fact it checks whether <var>lt<\/var> is clear. Now, if the condition register was set by a comparison, then the two cases are equivalent, but if you have been playing games with condition register flags, you can get into states where the trichotomy of numbers breaks down. <\/p>\n<p>&sup3; The return address predictor gives the processor the ability to start speculating instructions at the return address even before you move the return address into the <var>lr<\/var> register! <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Jump around.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[2],"class_list":["post-99495","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-history"],"acf":[],"blog_post_summary":"<p>Jump around.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/99495","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=99495"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/99495\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=99495"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=99495"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=99495"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}