{"id":99545,"date":"2018-08-22T07:00:00","date_gmt":"2018-08-22T21:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/?p=99545"},"modified":"2019-03-13T00:38:34","modified_gmt":"2019-03-13T07:38:34","slug":"20180822-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20180822-00\/?p=99545","title":{"rendered":"The PowerPC 600 series, part 13: Common patterns"},"content":{"rendered":"<p>Now that we understand function calls and the table of contents, we can demonstrate some common calling sequences. If you are debugging through PowerPC code, you&#8217;ll need to be able to recognize these different types of calling sequences in order to keep your bearings. <\/p>\n<p>Non-virtual calls generally look like this:<\/p>\n<pre>\n    ; Put the parameters in r3 through r10,\n    ; and additional parameters go on the stack\n    ; after the home space (not shown here).\n    mr      r3, r30     ; parameter 1 copied from another register\n    li      r4, 1       ; parameter 2 is calculated in place\n    add     r5, r1, 32  ; parameter 3 is address of local variable\n    bl      destination ; call the function\n    nop                 ; no need to restore table of contents\n<\/pre>\n<p>The final <code>nop<\/code> may be omitted if the compiler can prove that <code>destination<\/code> is a function in the same module. If it turns out that the destination is a glue function, then the <code>nop<\/code> becomes<\/p>\n<pre>\n    lwz      r2, 4(r1)  ; restore table of contents\n<\/pre>\n<p>Virtual calls load the destination from the target&#8217;s vtable, and it&#8217;s a function pointer, so we need to prepare the destination&#8217;s table of contents as well. <\/p>\n<pre>\n    ; \"this\" passed in r3. Other parameters go\n    ; into r4 through r10, with additional parameters\n    ; on the stack after the home space (not shown here).\n    mr      r3, r30     ; parameter 1 copied from another register\n    li      r4, 1       ; parameter 2 is calculated in place\n    add     r5, r1, 32  ; parameter 3 is address of local variable\n    <font COLOR=\"blue\">lwz     r11, (r3)   ; r11 = vtable of target\n    lwz     r11, n(r11) ; r11 = function pointer from vtable\n    lwz     r12, 0(r11) ; r12 = address of code\n    lwz     r2, 4(r11)  ; load table of contents for destination\n    mtctr   r12         ; put code address into ctr\n    bctrl               ; and call it\n    lwz      r2, n(r1)  ; restore our table of contents<\/font>\n<\/pre>\n<p>I put all of the virtual dispatch code in one block of contiguous instructions, but in practice the compiler may choose to interleave it with the preparation of the function arguments to avoid data load stalls. The above example uses <var>r11<\/var> and <var>r12<\/var> as temporary registers for preparing the call, but in practice, the compiler will use any volatile register that is not being used to pass parameters.&sup1; <\/p>\n<p>A call to an imported function indirects through the import address table entry. This is made double-complicated because we have to ask the current table of contents where the import address table entry is, and then we need to set up the table of contents for the destination. <\/p>\n<pre>\n    ; Put the parameters in r3 through r10,\n    ; and additional parameters go on the stack\n    ; after the home space (not shown here).\n    mr      r3, r30     ; parameter 1 copied from another register\n    li      r4, 1       ; parameter 2 is calculated in place\n    add     r5, r1, 32  ; parameter 3 is address of local variable\n    <font COLOR=\"blue\">lwz     r11, n(r2)  ; r11 points to import address table entry\n    lwz     r11, (r11)  ; r11 = point address table entry<\/font>\n    lwz     r12, 0(r11) ; r12 = address of code\n    lwz     r2, 4(r11)  ; load table of contents for destination\n    mtctr   r12         ; put code address into ctr\n    bctrl               ; and call it\n    lwz      r2, n(r1)  ; restore our table of contents\n<\/pre>\n<p>A call to an imported function incurs several memory accesses: <\/p>\n<ol>\n<li>Loading the address of the import address table entry     from the table of contents.<\/li>\n<li>Loading the function pointer from the import address table.<\/li>\n<li>Loading the destination function&#8217;s     code pointer and table of contents from the descriptor.<\/li>\n<\/ol>\n<p>I put the last two together since they almost always come from the same cache line. The theory is that the load from the table of contents is probably also in cache, so it should be relatively cheap. (I don&#8217;t know how well this holds up in practice.) <\/p>\n<p>If the compiler sees multiple calls to the same imported function, it will often put the address of the import address table entry into a non-volatile register so it can avoid the load from the table of contents for the second and subsequent times it calls the function. <\/p>\n<p>The last interesting calling pattern for today is the jump table, commonly used for dense <code>switch<\/code> statements. Suppose we have this: <\/p>\n<pre>\n    switch (n) {\n    case 1: ...; break;\n    case 2: ...; break;\n    case 3: ...; break;\n    case 4: ...; break;\n    }\n<\/pre>\n<p>The resulting code would look like this:&sup2; <\/p>\n<pre>\n    ; jump to address based on value in r3\n    addi    r3, r3, -1          ; subtract 1\n    cmplwi  r3, 4               ; in range of the jump table?\n    bnl     default             ; nope, go to the \"case default\"\n    lwz     r12, n(r2)          ; get address of jump table\n    rlwinm  r3, r3, 2, 0, 29    ; convert to byte offset\n    lwzx    r12, r12, r3        ; load entry from jump table\n    mtctr   r12                 ; put code address into ctr\n    bctr                        ; and jump there\n<\/pre>\n<p>The jump table pattern first performs a single-comparison range check by the standard trick of offseting the control value by the lowest value in the range and using an unsigned comparison against the length of the range. Asssuming the range check passes, we have to load the address of the jump table from the table of contents, then use the adjusted value (shifted left by 2) to index into the jump table to fetch the jump destination. We then move the jump destination into <code>ctr<\/code> and jump to it. <\/p>\n<p>The compiler always codes the jump as a <code>bctr<\/code> because the processor assumes that <code>bctr<\/code> is used for computed jumps. <\/p>\n<p><a HREF=\"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/20180823-00\/?p=99555\">Next time<\/a>, we wrap up our whirlwind tour of the PowerPC 600 series by putting what we&#8217;ve learned to the test. <\/p>\n<p>&sup1; You&#8217;d think that <var>r0<\/var> would be a great choice for this purpose, but it&#8217;s not, thanks to the special rule that <var>r0<\/var> cannot be used as the base register for effective address computations. <\/p>\n<p>&sup2; At least, that&#8217;s what the result should be like. In practice, I&#8217;ve seen the compiler generate code like this:<\/p>\n<pre>\n    ; jump to address based on value in r3\n    addi    r11, r3, -1         ; r11 = value - 1\n    cmplwi  r11, 4              ; in range of the jump table?\n    bnl     default             ; nope, go to the \"case default\"\n    lwz     r12, n(r2)          ; get address of jump table\n    rlwinm  r3, r3, 2, 0, 29    ; convert original value to byte offset\n    addi    r3, r3, -4          ; apply the offset again\n    lwzx    r12, r12, r3        ; load entry from jump table\n    mtctr   r12                 ; put code address into ctr\n    bctr                        ; and jump there\n<\/pre>\n<p>The compiler goes to the work of calculating <var>r3<\/var>&nbsp;&minus;&nbsp;1 into <var>r11<\/var>, but when it comes time to look up the jump table entry, it goes back to the original value in <var>r3<\/var>, scales it up to a byte offset, and then has to perform an extra subtraction to cover for the fact that it shifted the wrong value. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>How to recognize different kinds of jumps and calls.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[2],"class_list":["post-99545","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-history"],"acf":[],"blog_post_summary":"<p>How to recognize different kinds of jumps and calls.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/99545","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=99545"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/99545\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=99545"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=99545"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=99545"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}