{"id":99525,"date":"2018-08-20T07:00:00","date_gmt":"2018-08-20T21:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/?p=99525"},"modified":"2019-03-13T00:38:30","modified_gmt":"2019-03-13T07:38:30","slug":"20180820-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20180820-00\/?p=99525","title":{"rendered":"The PowerPC 600 series, part 11: Glue routines"},"content":{"rendered":"<p>The PowerPC has a concept of a &#8220;glue routine&#8221;. This is a little block of code to assist with control transfer, most of the time to allow a caller in one module to call a function in another module. There are two things that make glue routines tricky: Jumping to the final target and juggling two tables of contents (the caller&#8217;s and the callee&#8217;s). <\/p>\n<p>Registers <var>r11<\/var> and <var>r12<\/var> are available to glue routines as scratch registers. You can use them in your code, but be aware that they may be trashed by a glue routine, which means in practice that they are good only until the next taken jump instruction. (We saw earlier that <var>r12<\/var> is used by prologues, but since prologues run at the start of a function, and you must have jumped there, prologues are welcome to use <var>r12<\/var> as a scratch register because any valid caller must have assumed that <var>r12<\/var> could have been trashed by a glue routine anyway.) <\/p>\n<p>Let&#8217;s take care of the easy case first: Suppose the routines share the same table of contents. This is usually the case if the caller and callee are in the same module. A glue routine may become necessary if a branch target ends up being too far away to be reached by the original branch, and the linker needs to insert a glue routine near the caller that in turn jumps to the callee. (On the Alpha AXP, <a HREF=\"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/20170807-00\/?p=96766\">this is called a <i>trampoline<\/i><\/a>.) <\/p>\n<pre>\n    bl     toofar_glue\n    ...\n\ntoofar_glue:\n    lwz    r11, n(r2)       ; r11 = original jump target (toofar)\n    mtctr  r11              ; ctr = original jump target (toofar)\n    bctr                    ; and jump to toofar\n<\/pre>\n<p><b>Exercise<\/b>: We had two choices for the register to use for the indirect jump. We could have used <var>ctr<\/var> or <var>lr<\/var>. Why did we choose <var>ctr<\/var>? <\/p>\n<p>Next is the hard part: A glue routine that needs to connect functions that may have different tables of contents. This sort of thing happens if you na&iuml;vely import a function. <\/p>\n<pre>\n    bl     toofar_glue\n    ...\n\ntoofar_glue:\n    lwz    r11, n(r2)       ; r11 = function pointer\n    lwz    r12, 0(r11)      ; r12 = code pointer\n    stw    r2, 4(r1)        ; save caller's table of contents\n    mtctr  r12              ; ctr = code for target\n    lwz    r2, 4(r11)       ; load callee's table of contents\n    bctr                    ; and jump to toofar\n<\/pre>\n<p>The inter-module glue function sets up both the code pointer and the table of contents for the destination function. But there&#8217;s the question of what to do with the old table of contents. For now, we save it in one of the reserved words on the stack, but we&#8217;re still in trouble because the callee will return back to the caller with the wrong table of contents. Oh no! <\/p>\n<p>The solution is to have the compiler leave a <code>nop<\/code> after every call that might be to a glue routine that jumps to another module. If the linker determines that the call target is indeed a glue routine, then it patches the nop to <code>lwz r2, 4(r1)<\/code> to reload the caller&#8217;s table of contents. So from the caller&#8217;s perspective, calling a glue routine looks like this: <\/p>\n<pre>\n    ; before\n    bl     toofar           ; not sure if this is a glue routine or not\n    nop                     ; so let's drop a nop here just in case\n\n    ; after the linker inserts the glue routine\n    bl     toofar_glue      ; turns out this was a glue routine after all\n    ldw    r2, 4(r1)        ; reload caller's table of contents\n<\/pre>\n<p>The system also leaves the word at <code>8(r1)<\/code> available for the runtime, but I don&#8217;t see any code actually using it.&sup1; The remaining three reserved words in the stack frame have not been assigned a purpose yet; they remain reserved. <\/p>\n<p>If the compiler can prove&sup2; that the call destination uses the same table of contents as the caller, then it can omit the <code>nop<\/code>. <\/p>\n<p>The glue code saves the table of contents at <code>4(r1)<\/code>, but the calling function may have already saved its table of contents on the stack, in which case saving the table of contents <i>again<\/i> is redundant. On the other hand, if a function does not call through any function pointers, then it doesn&#8217;t explicitly manage its table of contents because it figures the table of contents will never need to be restored. So there&#8217;s a trade-off here: Do you force every function to save its table of contents on the stack just in case it calls a glue routine (and teach the linker how to fish the table of contents back out, so it can replace the <code>nop<\/code> with the correct reload instruction)? Or do you incur an extra store at every call to a glue routine? Windows chose the latter. My guess is that glue routines are already a bit expensive, so making them marginally more expensive is better than penalizing every non-leaf function with extra work that might end up not needed after all.&sup3; <\/p>\n<p><b>Exercise<\/b>: Discuss the impact of glue routines on tail call elimination. <\/p>\n<p><a HREF=\"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/20180821-00\/?p=99535\">Next time<\/a>, we&#8217;ll look at leaf functions. <\/p>\n<p>&sup1; My guess is that intrusive code coverage\/profiling tools may use it as a place to save the <var>r11<\/var> register, thereby making <var>r11<\/var> available to increment the coverage count. But I haven&#8217;t found any PowerPC code coverage instrumented binaries to know for sure. <\/p>\n<p>&sup2; Microsoft compilers in the early 1990&#8217;s did not support link-time code generation, so the compiler can prove this only if the function being called resides in the same translation unit as the caller. <\/p>\n<p>&sup3; It&#8217;s possible to eliminate most glue routines with sufficient diligence: Explicitly mark your imported functions as <code>__declspec(dllimport)<\/code> so that they aren&#8217;t na&iuml;vely-imported any more. The only glue routines remaining would be the ones for calls to functions that are too far away. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Binding the two sides together.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[2],"class_list":["post-99525","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-history"],"acf":[],"blog_post_summary":"<p>Binding the two sides together.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/99525","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=99525"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/99525\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=99525"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=99525"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=99525"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}