{"id":90801,"date":"2015-07-29T07:00:00","date_gmt":"2015-07-29T21:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/20150729-00\/?p=90801\/"},"modified":"2019-03-13T12:17:50","modified_gmt":"2019-03-13T19:17:50","slug":"20150729-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20150729-00\/?p=90801","title":{"rendered":"The Itanium processor, part 3: The Windows calling convention, how parameters are passed"},"content":{"rendered":"<p>The calling convention on Itanium uses a variable-sized register window. The mechanism by which this is done is rather complicated, so I&#8217;m first going to present a conceptual version, and then I&#8217;ll come back and fix up some of the implementation details. For today, I&#8217;m just going to talk about how parameters are passed. There are other aspects of the calling convention that I will cover in separate articles. <\/p>\n<p>Recall that the first 32 registers <var>r0<\/var> through <var>r31<\/var> are static (do not change), and the remaining registers <var>r32<\/var> through <var>r127<\/var> are stacked. These stacked registers fall into three categories: <i>input registers<\/i>, <i>local registers<\/i>, and <i>output registers<\/i>. <\/p>\n<p>The input registers receive the function parameters. On entry to a function, the function&#8217;s parameters are received in registers starting at <var>r32<\/var> and increasing. For example, a function that takes two parameters receives the first parameter in <var>r32<\/var> and the second parameter in <var>r33<\/var>. <\/p>\n<p>Immediately after the input registers are the registers for the function&#8217;s private use. These are known as <i>local registers<\/i>. For example, if that function with two parameters also wants four registers for private use, those private registers would be <var>r34<\/var> through <var>r37<\/var>. <\/p>\n<p>After the input registers are the registers used to call other functions, known as <i>output registers<\/i>.&sup1; For example, if the function with two parameters and four local registers wants to call a function that has three parameters, it would put those parameters in registers <var>r38<\/var> through <var>r40<\/var>. Therefore, a function needs as many output registers as the maximum number of parameters of any function it calls. <\/p>\n<p>The input registers and local registers are collectively known as the <i>local region<\/i>. The input registers, local registers, and output registers are collectively known as the <i>register frame<\/i>. <\/p>\n<p>Any registers higher than the last output register are off-limits to the function, and we shall henceforth pretend they do not exist. Since the registers go up to <var>r127<\/var>, and in practice register frames are around one or two dozen registers, there end up being a lot of registers that go unused. <\/p>\n<p>The first thing a function does is notify the processor of its intended register usage. It uses the <code>alloc<\/code> instruction to say how many input registers, local registers, and output registers it needs. <\/p>\n<pre>\nalloc r35 = ar.pfs, 2, 4, 3, 0\n<\/pre>\n<p>This means, &#8220;Set up my register frame as follows: Two input registers, four local registers, three output registers, and no rotating registers. Put the previous register frame state (<var>pfs<\/var>) in register <var>r35<\/var>.&#8221; <\/p>\n<p>The second thing a function does is save the return address, typically in one of the local registers it just created. For example, the above <code>alloc<\/code> might be followed by <\/p>\n<pre>\nmov r34 = rp\n<\/pre>\n<p>On entry to a function, the <var>rp<\/var> register contains the caller&#8217;s return address, and most of the time, the compiler will save the return address in a register. Note that this means that on the Itanium, a stack buffer overrun will never overwrite a return address, since return addresses are not kept on the stack. (Let that sink in. On Itanium, return addresses <i>are not kept on the stack<\/i>. This means that tricks like <a HREF=\"http:\/\/msdn.microsoft.com\/library\/s975zw7k\"><code>_Address&shy;Of&shy;Return&shy;Address<\/code><\/a> will not work!) <\/p>\n<p>By convention, the <var>rp<\/var> and <var>ar.pfs<\/var> are saved in consecutive registers (here, <var>r34<\/var> and <var>r35<\/var>). This convention makes exception unwinding slightly easier. <\/p>\n<p>Let&#8217;s see what happens when somebody calls this function. Suppose the caller&#8217;s register frame looks like this: <\/p>\n<table BORDER=\"1\" STYLE=\"border: solid 1px black;border-collapse: collapse;text-align: center\">\n<tr>\n<td COLSPAN=\"5\" ROWSPAN=\"2\">static<\/td>\n<td ROWSPAN=\"3\" WIDTH=\"0\" BGCOLOR=\"black\"><\/td>\n<td COLSPAN=\"5\">local region<\/td>\n<td COLSPAN=\"5\" ROWSPAN=\"2\">output<\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"3\">input<\/td>\n<td COLSPAN=\"2\">local<\/td>\n<\/tr>\n<tr>\n<td STYLE=\"width: 2em\"><var>r0<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r1<\/var><\/td>\n<td STYLE=\"width: 2em\">&hellip;<\/td>\n<td STYLE=\"width: 2em\"><var>r30<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r31<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r32<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r33<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r34<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r35<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r36<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r37<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r38<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r39<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r40<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r41<\/var><\/td>\n<\/tr>\n<\/table>\n<p>The caller places the parameters to our function in its output registers, in this case <var>r37<\/var> and <var>r38<\/var>. (Our function takes only two parameters, so <var>r39<\/var> and beyond are not used.) <\/p>\n<table BORDER=\"1\" STYLE=\"border: solid 1px black;border-collapse: collapse;text-align: center\">\n<tr>\n<td COLSPAN=\"5\" ROWSPAN=\"2\">static<\/td>\n<td ROWSPAN=\"4\" WIDTH=\"0\" BGCOLOR=\"black\"><\/td>\n<td COLSPAN=\"5\">local region<\/td>\n<td COLSPAN=\"5\" ROWSPAN=\"2\">output<\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"3\">input<\/td>\n<td COLSPAN=\"2\">local<\/td>\n<\/tr>\n<tr>\n<td STYLE=\"width: 2em\"><var>r0<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r1<\/var><\/td>\n<td STYLE=\"width: 2em\">&hellip;<\/td>\n<td STYLE=\"width: 2em\"><var>r30<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r31<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r32<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r33<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r34<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r35<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r36<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r37<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r38<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r39<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r40<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r41<\/var><\/td>\n<\/tr>\n<tr>\n<td>0<\/td>\n<td>A<\/td>\n<td>&hellip;<\/td>\n<td>F<\/td>\n<td>G<\/td>\n<td>H<\/td>\n<td>I<\/td>\n<td>J<\/td>\n<td>K<\/td>\n<td>L<\/td>\n<td>M<\/td>\n<td>N<\/td>\n<td>?<\/td>\n<td>?<\/td>\n<td>?<\/td>\n<\/tr>\n<\/table>\n<p>The caller then invokes our function. <\/p>\n<p>Our function opens by performing this <code>alloc<\/code>, declaring two input registers, four local registers, and three output registers. <\/p>\n<pre>\nalloc r35 = ar.pfs, 2, 4, 3, 0\n<\/pre>\n<p>That <code>alloc<\/code> instruction shuffles the registers like this: <\/p>\n<ul>\n<li>The static registers don&#8217;t change.<\/li>\n<li>The registers in the caller&#8217;s local region are saved in a magic place.<\/li>\n<li>The specified number of output registers from the caller become     the new function&#8217;s input registers.<\/li>\n<li>New local and output registers are created but left uninitialized.<\/li>\n<li>The previous function state is placed in the specified register     (for restoration at function exit).     There are many parts of the function state, but the part we care     about is the frame state, which describes how registers are assigned.<\/li>\n<\/ul>\n<p>Here&#8217;s what the register frame looks like after all but the last steps above: <\/p>\n<table BORDER=\"1\" STYLE=\"border: solid 1px black;border-collapse: collapse;text-align: center\">\n<tr>\n<td COLSPAN=\"5\" ROWSPAN=\"2\">static<\/td>\n<td ROWSPAN=\"5\" WIDTH=\"0\" BGCOLOR=\"black\"><\/td>\n<td COLSPAN=\"6\">local region<\/td>\n<td COLSPAN=\"3\" ROWSPAN=\"2\">output<\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"2\">input<\/td>\n<td COLSPAN=\"4\">local<\/td>\n<\/tr>\n<tr>\n<td STYLE=\"width: 2em\"><var>r0<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r1<\/var><\/td>\n<td STYLE=\"width: 2em\">&hellip;<\/td>\n<td STYLE=\"width: 2em\"><var>r30<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r31<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r32<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r33<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r34<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r35<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r36<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r37<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r38<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r39<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r40<\/var><\/td>\n<\/tr>\n<tr>\n<td>0<\/td>\n<td>A<\/td>\n<td>&hellip;<\/td>\n<td>F<\/td>\n<td>G<\/td>\n<td>M<\/td>\n<td>N<\/td>\n<td>?<\/td>\n<td>?<\/td>\n<td>?<\/td>\n<td>?<\/td>\n<td>?<\/td>\n<td>?<\/td>\n<td>?<\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"5\">unchanged<\/td>\n<td COLSPAN=\"2\">moved<\/td>\n<td COLSPAN=\"7\">uninitialized<\/td>\n<\/tr>\n<\/table>\n<p>The last step (storing the previous function state in the specified register) updates the <var>r35<\/var> register: <\/p>\n<table BORDER=\"1\" STYLE=\"border: solid 1px black;border-collapse: collapse;text-align: center\">\n<tr>\n<td COLSPAN=\"5\" ROWSPAN=\"2\">static<\/td>\n<td ROWSPAN=\"4\" WIDTH=\"0\" BGCOLOR=\"black\"><\/td>\n<td COLSPAN=\"6\">local region<\/td>\n<td COLSPAN=\"3\" ROWSPAN=\"2\">output<\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"2\">input<\/td>\n<td COLSPAN=\"4\">local<\/td>\n<\/tr>\n<tr>\n<td STYLE=\"width: 2em\"><var>r0<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r1<\/var><\/td>\n<td STYLE=\"width: 2em\">&hellip;<\/td>\n<td STYLE=\"width: 2em\"><var>r30<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r31<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r32<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r33<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r34<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r35<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r36<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r37<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r38<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r39<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r40<\/var><\/td>\n<\/tr>\n<tr>\n<td>0<\/td>\n<td>A<\/td>\n<td>&hellip;<\/td>\n<td>F<\/td>\n<td>G<\/td>\n<td>M<\/td>\n<td>N<\/td>\n<td>?<\/td>\n<td>pfs<\/td>\n<td>?<\/td>\n<td>?<\/td>\n<td>?<\/td>\n<td>?<\/td>\n<td>?<\/td>\n<\/tr>\n<\/table>\n<p>The next instruction is typically one to save the return address. <\/p>\n<pre>\nmov r34 = rp\n<\/pre>\n<p>After that <code>mov<\/code> instruction, the function prologue is complete, and the register state looks like this: <\/p>\n<table BORDER=\"1\" STYLE=\"border: solid 1px black;border-collapse: collapse;text-align: center\">\n<tr>\n<td COLSPAN=\"5\" ROWSPAN=\"2\">static<\/td>\n<td ROWSPAN=\"4\" WIDTH=\"0\" BGCOLOR=\"black\"><\/td>\n<td COLSPAN=\"6\">local region<\/td>\n<td COLSPAN=\"3\" ROWSPAN=\"2\">output<\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"2\">input<\/td>\n<td COLSPAN=\"4\">local<\/td>\n<\/tr>\n<tr>\n<td STYLE=\"width: 2em\"><var>r0<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r1<\/var><\/td>\n<td STYLE=\"width: 2em\">&hellip;<\/td>\n<td STYLE=\"width: 2em\"><var>r30<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r31<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r32<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r33<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r34<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r35<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r36<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r37<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r38<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r39<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r40<\/var><\/td>\n<\/tr>\n<tr>\n<td>0<\/td>\n<td>A<\/td>\n<td>&hellip;<\/td>\n<td>F<\/td>\n<td>G<\/td>\n<td>M<\/td>\n<td>N<\/td>\n<td>ra<\/td>\n<td>pfs<\/td>\n<td>?<\/td>\n<td>?<\/td>\n<td>?<\/td>\n<td>?<\/td>\n<td>?<\/td>\n<\/tr>\n<\/table>\n<p>where <code>ra<\/code> is the function&#8217;s return address. <\/p>\n<p>At this point the function runs and does actual work. Once it&#8217;s done, its register state might look like this: <\/p>\n<table BORDER=\"1\" STYLE=\"border: solid 1px black;border-collapse: collapse;text-align: center\">\n<tr>\n<td COLSPAN=\"5\" ROWSPAN=\"2\">static<\/td>\n<td ROWSPAN=\"4\" WIDTH=\"0\" BGCOLOR=\"black\"><\/td>\n<td COLSPAN=\"6\">local region<\/td>\n<td COLSPAN=\"3\" ROWSPAN=\"2\">output<\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"2\">input<\/td>\n<td COLSPAN=\"4\">local<\/td>\n<\/tr>\n<tr>\n<td STYLE=\"width: 2em\"><var>r0<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r1<\/var><\/td>\n<td STYLE=\"width: 2em\">&hellip;<\/td>\n<td STYLE=\"width: 2em\"><var>r30<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r31<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r32<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r33<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r34<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r35<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r36<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r37<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r38<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r39<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r40<\/var><\/td>\n<\/tr>\n<tr>\n<td>0<\/td>\n<td>A&prime;<\/td>\n<td>&hellip;<\/td>\n<td>F&prime;<\/td>\n<td>G&prime;<\/td>\n<td>T<\/td>\n<td>U<\/td>\n<td>ra<\/td>\n<td>pfs<\/td>\n<td>V<\/td>\n<td>W<\/td>\n<td>X<\/td>\n<td>Y<\/td>\n<td>Z<\/td>\n<\/tr>\n<\/table>\n<p>The function epilogue typically consists of three instructions: <\/p>\n<pre>\nmov rp = r34     \/\/ prepare to return to caller\nmov ar.pfs = r35 \/\/ restore previous function state\nbr.ret rp        \/\/ return!\n<\/pre>\n<p>This sequence begins by copying the saved return address into the <var>rp<\/var> register so that we can jump back to it. (We could have copied <var>r34<\/var> into any scratch branch register, but by convention we use the <var>rp<\/var> register because it makes exception unwinding easier.) <\/p>\n<p>Next, it restores the register state from the <var>pfs<\/var> it saved at function entry. Finally, it transfers control back to the caller by jumping through the <var>rp<\/var> register. (We cannot do a <code>br.ret r34<\/code> because <code>r34<\/code> is not a branch register; the parameter to <code>br.ret<\/code> must be a branch register.) <\/p>\n<p>Restoring the previous function state causes the caller&#8217;s register frame layout to be restored, and the values of the registers in the caller&#8217;s local region are restored from that magic place. <\/p>\n<p>The register state upon return back to the caller looks like this: <\/p>\n<table BORDER=\"1\" STYLE=\"border: solid 1px black;border-collapse: collapse;text-align: center\">\n<tr>\n<td COLSPAN=\"5\" ROWSPAN=\"2\">static<\/td>\n<td ROWSPAN=\"5\" WIDTH=\"0\" BGCOLOR=\"black\"><\/td>\n<td COLSPAN=\"5\">local region<\/td>\n<td COLSPAN=\"5\" ROWSPAN=\"2\">output<\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"3\">input<\/td>\n<td COLSPAN=\"2\">local<\/td>\n<\/tr>\n<tr>\n<td STYLE=\"width: 2em\"><var>r0<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r1<\/var><\/td>\n<td STYLE=\"width: 2em\">&hellip;<\/td>\n<td STYLE=\"width: 2em\"><var>r30<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r31<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r32<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r33<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r34<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r35<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r36<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r37<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r38<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r39<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r40<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r41<\/var><\/td>\n<\/tr>\n<tr>\n<td>0<\/td>\n<td>A&prime;<\/td>\n<td>&hellip;<\/td>\n<td>F&prime;<\/td>\n<td>G&prime;<\/td>\n<td>H<\/td>\n<td>I<\/td>\n<td>J<\/td>\n<td>K<\/td>\n<td>L<\/td>\n<td>?<\/td>\n<td>?<\/td>\n<td>?<\/td>\n<td>?<\/td>\n<td>?<\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"5\">unchanged<\/td>\n<td COLSPAN=\"5\">restored<\/td>\n<td COLSPAN=\"5\">uninitialized<\/td>\n<\/tr>\n<\/table>\n<p>From the point of view of the calling function, calling another function has the following effect: <\/p>\n<ul>\n<li>Static registers are shared with the called function.     (Any changes to static registers are visible to the caller.)<\/li>\n<li>The local region is preserved across the call.<\/li>\n<li>The output registers are trashed by the call.<\/li>\n<\/ul>\n<p>At most eight parameters are passed in registers. Any additional parameters are passed on the stack, and it is the caller&#8217;s responsibility to clean them up. (The stack-based parameters begin <i>after the red zone<\/i>. We&#8217;ll talk more about the red zone later.) <\/p>\n<p>Thank goodness for the parameter cap, because a variadic function doesn&#8217;t know how many parameters were passed, so it would otherwise not know how many input parameters to declare in its <code>alloc<\/code> instruction. The parameter cap means that variadic functions <code>alloc<\/code> eight input registers, and typically the first thing they do is spill them onto the stack so that they are contiguous with any parameters beyond 8 (if any). Note that this spilling must be done very carefully to avoid <a HREF=\"http:\/\/blogs.msdn.com\/b\/oldnewthing\/archive\/2004\/01\/19\/60162.aspx\">crashing if the corresponding register does not correspond to an actual parameter but happens to be a NaT left over from a failed speculative execution<\/a>. (There is a special instruction for spilling without taking a NaT consumption exception.) <\/p>\n<p>If any parameter is smaller than 64 bits, then the unused bits of the corresponding register are garbage and should be ignored. I didn&#8217;t discuss floating point parameters or aggregates. You can <a HREF=\"http:\/\/blogs.msdn.com\/b\/oldnewthing\/archive\/2004\/01\/13\/58199.aspx#62212\">read Thiago&#8217;s comment<\/a> for a quick version, or dig into the <a HREF=\"http:\/\/www.intel.com\/content\/dam\/www\/public\/us\/en\/documents\/guides\/itanium-software-runtime-architecture-guide.pdf\"><i>Itanium Software Conventions and Runtime Architecture Guide<\/i><\/a> (Section 8.5: Parameter Passing) for gory details. <\/p>\n<p>Okay, that&#8217;s the conceptual model. The actual implementation is not quite as I described it, but the conceptual model is good enough for most debugging purposes. Here are some of the implementation details which will come in handy if you need to roll up your sleeves. <\/p>\n<p>First of all, the processor does not actually distinguish between input registers and local registers. It only cares about the local region. In other words, the parameters to the <code>alloc<\/code> instruction are <\/p>\n<ul>\n<li>Size of local region.<\/li>\n<li>Number of output registers.<\/li>\n<li>Number of rotating registers.<\/li>\n<li>Register to receive previous function state.<\/li>\n<\/ul>\n<p>When the called function established its register frame, the processor just takes all the caller&#8217;s output registers (even the ones that aren&#8217;t actually relevant to the function call) and slides them down to <var>r32<\/var>. It is the compiler&#8217;s responsibility to ensure that the code passes the correct number of parameters. Therefore, our diagram of the function call process would more accurately go like this: The caller&#8217;s register frame looks like this before the call: <\/p>\n<table BORDER=\"1\" STYLE=\"border: solid 1px black;border-collapse: collapse;text-align: center\">\n<tr>\n<td COLSPAN=\"5\" ROWSPAN=\"2\">static<\/td>\n<td ROWSPAN=\"4\" WIDTH=\"0\" BGCOLOR=\"black\"><\/td>\n<td COLSPAN=\"5\">local region<\/td>\n<td COLSPAN=\"5\" ROWSPAN=\"2\">output<\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"3\">input<\/td>\n<td COLSPAN=\"2\">local<\/td>\n<\/tr>\n<tr>\n<td STYLE=\"width: 2em\"><var>r0<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r1<\/var><\/td>\n<td STYLE=\"width: 2em\">&hellip;<\/td>\n<td STYLE=\"width: 2em\"><var>r30<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r31<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r32<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r33<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r34<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r35<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r36<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r37<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r38<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r39<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r40<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r41<\/var><\/td>\n<\/tr>\n<tr>\n<td>0<\/td>\n<td>A<\/td>\n<td>&hellip;<\/td>\n<td>F<\/td>\n<td>G<\/td>\n<td>H<\/td>\n<td>I<\/td>\n<td>J<\/td>\n<td>K<\/td>\n<td>L<\/td>\n<td>M<\/td>\n<td>N<\/td>\n<td>X&#x2081;<\/td>\n<td>X&#x2082;<\/td>\n<td>X&#x2083;<\/td>\n<\/tr>\n<\/table>\n<p>where the X values are whatever garbage values happen to be left over from previous computations, possibly even NaT. <\/p>\n<p>When the called function sets up its register frame (before storing the previous register frame), it gets this: <\/p>\n<table BORDER=\"1\" STYLE=\"border: solid 1px black;border-collapse: collapse;text-align: center\">\n<tr>\n<td COLSPAN=\"5\" ROWSPAN=\"2\">static<\/td>\n<td ROWSPAN=\"5\" WIDTH=\"0\" BGCOLOR=\"black\"><\/td>\n<td COLSPAN=\"6\">local region<\/td>\n<td COLSPAN=\"3\" ROWSPAN=\"2\">output<\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"2\">input<\/td>\n<td COLSPAN=\"4\">local<\/td>\n<\/tr>\n<tr>\n<td STYLE=\"width: 2em\"><var>r0<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r1<\/var><\/td>\n<td STYLE=\"width: 2em\">&hellip;<\/td>\n<td STYLE=\"width: 2em\"><var>r30<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r31<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r32<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r33<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r34<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r35<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r36<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r37<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r38<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r39<\/var><\/td>\n<td STYLE=\"width: 2em\"><var>r40<\/var><\/td>\n<\/tr>\n<tr>\n<td>0<\/td>\n<td>A<\/td>\n<td>&hellip;<\/td>\n<td>F<\/td>\n<td>G<\/td>\n<td>M<\/td>\n<td>N<\/td>\n<td>X&#x2081;<\/td>\n<td>X&#x2082;<\/td>\n<td>X&#x2083;<\/td>\n<td>?<\/td>\n<td>?<\/td>\n<td>?<\/td>\n<td>?<\/td>\n<\/tr>\n<tr>\n<td COLSPAN=\"5\">unchanged<\/td>\n<td COLSPAN=\"2\">moved<\/td>\n<td COLSPAN=\"7\">uninitialized<\/td>\n<\/tr>\n<\/table>\n<p>The processor took all the output registers from the caller and slid them down to <var>r32<\/var> through <var>r36<\/var>. <\/p>\n<p>Of course, the called function shouldn&#8217;t try to read from any registers beyond <var>r33<\/var>, if it knows what&#8217;s good for it, because those registers contain nothing of value and may indeed be poisoned by a NaT. <\/p>\n<p>This little implementation detail has no practical consequences because those registers were uninitialized in the conceptual model anyway. But it does mean that when you disassemble the <code>alloc<\/code> instruction, you&#8217;ll see that the distinction between input registers and local registers has been lost, and that both sets of registers are reported as input registers. In other words, an instruction written as <\/p>\n<pre>\nalloc r34 = ar.pfs, 2, 4, 3, 0\n<\/pre>\n<p>disassembles as <\/p>\n<pre>\nalloc r34 = ar.pfs, 6, 0, 3, 0\n<\/pre>\n<p>The disassembler doesn&#8217;t know how many of the six registers in the input region are input registers and how many are local, so it just treats them all as input registers. <\/p>\n<p>That explains some of the undefined registers, but what about those question marks? To solve this riddle, we need to answer a different question first: &#8220;Where is this magic place that the caller&#8217;s local region gets saved to and restored from?&#8221; <\/p>\n<p>This is where the infamous Itanium <a HREF=\"http:\/\/blogs.msdn.com\/b\/oldnewthing\/archive\/2005\/04\/21\/410397.aspx\">second stack<\/a> comes into play. <\/p>\n<p>There are two stacks on Itanium. One is indexed by the <var>sp<\/var> register and is what one generally means when one says <i>the stack<\/i>. The other stack is indexed by the <var>bsp<\/var> register (<i>backing store pointer<\/i>), and it is the magic place where these &#8220;registers from long ago&#8221; are saved. The <var>bsp<\/var> register grows <i>upward<\/i> in memory (toward higher addresses), opposite from the <var>sp<\/var> which grows downward (toward lower addresses). Windows allocates the two stacks right next to each other, Here&#8217;s <a HREF=\"http:\/\/blogs.msdn.com\/b\/slavao\/archive\/2005\/03\/19\/399117.aspx\">an artistic impression by Slava Oks<\/a>. Bear in mind that Slava drew the diagram upside-down (low addresses at the top, high addresses at the bottom). The <var>bsp<\/var> grows toward toward higher addresses, but in Slava&#8217;s diagram, that direction is downward. <\/p>\n<p>One curious implementation detail is that the two stacks abut each other without a gap. I&#8217;m told that the kernel team considered putting a no-access page between the two stacks, so that a runaway memory copy into the stack would encounter an access violation before it reached the backing store. For whatever reason, they didn&#8217;t bother. <\/p>\n<p>Now, the processor is sneaky and doesn&#8217;t actually push the values onto the backing store immediately. Instead, the processor rotates them into high-numbered unused registers (all the registers beyond the last output register), and only when it runs out of space there does it spill them into the backing store. When the function returns, the rotation is undone, and the values squirreled away into the high-numbered unused registers magically reappear in the caller&#8217;s local region. <\/p>\n<p>Each time a function is called, the registers rotate to the left, and when a function returns, the registers rotate to the right. As a result, the local regions of functions in the call stack can be found among the off-limits registers, up until we reach the last spill point. <\/p>\n<p>Suppose the call stack looks like this (most recent function at the top): <\/p>\n<pre>\na() -- current function\nb()\nc()\nd()\ne()\nf()\ng()\n<\/pre>\n<p>If we zoom out, we can see all those local regions. <\/p>\n<table BORDER=\"1\" STYLE=\"border: solid 1px black;border-collapse: collapse;text-align: center\">\n<tr>\n<td VALIGN=\"bottom\" ROWSPAN=\"2\">static<\/td>\n<td ROWSPAN=\"3\" WIDTH=\"0\" BGCOLOR=\"black\"><\/td>\n<td COLSPAN=\"2\">a<\/td>\n<td ROWSPAN=\"2\">open<\/td>\n<td>g<\/td>\n<td>f<\/td>\n<td>e<\/td>\n<td>d<\/td>\n<td>c<\/td>\n<td>b<\/td>\n<\/tr>\n<tr>\n<td>LR<\/td>\n<td>O<\/td>\n<td>LR<\/td>\n<td>LR<\/td>\n<td>LR<\/td>\n<td>LR<\/td>\n<td>LR<\/td>\n<td>LR<\/td>\n<\/tr>\n<tr>\n<td BGCOLOR=\"#ffbbff\">&bull;&bull;&bull;&bull;&bull;&bull;<\/td>\n<td BGCOLOR=\"#C0FF97\">&bull;&bull;&bull;&bull;&bull;<\/td>\n<td BGCOLOR=\"#C0FF97\">&bull;&bull;&bull;<\/td>\n<td>&bull;&bull;&bull;&bull;&bull;&bull;&bull;&bull;&bull;&bull;&bull;&bull;&bull;&bull;<\/td>\n<td BGCOLOR=\"#FFBBBB\">&bull;&bull;&bull;&bull;&bull;<\/td>\n<td BGCOLOR=\"#FFFF99\">&bull;&bull;&bull;&bull;&bull;<\/td>\n<td BGCOLOR=\"#ACF3FD\">&bull;&bull;&bull;&bull;&bull;&bull;<\/td>\n<td BGCOLOR=\"#DEB19E\">&bull;&bull;&bull;&bull;&bull;&bull;<\/td>\n<td BGCOLOR=\"#A8A8FF\">&bull;&bull;&bull;&bull;<\/td>\n<td BGCOLOR=\"#A5D3CA\">&bull;&bull;&bull;&bull;&bull;&bull;<\/td>\n<\/tr>\n<\/table>\n<p>Why don&#8217;t we see any output registers for any functions other than the current one? You know why: Because at each function call, the caller&#8217;s output registers become the called function&#8217;s input registers. If you really wanted to draw the output registers, you could do it like this, where each function&#8217;s input registers is shared with the caller&#8217;s output registers. <\/p>\n<table BORDER=\"1\" STYLE=\"border: solid 1px black;border-collapse: collapse;text-align: center\">\n<tr>\n<td VALIGN=\"bottom\" ROWSPAN=\"2\">static<\/td>\n<td ROWSPAN=\"5\" WIDTH=\"0\" BGCOLOR=\"black\"><\/td>\n<td COLSPAN=\"3\">a<\/td>\n<td ROWSPAN=\"2\">open<\/td>\n<td COLSPAN=\"3\">g<\/td>\n<td STYLE=\"border-bottom: solid transparent 1px\"><\/td>\n<td COLSPAN=\"3\">e<\/td>\n<td STYLE=\"border-bottom: solid transparent 1px\"><\/td>\n<td COLSPAN=\"3\">c<\/td>\n<td STYLE=\"border-bottom: solid transparent 1px\"><\/td>\n<\/tr>\n<tr>\n<td>I<\/td>\n<td>L<\/td>\n<td>O<\/td>\n<p>     <!-- open -->    <\/p>\n<td>I<\/td>\n<td>L<\/td>\n<td>O<\/td>\n<td STYLE=\"border-top: solid transparent 1px\"><\/td>\n<td>I<\/td>\n<td>L<\/td>\n<td>O<\/td>\n<td STYLE=\"border-top: solid transparent 1px\"><\/td>\n<td>I<\/td>\n<td>L<\/td>\n<td>O<\/td>\n<td STYLE=\"border-top: solid transparent 1px\"><\/td>\n<\/tr>\n<tr>\n<td BGCOLOR=\"#ffbbff\">&bull;&bull;&bull;&bull;&bull;&bull;<\/td>\n<td BGCOLOR=\"#C0FF97\">&bull;&bull;<\/td>\n<td BGCOLOR=\"#C0FF97\">&bull;&bull;&bull;<\/td>\n<td BGCOLOR=\"#C0FF97\">&bull;&bull;&bull;<\/td>\n<td>&bull;&bull;&bull;&bull;&bull;&bull;&bull;&bull;&bull;&bull;&bull;&bull;&bull;&bull;<\/td>\n<td BGCOLOR=\"#FFBBBB\">&bull;&bull;<\/td>\n<td BGCOLOR=\"#FFBBBB\">&bull;&bull;&bull;<\/td>\n<td BGCOLOR=\"#FFFF99\">&bull;&bull;<\/td>\n<td BGCOLOR=\"#FFFF99\">&bull;&bull;&bull;<\/td>\n<td BGCOLOR=\"#ACF3FD\">&bull;&bull;&bull;<\/td>\n<td BGCOLOR=\"#ACF3FD\">&bull;&bull;&bull;<\/td>\n<td BGCOLOR=\"#DEB19E\">&bull;&bull;&bull;<\/td>\n<td BGCOLOR=\"#DEB19E\">&bull;&bull;&bull;<\/td>\n<td BGCOLOR=\"#A8A8FF\">&bull;&bull;<\/td>\n<td BGCOLOR=\"#A8A8FF\">&bull;&bull;<\/td>\n<td BGCOLOR=\"#A5D3CA\">&bull;&bull;&bull;<\/td>\n<td BGCOLOR=\"#A5D3CA\">&bull;&bull;&bull;<\/td>\n<\/tr>\n<tr>\n<td ROWSPAN=\"2\"><\/td>\n<td>O<\/td>\n<td ROWSPAN=\"2\" COLSPAN=\"5\"><\/td>\n<td>I<\/td>\n<td>L<\/td>\n<td>O<\/td>\n<td STYLE=\"border-bottom: solid transparent 1px\"><\/td>\n<td>I<\/td>\n<td>L<\/td>\n<td>O<\/td>\n<td STYLE=\"border-bottom: solid transparent 1px\"><\/td>\n<td>I<\/td>\n<td>L<\/td>\n<\/tr>\n<tr>\n<td>b<\/td>\n<td COLSPAN=\"3\">f<\/td>\n<td STYLE=\"border-top: solid transparent 1px\"><\/td>\n<td COLSPAN=\"3\">d<\/td>\n<td STYLE=\"border-top: solid transparent 1px\"><\/td>\n<td COLSPAN=\"2\">b<\/td>\n<\/table>\n<p>But we won&#8217;t bother drawing this exploded view any more. <\/p>\n<p>Now, if the function <code>a<\/code> calls another function <code>x<\/code>, then all the registers rotate left, with <code>a<\/code>&#8216;s local region wrapping around to the end of the list: <\/p>\n<table BORDER=\"1\" STYLE=\"border: solid 1px black;border-collapse: collapse;text-align: center\">\n<tr>\n<td VALIGN=\"bottom\" ROWSPAN=\"2\">static<\/td>\n<td ROWSPAN=\"3\" WIDTH=\"0\" BGCOLOR=\"black\"><\/td>\n<td COLSPAN=\"2\">x<\/td>\n<td ROWSPAN=\"2\">open<\/td>\n<td>g<\/td>\n<td>f<\/td>\n<td>e<\/td>\n<td>d<\/td>\n<td>c<\/td>\n<td>b<\/td>\n<td>a<\/td>\n<\/tr>\n<tr>\n<td>LR<\/td>\n<td>O<\/td>\n<td>LR<\/td>\n<td>LR<\/td>\n<td>LR<\/td>\n<td>LR<\/td>\n<td>LR<\/td>\n<td>LR<\/td>\n<td>LR<\/td>\n<\/tr>\n<tr>\n<td BGCOLOR=\"#ffbbff\">&bull;&bull;&bull;&bull;&bull;&bull;<\/td>\n<td BGCOLOR=\"#E6DBFF\">&bull;&bull;&bull;<\/td>\n<td BGCOLOR=\"#E6DBFF\">&bull;&bull;&bull;&bull;<\/td>\n<td>&bull;&bull;&bull;&bull;&bull;&bull;&bull;&bull;&bull;&bull;<\/td>\n<td BGCOLOR=\"#FFBBBB\">&bull;&bull;&bull;&bull;&bull;<\/td>\n<td BGCOLOR=\"#FFFF99\">&bull;&bull;&bull;&bull;&bull;<\/td>\n<td BGCOLOR=\"#ACF3FD\">&bull;&bull;&bull;&bull;&bull;&bull;<\/td>\n<td BGCOLOR=\"#DEB19E\">&bull;&bull;&bull;&bull;&bull;&bull;<\/td>\n<td BGCOLOR=\"#A8A8FF\">&bull;&bull;&bull;&bull;<\/td>\n<td BGCOLOR=\"#A5D3CA\">&bull;&bull;&bull;&bull;&bull;&bull;<\/td>\n<td BGCOLOR=\"#C0FF97\">&bull;&bull;&bull;&bull;&bull;<\/td>\n<\/tr>\n<\/table>\n<p>And when <code>x<\/code> returns, the registers rotate right, bringing us back to <\/p>\n<table BORDER=\"1\" STYLE=\"border: solid 1px black;border-collapse: collapse;text-align: center\">\n<tr>\n<td VALIGN=\"bottom\" ROWSPAN=\"2\">static<\/td>\n<td ROWSPAN=\"3\" WIDTH=\"0\" BGCOLOR=\"black\"><\/td>\n<td COLSPAN=\"2\">a<\/td>\n<td ROWSPAN=\"2\">open<\/td>\n<td>g<\/td>\n<td>f<\/td>\n<td>e<\/td>\n<td>d<\/td>\n<td>c<\/td>\n<td>b<\/td>\n<\/tr>\n<tr>\n<td>LR<\/td>\n<td>O<\/td>\n<td>LR<\/td>\n<td>LR<\/td>\n<td>LR<\/td>\n<td>LR<\/td>\n<td>LR<\/td>\n<td>LR<\/td>\n<\/tr>\n<tr>\n<td BGCOLOR=\"#ffbbff\">&bull;&bull;&bull;&bull;&bull;&bull;<\/td>\n<td BGCOLOR=\"#C0FF97\">&bull;&bull;&bull;&bull;&bull;<\/td>\n<td BGCOLOR=\"#C0FF97\">&bull;&bull;&bull;<\/td>\n<td>&bull;&bull;&bull;&bull;&bull;&bull;&bull;&bull;&bull;&bull;&bull;&bull;&bull;&bull;<\/td>\n<td BGCOLOR=\"#FFBBBB\">&bull;&bull;&bull;&bull;&bull;<\/td>\n<td BGCOLOR=\"#FFFF99\">&bull;&bull;&bull;&bull;&bull;<\/td>\n<td BGCOLOR=\"#ACF3FD\">&bull;&bull;&bull;&bull;&bull;&bull;<\/td>\n<td BGCOLOR=\"#DEB19E\">&bull;&bull;&bull;&bull;&bull;&bull;<\/td>\n<td BGCOLOR=\"#A8A8FF\">&bull;&bull;&bull;&bull;<\/td>\n<td BGCOLOR=\"#A5D3CA\">&bull;&bull;&bull;&bull;&bull;&bull;<\/td>\n<\/tr>\n<\/table>\n<p>Note that the conceptual model doesn&#8217;t care about this implementation detail. In theory, future versions of the Itanium processor might have additional &#8220;bonus registers&#8221; after <var>r127<\/var> which are programmatically inaccessible but which are used to expand the number of register frames that can be held before needing to spill. <\/p>\n<p>With this additional information, you now can see the contents of those undefined registers on entry to a function: They contain whatever garbage happened to be left over in the open registers. Similarly, the contents of those undefined output registers after the function returns to its caller are the leftover values in the called function&#8217;s local region. <\/p>\n<p>You can also see the contents of the uninitialized output registers on return from a function: They contain whatever garbage happened to be left over in the called function&#8217;s input registers. This behavior is actually documented by the processor, so in theory somebody could invent a calling convention where information is passed from a function back to its caller through the input registers, say, for a language that supports functions with multiple return values. (In other words, the input registers are actually in\/out registers.) The Windows calling convention doesn&#8217;t use this feature, however. <\/p>\n<p>It so happens that the debugger forces a full spill into the backing store when it gains control. This is useful, because groveling into the backing store gives you a way to see the local regions of any function on the stack. <\/p>\n<pre>\nkd&gt; r\n...\n      r32 =      6fbffd21130 0        r33 =          1170065 0\n      r34 =      6fbffd23700 0        r35 =                8 0\n      r36 =      6fbffd21338 0        r37 =            20000 0\n      r38 =             8000 0        r39 =             2000 0\n      r40 =              800 0        r41 =              400 0\n      r42 =              100 0        r43 =               80 0\n      r44 =              200 0        r45 =            10000 0\n      r46 =         7546fdf0 0        r47 = c000000000000693 0\n      r48 =             5041 0        r49 =         75ab0000 0\n      r50 =      6fbffd21130 0        r51 =          1170065 0\n      r52 =      6fbfc79f770 0        r53 =         7546cbe0 0\nkd&gt; dq @bsp\n000006fb`fc7a02e0  000006fb`ffd21130 00000000`01170065 \/\/ r32 and r33\n000006fb`fc7a02f0  000006fb`ffd23700 00000000`00000008 \/\/ r34 and r35\n000006fb`fc7a0300  000006fb`ffd21338 00000000`00020000 \/\/ r36 and r37\n000006fb`fc7a0310  00000000`00008000 00000000`00002000 \/\/ r38 and r39\n000006fb`fc7a0320  00000000`00000800 00000000`00000400 \/\/ r40 and r41\n000006fb`fc7a0330  00000000`00000100 00000000`00000080 \/\/ r42 and r43\n000006fb`fc7a0340  00000000`00000200 00000000`00010000 \/\/ r44 and r45\n000006fb`fc7a0350  00000000`7546fdf0 c0000000`00000693 \/\/ r46 and r47\n<\/pre>\n<p>But wait, ia64 integer registers are 65 bits wide, not 64. The extra bit is the NaT bit. Where did that go? <\/p>\n<p>Whenever the <var>bsp<\/var> hits a 512-byte boundary (<var>bsp<\/var> &amp; 0x1F8 == 0x1F8, or after 63 registers have been spilled), the value spilled into the backing store is not a 64-bit register but rather the accumulated NaT bits. You are not normally interested in the NaT bits, so the only practical consequence of this is that you have to remember to skip an entry whenever you hit a 512-byte boundary. <\/p>\n<p>Suppose we wanted to look at our caller&#8217;s local region. Here&#8217;s the start of a sample function. Don&#8217;t worry about most of the instructions, just pay attention to the <code>alloc<\/code> and the <code>mov ... = rp<\/code>. <\/p>\n<pre>\nSAMPLE!.Sample:\n       <b>alloc    r47 = ar.pfs, 013h, 00h, 04h, 00h<\/b>\n       mov      r48 = pr\n       addl     r31 = -2004312, gp\n       adds     sp = -1072, sp ;;\n       ld8.nta  r3 = [sp]\n       <b>mov      r46 = rp<\/b>\n       adds     r36 = 0208h, r32\n       or       r49 = gp, r0 ;;\n<\/pre>\n<p>Suppose you hit a breakpoint partway through this function, and you want to know why the caller passed a strange value for the first input parameter <var>r32<\/var>. <\/p>\n<p>From reading the function prologue, you see that the return address is kept in <var>r46<\/var>, so you can disassemble there to see how your caller set up its output parameters: <\/p>\n<pre>\nkd&gt; u @r46-20\nSAMPLE!.Caller+2bd0:\n       ld8      r47 = [r32]\n       ld4      r46 = [r33]\n       or       r45 = r35, r0\n       nop.b    00h\n       nop.b    00h\n       br.call.sptk.many  rp = SAMPLE!.Sample\n<\/pre>\n<p>(Notice the <code>nop<\/code> instructions which suggest that this is unoptimized code.) <\/p>\n<p>But we don&#8217;t know which of those registers are the output registers of the caller. For that, we need to know the register frame of the caller. We see from the <code>alloc<\/code> instruction that the previous function state (<code>pfs<\/code>) was saved in the <var>r47<\/var> register. <\/p>\n<pre>\nkd&gt; ?@r47\nEvaluate expression: -4611686018427386221 = c0000000`00000693\n<\/pre>\n<p>This value is not easy to parse. The bottom seven bits record the total size of the caller&#8217;s register frame, which includes both the local region and the output registers. The size of the local region is kept in bits 7 through 13, which is a bit tricky to extract by eye. You take the third and fourth digits from the right, double the value, and add one more if the second digit from the right is 8 or higher. This is easier to do than to explain: <\/p>\n<ul>\n<li>The third- and fourth-to-last digits are <code>06<\/code> hex.<\/li>\n<li>Double that, and you get 12 (decimal).<\/li>\n<li>Since the second-to-last digit is 9, add one more.<\/li>\n<li>Result: 13.<\/li>\n<\/ul>\n<p>The previous function&#8217;s local region has 13 registers. Therefore, the previous function&#8217;s output registers begin at 32 + 13 = 45. (You can also see that the previous function had 0x13 = 19 registers in its register frame, and you can therefore infer that it had 19 &minus; 13 = 6 output registers.) <\/p>\n<p>Applying this information to the disassembly of the caller, we see that the caller passed <\/p>\n<ul>\n<li>first output register <var>r45<\/var> = <var>r35<\/var>.     (Recall that the <var>r0<\/var> register is always zero,     so or&#8217;ing it with another value just copies that other value.)<\/li>\n<li>second output register <var>r46<\/var> = 4-byte value stored at [<var>r33<\/var>]<\/li>\n<li>third output register <var>r47<\/var> = 8-byte value stored at [<var>r32<\/var>]<\/li>\n<\/ul>\n<p>That first output register was a copy of the <var>r35<\/var> register. We can grovel through the backing store to see what that value is. <\/p>\n<pre>\n0:000&gt; dq @bsp-0n13*8 l4\n000006fb`ffe906d8  00000000`4b1e9720 00000000`4b1ea2e8     \/\/ r32 and r33\n000006fb`ffe906e8  00000000`0114a7c0 000006fb`fe728cac     \/\/ r34 and r35\n<\/pre>\n<p>And now we have extracted the registers from our caller&#8217;s local region. Specifically, we see that the caller&#8217;s <var>r35<\/var> is <code>000006fb`fe728cac<\/code>. <\/p>\n<p>We can extend this technique to grovel even further back in the stack. To do that, we need to obtain the <var>pfs<\/var> chain so we can see the structure of the register frame for each function in the call stack. <\/p>\n<p>From the disassembly above, we saw that the caller was kept in <var>r46<\/var>. To go back another level, we need to find that caller&#8217;s caller. We merely repeat the exercise, but with the caller. Sometimes it can be hard to find the start of a function (especially if you don&#8217;t have symbols); it can be easier to look for the <i>end<\/i> of the function instead! Instead of looking for the <code>alloc<\/code> and <code>mov ... = rp<\/code> instructions which save the previous function state and return address, we look for the <code>mov ar.pfs = ...<\/code> and <code>mov rp = ...<\/code> instructions which restore them. <\/p>\n<p>Here&#8217;s an example of a stack trace I had to reconstruct: <\/p>\n<pre>\n0:000&gt; u\n00000000`4b17e9d4       mov      <b>rp = r37<\/b>              \/\/ return address\n00000000`4b17e9e4       mov.i    <b>ar.pfs = r38<\/b>          \/\/ restore pfs\n00000000`4b17e9e8       br.ret.sptk.many  rp ;;        \/\/ return to caller\n0:000&gt; dq @bsp\n000006fb`ffe90758  000006fb`fe761cc0 000006fb`ffe8f860 \/\/ r32 and r33\n000006fb`ffe90768  000006fb`ffe8fa70 00000000`00000104 \/\/ r34 and r35\n000006fb`ffe90778  00000000`0114a7c0 <b>00000000`4b1b6890<\/b> \/\/ r36 and r37\n000006fb`ffe90788  <b>c0000000`0000<u>05<\/u>0e<\/b> 00000000`00005001 \/\/ r38 and r39\n<\/pre>\n<p>Double the <code>05<\/code> to get 10 (decimal), and don&#8217;t add one since the next digit (<code>0<\/code>) is less than 8. The previous function therefore has 10 registers in its local region. <\/p>\n<p>The current function&#8217;s return address is kept in <var>r37<\/var> and the <var>pfs<\/var> in <var>r38<\/var>. I&#8217;ve highlighted them in the <var>bsp<\/var> dump. <\/p>\n<p>Let&#8217;s disassemble at the return address and dump that function&#8217;s local variables, thereby walking back one level in the call stack. <\/p>\n<pre>\n0:000&gt; u 00000000`4b1b6890\n...\n00000000`4b1b6bd4       mov      <b>rp = r38<\/b> ;;           \/\/ return address\n00000000`4b1b6be4       mov.i    <b>ar.pfs = r39<\/b>          \/\/ restore pfs\n00000000`4b1b6be8       br.ret.sptk.many  rp ;;\n\/\/ we calculated that the local region of the previous function is size 0xA\n0:000&gt; dq @bsp-a*8 la\n000006fb`ffe90708  000006fb`fe73bfc0 000006fb`fe73ff10     \/\/ r32 and r33\n000006fb`ffe90718  00000000`00000000 000006fb`ffe8f850     \/\/ r34 and r35\n000006fb`ffe90728  000006fb`ffe8f858 00000000`00000000     \/\/ r36 and r37\n000006fb`ffe90738  <b>00000000`4b1e9350 c0000000`00000308<\/b>     \/\/ r38 and r39\n000006fb`ffe90748  00000000`00009001 00000000`4b57e000     \/\/ r40 and r41\n<\/pre>\n<p>By studying the value in the caller&#8217;s <var>r39<\/var>, we see that the caller&#8217;s caller has 3 &times; 2 + 0 = 6 registers in its local region. And the caller&#8217;s <var>r38<\/var> gives us the return address. Let&#8217;s walk back another frame in the call stack. <\/p>\n<pre>\n0:000&gt; u 4b1e9350\n...\n00000000`4b1e9354       mov      <b>rp = r34<\/b>              \/\/ return address\n00000000`4b1e9368       mov.i    <b>ar.pfs = r35<\/b>          \/\/ restore pfs\n00000000`4b1e9378       br.ret.sptk.many  rp ;;\n0:000&gt; dq @bsp-a*8-6*8 l6\n000006fb`ffe906d8  00000000`0114a7c0 000006fb`fe728cac     \/\/ r32 and r33\n000006fb`ffe906e8  <b>00000000`4b1e9720 c0000000`00000389<\/b>     \/\/ r34 and r35\n000006fb`ffe906f8  00000000`00009001 00000000`4b57e000     \/\/ r36 and r37\n<\/pre>\n<p>This time, the return address is in <var>r34<\/var> and the previous <var>pfs<\/var> is in <var>r35<\/var>. This time, the caller&#8217;s caller&#8217;s caller has 3 &times; 2 + 1 = 7 registers in its local region. <\/p>\n<pre>\n0:000&gt; u 4b1e9720\n...\n00000000`4b1e9784       mov      <b>rp = r35<\/b>             \/\/ return address\n00000000`4b1e9788       adds     sp = 010h, sp ;;\n00000000`4b1e9790       nop.m    00h\n00000000`4b1e9794       mov      pr = r37, -2 ;;\n00000000`4b1e9798       mov.i    <b>ar.pfs = r36<\/b>         \/\/ restore pfs\n00000000`4b1e97a0       nop.m    00h\n00000000`4b1e97a4       nop.f    00h\n00000000`4b1e97a8       br.ret.sptk.many  rp ;;\n0:000&gt; dq @bsp-a*8-6*8-7*8 l7\n000006fb`ffe906a0  00000000`0114a7c0 00000000`00000000    \/\/ r32 and r33\n000006fb`ffe906b0  00000000`0114a900 <b>00000000`4b19ba00<\/b>    \/\/ r34 and r35\n000006fb`ffe906c0  <b>c0000000`0000058f<\/b> 00000000`00009001    \/\/ r36 and r37\n000006fb`ffe906d0  00000000`4b57e000                      \/\/ r38\n<\/pre>\n<p>This function also allocates 0x10 bytes from the stack, so if you want to see its stack variables, you can dump the values at <var>sp + 0x10<\/var> for length 0x10. The <code>+ 0x10<\/code> is to skip over the red zone. <\/p>\n<p>Anyway, that&#8217;s the way to reconstruct the call stack on an Itanium. Repeat until bored. <\/p>\n<p>Maybe you can spot the fast one I pulled when discussing how the <code>alloc<\/code> instruction and <var>pfs<\/var> register work. More details <a HREF=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/\">next time<\/a>, when we discuss leaf functions and the red zone. <\/p>\n<p><b>Bonus chapter<\/b>: <a HREF=\"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/20150730-01\/?p=90781\">How does spilling actually work<\/a>? <\/p>\n<p>&sup1; When not preparing to call another function, the output registers can be used for any purpose, with the understanding that the values will not be preserved across a function call. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Slide on over.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[26],"class_list":["post-90801","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-other"],"acf":[],"blog_post_summary":"<p>Slide on over.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/90801","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=90801"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/90801\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=90801"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=90801"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=90801"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}