{"id":105332,"date":"2021-06-22T07:00:00","date_gmt":"2021-06-22T14:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=105332"},"modified":"2021-06-22T07:21:28","modified_gmt":"2021-06-22T14:21:28","slug":"the-arm-processor-thumb-2-part-17-prologues-and-epilogues","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20210622-00\/?p=105332","title":{"rendered":"The ARM processor (Thumb-2), part 17: Prologues and epilogues"},"content":{"rendered":"<p><a href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20210621-00\/?p=105327\"> The calling convention and ABI for ARM on Windows<\/a> dictates a lot of the structure of function prologues and epilogues.<\/p>\n<p>Here&#8217;s a typical function prologue:<\/p>\n<pre>    push    {r4-r7,r11,lr}      ; save a bunch of registers\r\n    add     r11, sp, #0x10      ; link into frame pointer chain\r\n    sub     sp, sp, #0x20       ; allocate space for locals\r\n                                ; and outbound stack parameters\r\n<\/pre>\n<p>This is probably easier to explain with pictures.<\/p>\n<p>On entry, the stack looks like this:<\/p>\n<table class=\"cp3\" style=\"border-collapse: collapse;\" title=\"top of stack (pointed to by sp) is the stack parameter, higher up the stack is the previous r11 (pointed to by r11, the frame chain), followed by the return address.\" border=\"0\" cellspacing=\"0\" cellpadding=\"3\">\n<tbody>\n<tr>\n<td style=\"border: solid 1px black; border-top: none; text-align: center;\">\u00a0<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td style=\"border: solid 1px black; text-align: center;\">return address<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td style=\"border: solid 1px black; text-align: center;\">previous <var>r11<\/var><\/td>\n<td>\u2190 <var>r11<\/var> (frame chain)<\/td>\n<\/tr>\n<tr>\n<td style=\"border: solid 1px black; text-align: center;\">\u22ee<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td style=\"border: solid 1px black; text-align: center;\">stack param<\/td>\n<td>\u2190 <var>sp<\/var><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>On entry to the function, <var>lr<\/var> contains the return address. After pushing the <var>r4<\/var> through <var>r7<\/var>, <var>r11<\/var>, and <var>lr<\/var> registers, we have<\/p>\n<table class=\"cp3\" style=\"border-collapse: collapse;\" title=\"Pushed onto the previous stack are the return address, the previous r11, previous r7, previous r6, previous r5, and previous r4. sp points to the last-pushed value (previous r4)\" border=\"0\" cellspacing=\"0\" cellpadding=\"3\">\n<tbody>\n<tr>\n<td style=\"border: solid 1px black; border-top: none; text-align: center;\">\u00a0<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td style=\"border: solid 1px black; text-align: center;\">return address<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td style=\"border: solid 1px black; text-align: center;\">previous <var>r11<\/var><\/td>\n<td>\u2190 <var>r11<\/var> (frame chain)<\/td>\n<\/tr>\n<tr>\n<td style=\"border: solid 1px black; text-align: center;\">\u22ee<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td style=\"border: solid 1px black; text-align: center;\">stack param<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td style=\"border: solid 1px black; text-align: center; background-color: #b0e0e6;\">return address<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td style=\"border: solid 1px black; text-align: center; background-color: #b0e0e6;\">previous <var>r11<\/var><\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td style=\"border: solid 1px black; text-align: center; background-color: #b0e0e6;\">previous <var>r7<\/var><\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td style=\"border: solid 1px black; text-align: center; background-color: #b0e0e6;\">previous <var>r6<\/var><\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td style=\"border: solid 1px black; text-align: center; background-color: #b0e0e6;\">previous <var>r5<\/var><\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td style=\"border: solid 1px black; text-align: center; background-color: #b0e0e6;\">previous <var>r4<\/var><\/td>\n<td>\u2190 <var>sp<\/var><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The incoming <var>lr<\/var> is saved on the stack, so we know where to return to when we&#8217;re done. The incoming <var>r11<\/var> is the head of the linked list of stack frames, and we push it onto the stack so we can create a new node on the linked list. And we also push four saved registers so that they are available for us to use in the function.<\/p>\n<p>It is not a coincidence that the convention is to use <var>r11<\/var> as the frame pointer. This puts it on the stack right next to the <var>lr<\/var> register, so that the return address is right next to the frame pointer.\u00b9<\/p>\n<p>The next instruction calculates <var>r11<\/var> as <var>sp<\/var> + <code>0x10<\/code>, which makes it point to where we saved <var>r11<\/var> onto the stack. This links a new node onto the stack frame chain.<\/p>\n<table class=\"cp3\" style=\"border-collapse: collapse;\" title=\"r11 (frame chain) now points to the previous r11 on the stack (which in turn points to the previous previous r11)\" border=\"0\" cellspacing=\"0\" cellpadding=\"3\">\n<tbody>\n<tr>\n<td rowspan=\"13\">\u00a0<\/td>\n<p><!-- keep Chrome happy --><\/p>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px black; border-top: none; text-align: center;\">\u00a0<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px black; text-align: center;\">return address<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td style=\"line-height: 50%;\">\u00a0<\/td>\n<td rowspan=\"2\">\u25b7<\/td>\n<td style=\"border: solid 1px black; text-align: center;\" rowspan=\"2\">previous <var>r11<\/var><\/td>\n<td rowspan=\"2\">\u00a0<\/td>\n<\/tr>\n<tr>\n<td style=\"border: 1px black; border-style: solid none none solid; line-height: 50%;\">\u00a0<\/td>\n<\/tr>\n<tr>\n<td style=\"border-left: solid 1px black;\">\u00a0<\/td>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px black; text-align: center;\">\u22ee<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td style=\"border-left: solid 1px black;\">\u00a0<\/td>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px black; text-align: center;\">stack param<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td style=\"border-left: solid 1px black;\">\u00a0<\/td>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px black; text-align: center;\">return address<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td style=\"border: 1px black; border-style: none none solid solid; line-height: 50%;\">\u00a0<\/td>\n<td rowspan=\"2\">\u00a0<\/td>\n<td style=\"border: solid 1px black; text-align: center;\" rowspan=\"2\">previous <var>r11<\/var><\/td>\n<td rowspan=\"2\">\u2190 <var>r11<\/var> (frame chain)<\/td>\n<\/tr>\n<tr>\n<td style=\"line-height: 50%;\">\u00a0<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px black; text-align: center;\">previous <var>r7<\/var><\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px black; text-align: center;\">previous <var>r6<\/var><\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px black; text-align: center;\">previous <var>r5<\/var><\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px black; text-align: center;\">previous <var>r4<\/var><\/td>\n<td>\u2190 <var>sp<\/var><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>And the last step in the prologue is allocating additional space for local variables and outbound parameters.<\/p>\n<table class=\"cp3\" style=\"border-collapse: collapse;\" title=\"additional space for locals have been allocated on the stack, with space for outbound parameters allocated on top of it. The stack pointer (sp) points to the outbound parameters\" border=\"0\" cellspacing=\"0\" cellpadding=\"3\">\n<tbody>\n<tr>\n<td rowspan=\"15\">\u00a0<\/td>\n<p><!-- keep Chrome happy --><\/p>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px black; border-top: none; text-align: center;\">\u00a0<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px black; text-align: center;\">return address<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td style=\"line-height: 50%;\">\u00a0<\/td>\n<td rowspan=\"2\">\u25b7<\/td>\n<td style=\"border: solid 1px black; text-align: center;\" rowspan=\"2\">previous <var>r11<\/var><\/td>\n<td rowspan=\"2\">\u00a0<\/td>\n<\/tr>\n<tr>\n<td style=\"border: 1px black; border-style: solid none none solid; line-height: 50%;\">\u00a0<\/td>\n<\/tr>\n<tr>\n<td style=\"border-left: solid 1px black;\">\u00a0<\/td>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px black; text-align: center;\">\u22ee<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td style=\"border-left: solid 1px black;\">\u00a0<\/td>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px black; text-align: center;\">stack param<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td style=\"border-left: solid 1px black;\">\u00a0<\/td>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px black; text-align: center;\">return address<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td style=\"border: 1px black; border-style: none none solid solid; line-height: 50%;\">\u00a0<\/td>\n<td rowspan=\"2\">\u00a0<\/td>\n<td style=\"border: solid 1px black; text-align: center;\" rowspan=\"2\">previous <var>r11<\/var><\/td>\n<td rowspan=\"2\">\u2190 <var>r11<\/var> (frame chain)<\/td>\n<\/tr>\n<tr>\n<td style=\"line-height: 50%;\">\u00a0<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px black; text-align: center;\">previous <var>r7<\/var><\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px black; text-align: center;\">previous <var>r6<\/var><\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px black; text-align: center;\">previous <var>r5<\/var><\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px black; text-align: center;\">previous <var>r4<\/var><\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px black; text-align: center; height: 4em; background-color: #b0e0e6;\">locals<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px black; text-align: center; height: 2em; background-color: #b0e0e6;\">outbound<br \/>\nparameters<\/td>\n<td valign=\"bottom\">\u2190 <var>sp<\/var><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Windows does not require that the <var>r11<\/var> register be the head of a linked list of stack frames,\u00b2 but all Windows system components are compiled with frame pointers enabled: It makes debugging a lot easier (since the <code>k<\/code> command always produces a stack trace), and it permits automated stack tracing, such as those created by <code>xperf<\/code>. In the stack frame chain, the return address is stored immediately adjacent to the <var>r11<\/var> pointer.<\/p>\n<p>To return from the function, we run things in reverse:<\/p>\n<pre>    add     sp, sp, #0x20       ; free locals and outbound stack parameters\r\n    pop     {r4-r7,r11,pc}      ; restore registers and return\r\n<\/pre>\n<p>The <code>pop<\/code> instruction is magic.<\/p>\n<p>The obvious part of the <code>pop<\/code> instruction is restoring registers <var>r4<\/var> through <var>r7<\/var>.<\/p>\n<p>The less obvious part is that we pop the original <var>r11<\/var> back into <var>r11<\/var>, which has the effect of deleting the frame from the linked list of stack frames.<\/p>\n<p>The totally magic part is that we pop the return address (which was originally passed in <var>lr<\/var>) directly into the <var>pc<\/var> register. Writing to the <var>pc<\/var> register acts like a jump instruction, so this jumps to the return address after the work of this instruction is complete.\u00b3<\/p>\n<p>The last thing the <code>pop<\/code> instruction does is update the stack pointer, which puts it back at the location it had when control originally entered the function. And then execution resumes at the return address.<\/p>\n<p>The standard prologue looks like this:<\/p>\n<pre>    push    {...,r11,lr}        ; save registers, frame pointer, return address\r\n    add     r11, sp, #nn        ; re-establish frame chain\r\n                                ; can be \"mov r11, sp\" if only r11 and lr were pushed\r\n    vpush   {d8,...}            ; save floating point registers\r\n    sub     sp, sp, #nnn        ; create local frame\r\n<\/pre>\n<p>I call this the standard prologue because the function unwind metadata is optimized for prologues that take this form.<\/p>\n<p>Next time, we&#8217;ll look at some tweaks and optimizations to this general pattern.<\/p>\n<p>\u00b9 Now, there are two other registers in between <var>r11<\/var> and <var>lr<\/var>: We have the intraprocedure call scratch register <var>r12<\/var>, and we have the stack pointer <var>sp<\/var> (also known as <var>r13<\/var>). Fortunately, we can avoid having to push either of these two registers. The intraprocedure call scratch register is a volatile register that is not expected to be preserved, and the stack pointer is preserved either by keeping track of its value through the function (subtracting a frame on entry and adding it back on exit), or recovering it from the frame pointer. You aren&#8217;t ever tempted to push the stack pointer because you cannot reliably pop it back anyway.<\/p>\n<p>\u00b2 The documentation is a bit unclear on this. In the discussion of the integer registers, it says<\/p>\n<blockquote class=\"q\"><p>Windows uses r11 for fast-walking of the stack frame. For more information, see the Stack Walking section. Because of this requirement, r11 must point to the topmost link in the chain at all times. Do not use r11 for general purposes\u2014your code will not generate correct stack walks during analysis.<\/p><\/blockquote>\n<p>The use of the words <i>requirement<\/i>, <i>must<\/i> and <i>do not<\/i> imply that using <var>r11<\/var> as the frame pointer is mandatory.<\/p>\n<p>But then when you get to the Stack Walking section, it says<\/p>\n<blockquote class=\"q\"><p>Generally, the r11 register points to the next link in the chain, which is an {r11, lr} pair that specifies the pointer to the previous frame on the stack and the return address. We recommend that your code also enable frame pointers for improved profiling and tracing.<\/p><\/blockquote>\n<p>This time, the use of the words <i>generally<\/i> and <i>recommend<\/i> imply that using <var>r11<\/var> as the frame pointer is merely a suggestion, albeit a strong suggestion.<\/p>\n<p>I&#8217;m not sure who is right, but I&#8217;m going to assume that the use of <var>r11<\/var> as a frame pointer is <i>strongly recommended<\/i> rather than <i>required<\/i>. I&#8217;m interpreting the first paragraph by adding the underlined clarifying words:<\/p>\n<blockquote class=\"q\"><p>Windows uses r11 for fast-walking of the stack frame. For more information, see the Stack Walking section. Because of this requirement <u>in order for fast-walking to work<\/u>, r11 must point to the topmost link in the chain at all times <u>if you want fast-walking to work<\/u>. <u>If you know what&#8217;s good for you<\/u>, do not use r11 for general purposes\u2014<u>if you ignore this advice, then<\/u> your code will not generate correct stack walks during analysis.<\/p><\/blockquote>\n<p>\u00b3 It is totally not a coincidence that <var>lr<\/var> and <var>pc<\/var> are adjacent registers. This allows you to push a set of registers including <var>lr<\/var>, and then pop the same set of registers, but substituting <var>pc<\/var> for <var>lr<\/var>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Implementing the receiving end of the calling convention.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[2],"class_list":["post-105332","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-history"],"acf":[],"blog_post_summary":"<p>Implementing the receiving end of the calling convention.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/105332","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=105332"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/105332\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=105332"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=105332"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=105332"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}