{"id":107059,"date":"2022-08-26T07:00:00","date_gmt":"2022-08-26T14:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=107059"},"modified":"2022-08-25T07:23:56","modified_gmt":"2022-08-25T14:23:56","slug":"20220826-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20220826-00\/?p=107059","title":{"rendered":"The AArch64 processor (aka arm64), part 23: Common patterns"},"content":{"rendered":"<p>Let&#8217;s look at some common patterns in compiler-generated code. We&#8217;ll start with a simple function call.<\/p>\n<pre>extern DWORD CreateWidget(WIDGETINFO const* info, int flags, HWIDGET* widget);\r\nextern WIDGETINFO c_info;\r\n\r\nif (CreateWidget(&amp;c_info, WidgetFlags::FailIfExists,\r\n                 &amp;widget) != NO_ERROR) ...\r\n\r\n    mov     w1, #1          ; WidgetFlags::FailIfExists\r\n    adrp    x0, unrelated_global ; top 52 bits of pointer to c_info\r\n    add     x0, x0, #0x320  ; lower 12 bits of pointer to c_info\r\n    add     x2, sp, #0x40   ; x2 -&gt; widget\r\n    bl      CreateWidget    ; call it\r\n    cbnz    w0, error       ; branch if nonzero return value\r\n<\/pre>\n<p>The parameters are loaded into the <var>x0<\/var> through <var>x2<\/var> registers, though not necessarily in that order. In this case, <var>w1<\/var> is the <var>flags<\/var> parameter, and it gets a hard-coded constant.<\/p>\n<p>The <var>info<\/var> parameter is a pointer to a global, so we use the <code>ADRP<\/code> + <code>ADD<\/code> sequence to get its address. Note that the name of the <code>c_info<\/code> variable appears nowhere in the disassembly. We just have to realize that <code>c_info<\/code> is <code>0x320<\/code> bytes after <code>unrelated_global<\/code>.<\/p>\n<p>The last parameter is a pointer to a local variable, so we calculate its address by adding the appropriate offset to the stack pointer.<\/p>\n<p>After the function returns, we branch if it returned a nonzero value in <var>w0<\/var>, which is the return value register for 32-bit integers.<\/p>\n<p>If <code>Create\u00adWidget<\/code> is a na\u00efvely-imported function, then that <code>BL<\/code> will call the import stub, which looks like this:<\/p>\n<pre>CreateWidget:\r\n    adrp        xip0, _imp_ResetDoodad\r\n    ldr         xip0, [xip0, #0x8E8]\r\n    br          xip0\r\n<\/pre>\n<p>This is an import stub that uses the <var>xip0<\/var> scratch register to look up the import address entry for <code>_imp_CreateWidget<\/code> by loading the doubleword that is <code>0x8E8<\/code> bytes after <code>_imp_ResetDoodad<\/code>. Again, since we are building the address in two parts, the actual destination variable is not visible in the disassembly.<\/p>\n<p>If the <code>Create\u00adWidget<\/code> function had been declared with <code>__declspec(dllimport)<\/code>, then the compiler would call indirectly through the import address table:<\/p>\n<pre>    <span style=\"color: #808080;\">mov     w1, #1          ; WidgetFlags::FailIfExists\r\n    adrp    x0, unrelated_global ; top 52 bits of pointer to c_info\r\n    add     x0, x0, #0x320  ; lower 12 bits of pointer to c_info\r\n    add     x2, sp, #0x40   ; x2 -&gt; widget<\/span>\r\n\r\n    adrp    x8, _imp_ResetDoodad\r\n    ldr     x8, [x8, #0x8E8] ; load CreateWidget function pointer\r\n    blr     x8              ; call it\r\n\r\n    <span style=\"color: #808080;\">cbnz    w0, error       ; branch if nonzero return value<\/span>\r\n<\/pre>\n<p>Virtual method calls also require obtaining the destination function pointer at runtime, this time from the vtable.<\/p>\n<pre>p-&gt;method(42);\r\n\r\n    ; assume x19 holds \"p\"\r\n\r\n    mov     x0, x19         ; x0 = this\r\n    ldr     x8, [x19]       ; x8 -&gt; vtable\r\n    mov     w1, #42         ; parameter 1\r\n    ldr     x8, [x8, #8]    ; load function pointer for p-&gt;method\r\n    br      x8              ; call it\r\n<\/pre>\n<p>If control flow guard is active, then there will be a call to validate the call target before using it.<\/p>\n<pre>    ldr     x8, [x19]       ; x8 -&gt; vtable\r\n    ldr     x20, [x8, #8]   ; x20 = function pointer for p-&gt;method\r\n    adrp    x8, unrelated_symbol+0x4280 ; page that contains __guard_check_icall_fptr\r\n    ldr     x8, [x8, #0x820] ; x8 -&gt; __guard_check_icall_fptr\r\n    mov     x15, x20        ; x15 = address to check\r\n    \r\n    mov     x0, x19         ; x0 = this\r\n    mov     w1, #42         ; parameter 1\r\n    br      x20             ; call the function\r\n<\/pre>\n<p>The <code>__guard_<wbr \/>check_<wbr \/>icall_<wbr \/>fptr<\/code> function uses a nonstandard calling convention: It takes the pointer to be checked in the <var>x15<\/var> register instead of <var>x0<\/var>.<\/p>\n<p>The last interesting code generation is the table-based dispatch for dense switch statements.<\/p>\n<pre>    ; switch on value in w19\r\n    cmp     w19, #9         ; beyond end of table?\r\n    bhi     do_default      ; Y: then go to default case\r\n    adr     x9, switch_table\r\n    ldrsw   x8, [x9, w19 uxtw #2] ; load offset from table\r\n    adr     x9, some_code   ; some code address in the middle of the cases\r\n    add     x8, x9, x8, lsl #2 ; move forward by this many instructions\r\n    br      x8              ; and jump there\r\n<\/pre>\n<p>First, we reject values which don&#8217;t correspond to an entry in our table. In more complex scenarios, the <code>BHI<\/code> might take us to code that tests some straggler values, or possible even tests a different jump table.<\/p>\n<p>If the value has an entry in our switch table, we use <code>ADR<\/code> to get the address of the table, which is stored in the code segment somewhere nearby (probably after the end of the function). Then we use <code>LDRSW<\/code> to load a signed word from the table, using the value in <var>w19<\/var> as an unsigned index, shifted left by 2, which makes it a word index.<\/p>\n<p>Okay, so we now have an offset loaded from the table.<\/p>\n<p>Next, we set <var>x9<\/var> to point to some code and use the offset as an instruction count (shift left by 2 since each instruction is 4 bytes) relative to the code address. That produces a new code address which we branch to.<\/p>\n<p>Depending on how much code exists in each of the cases, the jump table could be a table of bytes, halfwords, or (in this case) words.<\/p>\n<p>Sometimes the compiler is super-clever, and it puts the jump table close to the code. That way, it doesn&#8217;t need to load an anchor code address. <i>The jump table itself serves as the anchor<\/i>.<\/p>\n<pre>    ; switch on value in w19\r\n    cmp     w19, #9         ; beyond end of table?\r\n    bhi     do_default      ; Y: then go to default case\r\n    adr     x9, switch_table\r\n    ldrsw   x8, [x9, w19 uxtw #2] ; load offset from table\r\n;   don't need to reload x9\r\n    add     x8, x9, x8, lsl #2 ; move forward by this many instructions relative to table\r\n    br      x8              ; and jump there\r\n<\/pre>\n<p>In principle, the compiler could have a jump table of code pointers rather than a jump table of instruction offsets. Although it costs an extra instruction or two (to add the offset to an anchor code address), it does allow for a smaller table, since each entry is only a word, or possibly as small as a byte. It also makes the code position-independent, which means fewer relocations are needed.<\/p>\n<p>We&#8217;ll wrap up the series with the traditional line-by-line walkthrough of a simple function.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Learning to recognize various code generation patterns.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[2],"class_list":["post-107059","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-history"],"acf":[],"blog_post_summary":"<p>Learning to recognize various code generation patterns.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/107059","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=107059"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/107059\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=107059"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=107059"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=107059"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}