{"id":105355,"date":"2021-06-24T08:51:46","date_gmt":"2021-06-24T15:51:46","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=105355"},"modified":"2021-06-24T08:51:46","modified_gmt":"2021-06-24T15:51:46","slug":"20210624-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20210624-46\/?p=105355","title":{"rendered":"The ARM processor (Thumb-2), part 19: Common patterns"},"content":{"rendered":"<p>We saw some time ago how to recognize dense switch statements that use the <code>TBB<\/code> and <code>TBH<\/code> instructions. Here are some other common sequences in compiler-generated code. Note that instructions are likely to be reordered by the compiler to avoid stalls.<\/p>\n<p>A call to an imported function is an indirect function call through a global function pointer:<\/p>\n<pre>    movs    r0, #0          ; first parameter\r\n    movs    r1, #42         ; second parameter\r\n    ldr     r3, =|__imp__Function| ; address of global function pointer\r\n    ldr     r3, [r3]        ; load function pointer\r\n    blx     r3              ; call the function\r\n<\/pre>\n<p>A call to a virtual function is an indirect function call through a function pointer stored in the object&#8217;s vtable.<\/p>\n<pre>    mov     r0, r4          ; r0 = this\r\n    mov     r1, #42         ; r1 = first parameter\r\n    mov     r3, [r0]        ; r3 -&gt; vtable\r\n    ldr     r3, [r3, #4]    ; r3 = pointer to function from vtable\r\n    blx     r3              ; call the function\r\n<\/pre>\n<p>Windows components are compiled with <a href=\"https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/secbp\/control-flow-guard\"> control flow guard (CFG)<\/a>, which validates indirect jump targets, making it harder for malware to redirect indirect calls to malicious payload. Calls to virtual functions go through CFG to make it harder for an attacker to manufacture a fake vtable and trick code into calling through it. A virtual function call with CFG enabled looks like this:<\/p>\n<pre>    mov     r5, [r4]        ; r5 -&gt; vtable\r\n    mov     r5, [r5]        ; r5 = pointer to function from vtable\r\n    ldr     r3, =|__guard_check_call_fptr| ; address of pointer to CFG check function\r\n    ldr     r3, [r3]        ; r3 = pointer to CFG check function\r\n    mov     r0, r5          ; r0 = pointer to function from vtable\r\n    blx     r3              ; check the pointer in r0\r\n\r\n    mov     r0, r4          ; r0 = this\r\n    mov     r1, #42         ; r1 = first parameter\r\n    mov     r3, [r0]        ; r3 -&gt; vtable\r\n    blx     r5              ; call the pointer we validated\r\n<\/pre>\n<p>An important detail here is that we call indirectly through the same pointer we validated, rather than loading it from memory again. This avoids a TOCTTOU race condition, where the attacker swaps in a malicious function pointer after the old value is validated.<\/p>\n<p>Another common sequence is the dense switch statement, which uses the <code>TBB<\/code> and <code>TBH<\/code> instructions.<\/p>\n<pre>    cmp     r0, #8          ; beyond end of table?\r\n    bhi     default_case    ; Y: go to the default case\r\n    tbb     [pc, r0]        ; B: use jump table\r\n    dcb     4, 53, 4, 53, 93, 53, 143, 172, 205\r\ncase_0:\r\ncase_2:\r\n    ...\r\n<\/pre>\n<p>It is common to put the jump table immediately after the table branch instruction, and address it with <var>pc<\/var>, which has conveniently been moved forward four bytes, so it points at what would be the next instruction. In our example, the jump table is eight bytes long, so an entry of 4 means that we jump ahead 4 \u00d7 2 bytes, which takes us just past the jump table.<\/p>\n<p>The barrel shifter also comes in handy when performing multiword bit shifting, since you can use the barrel shifter to isolate the bits that need to move between words.<\/p>\n<pre>    ; logical right shift doubleword in (r1,r0) by N\r\n    lsrs    r0, r0, #N              ; logical shift right lower half\r\n    orr     r0, r0, r1 lsl #(32-N)  ; copy low N bits 0 of r1 to high bits of r0\r\n    lsrs    r1, r1, #N              ; logical shift right upper half\r\n<\/pre>\n<p>In pictures, we are doing this:<\/p>\n<table class=\"cp3\" style=\"border-collapse: collapse; text-align: center;\" border=\"0\" cellspacing=\"0\" cellpadding=\"3\">\n<tbody>\n<tr>\n<td rowspan=\"10\">\u00a0<\/td>\n<td colspan=\"3\">r1<\/td>\n<td>&nbsp;<\/td>\n<td colspan=\"3\">r0<\/td>\n<\/tr>\n<tr>\n<td style=\"border: solid 1px black; border-right: none; width: 5ex; background: #e0e0e0;\">\u00a0<\/td>\n<td style=\"border: solid 1px black; border-left: none; width: 17ex; background: #e0e0e0;\">\u00a0<\/td>\n<td style=\"border: solid 1px black; width: 5ex; background: #87cefa;\">\u00a0<\/td>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px black; border-right: none; width: 5ex; background: #f0f0f0;\">\u00a0<\/td>\n<td style=\"border: solid 1px black; border-left: none; width: 17ex; background: #f0f0f0;\">\u00a0<\/td>\n<td style=\"border: solid 1px black; width: 5ex; background: #808080;\">\u00a0<\/td>\n<td style=\"padding-left: 1em; text-align: left; font-size: 90%;\">Initial conditions<\/td>\n<\/tr>\n<tr>\n<td colspan=\"5\">\u00a0<\/td>\n<td>\u2b0a<\/td>\n<\/tr>\n<tr>\n<td style=\"border: solid 1px black; border-right: none; width: 5ex; background: #e0e0e0;\">\u00a0<\/td>\n<td style=\"border: solid 1px black; border-left: none; width: 17ex; background: #e0e0e0;\">\u00a0<\/td>\n<td style=\"border: solid 1px black; width: 5ex; background: #87cefa;\">\u00a0<\/td>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px black; width: 5ex; background: white;\">0<\/td>\n<td style=\"border: solid 1px black; border-right: none; width: 17ex; background: #f0f0f0;\">\u00a0<\/td>\n<td style=\"border: solid 1px black; border-left: none; width: 5ex; background: #f0f0f0;\">\u00a0<\/td>\n<td style=\"padding-left: 1em; text-align: left; font-size: 90%;\">After <code>lsrs r0, r0, #N<\/code><\/td>\n<\/tr>\n<tr>\n<td colspan=\"3\">\u00a0<\/td>\n<td>\u2b0a<\/td>\n<td>&nbsp;<\/td>\n<td>logical or<\/td>\n<\/tr>\n<tr>\n<td colspan=\"4\">\u00a0<\/td>\n<td style=\"border: solid 1px black; width: 5ex; background: #87cefa;\">\u00a0<\/td>\n<td style=\"border: solid 1px black; border-right: none; width: 17ex; background: white;\">0<\/td>\n<td style=\"border: solid 1px black; border-left: none; width: 5ex; background: white;\">\u00a0<\/td>\n<\/tr>\n<tr>\n<td colspan=\"5\">\u00a0<\/td>\n<td>\u21ca<\/td>\n<\/tr>\n<tr>\n<td style=\"border: solid 1px black; border-right: none; width: 5ex; background: #e0e0e0;\">\u00a0<\/td>\n<td style=\"border: solid 1px black; border-left: none; width: 17ex; background: #e0e0e0;\">\u00a0<\/td>\n<td style=\"border: solid 1px black; width: 5ex; background: #87cefa;\">\u00a0<\/td>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px black; width: 5ex; background: #87cefa;\">\u00a0<\/td>\n<td style=\"border: solid 1px black; border-right: none; width: 17ex; background: #f0f0f0;\">\u00a0<\/td>\n<td style=\"border: solid 1px black; border-left: none; width: 5ex; background: #f0f0f0;\">\u00a0<\/td>\n<td style=\"padding-left: 1em; text-align: left; font-size: 90%;\">After <code>orr r0, r0, r1 lsl #(32-N)<\/code><\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td style=\"border: solid 1px black; width: 5ex; background: white;\">0<\/td>\n<td style=\"border: solid 1px black; border-right: none; width: 17ex; background: #e0e0e0;\">\u00a0<\/td>\n<td style=\"border: solid 1px black; border-left: none; width: 5ex; background: #e0e0e0;\">\u00a0<\/td>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px black; width: 5ex; background: #87cefa;\">\u00a0<\/td>\n<td style=\"border: solid 1px black; border-right: none; width: 17ex; background: #f0f0f0;\">\u00a0<\/td>\n<td style=\"border: solid 1px black; border-left: none; width: 5ex; background: #f0f0f0;\">\u00a0<\/td>\n<td style=\"padding-left: 1em; text-align: left; font-size: 90%;\">After <code>lsrs r1, r1, #N<\/code><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>We shift the lower half right by <var>N<\/var> positions, which zeroes out the top <var>N<\/var> bits of the lower half. Then we use the barrel shifter to take the upper half and shift it left by 32 \u2212 <var>N<\/var> positions: This takes the lower <var>N<\/var> bits and move them to the top of the 32-bit value, clearing all the other bits. The result is then <code>orr<\/code>&#8216;d into the shifted lower half, so the net effect is that the low <var>N<\/var> bits of the upper half are copied to the upper <var>N<\/var> bits of the lower half. Finally, we shift the upper half right by <var>N<\/var> positions.<\/p>\n<p>For an arithmetic doubleword right shift, you can replace the final instruction with <code>asrs<\/code>. And an analogous three-instruction sequence works for multibit left shifts.<\/p>\n<p>Shifting by more than 32 bits is just a matter of shifting the surviving half by <var>N<\/var> \u2212 32 and either zeroing-out (for logical shifts) or sign-extending (for arithmetic right shift) the remaining bits.<\/p>\n<p>This pattern extends naturally to sizes beyond two words, though you won&#8217;t see that in compiler-generated code seeing as there are no arithmetic integer types bigger than 64 bits in 32-bit Windows.<\/p>\n<p>We&#8217;ll wrap up the series, as is traditional, with an annotated code walkthrough of a simple function.<\/p>\n<p><b>Bonus chatter<\/b>: For the special case of shifting by one position, you can take shortcuts: Start at the opposite end and rotate the carry into the other half.<\/p>\n<pre>    ; single bit right shift doubleword in (r1, r0)\r\n    lsrs    r1, r1, #1          ; logical shift right upper half\r\n    rrx     r0, r0              ; rotate shifted-out bit into high bit of lower half\r\n<\/pre>\n<p>Here it is in pictures:<\/p>\n<table class=\"cp3\" style=\"border-collapse: collapse; text-align: center;\" border=\"0\" cellspacing=\"0\" cellpadding=\"3\">\n<tbody>\n<tr>\n<td rowspan=\"6\">\u00a0<\/td>\n<td colspan=\"3\">r1<\/td>\n<td>&nbsp;<\/td>\n<td>C<\/td>\n<td>&nbsp;<\/td>\n<td colspan=\"3\">r0<\/td>\n<\/tr>\n<tr>\n<td style=\"border: solid 1px black; border-right: none; width: 1ex; background: #e0e0e0;\">\u00a0<\/td>\n<td style=\"border: solid 1px black; border-left: none; width: 25ex; background: #e0e0e0;\">\u00a0<\/td>\n<td style=\"border: solid 1px black; width: 1ex; background: #87cefa;\">\u00a0<\/td>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px black; width: 1ex;\">?<\/td>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px black; border-right: none; width: 1ex; background: #f0f0f0;\">\u00a0<\/td>\n<td style=\"border: solid 1px black; border-left: none; width: 25ex; background: #f0f0f0;\">\u00a0<\/td>\n<td style=\"border: solid 1px black; width: 1ex; background: #808080;\">\u00a0<\/td>\n<td style=\"padding-left: 1em; text-align: left; font-size: 90%;\">Initial conditions<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<td>\u2b0a<\/td>\n<td>&nbsp;<\/td>\n<td>\u2b0a<\/td>\n<\/tr>\n<tr>\n<td style=\"border: solid 1px black; width: 1ex; background: white;\">0<\/td>\n<td style=\"border: solid 1px black; border-right: none; width: 25ex; background: #e0e0e0;\">\u00a0<\/td>\n<td style=\"border: solid 1px black; border-left: none; width: 1ex; background: #e0e0e0;\">\u00a0<\/td>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px black; width: 1ex; background: #87cefa;\">\u00a0<\/td>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px black; border-right: none; width: 1ex; background: #f0f0f0;\">\u00a0<\/td>\n<td style=\"border: solid 1px black; border-left: none; width: 25ex; background: #f0f0f0;\">\u00a0<\/td>\n<td style=\"border: solid 1px black; width: 1ex; background: #808080;\">\u00a0<\/td>\n<td style=\"padding-left: 1em; text-align: left; font-size: 90%;\">After <code>lsrs r1, r1, #1<\/code><\/td>\n<\/tr>\n<tr>\n<td colspan=\"5\">\u00a0<\/td>\n<td>\u2b0a<\/td>\n<td>&nbsp;<\/td>\n<td>\u2b0a<\/td>\n<\/tr>\n<tr>\n<td style=\"border: solid 1px black; width: 1ex; background: white;\">0<\/td>\n<td style=\"border: solid 1px black; border-right: none; width: 25ex; background: #e0e0e0;\">\u00a0<\/td>\n<td style=\"border: solid 1px black; border-left: none; width: 1ex; background: #e0e0e0;\">\u00a0<\/td>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px black; width: 1ex; background: #808080;\">\u00a0<\/td>\n<td>&nbsp;<\/td>\n<td style=\"border: solid 1px black; width: 1ex; background: #87cefa;\">\u00a0<\/td>\n<td style=\"border: solid 1px black; border-right: none; width: 25ex; background: #f0f0f0;\">\u00a0<\/td>\n<td style=\"border: solid 1px black; border-left: none; width: 1ex; background: #f0f0f0;\">\u00a0<\/td>\n<td style=\"padding-left: 1em; text-align: left; font-size: 90%;\">After <code>rrx r0, r0<\/code><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The trick here is that if only one bit is being shifted out, we can hold it in the carry, and then use <code>rrx<\/code> to shift it into the high bit of the lower half. The bottom bit of the lower half ends up in the carry, ready to be rotated into the next word (if you need to shift a large array right by one bit).<\/p>\n<p>The same trick works for shifting left, using the fact that <code>adcs<\/code> can be used to perform a left rotate through carry.<\/p>\n<pre>    ; single bit left shift doubleword in (r1,r0)\r\n    adds    r0, r0, r0          ; shift left and propagate bit 31 to carry\r\n    adcs    r1, r1, r1          ; shift left and fill bottom bit with carry\r\n<\/pre>\n<p>These special-case sequences for 1-bit shifts do introduce instruction dependencies, which is bad for out-of-order execution, so compilers may avoid them for performance reasons. The results I see are inconsistent:<\/p>\n<table class=\"cp3\" style=\"border-collapse: collapse;\" border=\"1\" cellspacing=\"0\" cellpadding=\"3\">\n<tbody>\n<tr>\n<th rowspan=\"2\">Compiler<\/th>\n<th colspan=\"2\">Uses short version for 1-bit shift<\/th>\n<\/tr>\n<tr>\n<th>Right<\/th>\n<th>Left<\/th>\n<\/tr>\n<tr>\n<td>MSVC<\/td>\n<td>No<\/td>\n<td>No<\/td>\n<\/tr>\n<tr>\n<td>clang<\/td>\n<td>Yes<\/td>\n<td>No<\/td>\n<\/tr>\n<tr>\n<td>gcc<\/td>\n<td>No<\/td>\n<td>Yes<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n","protected":false},"excerpt":{"rendered":"<p>Code sequences you will learn to recognize.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[2],"class_list":["post-105355","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-history"],"acf":[],"blog_post_summary":"<p>Code sequences you will learn to recognize.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/105355","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=105355"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/105355\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=105355"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=105355"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=105355"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}