{"id":110378,"date":"2024-10-16T07:00:00","date_gmt":"2024-10-16T14:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=110378"},"modified":"2024-10-16T13:01:37","modified_gmt":"2024-10-16T20:01:37","slug":"20241016-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20241016-00\/?p=110378","title":{"rendered":"Effects of classic return address tricks on hardware-assisted return address protection"},"content":{"rendered":"<p>The x86-32 architecture notoriously does not offer direct access to the instruction pointer, and a common trick is to use <code>call<\/code>\/<code>pop<\/code> to read the instruction pointer.<\/p>\n<pre>    ; read current address into register\r\n    call    @F\r\n@@: pop     eax     ; eax = current address\r\n<\/pre>\n<p>And since x86-64 does not offer an absolute jump instruction, it is a common trick to use a <code>push<\/code>\/<code>ret<\/code> as a substitute.<\/p>\n<pre>    ; jump to absolute address\r\n    push    0x12345678\r\n    ret             ; jump to 0x12345678\r\n<\/pre>\n<p>We learned a while back that these unmatched <code>call<\/code>\/<code>ret<\/code> pairs <a title=\"Optimization is often counter-intuitive\" href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20041216-00\/?p=36973\"> unbalance the return address predictor<\/a>\u00b9 and end up being net pessimizations.<\/p>\n<p>And we recently learned that <a title=\"A quick introduction to return address protection technologies\" href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20241015-00\/?p=110374\"> they also unbalance the hardware shadow stack<\/a>, and the consequences of that are even worse: Instead of merely damaging your performance, this code doesn&#8217;t run <i>at all<\/i> because it also unbalances the hardware shadow stack, and an improper return results in an exception.<\/p>\n<p>In the case of Windows, the kernel receives the exception and checks whether the code performing the invalid <code>ret<\/code> is marked as compatible with return address protection. If so, then any return address protection failure is considered fatal. If not, then the kernel tries to forgive the error by popping entries off the hardware shadow stack until it finds a return address that matches the one popped from the CPU stack. If no match is found, then the failure is treated as fatal.<\/p>\n<p>If you do a <code>push<\/code>\/<code>ret<\/code>, that return address you pushed is nowhere in the valid return address history, and the kernel will terminate the process.<\/p>\n<p>If you do a <code>call<\/code>\/<code>pop<\/code>, then you pushed an extra entry onto the shadow stack, and what happens next varies.<\/p>\n<p>If your function ends with a <code>ret<\/code>, then that <code>ret<\/code> will be mismatched, and the kernel notices that it occurred inside a DLL that is marked as &#8220;not CET compatible&#8221;, so the kernel will shake its head, &#8220;oh man, here&#8217;s a weirdo&#8221;, and it will look up the stack and find the true return address one entry higher.<\/p>\n<p>If your function ends with a tail call optimization that jumps to another function, then that other function&#8217;s <code>ret<\/code> will be the one that takes the mismatch exception. If that other function is in a DLL that is marked as &#8220;CET compatible&#8221;, then the kernel will say, &#8220;<a href=\"https:\/\/knowyourmeme.com\/memes\/thats-a-paddlin\">That&#8217;s a paddlin&#8217;<\/a>&#8221; and terminate the process.<\/p>\n<p>So the <code>push<\/code>\/<code>ret<\/code> pattern results in a guaranteed process termination, whereas the <code>call<\/code>\/<code>pop<\/code> might result in a process termination depending on <a href=\"https:\/\/en.wikipedia.org\/wiki\/Dirty_Harry\"> how lucky you feel<\/a>.<\/p>\n<p>(Not recommended.)<\/p>\n<p>\u00b9 It appears that this specific pattern of <code>call<\/code>\/<code>pop<\/code> <a href=\"https:\/\/blog.stuffedcow.net\/2018\/04\/ras-microbenchmarks\/#call0\"> is special-cased inside modern processors<\/a> and does not unbalance the return address predictor stack after all.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Return address manipulations that are possibly even more impermissible than they already were.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-110378","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>Return address manipulations that are possibly even more impermissible than they already were.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/110378","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=110378"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/110378\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=110378"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=110378"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=110378"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}