{"id":107373,"date":"2022-11-09T07:00:00","date_gmt":"2022-11-09T15:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=107373"},"modified":"2022-11-09T05:40:00","modified_gmt":"2022-11-09T13:40:00","slug":"20221109-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20221109-00\/?p=107373","title":{"rendered":"Why don&#8217;t Windows functions begin with a pointless MOV EDI,EDI instruction on x86-64?"},"content":{"rendered":"<p>Some time ago, we investigated <a href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20110921-00\/?p=9583\"> why Windows functions all begin with a pointless MOV EDI,EDI instruction<\/a>. The answer was that the instruction was used as a two-byte <code>NOP<\/code> which could be hot-patched to a jump instruction, thereby allowing certain types of security fixes to be applied to a running system. (Those which alter data structures or involve cross-process communication would not benefit from this.)<\/p>\n<p>But you may have noticed that on 64-bit Windows, these pointless instructions are gone. Is hot-patching dead?<\/p>\n<p>No, hot-patching is still alive. But on 64-bit Windows, the hot-patch point is implemented differently.<\/p>\n<p>The idea is that we don&#8217;t have to insert a pointless two-byte <code>nop<\/code> instruction into every function. If the first instruction of the function is already a two-byte instruction (or bigger), then that instruction can itself serve as the hot-patch point.<\/p>\n<p>The case where the first instruction of a function is two bytes or larger is by far the dominant one. There are only a few one-byte instructions remaining in x86-64. The ones you&#8217;re likely to encounter in user-mode compiler-generated code are<\/p>\n<table class=\"cp3\" style=\"border-collapse: collapse;\" border=\"1\" cellspacing=\"0\" cellpadding=\"3\">\n<tbody>\n<tr>\n<td><code>push r<\/code><\/td>\n<td><code>leave<\/code><\/td>\n<td><code>cwde<\/code><\/td>\n<td><code>int 3<\/code><\/td>\n<\/tr>\n<tr>\n<td><code>pop r<\/code><\/td>\n<td><code>ret<\/code><\/td>\n<td><code>cdq<\/code><\/td>\n<td><code>nop<\/code><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>where <code>r<\/code> is the 64-bit version of one of the eight named (not numbered) registers.<\/p>\n<p>Some of these instructions are not going to appear naturally at the start of a function.<\/p>\n<ul>\n<li><code>leave<\/code> doesn&#8217;t make sense because it mutates a callee-preserved register.<\/li>\n<li><code>cwde<\/code> and <code>cdq<\/code> don&#8217;t make sense because they use <code>rax<\/code> as an input register, but that register is undefined on entry to a function.<\/li>\n<li><code>nop<\/code> can just be omitted.<\/li>\n<li>Starting with a <code>pop<\/code> is disallowed by the Win32 ABI. The return address must stay on the stack.<\/li>\n<\/ul>\n<p>And then some of the instructions can be worked around if they happen to be the start of a function.<\/p>\n<ul>\n<li><code>push<\/code>: If the function pushes any registers <code>r8<\/code> or higher, those can be pushed first, since the push of a high-numbered register is a two-byte instruction. Or the instruction could be re-encoded with a redundant REX prefix <code>0x48<\/code>. Alternatively, the compiler could save the register in the home space, which uses a multi-byte <code>mov [rsp+n], r<\/code> instruction.<\/li>\n<li><code>ret<\/code>: This happens if the function is empty and returns no value. The compiler can change this to a 3-byte <code>ret 0<\/code> or a 2-byte <a href=\"https:\/\/repzret.org\/p\/repzret\/\"><code>repz ret<\/code><\/a>.<\/li>\n<\/ul>\n<p>The last remaining instruction is <code>int 3<\/code>, which is generated by the <code>__debugbreak<\/code> intrinsic.<\/p>\n<p>One option is to use the alternate two-byte encoding <code>cd 03<\/code> (<code>int nn<\/code>, with <code>nn<\/code> = 3). However, the code with the <code>__debugbreak<\/code> may be relying on it being a one-byte instruction, because it intends to patch it with a one-byte <code>nop<\/code>, or it intends to handle the breakpoint exception by stepping over the opcode by incrementing the instruction pointer.<\/p>\n<p>Instead, the compiler plays it safe and begins the function with a two-byte <code>nop<\/code>, which is encoded as if it were <code>xchg ax, ax<\/code>, and in fact the Microsoft debugger disassembles it as such.<\/p>\n<p>The pointless <code>mov edi, edi<\/code> instruction is gone. And most of the time, the compiler can juggle things so that you don&#8217;t even notice that it arranged for the first instruction of a function to be a multi-byte instruction. The only time it fails is if the first thing your function does is <code>__debugbreak<\/code>, in which case the compiler inserts a pointless <code>xchg ax, ax<\/code> instruction, also known as the two-byte <code>nop<\/code>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Applying the hot-patch in a different way.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-107373","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>Applying the hot-patch in a different way.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/107373","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=107373"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/107373\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=107373"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=107373"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=107373"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}