November 9th, 2022

5 reactions

Why don’t Windows functions begin with a pointless MOV EDI,EDI instruction on x86-64?

Raymond Chen

Some time ago, we investigated why Windows functions all begin with a pointless MOV EDI,EDI instruction. The answer was that the instruction was used as a two-byte NOP which could be hot-patched to a jump instruction, thereby allowing certain types of security fixes to be applied to a running system. (Those which alter data structures or involve cross-process communication would not benefit from this.)

But you may have noticed that on 64-bit Windows, these pointless instructions are gone. Is hot-patching dead?

No, hot-patching is still alive. But on 64-bit Windows, the hot-patch point is implemented differently.

The idea is that we don’t have to insert a pointless two-byte nop instruction into every function. If the first instruction of the function is already a two-byte instruction (or bigger), then that instruction can itself serve as the hot-patch point.

The case where the first instruction of a function is two bytes or larger is by far the dominant one. There are only a few one-byte instructions remaining in x86-64. The ones you’re likely to encounter in user-mode compiler-generated code are

`push r`	`leave`	`cwde`	`int 3`
`pop r`	`ret`	`cdq`	`nop`

where r is the 64-bit version of one of the eight named (not numbered) registers.

Some of these instructions are not going to appear naturally at the start of a function.

leave doesn’t make sense because it mutates a callee-preserved register.
cwde and cdq don’t make sense because they use rax as an input register, but that register is undefined on entry to a function.
nop can just be omitted.
Starting with a pop is disallowed by the Win32 ABI. The return address must stay on the stack.

And then some of the instructions can be worked around if they happen to be the start of a function.

push: If the function pushes any registers r8 or higher, those can be pushed first, since the push of a high-numbered register is a two-byte instruction. Or the instruction could be re-encoded with a redundant REX prefix 0x48. Alternatively, the compiler could save the register in the home space, which uses a multi-byte mov [rsp+n], r instruction.
ret: This happens if the function is empty and returns no value. The compiler can change this to a 3-byte ret 0 or a 2-byte repz ret.

The last remaining instruction is int 3, which is generated by the __debugbreak intrinsic.

One option is to use the alternate two-byte encoding cd 03 (int nn, with nn = 3). However, the code with the __debugbreak may be relying on it being a one-byte instruction, because it intends to patch it with a one-byte nop, or it intends to handle the breakpoint exception by stepping over the opcode by incrementing the instruction pointer.

Instead, the compiler plays it safe and begins the function with a two-byte nop, which is encoded as if it were xchg ax, ax, and in fact the Microsoft debugger disassembles it as such.

The pointless mov edi, edi instruction is gone. And most of the time, the compiler can juggle things so that you don’t even notice that it arranged for the first instruction of a function to be a multi-byte instruction. The only time it fails is if the first thing your function does is __debugbreak, in which case the compiler inserts a pointless xchg ax, ax instruction, also known as the two-byte nop.

Author

Raymond Chen

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

6 comments

Discussion is closed. Login to edit/delete existing comments.

Sachin Joseph November 10, 2022

Curious if hotpatching in ARM64 works in the same way as x64 🙂
Александр Алексеев November 10, 2022

Perhaps, I am missing something, but the old MOV EDI, EDI method allowed you to call the patch for the function and then jump back to the function, since main function’s body (immediately after MOV EDI, EDI) will remain untouched.

It is no longer possible in x64, since you will overwrite actual function’s payload. So, if you want to call the original function, you will have to do something with its first instruction.

Is this scenario (calling the original function) not used widely, so it was abandoned?
- Raymond Chen Author November 10, 2022
  
  The hotpatch never calls the original function. It is a replacement function.
Joshua Hudson November 9, 2022

A recent hotpatch put CreateFileW in Windows 11 into unpatchable territory. It may well have since been fixed I stopped checking.

The initial instruction was movable just fine. There were not five bytes before it that were free. The end of the previous function ran into the space needed to put the far jmp instruction.

We don’t have to do this nonsense anymore; finally won the war about actually requiring full disk encryption on workstations.
Adam Rosenfield November 9, 2022

Are there many functions that start with any of the single-byte opcodes CLC, STC, CLD, or STD? (There are probably not many with INT1, HLT, CMC, SAHF, LAHF, CLI, or STI in user-mode code.)
- Csaba Varga November 9, 2022 · Edited
  
  I'm pretty sure the direction flag is expected to be clear on function entry, so CLD is unnecessary. STD might happen if the first thing the function needs to do is running a repeated string instruction backwards, but it sounds like a pretty uncommon thing to do. (String instructions also mutate RSI and/or RDI, so you would need to save them to the stack first anyway.) Setting, resetting or complementing the carry flag may have been useful in the old DOS days when some functions communicated through that flag, but not so much nowadays.
  
  In any case, if the function needs...
  Read more
  I’m pretty sure the direction flag is expected to be clear on function entry, so CLD is unnecessary. STD might happen if the first thing the function needs to do is running a repeated string instruction backwards, but it sounds like a pretty uncommon thing to do. (String instructions also mutate RSI and/or RDI, so you would need to save them to the stack first anyway.) Setting, resetting or complementing the carry flag may have been useful in the old DOS days when some functions communicated through that flag, but not so much nowadays.
  
  In any case, if the function needs to execute any of those instructions in the very beginning, the compiler can always insert XCHG AX, AX to work around the issue.
  
  Read less

Why don’t Windows functions begin with a pointless MOV EDI,EDI instruction on x86-64?

Author

6 comments

Read next

On the dangers of giving a product feature the name “new”

If I issue multiple overlapped I/O requests against the same region of a file, will they execute in the order I issued them?

Author

6 comments

Read next

On the dangers of giving a product feature the name “new”

If I issue multiple overlapped I/O requests against the same region of a file, will they execute in the order I issued them?

Stay informed