If you look at the disassembly of functions inside Windows DLLs, you’ll find that they begin with the seemingly pointless instruction MOV EDI, EDI
. This instruction copies a register to itself and updates no flags; it is completely meaningless. So why is it there?
It’s a hot-patch point.
The MOV EDI, EDI
instruction is a two-byte NOP
, which is just enough space to patch in a jump instruction so that the function can be updated on the fly. The intention is that the MOV EDI, EDI
instruction will be replaced with a two-byte JMPÂ $-5
instruction to redirect control to five bytes of patch space that comes immediately before the start of the function. Five bytes is enough for a full jump instruction, which can send control to the replacement function installed somewhere else in the address space.
Although the five bytes of patch space before the start of the function consists of five one-byte NOP
instructions, the function entry point uses a single two-byte NOP
.
Why not use Detours to hot-patch the function, then you don’t need any patch space at all.
The problem with Detouring a function during live execution is that you can never be sure that at the moment you are patching in the Detour, another thread isn’t in the middle of executing an instruction that overlaps the first five bytes of the function. (And you have to alter the code generation so that no instruction starting at offsets 1 through 4 of the function is ever the target of a jump.) You could work around this by suspending all the threads while you’re patching, but that still won’t stop somebody from doing a CreateRemoteThread
after you thought you had successfully suspended all the threads.
Why not just use two NOP
instructions at the entry point?
Well, because a NOP
instruction consumes one clock cycle and one pipe, so two of them would consume two clock cycles and two pipes. (The instructions will likely be paired, one in each pipe, so the combined execution will take one clock cycle.) On the other hand, the MOV EDI, EDI
instruction consumes one clock cycle and one pipe. (In practice, the instruction will occupy one pipe, leaving the other available to execute another instruction in parallel. You might say that the instruction executes in half a cycle.) However you calculate it, the MOV EDI, EDI
instruction executes in half the time of two NOP
instructions.
On the other hand, the five NOP
s inserted before the start of the function are never executed, so it doesn’t matter what you use to pad them. It could’ve been five garbage bytes for all anybody cares.
But much more important than cycle-counting is that the use of a two-byte NOP
avoids the Detours problem: If the code had used two single-byte NOP
instructions, then there is the risk that you will install your patch just as a thread has finished executing the first single-byte NOP
and is about to begin executing the second single-byte NOP
, resulting in the thread treating the second half of your JMPÂ $-5
as the start of a new instruction.
There’s a lot of patching machinery going on that most people don’t even realize. Maybe at some point, I’ll get around to writing about how the operating system manages patches for software that isn’t installed yet, so that when you do install the software, the patch is already there, thereby closing the vulnerability window between installing the software and downloading the patches.
0 comments