Windows Control Flow Guard (CFG) is a defense in depth feature which validates indirect call targets. The idea is that each module that is enabled for CFG provides a bitmap that describes which addresses in the module are intended to be targets of indirect calls. When CFG is enabled in a process, indirect function calls are checked against this table, and if the address is deemed invalid, the process terminates itself, and the Watson service records the details for future investigation.
If you are studying a crash in the control flow guard validator¹ you may want to pick out the failed address so you can understand better what went wrong and use it to guide the next step of your debugging. (Was it a bad address? Was the DLL unloaded? Was it a garbage value due to use-after-free?)
In general, the control flow guard validator takes a function address in some register, performs shifting and masking operations using that register as a source (to calculate the bit position in the call target bitmap), and then tests a bit. The source register is left unchanged so that the caller, on success, can use the validated address as a jump target.
Let’s practice. Here’s one of the control flow guard validator functions for x86-64, which Windows often calls x64. Try to spot the register that holds the address being validated.
ntdll!LdrpValidateUserCallTarget:
mov rdx,qword ptr [ntdll!................]
mov rax,rcx
shr rax,9 ; shift
mov rdx,qword ptr [rdx+rax*8] ; crash here
mov rax,rcx
shr rax,3
test cl,0Fh
jne @1
bt rdx,rax
jae @2
ret
@1: btr rax,0
bt rdx,rax
jae @3
@2: or rax,1
bt rdx,rax
jae @3
ret
@3: mov rax,rcx
xor r10d,r10d
jmp ntdll!LdrpHandleInvalidUserCallTarget
We see that the value in rcx gets moved into rax, and then rax gets shifted. So the address being validated is in rcx. The marked instruction is the only one that accesses memory, so if there’s a crash, it’ll happen there. The rest of the function is just bit twiddling.
Let’s do the same exercise for x86-32, which Windows often just calls x86.
ntdll!LdrpValidateUserCallTarget:
mov edx,dword ptr [ntdll!........]
mov eax,ecx
shr eax,8 ; shift
mov edx,dword ptr [edx+eax*4] ; crash here
mov eax,ecx
shr eax,3
test cl,0Fh
jne @1
bt edx,eax
jae ...
ret
@1: btr eax,0
bt edx,eax
jae ...
or eax,1
bt edx,eax
jae ...
ret
This time, it’s the value in ecx that gets moved into eax, and then eax gets shifted. The address being validated is therefore in ecx. Again, the marked instruction is the only one that accesses memory.
One more: This time, it’s 32-bit ARM, which Windows calls simply arm.
ntdll!LdrpValidateUserCallTarget:
mov r3,#0x....
movt r3,#0x....
ldr r3,[r3]
lsrs r2,r0,#6 ; shift
ubfx r1,r0,#3,#3
ldrb r2,[r3,r2] ; crash here
mov r3,r0
and r0,r0,#0xF
subs r0,r0,#1
bne ...
There are two memory accesses this time. The first is loading from a fixed address (built into r3 in two instructions), so it matches the first instruction of the x86-32 and x86-64 versions; it’s just that x86 can load from many fixed adresses in just one instruction.
The second group of instructions is the interesting one. It shifts the value in r0 and puts the result in r2. It also uses r0 as the source for a bit extraction operation that puts the result in r1, and then it accesses some memory. So it looks like r0 is the address, since it’s the source of the shift instruction.
Mind you, this code modifies r0 later on, so the value in r0 doesn’t hold the address through the entire function. It got copied into r3 for safekeeping, so if you break in later in the function, you’ll want to look to r3 for the address. But if you crash on the memory access, the address is in r0.
Our last example is AArch64, which Windows usually calls arm64.
ntdll!LdrpValidateUserCallTarget:
adrp xip0,ntdll!....
ldr xip0,[xip0,#0x598]
lsr xip1,x15,#6 ; shift
tst x15,#0xF
ldrb wip1,[xip0,xip1] ; crash here
ubfx xip0,x15,#3,#3
bne @2
lsr xip1,xip1,xip0
tbz wip1,#0,@3
@1: ret
@2: and xip0,xip0,#-2
lsr xip1,xip1,xip0
tbz wip1,#0,@4
@3: tbnz wip1,#1,@1
@4: mov xip0,#0
b @5
@5: b ntdll!LdrpHandleInvalidUserCallTarget
Again, we start by loading an address from memory, and then we shift a register, this time the x15 register. There is a bit test instruction whose result is used later, and then we perform a memory access (which could crash). From inspection, we therefore see that the address being validated is in x15.
The point of this exercise is not to memorize the registers that each architecture uses for control flow guard,³ but rather to take a little information about the design of control flow guard (checking a bit in a bitmap, using the address passed in a register to calculate the index),² and using that to figure out on the fly which register you need to look at based on the code surrounding the crashing access.
¹ Usually, these crashes occur because the address that got passed in is so invalid that there is no memory at the location where the bit in the validation bitmap is supposed to be, resulting in an access violation.
² You don’t even have to know the precise meaning of the bits in the bitmap. All you have to remember is that the address is used to determine the bit to check.
³ I sure don’t have them memorized. Each time it happens, I just re-derive it from the instructions around the crash.
0 comments
Be the first to start the discussion.