As we noted last time, the relative branch instructions have a limited reach. In particular, the bl
instruction, which is used for intra-module direct calls, has a reach of around ±16MB. But what happens if the call target is too far away? Or if the function is a naïvely-imported function?
In the case of a faraway call target, the linker injects a trampoline, called a veneer in the ARM documentation.
bl toofar_trampoline ... toofar_trampoline: mov r12, #lo(|toofar|+1) movt r12, #hi(|toofar|+1) bx r12 ; jump to r12
The r12 register, known as the intraprocedure call register, is a register that the linker is permitted to use for the purpose of generating trampolines and function prologues. From the compiler’s point of view, it is super-volatile: Any branch instruction could damage the r12 register.
In practice, the compiler doesn’t use r12 for anything at all.
In the case of a naïvely-imported function, the actual call target is stored in the import address table, and the linker must generate a trampoline that jumps to the imported function:
bl imported_trampoline ... imported_trampoline: mov r12, #lo(iat_imported) movt r12, #hi(iat_imported) ldr pc, [r12]
Here, we take advantage of the overly-uniform pc register: Loading a value into it acts as a jump instruction. It saves an instruction, because we don’t have to load the jump target into a register and then BX
to it.
Next time, we’ll look at a few miscellaneous instructions.
Bonus chatter: I don’t know why the linker prefers to use a MOV
+ MOVT
instruction pair instead of a single pc-relative LDR
. My guess is that it avoids memory latency.
Bonus chatter 2: You might think that trampolines can never be deployed for jumps within a function. However, that’s not true: Code motion due to profile-guided optimization can cause rarely-executed code blocks to be relocated to faraway locations in the module. The most likely case is that a relative short jump becomes long and has to be converted to a jump-to-a-jump. In rare cases, the destination could end up more than 16MB away, in which case you would need a full trampoline.
At which addresses do the trampolines live? As a trampoline has to live close to the branch insn, do compilers always reserve spaces around functions for potential trampoline injection?
As long as no object file is larger than 16MB, the the linker can insert the trampolines between object files. And if function-based linking is active (which it always is in practice), then the linker can insert code between functions.
Wait, then isn’t it inconsistent to assume that any branch may change R12? If the assumption is that trampolines can be reached from any point within a function, then no branch to a target within the function should ever require a trampoline, right? In that case, couldn’t the compiler treat R12 a call-clobbered register?
The AAPCS states that r12 may be altered "at any branch instruction that is exposed to a relocation that supports inter-working or long branches.
So as long as the compiler knows a given branch is non-interworking (so all of them in this case as we're limited to thumb), and non-long then, it is safe to use r12 for intermediate values.
Or in case of huge >16MB functions, the compiler can still use long branches and r12 safely as long as the branch target is not exposed to a relocation.
I imagine that in that case it'd be the compilers responsibility to add the...
The compiler can’t really assume that a branch is safe from rewrite because code motion due to PGO can rewrite any short branch to a long branch and trigger a trampoline.
I guess MGetz’s note that LDR is discouraged for R12 is the actual answer. Since we are not starved for registers, dealing with the complexities around R12 is probably just not worth it.
OK, I think I somehow misread the entire article thinking it’s the dynamic linker that injects trampolines…
Short answer to your ponder: Because ARM explicitly say to avoid using r12 as a target of an LDR instruction. Beyond that LDR would seem to be the better choice as it doesn’t require an immediate whereas MOVT does and thus could load the jump directly from the relocation table without requiring a fix up by the loader.
That caution is in the “doubleword load on classic ARM” section, so it means that “ldrd r12, [pc, #imm]” is strongly discouraged. The caution doesn’t apply here since not only are we Thumb-2 (not classic ARM), we aren’t even a doubleword load. My guess is that the recommendation against “ldrd r12” in classic ARM is that the implicit second destination is “r13” which is “sp”.
Yeah... I was reading that. I can't find any specific reason other than that. Googling around actually shows examples where compilers are actually doing an . The ARM docs seem mostly to be indicating "This is reserved for the linker... just ignore it" more than anything else. They do have a very prescribed relocation mechanism though, but I couldn't find out which variant MS is using; that does have conventions on how you talk to as best I can tell.
The only other thing I can think of from the MS linker/compiler perspective is the hooking mechanism like they do...