June 16th, 2021

The ARM processor (Thumb-2), part 13: Trampolines

Raymond Chen

As we noted last time, the relative branch instructions have a limited reach. In particular, the bl instruction, which is used for intra-module direct calls, has a reach of around ±16MB. But what happens if the call target is too far away? Or if the function is a naïvely-imported function?

In the case of a faraway call target, the linker injects a trampoline, called a veneer in the ARM documentation.

    bl      toofar_trampoline
...

toofar_trampoline:
    mov     r12, #lo(|toofar|+1)
    movt    r12, #hi(|toofar|+1)
    bx      r12             ; jump to r12

The r12 register, known as the intraprocedure call register, is a register that the linker is permitted to use for the purpose of generating trampolines and function prologues. From the compiler’s point of view, it is super-volatile: Any branch instruction could damage the r12 register.

In practice, the compiler doesn’t use r12 for anything at all.

In the case of a naïvely-imported function, the actual call target is stored in the import address table, and the linker must generate a trampoline that jumps to the imported function:

    bl      imported_trampoline
...

imported_trampoline:
    mov     r12, #lo(iat_imported)
    movt    r12, #hi(iat_imported)
    ldr     pc, [r12]

Here, we take advantage of the overly-uniform pc register: Loading a value into it acts as a jump instruction. It saves an instruction, because we don’t have to load the jump target into a register and then BX to it.

Next time, we’ll look at a few miscellaneous instructions.

Bonus chatter: I don’t know why the linker prefers to use a MOV + MOVT instruction pair instead of a single pc-relative LDR. My guess is that it avoids memory latency.

Bonus chatter 2: You might think that trampolines can never be deployed for jumps within a function. However, that’s not true: Code motion due to profile-guided optimization can cause rarely-executed code blocks to be relocated to faraway locations in the module. The most likely case is that a relative short jump becomes long and has to be converted to a jump-to-a-jump. In rare cases, the destination could end up more than 16MB away, in which case you would need a full trampoline.

Author

Raymond Chen

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

10 comments

Discussion is closed. Login to edit/delete existing comments.

紅樓鍮 June 16, 2021

At which addresses do the trampolines live? As a trampoline has to live close to the branch insn, do compilers always reserve spaces around functions for potential trampoline injection?
- Raymond Chen Author June 16, 2021
  
  As long as no object file is larger than 16MB, the the linker can insert the trampolines between object files. And if function-based linking is active (which it always is in practice), then the linker can insert code between functions.
  - Florian Philipp June 17, 2021
    
    Wait, then isn’t it inconsistent to assume that any branch may change R12? If the assumption is that trampolines can be reached from any point within a function, then no branch to a target within the function should ever require a trampoline, right? In that case, couldn’t the compiler treat R12 a call-clobbered register?
  - Jonathan Duncan June 17, 2021
    
    The AAPCS states that r12 may be altered "at any branch instruction that is exposed to a relocation that supports inter-working or long branches.
    
    So as long as the compiler knows a given branch is non-interworking (so all of them in this case as we're limited to thumb), and non-long then, it is safe to use r12 for intermediate values.
    
    Or in case of huge >16MB functions, the compiler can still use long branches and r12 safely as long as the branch target is not exposed to a relocation.
    
    I imagine that in that case it'd be the compilers responsibility to add the...
    Read more
    The AAPCS states that r12 may be altered “at any branch instruction that is exposed to a relocation that supports inter-working or long branches.
    
    So as long as the compiler knows a given branch is non-interworking (so all of them in this case as we’re limited to thumb), and non-long then, it is safe to use r12 for intermediate values.
    
    Or in case of huge >16MB functions, the compiler can still use long branches and r12 safely as long as the branch target is not exposed to a relocation.
    
    I imagine that in that case it’d be the compilers responsibility to add the trampoline within the function rather than the linker, and I suspect the compiler would choose to do so via a chaining short-branch at the trampoline rather than using r12 anyway.
    
    Read less
  - Raymond Chen Author June 17, 2021
    
    The compiler can’t really assume that a branch is safe from rewrite because code motion due to PGO can rewrite any short branch to a long branch and trigger a trampoline.
  - Florian Philipp June 17, 2021
    
    I guess MGetz’s note that LDR is discouraged for R12 is the actual answer. Since we are not starved for registers, dealing with the complexities around R12 is probably just not worth it.
  - 紅樓鍮 June 16, 2021
    
    OK, I think I somehow misread the entire article thinking it’s the dynamic linker that injects trampolines…
MGetz June 16, 2021 · Edited

Short answer to your ponder: Because ARM explicitly say to avoid using r12 as a target of an LDR instruction. Beyond that LDR would seem to be the better choice as it doesn’t require an immediate whereas MOVT does and thus could load the jump directly from the relocation table without requiring a fix up by the loader.
- Raymond Chen Author June 17, 2021
  
  That caution is in the “doubleword load on classic ARM” section, so it means that “ldrd r12, [pc, #imm]” is strongly discouraged. The caution doesn’t apply here since not only are we Thumb-2 (not classic ARM), we aren’t even a doubleword load. My guess is that the recommendation against “ldrd r12” in classic ARM is that the implicit second destination is “r13” which is “sp”.
  - MGetz June 17, 2021 · Edited
    
    Yeah... I was reading that. I can't find any specific reason other than that. Googling around actually shows examples where compilers are actually doing an . The ARM docs seem mostly to be indicating "This is reserved for the linker... just ignore it" more than anything else. They do have a very prescribed relocation mechanism though, but I couldn't find out which variant MS is using; that does have conventions on how you talk to as best I can tell.
    
    The only other thing I can think of from the MS linker/compiler perspective is the hooking mechanism like they do...
    Read more
    Yeah… I was reading that. I can’t find any specific reason other than that. Googling around actually shows examples where compilers are actually doing an ldr r12, [PC offset]. The ARM docs seem mostly to be indicating “This is reserved for the linker… just ignore it” more than anything else. They do have a very prescribed relocation mechanism though, but I couldn’t find out which variant MS is using; that does have conventions on how you talk to r12 as best I can tell.
    
    The only other thing I can think of from the MS linker/compiler perspective is the hooking mechanism like they do on x86, but normally that’s in a function prolog. mov + movt is 8 bytes which would be more than enough to clobber with a relocation if necessary at runtime to any arbitrary location.
    
    Read less