The ARM processor (Thumb-2), part 13: Trampolines

Raymond Chen

As we noted last time, the relative branch instructions have a limited reach. In particular, the bl instruction, which is used for intra-module direct calls, has a reach of around ±16MB. But what happens if the call target is too far away? Or if the function is a naïvely-imported function?

In the case of a faraway call target, the linker injects a trampoline, called a veneer in the ARM documentation.

    bl      toofar_trampoline

    mov     r12, #lo(|toofar|+1)
    movt    r12, #hi(|toofar|+1)
    bx      r12             ; jump to r12

The r12 register, known as the intraprocedure call register, is a register that the linker is permitted to use for the purpose of generating trampolines and function prologues. From the compiler’s point of view, it is super-volatile: Any branch instruction could damage the r12 register.

In practice, the compiler doesn’t use r12 for anything at all.

In the case of a naïvely-imported function, the actual call target is stored in the import address table, and the linker must generate a trampoline that jumps to the imported function:

    bl      imported_trampoline

    mov     r12, #lo(iat_imported)
    movt    r12, #hi(iat_imported)
    ldr     pc, [r12]

Here, we take advantage of the overly-uniform pc register: Loading a value into it acts as a jump instruction. It saves an instruction, because we don’t have to load the jump target into a register and then BX to it.

Next time, we’ll look at a few miscellaneous instructions.

Bonus chatter: I don’t know why the linker prefers to use a MOV + MOVT instruction pair instead of a single pc-relative LDR. My guess is that it avoids memory latency.

Bonus chatter 2: You might think that trampolines can never be deployed for jumps within a function. However, that’s not true: Code motion due to profile-guided optimization can cause rarely-executed code blocks to be relocated to faraway locations in the module. The most likely case is that a relative short jump becomes long and has to be converted to a jump-to-a-jump. In rare cases, the destination could end up more than 16MB away, in which case you would need a full trampoline.


Discussion is closed. Login to edit/delete existing comments.

  • MGetz 0

    Short answer to your ponder: Because ARM explicitly say to avoid using r12 as a target of an LDR instruction. Beyond that LDR would seem to be the better choice as it doesn’t require an immediate whereas MOVT does and thus could load the jump directly from the relocation table without requiring a fix up by the loader.

    • Raymond ChenMicrosoft employee 0

      That caution is in the “doubleword load on classic ARM” section, so it means that “ldrd r12, [pc, #imm]” is strongly discouraged. The caution doesn’t apply here since not only are we Thumb-2 (not classic ARM), we aren’t even a doubleword load. My guess is that the recommendation against “ldrd r12” in classic ARM is that the implicit second destination is “r13” which is “sp”.

      • MGetz 0

        Yeah… I was reading that. I can’t find any specific reason other than that. Googling around actually shows examples where compilers are actually doing an ldr r12, [PC offset]. The ARM docs seem mostly to be indicating “This is reserved for the linker… just ignore it” more than anything else. They do have a very prescribed relocation mechanism though, but I couldn’t find out which variant MS is using; that does have conventions on how you talk to r12 as best I can tell.

        The only other thing I can think of from the MS linker/compiler perspective is the hooking mechanism like they do on x86, but normally that’s in a function prolog. mov + movt is 8 bytes which would be more than enough to clobber with a relocation if necessary at runtime to any arbitrary location.

  • 紅樓鍮 0

    At which addresses do the trampolines live? As a trampoline has to live close to the branch insn, do compilers always reserve spaces around functions for potential trampoline injection?

    • Raymond ChenMicrosoft employee 0

      As long as no object file is larger than 16MB, the the linker can insert the trampolines between object files. And if function-based linking is active (which it always is in practice), then the linker can insert code between functions.

      • 紅樓鍮 0

        OK, I think I somehow misread the entire article thinking it’s the dynamic linker that injects trampolines…

      • Florian Philipp 0

        Wait, then isn’t it inconsistent to assume that any branch may change R12? If the assumption is that trampolines can be reached from any point within a function, then no branch to a target within the function should ever require a trampoline, right? In that case, couldn’t the compiler treat R12 a call-clobbered register?

        • Jonathan Duncan 0

          The AAPCS states that r12 may be altered “at any branch instruction that is exposed to a relocation that supports inter-working or long branches.

          So as long as the compiler knows a given branch is non-interworking (so all of them in this case as we’re limited to thumb), and non-long then, it is safe to use r12 for intermediate values.

          Or in case of huge >16MB functions, the compiler can still use long branches and r12 safely as long as the branch target is not exposed to a relocation.

          I imagine that in that case it’d be the compilers responsibility to add the trampoline within the function rather than the linker, and I suspect the compiler would choose to do so via a chaining short-branch at the trampoline rather than using r12 anyway.

          • Florian Philipp 0

            I guess MGetz’s note that LDR is discouraged for R12 is the actual answer. Since we are not starved for registers, dealing with the complexities around R12 is probably just not worth it.

          • Raymond ChenMicrosoft employee 0

            The compiler can’t really assume that a branch is safe from rewrite because code motion due to PGO can rewrite any short branch to a long branch and trigger a trampoline.

Feedback usabilla icon