Adventures in application compatibility: The cost of forgetting to specify a calling convention


We saw last time that the Windows header files sometimes look at world through __stdcall-colored glasses, and that causes problems when the header file fails to specify an explicit calling convention.

The developers of one particular component made the mistake of omitting an explicit calling convention for one of their callback function pointer types, but it didn’t cause any immediate problems. Consumers who compiled with __cdecl as the default calling convention passed a __cdecl function pointer, but things happened to work out okay.

However, people reported that after installing a sevicing update, some programs that used that component started crashing. The reason is that the servicing update altered the code generation, and now the misplaced stack pointer started causing problems.

What we have here is a confluence of multiple mistakes. The feature team authored their header file incorrectly, failing to specify an explicit calling convention. This led to customers consuming the header file incorrectly, and passing callback function pointers that used the __cdecl calling convention instead of __stdcall.

Now the application compatibility adventure begins.

In addition to fixing the header files to be explicit about the calling convention (to prevent the problem from spreading), the component has to be modified so that it can be used with either calling convention.

declspec(naked) declspec(noinline)
WrapCallbackWithESPFix(WIDGETFILTERPROC filter, int a, int b)
        mov     edi, edi                ; hotpatch stub
        push    ebp                     ; establish stack frame
        mov     ebp, esp
        push    b
        push    a
        mov     ecx, filter             ; call target
        call    [__guard_check_icall_fptr]
        call    ecx
        call    filter                  ; make the call

        ; restore esp if the callee mismanaged it due to wrong calling convention
        mov     esp, ebp
        pop     ebp
        ret     12

It so happens that this workaround didn’t hang around indefinitely. The component in question has a very small audience, and in particular, only one of the clients was encountering this problem. That customer made a fix for their program and deployed it via their update channel. The workaround was removed a little less than a year later.

Bonus reading: Throwing garbage on the sidewalk: The sad history of the rundll32 program.


Comments are closed. Login to edit/delete your existing comments

  • Jonathan Duncan

    Maybe it’s my ARM-colored glasses showing, but every time I read about x86 calling conventions I can’t imagine how it ended up such a mess.
    But perhaps its just a miracle that ARM managed to not fall into that mess an dictate a single calling convention.

    If you have some insight into the historical circumstances that led to either I’d be really interested to hear.

    • Me Gusta

      It is boringly simple really. ARM support first appeared with Windows 8, ARM64 support also first appeared during the Windows 10 lifetime. It is the same with x86-64 support, where the first version to support this was Windows XP/Server 2003.
      The x86 mess came from the age of the architecture and backwards compatibility issues, but because x86-64, ARM and ARM64 are so new, then everything was just incorporated into the default C calling convention.

      • Jonathan Duncan

        Yeah, but ARM has had its documented single “ARM Procedure Call Standard” since at least 1994, Windows just adopted the standard as all other vendors have chosen to do with ARM.

        I don’t know the intel history but presumably they didnt think to document and enforce a unified calling convention standard so everyone presumably just did their own thing.

      • Александр Гутенев

        There’s still some mess even in x86-64. The default `__fastcall` convention isn’t fast enough, so `__vectorcall` also exists.

    • Zak Larue-Buckley

      I suspect the underlying reason is that x86 just doesn’t have enough registers so params have to go on the stack.

      This means lots of pushing and popping of params for every call and so whatever calling convention you use, there is compromise. (Eg: __stdcall can’t do var-args, __cdecl needs clean-up code at every call site, __fastcall may cause more spilling in callee…)

      Modern architectures have enough registers to pass 4 or so params in registers so there is less need for spilling/clean-up code.

      I suppose modern compilers can all do link-time code-gen anyway so internally, calling conventions become a moot point…