Why are some system functions exported as stubs instead as forwarders?

Raymond Chen

If you do a little digging around inside some Windows system functions, you’ll see that, for example, the Create­ProcessW function looks like this:

kernel32!CreateProcessW:
6b819ef0 mov     edi,edi
6b819ef2 push    ebp
6b819ef3 mov     ebp,esp
6b819ef5 pop     ebp
6b819ef6 jmp     dword ptr [kernel32!kernelbase_CreateProcessW]

The first four instructions have no net effect, so basically this is just an indirect jump to the kernelbase!CreateProcessW function. In other words, it’s a stub that forwards to the real implementation over in kernelbase.

Why is it done this way? Why isn’t the Create­ProcessW function just a forwarder to kernelbase? That would avoid having to travel through kernel32 just to reach kernelbase.

Yes, this would normally be a forwarder, but it’s not. For backward compatibility.

Wait, why is there a compatibility constraint that the Create­ProcessW function cannot be a forwarder?

Set the time machine to 2001. The Microsoft Layer for Unicode (MSLU) was just released, also affectionately known as “Unicows”, after the DLL component of MSLU: unicows.dll.

MSLU was a combination of a static library and a DLL. You wrote a Unicode application and linked it with the MSLU static library. This library contained its own definitions for a large number of functions, including Create­ProcessW. When your Unicode application called the alternate version of Create­ProcessW, the library checked whether it was running on a version of Windows that was ANSI-only (the Windows 95 series) or a version that supported Unicode (the Windows NT series).

If it was running on an ANSI-only system, then the stub loaded the unicows.dll library and forwarded the call to a helper function in that library which did the work of thunking the Unicode parameters to ANSI, and then calling the Create­ProcessA function, and then converting the results back to Unicode, and returning that to the caller. If it was running on a Unicode system, then it forwarded the call to the operating system’s Create­ProcessW function.

In other words, the static library contained a stub that decided whether to allow the Unicode call to go straight to the Unicode version of the underlying function, or whether it should convert the call to ANSI and call the ANSI version of the underlying function.

Okay, great, so where do DLL forwarders come into the story?

After the MSLU static library decides which code path it should use, it goes back and patches the the caller’s import table to point directly to the destination function. That way, the second and subsequent calls are direct and don’t go through the evaluation step again. (This is the same sort of trick that the delay-load stubs use.)

In the case where the MSLU static library decided to pass the function straight to the Unicode version of the underlying function, it needs to get the address of that Unicode version of the underlying function. For reasons not entirely clear to me, it doesn’t use the Get­Proc­Address function.¹ Instead it has a custom implementation of Get­Proc­Address which parses the DLL export table manually to find the function to forward to.

That custom implementation of Get­Proc­Address doesn’t support forwarders. There’s even a comment acknowledging as much:

   // This is a forwarder - Ignore for now.

Therefore, any function supported by MSLU may not take the form of a DLL forwarder. It must be a stub. Just in case somebody runs a program from the early 2000s written with MSLU.

Bonus chatter: This requirement that the function be a stub and not be a forwarder applies only to the x86-32 version of Windows, since that’s the only architecture supported by the Windows 95 series, and therefore the only one supported by MSLU. However, the functions are stubs on all architectures, presumably for simplicity of implementation.

¹ My suspicion was that it does this to avoid certain reentrancy issues in the loader, but I’m not sure.

13 comments

Discussion is closed. Login to edit/delete existing comments.

  • Alex Martin 0

    Have you decided to go with x86-32 and x86-64 now? It seems more rational to me, but Microsoft has always used (and to my knowledge still uses) x86 and x64.

    • Raymond ChenMicrosoft employee 0

      I haven’t yet settled on a naming convention.

      • Mystery Man 0

        Assuming your context does not allow you to use x86, please go with what people already use: “IA-32” or “i386”.

        I’m concerned that if you do otherwise, whatever you settle on probably has the same feel as phrases like “.NET Standard”, “inbox”, “boot partition”, “system partition” and, my personal favorite, “Program Files”.

        • Julien Oster 0

          The problem is: “What people already use” might just be what you perceive. IA-32 was coined retroactively after IA-64 (Itanium) became a thing, and “i386” has the opposite problem of being very antiquated and also rather misleading. A whole lot of non-64bit x86 code does not actually run on an actual 386 CPU anymore, and I see other weird stuff like “i586” in platform names.

          • Mystery Man 0

            That’s how a geek thinks. “The problem is: World War 1 was coined retroactively after World War 2.” A layman responds “Good!” Or maybe tries to humor the geek by asking “Retroactively or retrospectively? Oh, and by the way, which one is correct, ‘Aerith’ or ‘Aeris’? Can we call something an ‘HTML5 app’ if its HTML is antiquated but it is using CSS 3?”

            I use language as a way of helping my readers understand me, not to infuriate them.

    • skSdnW 0

      Parts of Windows uses AMD64. It confuses some people but it made sense at the time when you had two different 64-bit XP versions.

  • Sunil Joshi 0

    I’m assuming that the performance implications of this are actually negligible at runtime because it’s entirely predictable?

  • Patrick 0

    But what are the four “no net effect” instructions for?

      • skSdnW 0

        But why does MSVC insist on setting up a useless stack frame? Older versions of the compiler (VC6 etc.) does not do this.

        • Kasper Brandt 0

          Maybe it’s compiled with /Oy-
          (I haven’t checked what exactly it does – I guess the jump is from a tail call optimization? Maybe that just gives that slightly silly result when combined with forcing stack frame generation)

  • ori damari 0

    But many functions like VirtualProtect does not have a string argument and still use a stub

  • Dzmitry Konanka 0

    If this is single and well known library, written by MS. Why didn’t you use your app-compatibility engine to hotpatch its code or even intercept loading of old unicows.dll and load instead its new version, with blackjack and forwarded exports support?

Feedback usabilla icon