After I made my DLL delay-load another DLL, my DLL has started crashing in its process detach code

Raymond Chen

Raymond

A customer had a DLL, let’s call it CONTOSO.DLL, and that DLL linked to another DLL, let’s call it WIDGET.DLL. To improve DLL load time, they made WIDGET.DLL a delay-loaded DLL via the /DELAYLOAD linker option. This worked out great, except that sometimes their DLL crashed when shutting down.

When the WIDGET.DLL was a static dependency, the loader made a note to ensure that WIDGET.DLL was loaded and ready before calling CONTOSO.DLL‘s initialization function, and made sure that WIDGET.DLL remained valid until CONTOSO.DLL completed its uninitialization.

Switching the WIDGET.DLL to a /DELAYLOAD DLL removes this static dependency, and the loader isn’t around to help any more.

When the process shuts down, the loader uninitializes the DLLs in an order that tries¹ to preserve the static dependencies, so that a DLL waits until all its dependents are uninitialized before itself uninitializing. However, the loader does not have insight into dynamically-created dependencies, and the DLLs may unload out of order.

What happened is that CONTOSO.DLL initialized without WIDGET.DLL, and then later somebody needed a widget, so it loaded WIDGET.DLL and did some widget stuff, and then cached the widget so it wouldn’t have to go through all that nonsense again.

In the CONTOSO.DLL module’s DLL_PROCESS_DETACH code, it checks² if there is a cached widget, and if so, destroys it.

WIDGET.DLL was a dynamic dependency, the module loader doesn’t take it into account when calculating the order in which modules should be uninitlalized. The loader sees no static dependency between CONTOSO.DLL and WIDGET.DLL, so the order in which they uninitialize is arbitrary.

And if the arbitrary decision ends up selecting WIDGET.DLL to uninitialize first, then you have a crash when CONTOSO.DLL tries to call into an already-uninitialized DLL.

Note that this problem occurs only at process shutdown. If CONTOSO.DLL unloads via a runtime call to Free­Library, it will still be able to call into WIDGET.DLL because it hasn’t yet called Free­Library on WIDGET.DLL. But during process shutdown, the module loader needs to free all the things, and the outstanding Load­Library won’t prevent that from happening.

The solution is to bypass widget cleanup if the DLL_PROCESS_DETACH handler realizes that the process is terminating. Just leak the widget. The building is being demolished. You don’t need to sweep the floors.

The DLL was able to start without the widget DLL. It should be able to finish without the widget DLL.

¹ I say “tries” because circular dependencies make such an effort impossible to achieve, but the loader does the best it can.

² It’s important to check for evidence of widgets before trying to clean up widget-related things. Otherwise, you may end up loading a DLL in your DLL_PROCESS_DETACH handler, and that’s not good.

 

Raymond Chen
Raymond Chen

Follow Raymond   

8 comments

Comments are closed.

  • Avatar
    Ian Boyd

    I know a lot of developers have balked at the idea of leaking memory or handles, because while running in the debugger they have telemetry tools that report memory or handles leaks – and they don’t want to obscure real memory leaks with the false positives. That’s why I wrap those Free functions in IsDebuggerPresent. This way I still get the quick shutdown, and my memory and GDI leak trackers don’t go insane.

    • Avatar
      Dmitry Vozzhaev

      Before using IsDebuggerPresent one should estimate how many undebuggable bugs will make her happy.

  • Avatar
    Leif Strand

    “a DLL waits until all its dependents are uninitialized before itself uninitializing”
    Strike that, reverse it.

  • Avatar
    Piotr Siódmak

    Hi,
    Sorry to bother you with this, but do you know if the loader is documented anywhere? I can’t find anything. I have an issue of a third party application crashing on startup and all I have so far is that in ntdll!LdrInitializeThunk after a call to ntdll!NtTestAlert esi is 0xc0000139 (which apperantly means “STATUS_ENTRYPOINT_NOT_FOUND”) and then is sent to ntdll!RtlRaiseStatus. When run from the debugger the application instead of showing the WER UI shows a message box with “The procedure entry point ??0SomeFunction@@QAE@H@Z could not be located in the dynamic link library C:\Program Files (x86)\Application\Application.exe”.
    I’m stuck. Why wouldn’t the loader be able to find the entry point? !dlls show an entry point address. Everything point to the exe itself, so it can’t be a dependency dll (it has loaded a few of its own dlls by the time the error happened).
    Searching for NtTestAlert didn’t help. Is there any resource that could help me or is it black box land?

    • Avatar
      Ramón Sola

      A typical cause for error 0xc0000139 is a DLL version mismatch, a particular instance of the dreaded “DLL Hell”.

      The message “The procedure entry point XXXX could not be located…” cannot be trusted anymore since Windows 8, I think. It currently shows the executable module that tried to reference the entry point, not the module in which the function entry point should actually be found as it was reported in previous versions of Windows. This issue puzzled me a few years ago when some users were starting to discuss program load failures related to _crtCreateSymbolicLinkW in forums. The error message sometimes pointed out MSVCR110.DLL and other times it referenced MSVCP110.DLL. The correct one was the former: _crtCreateSymbolicLinkW really was located in MSVCR110.DLL (since Visual Studio 2012 Update 1), but it was usually referenced from MSVCP110.DLL.

      If you have installed the Microsoft Debugging Tools for Windows, you can enable “Show loader snaps” for Application.exe by running “gflags -i application.exe +sls” on the command line. Alternatively, you can set the GlobalFlag value to 2 (as a DWORD value) or to 0x00000002 (as a string, REG_SZ value) in the registry key HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\Application.exe. The application must be run under a debugger to see the “loader snaps”. These “snaps” are pretty verbose messages sent to the debugger from the Windows DLL loader as it does its work. The very last lines before the final 0xC0000139 status code should pinpoint where the problem is.

      You could also try specific tools like the venerable and pretty dated Dependency Walker or the fairly newer Dependencies from LucasG (search for it in GitHub).

      • Avatar
        Piotr Siódmak

        This helped a lot, thank you.
        After setting gflags, I could see “LdrpNameToOrdinal – WARNING: Procedure “??0SomeFunction@@QAE@H@Z” could not be located in DLL at base 0x10000000″ in the debugger. The address 0x10000000 pointed at ApplicationLibrary3.dll. After checking with peviewer, indeed Application.exe tried to import ??0SomeFunction@@QAE@H@Z from ApplicationLibrary3.dll, but the library did not export it. The library was in a previous version (luckily the version in the file’s metadata reflects application version). It probably wasn’t replaced by the updater because ApplicationTrayWidget.exe held tightly to it. Kicking all users out, killing all Application processes and reinstalling it did the job.

        Thank you for the help.

      • Avatar
        Ben Voigt

        It would be so helpful if descriptions of these OS-provided diagnostics were linked in the LoadLibrary or GetProcAddress function documentation, or the dllimport keyword documentation (in the toolchain), or pretty much anwhere at all having to do with building software that consumes DLL exports.  Troubleshooting guides are nice but tracing loader operation should not be considered a separate activity from development and debugging.