The case of Explorer calling into an unloaded DLL while trying to run down a reference to it

Raymond Chen

Raymond

There was a large number of crashes in Explorer that were tracked back to attempting to release a COM object that belonged to a DLL that was no longer in memory.

A typical call stack at the crash looked like this:

combase!<lambda_...>::operator()+0x9e
combase!ObjectMethodExceptionHandlingAction<lambda_...>+0x1b
combase!CStdIdentity::ReleaseCtrlUnk+0x68
combase!CStdMarshal::DisconnectWorker_ReleasesLock+0x385
combase!CStdMarshal::DisconnectSwitch_ReleasesLock+0x28
combase!CStdMarshal::DisconnectAndReleaseWorker_ReleasesLock+0x3c
combase!CStdMarshal::DisconnectAndRelease+0x35
combase!COIDTable::ThreadCleanup+0xd5
combase!FinishShutdown::<lambda_...>::operator()+0x5
combase!ObjectMethodExceptionHandlingAction<lambda_...>+0x15
combase!FinishShutdown+0x45
combase!ApartmentUninitialize+0x67
combase!wCoUninitialize+0x11a
combase!CoUninitialize+0xb6
imm32!CtfImmCoUninitialize+0x48
msctf!CicFlsCallback+0x50
ntdll!RtlProcessFlsData+0xf6
ntdll!LdrShutdownThread+0x32
ntdll!RtlExitUserThread+0x4c
KERNELBASE!FreeLibraryAndExitThread+0x34
ucrtbase!common_end_thread+0x84
ucrtbase!_endthreadex+0x7
ucrtbase!thread_start+0x46
kernel32!BaseThreadInitThunk+0x24
ntdll!__RtlUserThreadStart+0x2f
ntdll!_RtlUserThreadStart+0x1b

I took a sample of ten crashes with this stack to see if I could find a pattern. The object being released is still alive (the data for it is still present in memory, and it still has a vtable), but the code that the vtable points to has already been unloaded. Fortunately, the system remembers the DLLs that were most recently unloaded, so we can use that to look up the DLLs that hosted the objects that are being run down.

The ten crashes break down like this:

NumberCulprit
7Contoso
2Fabrikam
1LitWare

The vast majority of the issues are with Contoso, so we’ll focus on that one.

An interesting detail is that in four of the Contoso crashes, some version of the Contoso setup program is running.

I got lucky and discovered that Contoso is an open source project, so I was able to make further progress by reading the code and seeing what they were trying to do.

Contoso injects its DLL into Explorer and takes over a bunch of stuff. When Contoso wants to unload from Explorer, it unhooks all the hooks that it installed and unloads itself. It wasn’t loaded by COM, so COM is not going to call Dll­Can­Unload­Now to see whether the DLL has any active COM objects that would require it to remain loaded in memory.

However, it does produce COM objects, particularly, implementations of IAccessible so that its UI objects are available to screen readers and other UI automation clients.

Once I had this foothold, it was relatively easy to reproduce the problem:

  • Start Narrator.
  • Launch the Contoso UI.
  • Uninstall Contoso.
  • Perform a developer shutdown of Explorer; Ctrl+Shift+RightClick, Exit Explorer.

Here’s what’s going on.

Launching the Contoso UI causes Narrator to ask for the IAccessible interface so it can navigate the user interface elements.

Uninstalling Contoso causes it to remove its injected DLL, even though there are IAccessible objects still outstanding. These are ticking time bombs waiting to be triggered.¹

Shutting down Explorer causes COM to be shut down for the process, at which time it runs down all the outstanding objects. And that’s when it trips over these IAccessible objects that are backed by code that is no longer present in the process.

The fix is to create a custom COM context to hold your objects so that you can disconnect them prior to unloading. And the project owners agreed to make a fix to do exactly that.

One of the burdens of Explorer is that it is an attractive target for third-party code to inject itself, despite it being totally unsupported. And when that third-party code crashes, it’s Explorer that takes the blame.

One crash caused by a third party code-injector down. A few million more to go.

¹ That explains why the Contoso installer is often running at the time of these crashes. One of the things that the Contoso installer does is uninstall the previous version.

17 comments

Comments are closed. Login to edit/delete your existing comments

  • Avatar
    Joshua Hudson

    It’s quite possible to blame this one on MS anyway, but explorer isn’t the problem at all. Here we have yet another case of an application jumping through hoops because it can’t delete a file that contains running code.

    I know MS doesn’t want to use such functionality, but it makes so much extra work because it’s not there for those that do. And now you’ve had to pay a piece of it.

    • Avatar
      Me Gusta

      Do you know that at the very least you can move files that contain running code? Sure, this moves the problem from how to replace the file that is in use with how to delete the files after they unloaded, but at the very least this situation can be easily prevented anyway.

      • Avatar
        Joshua Hudson

        > how to delete the files after they unloaded

        Got a solution that doesn’t require admin rights and doesn’t require a process to be left running waiting for the file to become unlocked?

        • Avatar
          Me Gusta

          In this case? Let the setup program deal with it after explorer was restarted. But I would also say that since this is dealing with explorer, you are going to be using a setup program with admin rights anyway so using MoveFile with delay until reboot is an option.
          In general? The Windows task scheduler. The requirements for this is only as much as is needed to execute the command and access what you want to clean up.

    • Avatar
      Kythyria Tieran

      If you don’t want to be admin or leave a watcher behind, just force the user to log out. Since it’s not a reboot it’s *clearly* not any kind of downtime. /s

      The actual problem here is that the file *is* being “successfully” unloaded (and probably getting replaced on disk just fine). It’s just leaving some dangling pointers behind. If you only replaced the file on disk you’d still have to either unload the original somehow or make explorer restart without unloading the DLL.

  • Avatar
    Sebastiaan Dammann

    > Perform a developer shutdown of Explorer; Ctrl+Shift+RightClick, Exit Explorer.

    That is a nice trick! You should do a blog post with a few of these 😀

  • Avatar
    Piotr Siódmak

    I would expect a shell extension crash to cause a bluescreen. I think I read somewhere (old MSDN?) that they run on the kernel, but it was looong ago, so probably a mistake.
    It would be interesting to see how you tracked that stack trace to Contoso. Did you have to instruct WER to gather more information? What kind of memory dump was it?

      • Avatar
        Piotr Siódmak

        I think it was specifically about taskbar extensions and why they removed them after vista (if I remember correctly). Like the little mini windows media player control bar – I miss that one. The problem might have been related to the fact that people started loading multiple versions of .net into the shell (I vaguely recall discussions on how to use silverlight on the taskbar).

    • Avatar
      Me Gusta

      It is more or less the Vista one, but they probably had to move it due to the start changes in Windows 8+. Or it always worked on the task bar but nobody mentioned that.

  • Avatar
    word merchant

    Of course this is one of the fantastic things about open source – smart people in the wider community can contribute improvements one way or the other.