The case of the DLL that was not present in memory despite not being formally unloaded, part 2

Last time, we looked at crashes caused by a DLL being removed from memory behind everybody’s back, causing crashes when somebody tried to call into that no-longer-there DLL that everybody thought was still there.

A colleague of mine who was looking at other crashes coming from this process found that most of those other crashes were also of the form “a data structure was corrupted because somebody wrote the single byte 01 into it.” That piece of information made everything fall into place for my side of the investigation.

We saw earlier that the bottom bit of the HMODULE is set for datafile module handles. Therefore, if one of these stray 01 bytes happens to overwrite the bottom byte of an existing HMODULE handle, that turns it into a (fake) datafile module handle. And then, during process destruction, a component dutifully cleans up the DLLs they loaded by freeing them (say because they were stored in an RAII type like wil::unique_hmodule), the code will pass this (fake) datafile module handle to FreeLibrary. The FreeLibrary function sees the bottom bit set and says, “Oh, this must be the handle to a module that was loaded via LOAD_LIBRARY_AS_DATAFILE,” so it frees it as a datafile.

Freeing a datafile module means undoing the steps that were taken when the module was loaded as a datafile: Unmapping the DLL from memory. In particular, loading a module as a datafile does not add the DLL to the list of DLLs that were loaded as code; therefore, unloading a datafile module doesn’t remove it from that list. As far as the DLL list is concerned, the DLL is still in memory.

A one-bit error caused the code to lie and attempt to free a module handle that did not correspond to a LoadLibrary call, resulting in mass havoc.

The “DLL unmapped from memory” crash is just an alternate manifestation of the “somebody is writing 01 bytes to places they shouldn’t” bug. The original bug had a larger bucket spray than we initially thought.

The good news is that all of the crashes have funneled down to a single bug. The bad news is that you now have to debug this one memory corruption bug.

Unfortunately, at the time of this writing, the root memory corruption bug in the third party program has yet to be identified. We don’t know whether it’s coming from an operating system component or from the program itself. Though the fact that it appears to occur only in one process, where it sprays across multiple modules, suggests that it’s a problem with that program, or that there’s something peculiar about how this specific process uses the system.

If you look at the original stack trace, you can see that the problem is occurring at process termination. That’s probably why the problem has lurked for so long: Crashes at exit often go unnoticed because there is no end-user loss of functionality. The user was finished with the program anyway. Whether it exits cleanly or with a crash doesn’t affect the user much.

Sorry. Not all stories have a happy ending.

Author

Raymond Chen

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

8 comments

Shimon Doodkin June 27, 2026

May you please ask to fix the memory leak in winrt allocation not being released used in ms phone app,

Alexander Bessonov June 27, 2026

Looks like a best candidate for reproduction under Time Travel Debugger. If the problem is still reproducible (which is not granted, as the process will run much slower and some race conditions may disappear), it will be very easy to find the instruction that writes that byte.

Ken Settle 19 hours ago

So the root cause sounds like a major cyber security problem that is easily exploitable. The fix is to verify that the handle doesn’t point to code. That check belongs in the OS.
Forget about “how it happened” and fix the root cause!

Jacob Manaker 17 hours ago

I’m not seeing how this bug crosses a security boundary.

Log in to Vote or Reply
- Ray 14 hours ago
  
  @Ken Settle On the next episode of the airtight hatchway…

HXO June 27, 2026

I was chasing something (I forget what) and set up a SchTsk to trigger on Eventlog
Application – Application Error
Application – Application Hang
with action
msg.exe * /time:864000 Application Hang or Crash, see EvtLog.
Later found one of my own .NET Framework applications tripped on something on the way out, but no one had noticed.

The case of the DLL that was not present in memory despite not being formally unloaded, part 2

Category

Topics

Author

8 comments

Leave a commentCancel reply

Read next

The case of the DLL that was not present in memory despite not being formally unloaded, part 1

Cancellation of Windows Runtime activities is asynchronous

Category

Topics

Share

Author

8 comments

Leave a commentCancel reply

Read next

The case of the DLL that was not present in memory despite not being formally unloaded, part 1

Cancellation of Windows Runtime activities is asynchronous

Stay informed