January 21st, 2025

Why doesn’t the Windows blue screen of death prominently identify the company that created the driver that crashed?

When there is a crash in the kernel, Windows displays the famous blue screen of death. Why doesn’t the blue screen message also say, “This crash brought to you by Company X, author of Driver D”? Wouldn’t that make it easier for users to understand whom to blame for the problem?

When you assign blame, you need to be sure you are assigning blame correctly. If you mess up, then you’ll be like one of those “Find my lost phone” apps that gives the wrong location for a phone, causing some poor homeowner to be harrassed repeatedly.

What the kernel knows is that the crash occurred due to the execution of a specific instruction, and it can even figure out what memory address that instruction was attempting to access. But it is not necessarily the case that the driver that is executing that instruction is the one at fault for the crash.

One large category of failure is memory corruption. These types of failures are often quite difficult to debug because memory corruption usually does not manifest itself as a crash in the code that did the corrupting, but rather as a crash in the code that tries to use the corrupted data. If you blame the driver that executed the crashing instruction, you’ll be blaming the victim, rather than the culprit.

If you assume that memory corruption is random, then each time the system crashes, it will blame a different driver, and the conclusion of the user might be, “There must be some unknown driver that is causing all these other drivers to crash.” But it might also be “Each time I try to repair a driver that Windows blamed for the crash, I just get a crash in some other driver. Windows is so horribly broken that all of the drivers are crashing! And sometimes, it comes back and re-blames the driver that I just repaired. Oh, and how do I repair the ntoskrnl.exe driver?”

Furthermore, it’s not valid to assume that memory corruption is random. We’ve seen memory corruption bugs that consistently corrupt the same innocent victim repeatedly. So you can’t even use a rule of thumb that “Well, if ten consecutive crashes are on the same instruction, then that code is definitely at fault.”

Assigning blame for an access violation in native code is difficult because the nature of memory corruption can lead to the access violation occurring in a component unrelated to the one that is the source of the problem. An incorrect assignment of blame causes users (and technology reporters) to march on the company’s headquarters with torches and pitchforks, and now you have public relations and legal problems on your hands.

Related reading: Windows 95 provided the name of the crashing driver in its blue screen messages, giving users the incorrect impression that it was a Windows-provided driver that was crashing their system.

Bonus reading: Steve Ballmer did not write the text for the blue screen of death. But he did write the text for the Ctrl+Alt+Del screen in Windows 3.1.

Topics
Other

Author

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

5 comments

Discussion is closed. Login to edit/delete existing comments.

  • Joshua Hudson · Edited

    To bring this back on topic; it has always been that providing this information has been *better* than avoiding pointing fingers. The benefit is unique, and therefore searchable, crash reports.

    We should get something like:

    faultcode instruction-address fault-address

    where both addresses are given as module!address if it’s within a loaded module or absolute if not.

    If you’re really worried about finger pointing you could give the CRC16 of the module name instead; still unique enough to search for.

  • Igor Levicki

    The real reason why Microsoft isn't pointing fingers is because Microsoft itself is the enabler of bad kernel drivers with its meaningless and lax WHQL certification program which apparently only serves to prevent individual developers and hobbyists from being able to write kernel-mode drivers while big players can submit any crap code and get it rubber-stamped for a fee.

    I have one question for you Raymond, a question I dare you ask the management -- how the Crowdstrike's Falcon endpoint "security" kernel mode driver got WHQL certified and cross-signed by Microsoft when it contained what basically amounts to ring 0 script...

    Read more
    • Luke727 · Edited

      @Igor Levicki
      You’re not wrong, but what’s the alternative? Letting just anybody distribute kernel code was not sustainable from a security perspective, and it’s not feasible for Microsoft to deeply inspect every driver everyone wants to publish. The end result of your criticism is ultimately the removal of 3rd party kernel access entirely.

    • Me Gusta

      In regards to the "how to assign the blame" portion. Again, that doesn't guarantee that the driver that gets the blame is the one responsible for the error. A driver stack can have filter drivers between the user mode call and the function driver, or filter drivers between the function driver and the bus driver.

      If there is a bug in a filter driver that corrupts a pointer, then it is entirely possible for a filter driver to do its work, corrupt the pointer and then pass the corrupted information onto the next driver. In this case, it is potentially the...

      Read more
      • Igor Levicki

        I am aware of all that, but that's not the main point I was making.

        The main point is that Microsoft WHQL certification is a rent seeking sham that also serves as barrier to entry for individual developers, hobbyists, and small companies who want to write kernel driver code.

        In short, "WHQL certified" doesn't certify anything except that:

        1. You own a company and can prove it with documents
        2. You paid for an EV certificate from a whitelisted root CA
        3. You are paying rent for Azure AD so you can keep using the signing portal

        Given that Microsoft didn't catch Crowdstrike's parser/interpreter...

        Read more