February 13th, 2023

Adventures in application compatibility: The case of the display control panel crash on exit

Windows reliability telemetry reported that there were a large number of crashes in the Display control panel. Since these crashes are coming via telemetry and the Windows Error Reporting service, there is no information about what steps are required to reproduce the problem. All we have are crash dumps.

The crash was due to the instruction pointer being in the middle of nowhere. For example, in one dump, it was at address ffffffff`924bbde0. Close study shows that this value is suspiciously similar to 00007fff`924bbde0, which is the address of ntdll!ButtonWndProc_A. This tells me that somebody subclassed a button, and then tried to restore the original window procedure, but they messed up and truncated the 64-bit pointer value to a 32-bit signed integer. Bonus insult: Their button is ANSI, not Unicode. It’s (checks watch) 2023, get with the program. Not everybody who uses a computer speaks English.

To debug this problem, I had to do some triangulation of the crash dumps to look for a third party component that was common to all (or at least most) of the crashes. Since this was a Display control panel, I focused on the video card information, since video card drivers can provide a custom Display control panel plug-in to show off their driver-specific features.

And I found it.

The custom property sheet that comes with one particular video card has a bug in its WM_DESTROY handler: It casts a WNDPROC to a 32-bit value, causing the upper 32 bits to be lost.

Here is the reverse-engineered dialog procedure:

INT_PTR CALLBACK DialogProc(
    HWND hdlg, UINT uMsg, WPARAM wParam, LPARAM lParam)
{
  if (uMsg == WM_INITDIALOG) {
        SetWindowLongPtr(hdlg, GWLP_USERDATA,
                ((PROPSHEETPAGE*)lParam)->lParam);
  }
  MyClass* self = (MyClass*)GetWindowLongPtr(
        hdlg, GWLP_USERDATA);
  return self ? self->RealDialogProc(hdlg, uMsg, wParam, lParam)
              : FALSE;
}

And here is the “real” DLGPROC:

INT_PTR CALLBACK MyClass:RealDialogProc(
    HWND hdlg, UINT uMsg, WPARAM wParam, LPARAM lParam)
{
  switch (uMsg)
  {
  ...
  case WM_DESTROY:
    SetWindowLongPtr(GetDlgItem(m_dlg, IDC_SOME_BUTTON),
        GWLP_WNDPROC, (LONG)g_originalWndProc);
   ...
  }
}

I suspect this code was originally written as 32-bit code, and the line was

    SetWindowLong(GetDlgItem(m_dlg, IDC_SOME_BUTTON),
        GWL_WNDPROC, (LONG)g_originalWndProc);

When porting to 64-bit, the Set­Window­Long becomes Set­Window­Long­Ptr to expand the value to 64 bits, and the name of the index changes from GWL_WNDPROC to GWLP_WNDPROC, with the extra P emphasizing that the value should be passed to Get/Set­Window­Long­Ptr.

But they forgot to upgrade their cast from (LONG) to (LONG_PTR), so they were accidentally truncating their 64-bit value to a sign-extended 32-bit value as part of the restoration.

I went for style points and came up with a one-byte patch to fix the bug.

// rbx = (LONG)g_originalWndProc
48631d33540300  movsxd  rbx,dword ptr [contoso+0x39c50]

Patch the second byte from 63 to 8b:

// rbx = (LONG_PTR)g_originalWndProc
488b1d33540300  mov     rbx,qword ptr [contoso+0x39c50]

It turns out that all the machines that are hitting this bug are running drivers that are over ten years old. The current drivers don’t have this bug. In fact, the current drivers don’t even have the custom control panel extension! In the time since I originally did this investigation, it appears that people finally got their act together and upgraded their video drivers, because there has been only one recorded occurrence of this crash worldwide in the past 30 days.

Bonus reading: The difference between a junior and senior position at a video card company.

Topics
Code

Author

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

15 comments

Discussion is closed. Login to edit/delete existing comments.

  • Stefan Kanthak · Edited

    "ntdll!ButtonWndProc_A ... Bonus insult: Their button is ANSI, not Unicode. It’s (checks watch) 2023, get with the program. Not everybody who uses a computer speaks English."

    Hmmm... https://docs.microsoft.com/en-us/windows/uwp/design/globalizing/use-utf8-code-page#activeCodePage but advertises that functions like ntdll!ButtonWndProc_A support Unicode!

    Now switch from ntdll to NLS, the "(Inter)National Language Support", and query its GetLocaleInfoA function for LOCALE_SNATIVEDIGITS of MAKELCID(MAKELANGID(LANG_ARABIC, SUBLANG_DEFAULT), SORT_DEFAULT), which according to the documentation of LOCALE_SNATIVE* should return the Arabic numerals ٠١٢٣٤٥ ٦٧٨٩, or for LOCALE_SNATIVELANGUAGE of MAKELCID(MAKELANGID(LANG_ARMENIAN, SUBLANG_ARMENIAN_ARMENIA),...

    Read more
  • Zsigmond LÅ‘rinczy

    35+ years ago long/LONG was defined as int32_t (just like DWORD was defined as uint32_t), everything since then (i.e. choosing LLP64 instead of LP64) is just consequence of this. (Maybe type LPARAM=intptr_t could have been used instead (e.g. Get-/SetWindowLparam), but that didn’t happen.))

  • Ryan Phelps

    Did your psychic debugging powers lead you to look at the address of the ntdll functions? That would have taken me approximately infinity -1 years to think of.

    • Joshua Hudson

      A negative pointer in 64 bit address space makes the mistake obvious. If you’re going to do something that can succeed from there, you’re going to overlay on the low 32 bits of the pointer until you find a match somewhere in the dump.

  • Andreas Schulz

    > It’s (checks watch) 2023, get with the program. Not everybody who uses a computer speaks English.

    Coming from the platform that still fails to support proper UTF-8 in its APIs (https://github.com/microsoft/terminal/issues/7777). It’s 2023, indeed. But you’re blaming the wrong party here.

    • Daniel Sturm

      If only Microsoft had also ignored localization to non English for many many years instead of offering it years before utf8 came around.

    • George Tokmaji

      The platform has properly supported UTF-16 and therefore Unicode for decades. One might think a Control Panel applet would make use of that.

      • Joshua Hudson

        UTF-16 is legacy trash but yeah I too would expect a control panel applet to use it. Among other reasons, you can’t manifest that because it runs in some other process.

      • Stefan Kanthak

        Fortunately, nobody needs a (Microsoft) C library on Windows: the Win32 API provides all interfaces with its -W functions.
        The -A functions are the legacy trash you wrote about; manifested to abuse UTF-8 they become outright dangerous or just suck — BIG TIME: see FullDisclosure

      • Andreas Schulz

        They did not fix that entirely yet, as you can read with the linked issue above. ReadConsoleA + manifest still does not work.

  • George Tokmaji

    > which is the address of ntdll!ButtonWndProc_A

    Huh, I’d have expected it to be in user32.dll, not ntdll.dll.

    • Stefan Kanthak

      NTDLL.dll also exports NtdllDefWindowProc_A, NtdllDefWindowProc_W, NtdllDialogWndProc_A and NtdllDialogWndProc_W
      Now guess where USER32.dll forwards its DefDlgProcA, DefDlgProcW, DefWindowProcA and DefWindowProcW to.

  • MGetz

    I remember at my first job out of college I had to do work on an extension to a well known piece of publishing software that has largely fallen out of favor that we sold. This was about the time of the x64 transition and I had just finished up the 64bit port of an extension we had to office, a job made all the more fun because someone was doing pinvokes directly to mangled...

    Read more
  • c h · Edited

    Win10, since long time (and maybe a bit related to this topic):

    Virtual Desktop going from say Desktop 1 to Desktop 2, crashes a heck of a lot. I mean, a lot.

    I only notice when VS 2019 is running in D1. Switch to D2 and things just might jitter for a bit (four display monitors, Nvidia), then blooooop, a crashdump gets made and sent. I would include the name but I delete them...

    Read more