Somebody asked our team for help because they believed they hit a deadlock in their program’s UI. (It’s unclear why they asked our team, but I guess since our team uses the window manager, and their program uses the window manager, we’re all in the same boat. You’d think they’d ask the window manager team for help.)
But it turns out that solving the problem required no special expertise. In fact, you probably know enough to solve it, too.
Here are the interesting threads:
0 Id: 980.d30 Suspend: 1 Teb: 7ffdf000 Unfrozen ChildEBP RetAddr 0023dc90 7745dd8c ntdll!KiFastSystemCallRet 0023dc94 774619e0 ntdll!ZwWaitForSingleObject+0xc 0023dcf8 774618fb ntdll!RtlpWaitOnCriticalSection+0x154 0023dd20 00cd03f2 ntdll!RtlEnterCriticalSection+0x152 0023dd38 00cd0635 myapp!LogMsg+0x15 0023dd58 00cd0c6a myapp!LogRawIndirect+0x27 0023fcb8 00cb64a7 myapp!Log+0x62 0023fce8 00cd7598 myapp!SimpleClientConfiguration::Cleanup+0x17 0023fcf8 00cd8ffe myapp!MsgProc+0x1a9 0023fd10 00cda1a9 myapp!Close+0x43 0023fd24 761636d2 myapp!WndProc+0x62 0023fd50 7616330c USER32!InternalCallWinProc+0x23 0023fdc8 76164030 USER32!UserCallWinProcCheckWow+0x14b 0023fe2c 76164088 USER32!DispatchMessageWorker+0x322 0023fe3c 00cda3ba USER32!DispatchMessageW+0xf 0023fe9c 00cd0273 myapp!GuiMain+0xe8 0023feb4 00ccdeca myapp!wWinMain+0x87 0023ff48 7735c6fc myapp!__wmainCRTStartup+0x150 0023ff54 7742e33f kernel32!BaseThreadInitThunk+0xe 0023ff94 00000000 ntdll!_RtlUserThreadStart+0x23 1 Id: 980.ce8 Suspend: 1 Teb: 7ffdd000 Unfrozen ChildEBP RetAddr 00f8d550 76162f81 ntdll!KiFastSystemCallRet 00f8d554 76162fc4 USER32!NtUserSetWindowLong+0xc 00f8d578 76162fe5 USER32!_SetWindowLong+0x131 00f8d590 74aa5c2b USER32!SetWindowLongW+0x15 00f8d5a4 74aa5b65 comctl32_74a70000!ClearWindowStyle+0x23 00f8d5cc 74ca568f comctl32_74a70000!CCSetScrollInfo+0x103 00f8d618 76164ea2 uxtheme!ThemeSetScrollInfoProc+0x10e 00f8d660 00cdd913 USER32!SetScrollInfo+0x57 00f8d694 00cdf0a4 myapp!SetScrollRange+0x3b 00f8d6d4 00cdd777 myapp!TextOutputStringColor+0x134 00f8d93c 00cd04c4 myapp!TextLogMsgProc+0x3db 00f8d960 00cd0635 myapp!LogMsg+0xe7 00f8d980 00cd0c6a myapp!LogRawIndirect+0x27 00f8f8e0 00cd6367 myapp!Log+0x62 00f8faf0 7735c6fc myapp!remote_ext::ServerListenerThread+0x45c 00f8fafc 7742e33f kernel32!BaseThreadInitThunk+0xe 00f8fb3c 00000000 ntdll!_RtlUserThreadStart+0x23
The thing about debugging deadlocks is that you usually don’t need to understand what’s going on. The diagnosis is largely mechanical once you get your foot in the door. (Though sometimes it’s hard to get your initial footing.)
Let’s look at thread 0.
It is waiting for a critical section.
The owner of that critical section is thread 1.
How do I know that?
Well, I could’ve debugged it, or I could’ve used my
psychic powers to say,
“Gosh, that function is called LogMsg
,
and look there’s another thread that is inside the
function LogMsg
.
I bet that function is using a critical section to ensure
that only one thread uses it at a time.”
Okay, so thread 0 is waiting for thread 1.
What is thread 1 doing?
Well, it entered the critical section back in the
LogMsg
function,
and then it did some text processing and, oh look,
it’s doing a SetScrollInfo
.
The SetScrollInfo
went into comctl32
and ultimately resulted in a SetWindowLong
.
The window that the application passed to
SetScrollInfo
is owned by thread 0.
How do I know that?
Well, I could’ve debugged it, or I could’ve used my
psychic powers to say,
“Gosh, the change in the scroll info has led to a change
in window styles, and the thread is trying to notify the
window of the change in style.
The window clearly belongs to another thread; otherwise we wouldn’t
be stuck in the first place,
and given that we see only two threads, there isn’t much choice
as to what other thread it could be!”
At this point, I think you see the deadlock. Thread 0 is waiting for thread 1 to exit the critical section, but thread 1 is waiting for thread 0 to process the style change message.
What happened here is that the program sent a message while holding a critical section. Since message handling can trigger hooks and cross-thread activity, you cannot hold any resources when you send a message because the hook or the message recipient might want to acquire that resource that you own, resulting in a deadlock.
0 comments