Today we’re going to debug a hang. Here are some of the (redacted) stacks of the process. I left some red herrings and other frustrations.
0: kd> !process ffffe000045ef940 7
PROCESS ffffe000045ef940
SessionId: 1 Cid: 0a50 Peb: 7ff6b661f000 ParentCid: 0a0c
DirBase: 12e5c6000 ObjectTable: ffffc0000288ae80 HandleCount: 1742.
Image: contoso.exe
THREAD ffffe000018d68c0 Cid 0a50.0a54 Teb: 00007ff6b661d000 Win32Thread: fffff90143635a90 WAIT: (WrUserRequest) UserMode Non-Alertable
ffffe000046192c0 SynchronizationEvent
nt!KiSwapContext+0x76
nt!KiSwapThread+0x14c
nt!KiCommitThreadWait+0x126
nt!KeWaitForSingleObject+0x1cc
nt!KeWaitForMultipleObjects+0x44e
0xfffff960`0038bed0
0x1
0xffffd000`24257b80
0xfffff901`43635a90
0xd
0xffffe000`00000001
0xfffff803`ffffff00
THREAD ffffe000045f88c0 Cid 0a50.0a8c Teb: 00007ff6b64ea000 Win32Thread: 0000000000000000 WAIT: (UserRequest) UserMode Non-Alertable
ffffe000041c1830 SynchronizationEvent
nt!KiSwapContext+0x76
nt!KiSwapThread+0x14c
nt!KiCommitThreadWait+0x126
nt!KeWaitForSingleObject+0x1cc
nt!NtWaitForSingleObject+0xb1
nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ ffffd000`248ebc40)
ntdll!ZwWaitForSingleObject+0xa
ntdll!RtlpWaitOnCriticalSection+0xe1
ntdll!RtlEnterCriticalSection+0x94
ntdll!LdrpAcquireLoaderLock+0x2c
ntdll!LdrShutdownThread+0x64
ntdll!RtlExitUserThread+0x3e
KERNELBASE!FreeLibraryAndExitThread+0x4c
combase!CRpcThreadCache::RpcWorkerThreadEntry+0x62
KERNEL32!BaseThreadInitThunk+0x30
ntdll!RtlUserThreadStart+0x42
THREAD ffffe00003c46080 Cid 0a50.0a9c Teb: 00007ff6b64e6000 Win32Thread: fffff90143713a90 WAIT: (UserRequest) UserMode Non-Alertable
ffffe000041c1830 SynchronizationEvent
nt!KiSwapContext+0x76
nt!KiSwapThread+0x14c
nt!KiCommitThreadWait+0x126
nt!KeWaitForSingleObject+0x1cc
nt!NtWaitForSingleObject+0xb1
nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ ffffd000`367ece40)
ntdll!ZwWaitForSingleObject+0xa
ntdll!RtlpWaitOnCriticalSection+0xe1
ntdll!RtlEnterCriticalSection+0x94
ntdll!LdrpAcquireLoaderLock+0x4c
ntdll!LdrpFindOrMapDll+0x75d
ntdll!LdrpLoadDll+0x394
ntdll!LdrLoadDll+0xc6
kernelbase!LoadLibraryExW+0x142
kernelbase!LoadLibraryExA+0x26
contoso!__delayLoadHelper2+0x2b
contoso!_tailMerge_Winmm_dll+0x3f
contoso!PolarityReverser::OnCompleted+0x28
contoso!PolarityReverser::Reverse+0xf4
contoso!ListItem::ReversePolarity+0x7e
contoso!View::OnContextMenu+0x8
contoso!View::WndProc+0x25e
user32!UserCallWinProcCheckWow+0x13a
user32!DispatchClientMessage+0xf8
user32!__fnEMPTY+0x2d
ntdll!KiUserCallbackDispatcherContinue
user32!ZwUserMessageCall+0xa
user32!RealDefWindowProcWorker+0x1e2
user32!RealDefWindowProcW+0x52
uxtheme!_ThemeDefWindowProc+0x33e
uxtheme!ThemeDefWindowProcW+0x11
user32!DefWindowProcW+0x1b6
comctl32!CListView::WndProc+0x25e
comctl32!CListView::s_WndProc+0x52
user32!UserCallWinProcCheckWow+0x13a
user32!SendMessageWorker+0xa72
user32!SendMessageW+0x10a
comctl32!CLVMouseManager::HandleMouse+0xd10
comctl32!CLVMouseManager::OnButtonDown+0x27
comctl32!CListView::WndProc+0x1a4186
comctl32!CListView::s_WndProc+0x52
user32!UserCallWinProcCheckWow+0x13a
user32!DispatchMessageWorker+0x1a7
THREAD ffffe0000462b8c0 Cid 0a50.0ac0 Teb: 00007ff6b64dc000 Win32Thread: 0000000000000000 WAIT: (UserRequest) UserMode Non-Alertable
ffffe0000462c980 NotificationEvent
nt!KiSwapContext+0x76
nt!KiSwapThread+0x14c
nt!KiCommitThreadWait+0x126
nt!KeWaitForSingleObject+0x1cc
nt!NtWaitForSingleObject+0xb1
nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ ffffd000`201e9c40)
ntdll!ZwWaitForSingleObject+0xa
KERNELBASE!WaitForSingleObjectEx+0xa5
contoso!CNetworkManager::ThreadProc+0x94
KERNEL32!BaseThreadInitThunk+0x30
ntdll!RtlUserThreadStart+0x42
THREAD ffffe000046ad340 Cid 0a50.0b38 Teb: 00007ff6b64b6000 Win32Thread: 0000000000000000 WAIT: (UserRequest) UserMode Non-Alertable
ffffe000049108c0 Thread
nt!KiSwapContext+0x76
nt!KiSwapThread+0x14c
nt!KiCommitThreadWait+0x126
nt!KeWaitForSingleObject+0x1cc
nt!NtWaitForSingleObject+0xb1
nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ ffffd000`2563bc40)
ntdll!ZwWaitForSingleObject+0xa
KERNELBASE!WaitForSingleObjectEx+0xa5
litware!CDiscovery::Uninitialize+0x8c
litware!CApiInstance::~CApiInstance+0x48
litware!CApiInstance::`scalar deleting destructor'+0x14
litware!std::tr1::_Ref_count_obj<CApiInstance>::_Destroy+0x31
litware!std::tr1::_Ref_count_base::_Decref+0x1b
litware!std::tr1::_Ptr_base<CApiInstance>::_Decref+0x20
litware!std::tr1::shared_ptr<CApiInstance>::{dtor}+0x20
litware!std::tr1::shared_ptr<CApiInstance>::reset+0x3c
litware!CSingleton<CApiInstance>::ReleaseRef+0x97
litware!LitWareUninitialize+0xed
fabrikam!CDoodadHelper::~CDoodadHelper+0x67
fabrikam!_CRT_INIT+0xda
fabrikam!__DllMainCRTStartup+0x1e5
ntdll!LdrpCallInitRoutine+0x57
ntdll!LdrpProcessDetachNode+0xfe
ntdll!LdrpUnloadNode+0x77
ntdll!LdrpDecrementNodeLoadCount+0xd0
ntdll!LdrUnloadDll+0x34
KERNELBASE!FreeLibrary+0x22
combase!CClassCache::CDllPathEntry::CFinishObject::Finish+0x28
combase!CClassCache::CFinishComposite::Finish+0x80
combase!CClassCache::FreeUnused+0xda
combase!CoFreeUnusedLibrariesEx+0x2c
combase!CDllHost::MTAWorkerLoop+0x7d
combase!CDllHost::WorkerThread+0x122
combase!CRpcThread::WorkerLoop+0x4e
combase!CRpcThreadCache::RpcWorkerThreadEntry+0x46
KERNEL32!BaseThreadInitThunk+0x30
ntdll!RtlUserThreadStart+0x42
THREAD ffffe000046db8c0 Cid 0a50.0b50 Teb: 00007ff6b64aa000 Win32Thread: fffff9014370da90 WAIT: (UserRequest) UserMode Non-Alertable
ffffe000046dcae0 NotificationEvent
ffffe000046dd3c0 SynchronizationEvent
nt!KiSwapContext+0x76
nt!KiSwapThread+0x14c
nt!KiCommitThreadWait+0x126
nt!KeWaitForMultipleObjects+0x22b
nt!ObWaitForMultipleObjects+0x1f8
nt!NtWaitForMultipleObjects+0xde
nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ ffffd000`21801c40)
ntdll!ZwWaitForMultipleObjects+0xa
KERNELBASE!WaitForMultipleObjectsEx+0xe1
USER32!MsgWaitForMultipleObjectsEx+0x14e
contoso!EventManagerImpl::MessageLoop+0x32
contoso!EventManagerImpl::BackgroundProcessing+0x134
ntdll!TppWorkpExecuteCallback+0x2eb
ntdll!TppWorkerThread+0xa12
KERNEL32!BaseThreadInitThunk+0x30
ntdll!RtlUserThreadStart+0x42
THREAD ffffe000049108c0 Cid 0a50.06cc Teb: 00007ff6b6470000 Win32Thread: 0000000000000000 WAIT: (UserRequest) UserMode Non-Alertable
ffffe000041c1830 SynchronizationEvent
nt!KiSwapContext+0x76
nt!KiSwapThread+0x14c
nt!KiCommitThreadWait+0x126
nt!KeWaitForSingleObject+0x1cc
nt!NtWaitForSingleObject+0xb1
nt!KiSystemServiceCopyEnd+0x13
ntdll!ZwWaitForSingleObject+0xa
ntdll!RtlpWaitOnCriticalSection+0xe1
ntdll!RtlEnterCriticalSectionContended+0x94
ntdll!LdrpAcquireLoaderLock+0x2c
ntdll!LdrShutdownThread+0x64
ntdll!RtlExitUserThread+0x3e
KERNEL32!BaseThreadInitThunk+0x38
ntdll!RtlUserThreadStart+0x42
Since debugging is an exercise in optimism, let’s ignore the stacks that didn’t come out properly. If we can’t make any headway, we can try to fix them, but let’s be hopeful that the stacks that are good will provide enough information.
Generally speaking, the deeper the stack, the more interesting it is, because uninteresting threads tend to be hanging out in their message loop or event loop, whereas interesting threads are busy doing something and have a complex stack trace to show for it.
Indeed, one of the deep stacks belongs to thread
0a9c,
and it also has a very telling section:
ntdll!RtlpWaitOnCriticalSection+0xe1
ntdll!RtlEnterCriticalSection+0x94
ntdll!LdrpAcquireLoaderLock+0x4c
ntdll!LdrpFindOrMapDll+0x75d
ntdll!LdrpLoadDll+0x394
ntdll!LdrLoadDll+0xc6
kernelbase!LoadLibraryExW+0x142
kernelbase!LoadLibraryExA+0x26
contoso!__delayLoadHelper2+0x2b
contoso!_tailMerge_Winmm_dll+0x3f
The polarity reverser’s completion handler is trying to
load winmm via delay-load.
That load request is waiting on a critical section,
and it should be clear both from the scenario and the function names
that the critical section it is trying to claim is the loader lock.
In real life, I just proceeded with that conclusion, but
but just for demonstration purposes, here’s how we can double-check:
0: kd> .thread ffffe00003c46080 0: kd> kn *** Stack trace for last set context - .thread/.cxr resets it # Call Site 00 nt!KiSwapContext+0x76 01 nt!KiSwapThread+0x14c 02 nt!KiCommitThreadWait+0x126 03 nt!KeWaitForSingleObject+0x1cc 04 nt!NtWaitForSingleObject+0xb1 05 nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ ffffd000`367ece40) 06 ntdll!ZwWaitForSingleObject+0xa 07 ntdll!RtlpWaitOnCriticalSection+0xe1 08 ntdll!RtlEnterCriticalSection+0x94 09 ntdll!LdrpAcquireLoaderLock+0x4c 0a ntdll!LdrpFindOrMapDll+0x75d 0b ntdll!LdrpLoadDll+0x394 0c ntdll!LdrLoadDll+0xc6 0d kernelbase!LoadLibraryExW+0x142 0e kernelbase!LoadLibraryExA+0x26 0f contoso!__delayLoadHelper2+0x2b 10 contoso!_tailMerge_Winmm_dll+0x3f
We need to grab the critical section passed to
RtlEnterCriticalSection,
but since this is an x64 machine, the parameter was passed
in registers, not on the stack, so we need to figure out where
the rcx register got stashed.
I’m going to assume that the same critical section is
the first (only?) parameter to
RtlpWaitOnCriticalSection.
I don’t know this for a fact, but it seems like a reasonable guess.
The guess might be wrong; we’ll see.
We disassemble the function look to see where it stashes rcx.
0: kd> u ntdll!RtlpWaitOnCriticalSection
mov qword ptr [rsp+18h],rbx
push rbp
push rsi
push rdi
push r12
push r13
push r14
push r15
mov rax,qword ptr [ntdll!__security_cookie (000007ff`3099d020)]
xor rax,rsp
mov qword ptr [rsp+80h],rax
mov r14,qword ptr gs:[30h]
xor r12d,r12d
lea rax,[ntdll!LdrpLoaderLock (00007fff`d4f51cb8)]
mov r15d,r12d
cmp rcx,rax
mov ebp,edx
sete r15b
mov rbx,rcx // ⇐ Bingo
Awesome, we can suck rbx out of the trap frame.
0: kd> .trap ffffd000`367ece40 rax=0000000000000000 rbx=00007fffd4f51cb8 rcx=000007f8136f2c2a rdx=0000000000000000 rsi=00000000000001e8 rdi=0000000000000000 rip=000007f8136f2c2a rsp=000000000cf7f798 rbp=0000000000000000 r8=000000000cf7f798 r9=0000000000000000 r10=0000000000000000 r11=0000000000000344 r12=0000000000000000 r13=0000000000000000 r14=000007f696870000 r15=000000007ffe0382 iopl=0 nv up ei pl zr na po nc cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000246 ntdll!ZwWaitForSingleObject+0xa: 000007f8`136f2c2a c3 ret
Okay, let’s see if that value in rbx pans out.
0: kd> !cs 0x00007fff`d4f51cb8 ----------------------------------------- Critical section = 0x00007fffd4f51cb8 (ntdll!LdrpLoaderLock+0x0) DebugInfo = 0x00007fffd4f55228 LOCKED LockCount = 0x8 WaiterWoken = No OwningThread = 0x0000000000000b38 RecursionCount = 0x1 LockSemaphore = 0x1A8 SpinCount = 0x0000000004000000
Hooray, we confirmed that this is indeed the loader lock.
I would have been surprised if it had been anything else!
(If you had been paying attention, you would have noticed
the lea rax,[ntdll!LdrpLoaderLock (00007fff`d4f51cb8)]
in the disassembly which already confirms the value.)
We also see that the owning thread is 0xb38. Here’s its stack again:
nt!KiSwapContext+0x76
nt!KiSwapThread+0x14c
nt!KiCommitThreadWait+0x126
nt!KeWaitForSingleObject+0x1cc
nt!NtWaitForSingleObject+0xb1
nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ ffffd000`2563bc40)
ntdll!ZwWaitForSingleObject+0xa
KERNELBASE!WaitForSingleObjectEx+0xa5
litware!CDiscovery::Uninitialize+0x8c
litware!CApiInstance::~CApiInstance+0x48
litware!CApiInstance::`scalar deleting destructor'+0x14
litware!std::tr1::_Ref_count_obj<CApiInstance>::_Destroy+0x31
litware!std::tr1::_Ref_count_base::_Decref+0x1b
litware!std::tr1::_Ptr_base<CApiInstance>::_Decref+0x20
litware!std::tr1::shared_ptr<CApiInstance>::{dtor}+0x20
litware!std::tr1::shared_ptr<CApiInstance>::reset+0x3c
litware!CSingleton<CApiInstance>::ReleaseRef+0x97
litware!LitWareUninitialize+0xed
fabrikam!CDoodadHelper::~CDoodadHelper+0x67
fabrikam!_CRT_INIT+0xda
fabrikam!__DllMainCRTStartup+0x1e5
ntdll!LdrpCallInitRoutine+0x57
ntdll!LdrpProcessDetachNode+0xfe
ntdll!LdrpUnloadNode+0x77
ntdll!LdrpDecrementNodeLoadCount+0xd0
ntdll!LdrUnloadDll+0x34
KERNELBASE!FreeLibrary+0x22
combase!CClassCache::CDllPathEntry::CFinishObject::Finish+0x28
combase!CClassCache::CFinishComposite::Finish+0x80
combase!CClassCache::FreeUnused+0xda
combase!CoFreeUnusedLibrariesEx+0x2c
combase!CDllHost::MTAWorkerLoop+0x7d
combase!CDllHost::WorkerThread+0x122
combase!CRpcThread::WorkerLoop+0x4e
combase!CRpcThreadCache::RpcWorkerThreadEntry+0x46
KERNEL32!BaseThreadInitThunk+0x30
ntdll!RtlUserThreadStart+0x42
Reading from the bottom up, we see that this thread is doing
some work on behalf of COM;
specifically, it is freeing unused libraries.
The fabrikam library presumably responded
S_OK to
DllCanUnloadNow,
so COM says,
“Okay, then out you go.”
As part of
DLL_PROCESS_DETACH processing,
the C++ runtime library runs global destructors.
The
CDoodadHelper destructor calls into
the LitWareUninitialize function in
litware.dll.
That function decrements a reference count, and it appears
that the reference count went to zero because it’s destructing
the CApiInstance object.
The destructor for that function calls
CDiscovery::Uninitialize,
and that function waits on a kernel object.
The debugger was kind enough to tell us what the object is:
THREAD ffffe000046ad340 Cid 0a50.0b38 Teb: 00007ff6b64b6000 Win32Thread: 0000000000000000 WAIT: (UserRequest) UserMode Non-Alertable
ffffe000049108c0 Thread
It’s a thread.
<!–
But in case we didn’t know that,
here’s how we could have figured it out anyway:
Look at the handle that was passed to
WaitForSingleObject:
0: kd> kn
*** Stack trace for last set context - .thread/.cxr resets it
# Call Site
00 nt!KiSwapContext+0x76
01 nt!KiSwapThread+0x14c
02 nt!KiCommitThreadWait+0x126
03 nt!KeWaitForSingleObject+0x1cc
04 nt!NtWaitForSingleObject+0xb1
05 nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ ffffd000`2563bc40)
06 ntdll!ZwWaitForSingleObject+0xa
07 KERNELBASE!WaitForSingleObjectEx+0xa5
08 litware!CDiscovery::Uninitialize+0x8c
0: kd> .frame 7
07 KERNELBASE!WaitForSingleObjectEx+0xa5
0: kd> dv
hHandle = 0x00000000`000016e8
dwMilliseconds = 0xffffffff
bAlertable = 0n0
...
The code is waiting infinitely (0xffffffff)
for handle 0x16e8.
Let’s see what that handle refers to.
0: kd> !handle 16e8 f
PROCESS ffffe000045ef940
SessionId: 1 Cid: 0a50 Peb: 7ff6b661f000 ParentCid: 0a0c
DirBase: 12e5c6000 ObjectTable: ffffc0000288ae80 HandleCount: 1742.
Image: contoso.exe
Handle table at ffffc0000288ae80 with 1742 entries in use
16e8: Object: ffffe000049108c0 GrantedAccess: 001fffff (Protected) Entry: ffffc00002b8aba0
Object: ffffe000049108c0 Type: (ffffe000001fff20) Thread
ObjectHeader: ffffe00004910890 (new version)
HandleCount: 2 PointerCount: 32770
That value ffffe000049108c0
matches the value reported by the
thread dump at the start.
–>
Going back to the thread dump at the start, we also can see what thread
ffffe000049108c0 is doing.
Here it is again:
nt!KiSwapContext+0x76
nt!KiSwapThread+0x14c
nt!KiCommitThreadWait+0x126
nt!KeWaitForSingleObject+0x1cc
nt!NtWaitForSingleObject+0xb1
nt!KiSystemServiceCopyEnd+0x13
ntdll!ZwWaitForSingleObject+0xa
ntdll!RtlpWaitOnCriticalSection+0xe1
ntdll!RtlEnterCriticalSectionContended+0x94
ntdll!LdrpAcquireLoaderLock+0x2c
ntdll!LdrShutdownThread+0x64
ntdll!RtlExitUserThread+0x3e
KERNEL32!BaseThreadInitThunk+0x38
ntdll!RtlUserThreadStart+0x42
That thread is trying to acquire the loader lock so it can
send DLL_THREAD_DETACH notifications.
But the loader lock is held by the
FreeLibrary.
Result: Deadlock, as the two threads are waiting for each other.
(You can also see that thread 0xa8c is stuck
in the same place because it too is trying to exit.)
The underlying problem is that the
Fabrikam DLL is waiting on a thread (indirectly via LitWare)
while inside its own DllMain.
The Fabrikam code could avoid this problem by calling
LitWareUninitialize
when its last object is destroyed
rather than when the DLL is unloaded.
(Of course, it also has to remember to call
LitWareInitialize when its first
object is created.)
0 comments