November 13th, 2015

Diagnosing high CPU by studying profiling results, example

A customer asked for assistance determining why their program demonstrated sporadic high CPU. It occurred only at a client location, so they couldn’t set up a full debugging environment. They were able to convince the client to trace one of the spikes in a profiler. During this capture, high CPU was recorded for around 20% of the running time. Here is the drill-down of where most of that time was going.

Stack % Weight
[Root] 19.36
 |- ntdll.dll!RtlUserThreadStart 19.36
 |    kernel32.dll!BaseThreadInitThunk 19.36
 |    contoso.exe!__wmainCRTStartup 19.36
 |    contoso.exe!wWinMain 19.36
 | ⋮
 |    fabrikam.dll!Widget::~Widget 19.36
 |    fabrikam.dll!WidgetProxy::Disconnect 19.36
 |    fabrikam.dll!WidgetProxy::DisconnectAll 19.36
 |    fabrikam.dll!WidgetProxy::TransactCommand 19.36
 |    fabrikam.dll!WidgetProxy::WaitForResponse 19.36
 |    |- user32.dll!MsgWaitForMultipleObjectsEx 18.63
 |    |    |- kernel32.dll!WaitForMultipleObjectsExImplementation 17.37
 |    |    |    |- KernelBase.dll!WaitForMultipleObjectsEx 16.55
 |    |    |    |    |- ntdll.dll!ZwWaitForMultipleObjects 8.69
 |    |    |    |    |    |- ntoskrnl.exe!KiSystemServiceCopyEnd 4.76
 |    |    |    |    |    |    |- ntoskrnl.exe!NtWaitForMultipleObjects 4.71
 |    |    |    |    |    |    |    |- ntoskrnl.exe!ObpWaitForMultipleObjects 4.02
 |    |    |    |    |    |    |    |    |- ntoskrnl.exe!ObpWaitForMultipleObjects 4.02
 |    |    |    |    |    |    |    |    |- ntoskrnl.exe!KiInterruptDispatchNoLock <0.01
 |    |    |    |    |    |    |    |    |- ntoskrnl.exe!KiCheckForKernelApcDelivery <0.01
 |    |    |    |    |    |    |    |    |- ntoskrnl.exe!KiDpcInterrupt <0.01
 |    |    |    |    |    |    |    |    |- ntoskrnl.exe!KiInterruptDispatch <0.01
 |    |    |    |    |    |    |    |- ntoskrnl.exe!NtWaitForMultipleObjects 0.49
 |    |    |    |    |    |    |    |- ntoskrnl.exe!memcpy 0.20
 |    |    |    |    |    |    |    |- ntoskrnl.exe!KiInterruptDispatchNoLock <0.01
 |    |    |    |    |    |    |    |- ntoskrnl.exe!KiDpcInterrupt <0.01
 |    |    |    |    |    |    |- ntoskrnl.exe!KiSystemServiceCopyEnd 0.05
 |    |    |    |    |    |- ntoskrnl.exe!KiSystemCall64 1.84
 |    |    |    |    |    |- ntdll.dll!ZwWaitForMultipleObjects 1.20
 |    |    |    |    |    |- ntoskrnl.exe!KiSystemServiceExit 0.56
 |    |    |    |    |    |- ntoskrnl.exe!KiSystemServiceRepeat 0.16
 |    |    |    |    |    |- ntoskrnl.exe!KiSystemServiceGdiTebAccess 0.13
 |    |    |    |    |    |- ntoskrnl.exe!KiSystemServiceStart 0.03
 |    |    |    |    |    |- ntoskrnl.exe!KiSystemServiceCopyStart 0.01
 |    |    |    |    |    |- ntoskrnl.exe!KiInterruptDispatchNoLock <0.01
 |    |    |    |    |- KernelBase.dll!BaseSetLastNTError 6.57
 |    |    |    |    |    |- ntdll.dll!RtlNtStatusToDosError 6.33
 |    |    |    |    |    |    |- ntdll.dll!RtlNtStatusToDosErrorNoTeb 6.05
 |    |    |    |    |    |    |    |- ntdll.dll!RtlNtStatusToDosErrorNoTeb 6.04
 |    |    |    |    |    |    |    |- ntoskrnl.exe!KiDpcInterrupt <0.01
 |    |    |    |    |    |    |    |- ntoskrnl.exe!KiApcInterrupt <0.01
 |    |    |    |    |    |    |    |- ntoskrnl.exe!KiInterruptDispatchNoLock <0.01
 |    |    |    |    |    |    |- ntdll.dll!RtlNtStatusToDosError 0.29
 |    |    |    |    |    |    |- ntoskrnl.exe!KiInterruptDispatchNoLock <0.01
 |    |    |    |    |    |- KernelBase.dll!BaseSetLastNTError 0.19
 |    |    |    |    |    |- ntdll.dll!RtlSetLastWin32Error 0.05
 |    |    |    |    |- ntdll.dll!memcpy 0.15
 |    |    |    |    |- KernelBase.dll!memcpy 0.01
 |    |    |    |    |- ntoskrnl.exe!KiApcInterrupt <0.01
 |    |    |    |    |- ntoskrnl.exe!KiDpcInterrupt <0.01
 |    |    |    |- kernel32.dll!WaitForMultipleObjectsExImplementation 0.60
 |    |    |    |- ntdll.dll!memcpy 0.16
 |    |    |    |- kernel32.dll!WaitForMultipleObjectsEx 0.04
 |    |    |    |- kernel32.dll!memcpy 0.02
 |    |    |    |- ntoskrnl.exe!KiInterruptDispatchNoLock <0.01
 |    |    |    |- ntoskrnl.exe!KiApcInterrupt <0.01
 |    |    |- ntoskrnl.exe!KiDpcInterrupt <0.01
 |    |    |- ntoskrnl.exe!KiApcInterrupt <0.01
 |    |- KernelBase.dll!GetTickCount 0.24
 |    |- KernelBase.dll!GetLastError 0.20
 |    |- fabrikam.dll!GetLastError 0.09
 |    |- kernel32.dll!GetLastError 0.02

From this chart, you can see that all of the time is consumed in Widget­Proxy::Wait­For­Response, and most of that time is in Msg­Wait­For­Multiple­Objects­Ex. There is a lot of detail inside that function, so let’s hide everything that contributes less than one percent.

Stack % Weight
 |    fabrikam.dll!WidgetProxy::WaitForResponse 19.36
 |    |- user32.dll!MsgWaitForMultipleObjectsEx 18.63
 |    |    |- kernel32.dll!WaitForMultipleObjectsExImplementation 17.37
 |    |    |    |- KernelBase.dll!WaitForMultipleObjectsEx 16.55
 |    |    |    |    |- ntdll.dll!ZwWaitForMultipleObjects 8.69
 |    |    |    |    |    |- ntoskrnl.exe!KiSystemServiceCopyEnd 4.76
 |    |    |    |    |    |    |- ntoskrnl.exe!NtWaitForMultipleObjects 4.71
 |    |    |    |    |    |- ntoskrnl.exe!KiSystemCall64 1.84
 |    |    |    |    |- KernelBase.dll!BaseSetLastNTError 6.57

If we look only at the time spent in Wait­For­Multiple­Objects­Ex, 8.69%/16.55% = around half of the time is spent in Zw­Wait­For­Multiple­Objects (the kernel part of the function which does the waiting), and 6.57%/16.55% = around 40% of the time is the time setting the last error code.

That’s odd. Why is this function spending nearly half of its time setting the last error code?

My theory: Because the call is failing with an error!

That explains the high CPU. The call to Wait­For­Multiple­Objects­Ex is failing, which means that instead of waiting, it returns immediately with an error code. The Widget­Proxy::Wait­For­Response function doesn’t quite know what to do in that case, so it shrugs its shoulders and tries waiting again. Eventually, whatever it’s waiting for actually happens, and the call returns, but instead of waiting at low CPU, it accidentally created a CPU spin loop.

Topics
Code

Author

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

0 comments

Discussion are closed.