August 26th, 2020

Analyzing stalls and context switches in Timing Captures

Steven Pratschner
Program Manager

A context switch occurs when a CPU core switches from executing code on one thread, to executing code on a different thread, or going to idle. When the OS switches the thread that is running on a CPU, it must save and restore state, both for the thread that is being switched out, and for the thread that is being switched in. This state, or context, includes data such as the current value of all registers, the program counter and so on. Saving and restoring this state is an expensive operation from a performance perspective, so reducing the overall number of context switches is often key to achieving predictable, consistent frame times.

Context switches can happen when a thread waits on a synchronization object, when the OS scheduler determines a thread’s quantum has expired (and another thread is ready to run on that core) and so on.

Timing Captures in PIX have several features to help you determine when context switches occur, and to analyze what caused each switch.

Collecting context switch data

If you intend to analyze context switches, you’ll want to collect callstacks for each context switch. To collect callstacks, ensure that the Callstacks on context switches checkbox is selected before starting a capture. The checkbox is selected by default.

Image pix newtiming capture cswitch callstacks option

 

Finding stalls and context switches

By default, context switches are displayed as small, vertical red lines in the thread and core lanes in the Timeline. Periods of time when a thread is not running, known as stalled or unscheduled time, are drawn by default using a cross hatched pattern. Every period of unscheduled time is bordered by a context switch out at the beginning, and a context switch back in at the end. The default visualization for context switches and unscheduled time is shown in the following figure.

Image pix new timing capture core lane cswitch

If you’re looking at a timing capture specifically to find and analyze context switches, you can change the default visualization so that the switches stand out more. The Bold Context Switches option in the lane configuration extends the vertical red lines that identify a context switch the full height of the thread or core lane.

Image pix new timing capture bold cswitch

Often times it’s useful to be able to quickly identify the stalls and context switches that have occurred during the execution of a particular PIX event. When an event is selected in the Timeline or Range Details views, the information displayed in the Element Details view includes a list of all the stalls in time order. For example, the following figure shows an instance of an event named UpdatePlayerArmor that had 8 stalls.

Image pix new timing capture cswitch event element details

Context switches can also be found using the Range Details view. Select Context switches from the Items to show dropdown to see a list of all context switches for a selected time period. The list of context switches displayed in Range Details can be filtered by core using the Selector panel on the left-hand side of the view.

Image pix new timing capture cswitch range details

Analyzing stalls and context switches

Selecting a context switch in the Timeline or Range Details view populates the Element Details view with information about the context switch. Element Details shows which core the switch occurred on, the time at which the switch occurred, and the callstacks for the thread that was running before and the thread (if any) that was running after the context switch. The following figure shows that a call from a title’s AppMain thread to the Win32 API ReadFile caused a context switch. After the call, AppMain transitioned from running to Idle.

Image pix new timing capture cswitch element details

If a thread begins to run because of an action that occurred on another thread, information from the readying thread is displayed in both the Timeline and in Element Details. The context switch in the following figure shows that Core 0 goes from Idle to executing thread AppMain, and that AppMain was readied by a thread named Worker. The Element Details view includes the callstack for the readying thread, the readying time, and the readying reason. In the Timeline, an arrow is drawn from the readying thread (Worker) to the thread that is now running (AppMain).

Image pix new timing capture cswitch readythread

If a context switch involves a non-title process, PIX will show the process name and id in Element Details. The following figure shows that Core 0 switched from running the non-title process devenv.exe to running a title thread named AP1. In this case, the wait reason tells us that this switch occurred because of the operating system’s thread scheduler.

Image pix new timing capture cswitch otherproc

In addition to selecting individual context switches, an entire stall can also be selected. Selecting a stall gives you quick access to all three callstacks related to the stall: the callstack on the switch out, the callstack on the switch back in, and the readying thread callstack, if any. These callstacks are shown in Element Details along with other information about the stall including it’s duration as shown in the following figure.

Image pix new timing capture cswitch stallselect

 

 

Author

Steven Pratschner
Program Manager

I'm the Program Manager for the PIX CPU tools in the Gaming Division at Microsoft. PIX helps you identify the performance issues that may be affecting the frame rate of your DX12 AAA title on Windows and on Xbox.