PIX CPU Tools: Finding and Analyzing Context Switches
A context switch occurs when a CPU core switches from executing code on one thread, to executing code on a different thread, or going to idle. When the OS switches the thread that is running on a CPU, it must save and restore state both for the thread that is being switched out, and for the thread that is being switched in. This state, or context, includes data such as the current value of all registers, the program counter and so on. Saving and restoring this state is an expensive operation from a performance perspective, so reducing the overall number of context switches is often key to achieving predictable, consistent frame times.
Context switches can happen when a thread waits on a synchronization object, when the OS scheduler determines a thread’s quantum has expired (and another thread is ready to run on that core) and so on.
The new implementation of Timing Captures in PIX has several features to help you determine when context switches occur, and to analyze what caused each switch.
Finding Context Switches
Context switches are shown in the Timeline as vertical red lines on both the Core and Thread lanes.
Closely related to the display of context switches in the Timeline is the display of unscheduled, or “swapped out” time. Unscheduled time is shown as a cross hatched pattern in the lane. Every period of unscheduled time is bordered by a context switch out at the beginning, and a context switch back in at the end.
The display of context switches and unscheduled time in the Timeline is useful for quickly visualizing when a frame or region of time has an excessive number of context switches. The Metrics view can often be used to find regions of time that may contain a large number of context switches. While PIX doesn’t currently support graphing the number of context switches or the length of stalls directly, it does allow graphing event durations. Often times, if you graph the duration of your CPU frame event over the duration of the capture, the spikes in frame time correspond to those frames that contain a large number of context switches.
Once you find a frame or region of time that contains several context swtiches, you’ll often want to navigate sequentially through all the context switches that occurred during that time period.
The PIX UI offers a few different options for sequentially navigating through context switches. First, the Element Details view includes a chronological list of all stalls that occurred while a selected event is running. If you have an event that represents a frame of CPU time, you can use Element Details to navigate through all context switches that occurred in a particular frame. For example, the following picture shows a selected frame-level event that includes 22 stalls.
Element Details includes an Out and an In button for each context switch. Pressing these buttons will navigate to the switch in the timeline and populate Element Details with information about the switch. The length of each stall, in nanoseconds, is displayed for each pair of context switches.
Note that pressing the Out and In buttons changes the contents of Element Details. To return to the list of stalls for your selected event, use the back button in the upper left corner of Element Details.
Another way to navigate through a list of context switches is to use the Range Details view. This technique is similar in some ways to navigating using the list of stalls in Element Details, with a key difference being that Range Details will list the context switches for all threads and cores in the region of time you’ve selected, not just the context switches for a single PIX event.
After selecting a range of time, use the Items to Show dropdown to view the list of all context switches that occurred during that time range.
By default, the list of context switches in Range Details is shown in chronological order. Clicking on one of the columns in the table changes the sort order.
Selecting a context switch in Range Details populates Element Details with information about the context switch, including its callstack. The up and down arrow keys can be used to quickly navigate the list of context switches.
Analyzing Context Switches
The primary view used to determine why a context switch occurred is Element Details. When a context switch is selected either in the Timeline or Range Details, the Element Details view is populated with additional data about that context switch. This data includes the time at which the context switch occurred, the core on which it occurred, and callstacks for the From, To, and Readying threads.
Note that for context switch callstacks to be collected, you must enable the Capture callstacks on context switches option when you start the capture. This option is on by default.
You’ll also want to make sure that you’ve configured PIX to be able to locate your title’s PDBs. Without the PDBs, the context switch callstacks displayed in Element Details will contain only function addresses, not function names.
By studying the From, To and Readying callstacks, you’ll be able to determine why a context switch occurred. For example, the context switch in the following diagram shows the render thread beginning to run again because the main thread dispatched some rendering work.
The To and From threads in a context switch won’t necessarily both be threads in your title. It’s quite possible for a thread in your title to be switched out in favor of a thread in a different process. When this occurs, Element Details includes the process ID for the non-title process.
The fact that Element Details shows the process ID but not the process name is a bug that we’ll fix in a future release. For now, you’ll have to use a utility like tlist or Task Manager to map the process ID to a name. For example, the following picture shows a context switch where one of my title threads was swapped out to run Microsoft Teams. I used Task Manager to identify the external process in this case.
The core lanes also contain information about which non-title process was running at any point in time. In the picture above, the core lanes tell me that core 5 switched from running my game to running Teams.
By default, the context switch information for threads in non-title processes do not contain callstacks. If you’d like to capture callstacks for all processes, uncheck the Limit Context Switch Callstacks to Target Process Only in Settings. But be forewarned: collecting callstacks for all processes is expensive!
Program Manager – PIX team