Analyzing CPU samples in Timing Captures
PIX includes a CPU sampling profiler that can optionally be run when taking a Timing Capture. Collecting CPU samples allows you to analyze how functions within your title are impacting performance.
Viewing CPU samples is useful in several scenarios. For example, CPU samples can help you determine what code is running on a thread or core for portions of your title that either have sparse or no instrumentation with PIX events. Diagnosing performance issues using CPU samples in this scenario is more efficient than having to add additional instrumentation to your title, rebuild and redeploy.
In addition, looking at an aggregated view of the CPU samples taken over a relatively long capture can be used to find which functions are executed the most frequently and which take the longest to execute.
Collecting CPU Samples
Before collecting samples, PIX must be configured to find your title’s PDBs. Symbol information in the PDBs is used to display function names for the CPU samples PIX collects. See Configuring PIX to access CPU PDBs for more information.
To collect CPU samples, select the CPU Samples checkbox in the CPU section of the Timing Capture options pane. A sampling interval also must be specified using the Sampling Rate dropdown. The three built in options for sampling interval allow you to find a balance between the increased resolution you’ll get with more CPU samples and the additional overhead a higher sampling rate will incur.
When a Timing Capture is opened, PIX displays information about the samples collected in the Timeline, Element Details and Range Details views.
CPU samples in the Timeline and Element Details view
Individual CPU samples are shown on the Thread and Core lanes in the Timeline as vertical black lines just above the thread or core indicator bar. Selecting a sample populates the Element Details view with the sample’s callstack.
The following figure shows a PIX event named RunSimulation that takes several milliseconds to complete and has no child PIX events. Selecting individual samples gives you more detail about what code is running during this period of time in which the title hasn’t been instrumented with more granular PIX events. For example, looking at the callstack for the selected sample in the following figure shows that a function named UpdateEnemyPositions is running and that it is allocating memory using the new operator.
Customizing the display of CPU Samples
The display of CPU samples can be customized using either the gear icon next to a thread or core lane, or by using the Lane Configuration panel. The height of the vertical black line that represents a sample can be customized, as well as whether samples should be shown in the lane at all as shown in the following figure.
Analyzing aggregated CPU samples
When a range of time is selected in the Timeline, the Range Details view can be used to see an aggregation of all samples that occurred in the selected time range. The samples are aggregated across all threads and cores and can be viewed either as a stack tree or a flat function list. Select Sampled Functions from the Items to show dropdown in Range Details to view the aggregated samples.
Viewing the aggregated samples as a stack tree allows you to see which callstacks occurred most frequently in the selected time range. The number of times a sample landed in a given function, both inclusive and exclusive, along with the inclusive and exclusive percentages are shown as columns for each function in the stack tree.
The following figure shows the aggregated stack tree for a range of time that corresponds to an instance of the RunSimulation PIX event. The Inc % column in the stack tree shows that 33.33% of the samples occurred in the UpdateEnemyPositions function or it’s children.
To view the aggregated samples as a flat list, open the Display Options panel and switch the slider from Stack Tree to Function List. The Display Options panel can also be used to customize which columns appear in the events list.