Performance Monitor Unit (PMU) events are used to measure CPU performance and understand workloads CPU characterization. Windows provides a way to collect PMU events through Event Tracing for Windows (ETW). When combined with other ETW events, we can tell a lot more concrete story about the performance. I recently added “Recording Hardware performance (PMU) Events” in docs.microsoft.com site. Using the hardware counter is an advanced subject and I couldn’t add all the specific details. This article is a recap of the documentation with focuses on how to use the perf tools such as WPR and Xperf to use PMU events. I will keep the same order of subjects. It is a good idea to keep the prerequisite document on the side.
Prerequisite reading: Recording Hardware performance (PMU) Events
First things first, we should clarify the words to be used frequently.
- PMU – Performance Monitoring Unit in CPU
- PMU events – Processor performance events, aka profile sources, not to be confused with Windows ETW events
- PMC – Performance monitoring counter in CPU, which counts PMU events.
Enumerating PMU events supported in the system
First you find PMU events supported on your system with the command wpr –pmcsources
or xperf -pmcsources
The output shows available IDs and names of PMU events (aka profile sources), their default and range of intervals, and the logger number(session) that is currently using the events.
Logger column
The logger number will show up only if a session is collecting PMU events on ETW events and OS is Windows 11 22453 or later.Also, pmcsession command shows the session information that use PMU events on ETW events. The same command exists for Xperf.exe. The command shows mostly redundant information as pmcsources
except ‘Hook Ids’.
PMU events are for one session only!
The logger number in the command ouput is useful to check whether anyone else is using the PMU event. If another session is using the PMU event, the slot is taken, and you will not get the PMU events in your session. In the picture above, the session with logger number 32 is using TotalCycles and InstructionRetired. If you start a trace making use of those PMU events, you won’t find those PMU events in your trace. To make sure PMU events are going to make to your session, it is helpful to set Strict=”true” in the <HardwareCounter> in the custom profile or -strict option in Xperf to simply fail to start trace. WPR returns the error 0x800700aa if any PMU events are already in use.
If PMU event resource is already in use, you will have to stop session #32 first. Refer to the previous posts if you need help on how to enumerate sessions and stop a session.
One more thing to consider is that the pmcsources command output only shows the logger number that is collecting PMU events on ETW events. If a session was collecting PMU events on sampling basis, wpr -pmcsources
wouldn’t show the logger number. In that case, you will have to find if any system sessions are using “pmc_profile” keyword as the keyword is used to sample on PMU events by enumerating sessions using xperf.
Collecting PMU Events
CPUs have multiple Performance Monitoring Counters (PMCs) that can count PMU events. ETW (Event Tracing for Windows) provides ways to collect those counter values. You can either collect PMU events whenever certain kernel events fire or on sampling basis, that is whenever a PMC interrupt fires due to the counter overflow.
Depending on the platform, the number of counters is different and the counter to the PMU event may not be a 1 to 1 mapping. Also, many PMU events can only be mapped to a subset of the counters. We’ve not implemented support for that constraint either by allowing explicit mapping or allowing the constraint to be described in the configuration data and finding if there’s an available mapping yet.
Collecting PMU Events on ETW Events – Example WPRP profile
Below is the custom profile that logs TotalCycles and InstructionRetired counter values on CSwitch events. You can add more ETW events up to four but consider a pair of them that are start/stop to make the trace more useful. CSwitch is such an event that is both a beginning and end since we are ending execution of one thread and beginning another.
To start and stop trace use wpr -start profilename.wprp -filemode
and wpr -stop filename.etl
.
<?xml version="1.0" encoding="utf-8"?> <WindowsPerformanceRecorder Version="1.0"> <Profiles> <SystemCollector Id="SystemCollector_PMC" Base="" Name="WPRSystemCollector"> <BufferSize Value="1024" /> <!-- PMU events can be verbose. --> <Buffers Value="256" /> </SystemCollector> <SystemProvider Id="SystemProvider_ProcThreadForEventCounters"> <Keywords> <Keyword Value="Loader"/> <Keyword Value="ProcessThread"/> <Keyword Value="CSwitch"/> <!--Need to turn on CSwitch system keyword to trace CSwitch event in HardwareCounter --> </Keywords> </SystemProvider> <HardwareCounter Id="HardwareCounters_EventCounters" Strict=”true”> <!-- Optional Strict attribute to hard fail of any issues --> <Counters> <Counter Value="TotalCycles"/> <!-- beware the event is not used by another session. Only one session can use the event --> <Counter Value="InstructionRetired" /> </Counters> <Events> <Event Value="CSwitch"/> <!-- Counter values are logged on this event. Turn on appropriate keyword in SystemProvider --> </Events> </HardwareCounter> <!--Profile to capture the counters with ETW events --> <Profile Id="PMCE.Verbose.File" LoggingMode="File" Name="PMCE" DetailLevel="Verbose" Description="PMC Test"> <Collectors Operation="Add"> <SystemCollectorId Value="SystemCollector_PMC"> <SystemProviderId Value="SystemProvider_ProcThreadForEventCounters"></SystemProviderId> <HardwareCounterId Value="HardwareCounters_EventCounters"></HardwareCounterId> </SystemCollectorId> </Collectors> </Profile> </Profiles> </WindowsPerformanceRecorder>
Sampling on PMC Overflow – Example WPRP Profile
Below is the custom profile to sample PMCs in their frequency.
<?xml version="1.0" encoding="utf-8"?> <WindowsPerformanceRecorder Version="1.0"> <Profiles> <SystemCollector Id="SystemCollector_PMC" Base="" Name="WPRSystemCollector"> <BufferSize Value="1024" /> <!-- PMU events can be verbose. --> <Buffers Value="256" /> </SystemCollector> <SystemProvider Id="SystemProvider_ProcThreadForSamplingCounters" Base=""> <Keywords> <Keyword Value="Loader"/> <Keyword Value="ProcessThread"/> <Keyword Value="PmcProfile" /> <!-- required for sampling counters--> </Keywords> </SystemProvider> <HardwareCounter Id="HardwareCounters_SamplingCounters" Strict="true"> <!-- Optional Strict attribute to hard fail of any issues --> <SampledCounters> <SampledCounter Value="CacheMisses" /> <!-- sampling(periodic) counters using the default interval--> <SampledCounter Value="BranchMispredictions" Interval="100000"/> <!-- interval is number of events --> </SampledCounters> </HardwareCounter> <!-- Profile to capture the counters on PMC overflow aka. rollover --> <Profile Id="PMC.Verbose.File" Base="" LoggingMode="File" Name="PMC" DetailLevel="Verbose" Description="PMC Test"> <Collectors Operation="Add"> <SystemCollectorId Value="SystemCollector_PMC"> <SystemProviderId Value="SystemProvider_ProcThreadForSamplingCounters"></SystemProviderId> <HardwareCounterId Value="HardwareCounters_SamplingCounters"></HardwareCounterId> </SystemCollectorId> </Collectors> </Profile> </Profiles> </WindowsPerformanceRecorder>
Configuring Extended PMU Counter Configurations
Configuration using WPRP Custom Profile
The example below shows how to define such counters in the custom profile. Unlike the registration using the registry, this registration will not persist after the system reboot unless you set Persist=”true”.
<?xml version="1.0" encoding="utf-8"?> <WindowsPerformanceRecorder Version="1.0"> <Profiles> <SystemCollector Id="SystemCollector_PMC" Name="WPRSystemCollector"> <BufferSize Value="1024" /> <!-- PMU events can be verbose. --> <Buffers Value="256" /> </SystemCollector> <SystemProvider Id="SystemProvider_ProcThreadForSamplingCounters"> <Keywords> <Keyword Value="Loader"/> <Keyword Value="ProcessThread"/> <Keyword Value="PmcProfile" /> <!-- required for sampling counters--> </Keywords> </SystemProvider> <MicroArchitecturalConfig Id="CounterConfig_Mine"> <!-- Optional Strict attribute to hard fail of any issues --> <ProfileSources Architecture="AMD" Family="6" Model="158" Stepping="10"> <!-- The values are examples only !--> <ProfileSource Name="UOPS_ISSUED.ANY" Event="0xE" Unit="0x01" Interval="0x02000003" AllowsHalt="false" Persist="false"/> <ProfileSource Name="DTLB_LOAD_MISSES.WALK_COMPLETED_4K" Event="0x8" Unit="0x02" Interval="0x02000003" AllowsHalt="false" Persist="false"/> <ProfileSource Name="L1D_PEND_MISS.PENDING_CYCLES_ANY" Event="0x48" Unit="0x01" Interval="0x02000003" ExtendedBits="01000100"/> </ProfileSources> </MicroArchitecturalConfig> <HardwareCounter Id="HardwareCounters_MyCounters"> <MicroArchitecturalConfigId Value="CounterConfig_Mine"></MicroArchitecturalConfigId> <SampledCounters> <SampledCounter Value="UOPS_ISSUED.ANY" Interval="2147483647"/> <SampledCounter Value="DTLB_LOAD_MISSES.WALK_COMPLETED_4K" Interval="2147483647"/> <SampledCounter Value="L1D_PEND_MISS.PENDING_CYCLES_ANY" Interval="2147483647"/> </SampledCounters> </HardwareCounter> <Profile Id="PMCExtended.Verbose.File" LoggingMode="File" Name="PMCExtended" DetailLevel="Verbose" Description="PMC Test"> <Collectors Operation="Add"> <SystemCollectorId Value="SystemCollector_PMC"> <SystemProviderId Value="SystemProvider_ProcThreadForSamplingCounters"></SystemProviderId> <HardwareCounterId Value="HardwareCounters_MyCounters"></HardwareCounterId> </SystemCollectorId> </Collectors> </Profile> </Profiles> </WindowsPerformanceRecorder>
Further Reading
Each of the CPU vendors such as ARM, Intel, and AMD have detailed technical reference manual for the available PMU events on their platform. These are a couple starters.
- For ARM, see the Performance Monitor Unit sections under the specific processor being used. Documentation – Arm Developer
- For Intel, see the Performance Monitor Unit section in Volume 3B of the Intel 64 and IA-32 Architectures Developer’s Manual and the downloadable JSON counter descriptions for individual architectures are available on https://download.01.org/perfmon
0 comments