We are happy to announce the latest WindowsPerf bug-fix release ver. 3.7.2. This release based on ver. 3.5.0-beta is a continuation of WindowsPerf development effort. Current release focuses on extending profiling capabilities and user-space functionality. For previous releases check our GitLab release page.
These updates aim to improve the performance and stability of the Kernel Driver. Please update to the latest version to benefit from these improvements!
Highlights from 3.7.2 Release
This release includes several important updates
- New
wperf man <name>command which allows users to print manual style help about Telemetry Solution data (events, metrics and group of metrics). - Update Telemetry Solution data with Neoverse-V2.
- See Neoverse V2 telemetry JSON for more details.
- We’ve enabled Event Tracing for Windows (ETW) output on the
wperfapplication andwperf-driverby default forReleaseproject configuration.- Use WindowsPerf WPA ETL Plugin to interpret WindowsPerf ETW event injection with WPA. Read plugin installation instructions for more details.
- Various command line options updates:
- Improvements to how we define core(s), for compatibility with Linux perf CLI.
- CPU ranges for
-care now available. Now you can use notation like-c 1-3which corresponds to-c 1,2,3. Use wherever applicable. - New
--cpualias for-c.
- CPU ranges for
- Add time units to
--timeout/sleepand-icommand line options. Now you can use suffixes:msfor milliseconds,sfor seconds,mfor minutes,hfor hours,dfor days. For example--timeout 10sfor 10 seconds timeout.- Note: However,
--timeout 2h30mis not allowed.
- Improvements to how we define core(s), for compatibility with Linux perf CLI.
- Other minor improvements include:
- New FeatureString is added to
wperf --versioncommand to show users whichENABLE_*macros were enabled during driver and application build. - We now correctly escape
\nand\twhen we output JSON withwperf ... --jsoncommands.
- New FeatureString is added to
- Preview support for Arm Statistical Profiling Extension.
Arm Statistical Profiling Extension (SPE) support in WindowsPerf
In addition to our roadmap improvements, we’ve also started working on support for Arm Statistical Profiling Extension (SPE). WindowsPerf added in sample and record command support for the SPE. SPE is an optional feature in ARMv8.2 hardware that allows CPU instructions to be sampled and associated with the source code location where that instruction occurred.
Note: Currently SPE is available on Windows On Arm in Test Mode only, due to OS SPE support limitations. WindowsPerf SPE implementation was tested on Windows 11 Pro 24H2 (OS build 26100.1150).
Users can use the same --annotate and --disassemble interface of WindowsPerf with one exception. To reach out to SPE resources please use -e command with arm_spe_0// options. For example:
> wperf record -e arm_spe_0//` -c 0 --timeout 10 -- workload.exe
SPE detection
Note: SPE support in WindowsPerf is in development! You can enable beta code of SPE support with the ENABLE_SPE macro or just rebuild the project with Debug+SPE configuration.
You can check if your hardware supports SPE or if WindowsPerf can detect it with wperf test command. See below an example of spe_device.version_name property value on a system with feature SPE. FEAT_SPE indicates your hardware has the necessary SPE extension baked in.
>wperf test
Test Name Result
========= ======
...
spe_device.version_name FEAT_SPE
How do I know if WindowsPerf binaries support optional SPE?
You can check the feature string (FeatureString) of both wperf and wperf-driver with the wperf --version command. If FeatureString for both components (wperf and wperf-driver) contains +spe (and spe_device.version_name contains FEAT_SPE) you are good to go!
> wperf --version
Component Version GitVer FeatureString
========= ======= ====== =============
wperf 3.7.2 4338371a +etw-app+spe
wperf-driver 3.7.2 4338371a +trace+spe
How to use SPE from command line
Users can specify SPE filters with arm_spe_0//. We added CLI parser function for -e arm_spe_0/*/ notation for record command. Where * is a comma separated list of supported filters. Currently we support filters. Users can define filters such as store_filter=, load_filter=, branch_filter= or short equivalents like st=, ld= and b=. Use 0 or 1 to disable or enable a given filter. For example:
> wperf record -e arm_spe_0/branch_filter=1/
> wperf record -e arm_spe_0/load_filter=1,branch_filter=0/
> wperf record -e arm_spe_0/st=0,ld=0,b=1/
SPE register PMSFCR_EL1.FT enables filtering by operation type. When enabled PMSFCR_EL1.{ST, LD, B} define the collected types:
STenables collection of store sampled operations, including all atomic operations.LDenables collection of load sampled operations, including atomic operations that return a value to a register.Benables collection of branch sampled operations, including direct and indirect branches and exception returns.
Filters store_filter=, load_filter=, branch_filter= enable these filters.
Example
> wperf record -e arm_spe_0/ld=1/ -c 8 -- cpython\PCbuild\arm64\python_d.exe -c 10**10**100
base address of 'cpython\PCbuild\arm64\python_d.exe': 0x7ff69e251288, runtime delta: 0x7ff55e250000
sampling ...e..........e... done!
======================== sample source: LOAD_STORE_ATOMIC-LOAD-GP/retired+level1-data-cache-access+tlb_access, top 50 hot functions ========================
overhead count symbol
======== ===== ======
93.75 15 x_mul:python312_d.dll
6.25 1 _Py_ThreadCanHandleSignals:python312_d.dll
100.00% 16 top 2 in total
Annotate example with ld=1 filter enabled: enables collection of load sampled operations, including atomic operations that return a value to a register.
WindowsPerf: what’s next
The WindowsPerf ecosystem has recently expanded its capabilities and now supports a Visual Studio extension that is readily available in the Microsoft Marketplace. This integration allows developers to leverage the robust performance analysis of WindowsPerf ver. 3.7.2 directly within the familiar environment of Visual Studio, enhancing productivity and efficiency. Read the extension Wiki pages for more details.
Furthermore, we are excited to announce that our team is actively working on extending this support to Visual Studio Code. This upcoming feature will broaden the accessibility of WindowsPerf, enabling even more developers to optimize their applications with ease. Stay tuned for more updates on this front!
What to expect in the next release?
- Stable support for Statistical Profiling Extension, please note that we may release a version with stable SPE-beta support before release 4.0.0 planned at the end of September.
- Improved documentation on Event Tracing for Windows (ETW) support. We plan to write a comprehensive tutorial on how to use WindowsPerf and its
WPA-plugin-etl.
Where to find us?
For source code and binary releases please visit our WindowsPerf webpage at GitLab. Additional project resources include WindowsPerf Wiki and WindowsPerf JIRA project board.
If you have any questions, issues you would like to raise please visit our WindowsPerf GitLab issue page and create a new issue with a clear description of the problem you’re facing or issue you want help with.
