Missing perf_event Features
and when they finally made it into a mainline kernel
Note this lists changes/fixes relevant to the low level interface
as used by PAPI developers.
Many other changes are made release to release, mainly
to enhance the perf tool itself. I don't track those.
Issues that Still need to be Addressed
Note that some of these fixes were backported to previous
- Fixed-counters 0 and 1 on Intel processors (for retired instructions
give slightly different results than the same event in a
general-purpose counter. There is no way to specify you want
the fixed vs general event.
- AMD Lightweight Profiling (LWP) - userspace-only (only minimal
context-switch support needed in kernel). This depends on
advanced xsave support, which seems to have been rejected and
then abandoned due to the kernel devs demanding that any
perf counter access go through the kernel interface.
- Throttling - perf_events throttles the PMU interrupt; in some
cases this can interfere with measurements
- NMI issues - the PMU uses NMI interrupts. There has been some ongoing
problems with 2.6.36-rc where spurious NMIs are not being
handled properly (either the PMU code is too eager and eats
NMIs not from the PMU, or else when trying to fix this some
PMU interrupts might be lost)
- NMI watchdog bug -- if the NMI watchdog is enabled then scheduability
checking is broken. You can add more events than can fit (with the
watchdog stealing one) and not find out until read time.
- Heterogeneous processors. The ARM big/little
Cortex-A7/Cortex-A15 chips may involve context switches to chips
with different PMUs, with all the inherent complications.
- Intel QoS/RMI Cache analysis
- DTLB load misses event wrong on ivybridge
it should be 0x8108 DTLB_LOAD_MISSES.DEMAND_LD_MISS_CAUSES_A_WALK
not the sandybridge DTLB_LOAD_MISSES.CAUSES_A_WALK
- The PERF_SAMPLE_DATA_SRC results are a bitfield, so endian-dependent.
This causes issue if recording on one machine and reading out
on another. The Power people were working on a fix (always
forcing little endian) but I don't think it's made the kernel
4.11 -- Not released yet
4.10 -- 19 February 2017
4.9 -- 11 December 2016
- PERF_RECORD_MMAP was missing flags on anon memory
- Fix PM_BRU_CMPL event on Power9
- Reject non-sampling PEBS events
- BTS and LBR cannot be enabled at same time on goldmont
- Power9 raw events
- ARM: imx: Added perf functionality to mmdc driver
- Power8 json events
- Lots of intel json event files
4.8 -- 2 October 2016
- Counter overflow could fail since v3.11 on many
architectures (easiest to reproduce on KNL/SLM
- cstate events for Knights Mill and Knights Landing
- Disallow grouping of uncore events from different PMUs
- Support for AMD fam17h
- IMC uncores for more skylake
- Honor fixed counters count when running in VM
- Knights Mill added to pmu/uncore/RAPL
- Perf tool now supports reading event lists from Intel
provided JSON file.
- ARM64 APM-Xgene support 832c927d119b5be3a01376b8e3033286eb5797e1
- Add Skylake server uncore support cd34cd97b7b4336aa2c623c37daffab264c7c6ce
- Intel Apollo Lake (Goldmont) RAPL support
- ARM cpumask in /sys 48538b5863d8e8f8d5
- ARM: 8611/1: l2x0: add PMU support b828f960215f02e5d2c88bbd27565c694254a15a
- Can have bpf program as handler for perf event?
- Optimization to make uncore/rapl reads faster if on same CPU
- group_sched_out was not scheduling events out atomically
4.7 -- 24 July 2016
- Exlclusive filter perf-pmu
- On AMD HW_CACHE_REFERENCES and HW_CACHE_MISSES measure L2
- On many systems had wrong number of uncores
- Wrong Power9 Events
- Default loglevel for perf throttling message changed
- Skylake client uncore
- Power9 support
- Skylake server RAPL support
- Intel model numbers all changed to #defines, annoying
- Perf callchain limit in perf_event_max_stack
4.6 -- 15 May 2016
- Fix Intel constraints when hyperthreading is off
- Address ranges in Intel PT (eadf48cab4b6b0ab8b)
- Skylake RAPL PSys domain (3521ba1cc351e80)
- New /proc/sys/kernel/perf_event_max_stack file (c5dfd78eb79851e278b7)
- arm64 Broadcom Vulcan PMU (201a72b2829fa6d58)
- write_backward support (9ecda41acb971e)
- Intel Goldmont support (8b92c3a78d40fb220)
- AMD fam17h retired instruction msr (aaf248848db5039276)
- AMD fam15h+ PTSC (8a22426184774d7ced9c1)
- RAPL driver modularized (4b6e2571bf00019e01625)
- Intel uncore driver modularized (e633c65a1d5859da17)
4.5 -- 13 March 2015
- Kabylake support (cba1b3798e2c4c094)
- Skylake server support (b89c173788c3a8ed)
- AMD accumulated power driver (c7ab62bfbe0e27)
- AMD IOMMU events (f8519155b4d522)
- Intel MBM memory bandwidth monitoring (87f01cc2a2914b61)
- Fix PEBS source type for Nehalem and Westmere (e17dc65328057)
- Actually enable Knight's Landing (4d120c535d638a952e)
- x86 perf support moved from
arch/x86/kernel/cpu to arch/x86/events/
4.4 -- 10 January 2016
- Large number of fixes based on results of syzkaller
- Skylake IMC uncore driver (0e1eb0a1f5530b)
- Remove l1-dcache-stores event on amd fam15h (9cc2617de5b92)
- Knight's Landing uncore support (77af0037de0)
- Knight's Landing PMU support (1e7b939062)
- Broadwell-EP uncore support (d6980ef32570e)
- cycles:pp support for atom (673d188ba5b1)
- PEBS support fixed on Core2/Atom (1424a09a9e18)
- Fix LBR support on Atom (6fc2e83077b)
- Add cycles:ppp (724697648eec)
- Add support for ARM Cortex-A72 (5d7ee87708d4d86)
- ARM64 add event descriptions (9e9caa6a4961)
- ARM add event descriptions (3fbac6ccb6c3)
4.3 -- 1 November 2015
- PERF_BRANCH_SAMPLE_CALL (c229bf9dc179d2)
- Intel cstate PMU (7ce1346a68425)
4.2 -- 31 August 2015
- Add ARCv2 support
- Add a msr driver for various intel free-running counters
SC, IA32_APERF, IA32_MPERF, IA32_PPERF, SMI_COUNT
- Add broadwell-de uncore support
- Add intel skylake support
- skylake support for branch cycle counts
- haswell and broadwell cbox/arb uncore
- intel ARB uncore support
- Knight's landing RAPL support
4.1 -- 22 June 2015
- More broadwell models supported
- Skylake has valid PEBS status bits
- Enable batched PEBS samples (gathering multiple samples
before triggering an interrupt)
- Allow sampling indirect jumps
- Fix the intel hyperthread workaround
- Broadwell-U uncore IMC support
- Fix PERF_COUNT_SW_CPU_MIGRATIONS event
4.0 -- 13 April 2015
- cycles:pp event broken on atom/core2/nhm/wsm since
3.19, fixed. 517e6341fa123ec3a2f9ea78ad547be910529881
- Fix RAPL domains (DRAM RAPL counters were off by a factor of 4
- Can attach eBPF filters to tracepoint events
- clockid support (can chance which kernel clock used in timestamps)
- Intel Processor Trace support (Broadwell)
- AUX buffer support (in conjunction with Processor Trace)
- Intel Cache QoS support (CQM), in a separate PMU
- Haswell LBR Stack backtrace support
- Broadwell CPU support (91f1b70582c62576f429cf78d53751c66677553d)
- Broadwell INST_RETIRED.ALL event cannot be used if bottom 6 bits
of period not zero (294fe0f52a44c6f207211de0686c369a961b5533)
- Update Haswell Offcore event support (it's different from Sandybridge)
- Workaround for the Hyperthreading event scheduling bug
on recent Intel machines:
errata BJ122 (SNB) BV98 (IVB) HSD29 (HSW) lead to count corruptions
for the various MEM_UOPS_RETIRED and MEM_LOADS_UOPS_RETIRED
events and hyperthreading enabled.
- Add a new BTS (branch trace) PMU driver that is separate.
- Update the userspace page info for software events at context
switch time (6a694a607a97d58c042fb7fbd60ef1caea26950c)
3.19 -- 8 February 2014
- rdpmc instruction used to be enabled globally in all processes
once one process started perf; now this is saved/restored
3.18 -- 7 December 2014
3.17 -- 6 October 2014
3.16 -- 3 August 2014
3.15 -- 8 June 2014
- Atom airmont support (ef454caeb740ee4e1b89aeb7f7692d5ddffb6830)
- PEBS interrupt support
3.14 -- 31 March 2014
- Support for IMC (memory controller)
uncore on (non-server) Sandybridge, Ivybridge and Haswell
- Userspace callchains no longer supported on function trace events
- Various Pentium 4 fixes.
3.13 -- 19 January 2014
- RAPL energy support (4788e5b4b2338f85fa42a712a182d8afd65d7c58)
- Different ring-buffer write behavior
- New PERF_FLAG_FD_CLOEXEC flag (a21b0b354d4ac39be691f51c53562e2c24443d9e)
- PERF_EVENT_IOC_PERIOD changes take effect immediately rather than
after next stop/start
This changes the ABI, and ARM had different
behavior for a while from (3581fe0ef37ce12ac7a4f74831168352ae848edc)
3.12 -- 3 November 2013
- More HSW transasctional memory support
3.11 -- 2 September 2013
- enhanced Intel Silvermont (22nm Atom) CPU support (offcore events)
- improved Intel SNB-EP uncore PMU: QPI filters
- add attr->mmap2 attribute (but disabled before release)
- all Power7 events available via sysfs (urgh)
- PERF_EVENT_IOC_ID ioctl to return event ID
- export u64 time_zero on the mmap header page to allow TSC
- The previous exposed a bug in how the mmap/rdpmc page worked
(cap_usr_time and cap_usr_rdpmc mapped to the same bit).
Thus user rdpmc detection code has to be re-written
to use new fields.
- dummy software event
- new PERF_SAMPLE_IDENTIFIER to make samples always parseable
3.10 -- 30 June 2013
- Event multiplexing by hrtimers
- Add sysfs entry to adjust multiplexing interval per PMU
- AMD IOMMU uncore PMU support
- hw_breakpoint cleanups (inspired my by trinity bug report)
- Intel Haswell PMU
- Overflow on hw breakpoint events fixed to not double count
- Allow overlapping bit ranges in sysfs format files
3.9 -- 28 April 2013
- AMD Fam16h Northbridge and L2I event support
- Intel PEBS Precise store
- Intel PEBS Load Latency Measurement
- Change AMD Fam15h Northbridge support to use separate PMU
- Ivy Bridge Model 58 Uncore support
- Ivy Bridge EP Model 62 Uncore support
- MEM_*_RETIRED events blocked on Ivy Bridge due to Errata
- Fix PERF_SAMPLE_BRANCH_KERNEL to require root.
3.8 -- 18 February 2013
- Fix crash Fix offcore_rsp valid mask for SNB/IVB (f1923820c447e986a9d)
- Fix crash Treat attr.config as u64 in perf_swevent_init() (8176cced706b5e5d158875)
- fix kernel crash with PEBS/BTS after suspend/resume (1d9d8639c063)
- Add SNB/SNB-EP scheduling constraints for cycle_activity event
- perf/POWER7: Create a sysfs format entry for Power7 events (3bf7b07ece6e)
- Meta architecture support
- Add Intel IvyBridge event scheduling constraints (69943182bb9)
- AMD Fam15h Northbridge Support (e259514eef764a)
3.7 -- 10 December 2012
- Generic event mappings are now available in
3.6 -- 1 October 2012
- Intel Knights Corner / MIC / Xeon Phi Support
- Minor updates to P6 driver
3.5 -- 22 July 2012
- Complete Intel uncore support, including the large set
of Nehalem-EX uncores.
3.4 -- 20 May 2012
- AMD IBS support?
- More fam15h constraint support?
- Re-enable PEBS on Sandybridge if new enough firmware available?
- Preliminary Uncore support?
- Xen virtualized counter support
- uprobe support
3.3 -- 19 March 2012
- s390x support
- LBR support
- Userspace rdpmc() support (read without syscall), although
no proper way to detect it.
3.2 -- 4 January 2012
3.1 -- 24 October 2011
- ENOSPC errors switched to be EINVAL.
- Intel fixed counter 2 - (core2 and later). This is the way
to count UNHALTED_REFERENCE_CYCLES. The regular cycle counter
is affected by frequency scaling and Turbo mode.
Special support is needed because the event number means different
things when programmed on the fixed counter versus a standard
Access is by PERF_COUNT_HW_REF_CPU_CYCLES.
- User access to Nehalem/Westmere/Sandybridge Offcore events
(note, this is different than "Uncore" events)
finally added. (These require special support as they
access an extra MSR). Unfortunately detecting if this
feature is available is not-reliable for older kernels
due to an implementation bug.
- cpuid has a mask that tells if some events are not supported.
Support is added to disable these events properly.
- Introduced Possible
bug with regards to throttling
- Event scheduler re-write by Richter.
node-stores and node-stores-misses generalized events changed.
- Support for KVM in-guest counter use.
3.0 -- 22 July 2011
- Model 45 SandyBridge EP support
2.6.39 -- 19 May 2011
- PERF_COUNT_HW_STALLED_CYCLES_FRONTEND and
PERF_COUNT_HW_STALLED_CYCLES_BACKEND generalized event
2.6.38 -- 15 March 2011
2.6.37 -- 5 Jan 2011
Nehalem/Westmere Offcore Response support was removed for raw
access (needed by PAPI and other external tools)
See here for more details
- SandyBridge support
- AMD Family 15h (Bulldozer, Interlagos) support
- cgroup support
- Xeon E7 (aka Westmere EX) support (model 47)
- Nehalem built-in cache events changed.
- Fixes resource leak DoS when using inherit (f07b34a6fac9873fd)
- Fixes kernel panic when multithreaded/multiplexing runs
are done (6db8828cafd6a)
2.6.36 -- 21 Oct 2010
- Support for MIPS merged (de74696cde9).
- Fix bogus AMD64 TLB events (ba0cef3d149ce4db293c572bf36ed352b11ce7b9).
- Fix bogus context time tracking (ce9f2357a).
- Removed the /sys/devices/system/cpu/perf_events directories.
If you're tryig to detect if perf is running, try
- Before 2.6.37 you could enable profiling in a sibling event
group by sending a PERF_EVENT_IOC_REFRESH to the
group leader (this was undocumented behavior).
This behavior was removed in 2.6.37.
2.6.35 -- 1 Aug 2010
- Support for SH-3 merged
- Support for DEC Alpha merged (c1b3662b648).
- ARM and SH oprofile built on top of perf
- Better handling of spurious NMIs (e51ab6afa1).
- Support for raw SPARC64 events (c12212b66).
- Fix for some Pentium 4 bugs that could lock machine (c991da813a0).
- Per-thread events with a cpu filter, i.e., cpu != -1, were not
reporting correct timings when the thread never ran on the
- Fix Nehalem-EX PMU programming errata (15c1ed06db).
2.6.34 -- 17 May 2010
- Support for Pentium 4 processors merged. Common usage cases
still broken though, and won't be fixed until at least 2.6.39
- "Retired Branches" predefined event fixed for AMD64;
the wrong event is used on all previous kernels (f287d332ce835f77a4f5077d2c0ef1e3f9ea42d2).
- SNOOPQ_REQUEST_OUTSTANDING constraints fixed for Westmere
- Frequency-driven sw-events were broken (a6ee4fa268).
- A "make perf-tarbz2-src-pkg" option was made to make it
possible to build perf w/o the kernel (ad2ad58ae53).
- Fix a problem when tracefiles aren't aligned to 8-bytes
causing problems on arches that need that (2dba103a17bd).
- The value scale times of group siblings are not
updated when the monitored task dies (8dbab958a29).
- PERF_FORMAT_GROUP didn't
work as expected if the task the counters were attached to quit
before the read() call (025b88fee770b).
- perf kvm measuring guest performance support.
- Possible problem with overflow on Power fixed
2.6.33 -- 24 Feb 2010
- Support for Nehalem-EX chips added
- Support for ARMv6 processors added (b94658f857c47f2)
- Support for Intel CoreDuo added (d41180d7bc3e74f14ef)
- Support for Westmere processors (b3f73080401e2fa3a6)
- Fix for PERF_FORMAT_GROUP not working for attached processes.
- Enforce constraints on AMD Northbridge Events
- The PERF_COUNT_SW_CONTEXT_SWITCHES event switched from
being reported always as a user event to being
reported always as a kernel event
- Simple LBR support?
2.6.32 -- 3 Dec 2009
- Fixes made that are needed for PAPI
event multiplexing to work properly.
- Fixes made that enable more straightforward
detection of counter conflicts when allocating events.
- Counts might not have been updated when attached
processes exit (f439c167ae559533,cd9e13a4c89ee44)
- Add event constraints for x86 processors
- Add support for per-task per-cpu counters
2.6.31 -- 9 Sep 2009
- Performance Counters for Linux renamed Perf Events.
- Main header file renamed from perf_counter.h to
- F_SETOWN_EX fcntl() parameter introduced,
which is needed to
properly get overflow counts from threads.
(F_SETOWN behavior was unintentionally broken
in 2.6.12 and no one noticed until 2.6.32)
- Initial merge of Performance Counters for Linux (PCL) code.
Various resolved issues to Watch For when using Older Kernels
- Event Constraints -- On kernels before 2.6.34 event constraints are not enforced by
the kernel. This means that when using PAPI or perf on machines
like Core2 or Nehalem you will get a "0" result for some events
if specified first on the command line, but proper results
when specified second. (or vice-versa). Short of upgrading
the kernel there's not much that can be done about this without
a lot of overhead.
- Inherit Multithreading Crash -- On kernels before 2.6.39
it is very easy to completely out-of-memory the kernel by turning
on event inheriting and spawning a lot of threads. A fix has been
merged and backported via stable updates.
Back to the unofficial perf_event page