Missing perf_event Features
and when they finally made it into a mainline kernel
Note this lists changes/fixes relevant to the low level interface
as used by PAPI developers.
Many other changes are made release to release, mainly
to enhance the perf tool itself. I don't track those.
Issues that Still need to be Addressed
Note that some of these fixes were backported to previous
stable releases.
- Fixed-counters 0 and 1 on Intel processors (for retired instructions
and cycles)
give slightly different results than the same event in a
general-purpose counter. There is no way to specify you want
the fixed vs general event.
- AMD Lightweight Profiling (LWP) - userspace-only (only minimal
context-switch support needed in kernel). This depends on
advanced xsave support, which seems to have been rejected and
then abandoned due to the kernel devs demanding that any
perf counter access go through the kernel interface.
- Throttling - perf_events throttles the PMU interrupt; in some
cases this can interfere with measurements
- NMI issues - the PMU uses NMI interrupts. There has been some ongoing
problems with 2.6.36-rc where spurious NMIs are not being
handled properly (either the PMU code is too eager and eats
NMIs not from the PMU, or else when trying to fix this some
PMU interrupts might be lost)
- NMI watchdog bug -- if the NMI watchdog is enabled then scheduability
checking is broken. You can add more events than can fit (with the
watchdog stealing one) and not find out until read time.
- Heterogeneous processors. The ARM big/little
Cortex-A7/Cortex-A15 chips may involve context switches to chips
with different PMUs, with all the inherent complications.
- Intel QoS/RMI Cache analysis
- DTLB load misses event wrong on ivybridge
it should be 0x8108 DTLB_LOAD_MISSES.DEMAND_LD_MISS_CAUSES_A_WALK
not the sandybridge DTLB_LOAD_MISSES.CAUSES_A_WALK
- The PERF_SAMPLE_DATA_SRC results are a bitfield, so endian-dependent.
This causes issue if recording on one machine and reading out
on another. The Power people were working on a fix (always
forcing little endian) but I don't think it's made the kernel
yet.
5.0 -- Not released yet
4.20 -- 23 December 2018
4.19* -- 22 October 2018
4.18 -- 12 August 2018
- Preliminary RISC-V support. 32c81bced35696e1ffe92170c72fba16edef3023
- Intel uncore free-running counter PMU
0f519f0352e37e7d71bdce5559517c74a35f6e33
- possible fix for sw/hw event mix
a1150c202207cc8501bebc45b63c264f91959260
- fix MIPS counting on hardware multithreads
84002c88599d6b537e54b003f763215be2075243
4.17 -- 3 June 2018
- Don't enable freeze-on-smi for PerfMon V1
4e949e9b9d1e3edcdab3b54656c5851bd9e49c67
- uncore: Fix SBOX support for Broadwell CPUs
15a3e845b01ce2342cf187dc123c92c44c3c8170
- Enable C-state residency events for Cannon Lake
1159e09476536250c2a0173d4298d15114df7a89
- Add Cannon Lake support for RAPL profiling
490d03e83da2a5e9d7db84b1ec30a9c95415787e
- Add blacklisted events for Power9 DD2.1
64acab4e4fca19706e907bec435cc2acb65c83f3
- Implement fast breakpoint modification via _IOC_MODIFY_ATTRIBUTES
32ff77e8cc9e66cc4fb38098f64fd54cc8f54573
- Disable userspace RDPMC usage for large PEBS (??)
1af22eba248efe2de25658041a80a3d40fb3e92e
- Fix large period handling on Broadwell CPUs
f605cfca8c39ffa2b98c06d2b9f30ba64f1e54e3
- ARM: dts: imx7: add CPU PMU support
a934a5834d59d8529da228aec1f3f89d1be225e9
- drm/vc4: Expose performance counters to userspace
65101d8c9108201118efa7e08f4e2c57f438deb9
- perf/core: Implement the 'perf_kprobe' PMU
e12f03d7031a977356e3d7b75a68c2185ff8d155
4.16 -- 1 April 2018
- Fix linear IP of PEBS real_ip on Haswell and later CPUs
71eb9ee9596d8df3d5723c3cfc18774c6235e8b1
- Rename confusing 'freerunning PEBS' API and implementation to 'large PEBS'
174afc3e7dd7823df8218e16e7768b834097184e
- Don't accidentally clear high bits in bdw_limit_period()
e5ea9b54a055619160bbfe527ebb7d7191823d66
- Disable userspace RDPMC usage for large PEBS
2c2a9bbe7fecb2ad4981b6f4a56cacbfb849f848
- Fix Skylake UPI event format
317660940fd9dddd3201c2f92e25c27902c753fa
- Add support for MSR_IA32_THERM_STATUS
9ae21dd66b970b5e3192a636353d75ede0529338
- ARM DynamIQ Shared Unit PMU support
7520fa99246dade7ab6dde1573a146beed632abd
- arm_spe: Fail device probe when arm64_kernel_unmapped_at_el0()
7a4a0c1555b824e0d3dd72942481b1190abea604
- ARM: dts: exynos: Add CPU perf counters to Exynos54xx boards
c4f2fc00defc65950dfabce7a4c70cd2a289111d
- drm/i915/pmu: Expose a PMU interface for perf queries
b46a33e271ed81bd765c632b972c49d5b44729c7
4.15 -- 28 January 2018
- Disable Intel BTS when KPTI (meltdown mitigation) enabled
99a9dc98ba52267ce5e062b52de88ea1f1b2a7d8
- Fix Haswell/Broadwell server RAPL events
1289e0e29857e606a70a0200bf7849ae38d3493a
- Enable free running PEBS for REGS_USER/INTR
2fe1bc1f501d55e5925b4035bcd85781adc76c63
- remove unsupported events for Cortex-A73
f8ada189550984ee21f27be736042b74a7da1d68
- Add event constraint for BDX PCU
bb9fbe1b57503f790dbbf9f06e72cb0fb9e60740
- Hide TSX events when RTM is not supported
58ba4d5a25579e5c7e312bd359c95f3a9a0a242c
- Add support for HiSilicon SoC DDRC PMU driver
904dcf03f086a2e3b9d1e02cb57c43ea2e588c8c
- Add support for HiSilicon SoC HHA PMU driver
2bab3cf9104c5ab80a1b9c706d81d997548401e4
- Add support for HiSilicon SoC L3C PMU driver
2940bc4333707a05e69b3ffd737bda0dc0c3004f
- Add support for HiSilicon SoC uncore PMU driver
6ce4ef94195da926245b58311119ed9d52428fdc
- Add support for ARMv8.2 Statistical Profiling Extension
d5d9696b03808bc6be723cc85288c912c3a05606
- Add PERF_AUX_FLAG_COLLISION to report colliding samples
085b30625e39df67d7320f22269796276c6b0c11
4.14 -- 12 November 2017
- x86 uncore: Correct num_boxes for IIO and IRP
29b46dfb136cdbeece542b3f01115237e43f2855
- RAPL support for DENVERTON and GEMINI_LAKE
450a97893559354b927c935f39ee11126f01f520
- Goldmont, Glodmont plus and Xeon Phi have MSR_SMI_COUNT
1aaccc40a1864053da26605b0297be16dd52641e
- Skylake server and gemini lake C-state
b09c146f8f63c0e03adba74df76bf9c2be466fec
- PERF_SAMPLE_PHYS_ADDR support
fc7ce9c74c3ad232b084d80148654f926d01ece7
- /sys/bus/event_source/devices/cpu/caps/ support
b00233b5306512a09e339d69ef5e390a77f2d302
- Only show format attributes when supported
a5df70c354c26e20d5fd8eb64517f724e97ef0b2
- Skylake changed the encoding of the PEBS data source field
6ae5fa61d27dcb055f4198bcf6c8dbbf1bb33f52
- Cortex A35 support
e884f80cf2a76a86547e2316982e1f200f556ddf
- Cortex A73 support
5561b6c5e9813df16d7453f6ce1a0546221fca97
- arm64: Allow more than one cycle counter
1031a1592908ccd3240f4a5731c96c382c932310
- Intel cache quality monitoring (cqm) removed
c39a0e2c8850f08249383f2425dbd8dbe4baad69
- Power: IMC PMU support
39a846db1d574a498511ffccd75223a35cdcb059
4.13 -- 3 September 2017
- Make rdpmc finally pass all PAPI tests
bfe334924ccd9f4a53f30240c03cf2f43f5b2df1
- Many skylake server (SKX) fixes
8aa7b7b4b4a601978672dce6604b9f5630b2eeb8
- Add Goldmont Plus CPU PMU support
dd0b06b551f6b14da19582e301814746d838965a
- Add Apollo Lake CSTATE support
5c10b048c37cc08a21fa97a0575eccf4948948ca
- Fix branch event code on Power9
24bedcb7c811375a962a621613ad152d95bc28ba
- Add ARM xgene PMUv3 support
c0f7f7acdecdd7cf9a19c0af5c3dc649e1b934f7
- Correct event creation with PERF_FORMAT_GROUP
ba5213ae6b88fb170c4771fef6553f759c7d8cdd
- Add BPF support to all perf_event types
f91840a32deef5cb1bf73338bc5010f843b01426
- ARMv8.1 16-bit event config
fe7296e19221c6dc125a06b52e28ccbdb76d9b58
- perf/x86: Add sysfs entry to freeze counters on SMI
6089327f5424f227bb6a8cf92363c2617e054453
4.12 -- 2 July 2017
- Make Skylake TLB event handle 1G misses
fb3a5055cd7098f8d1dd0cd38d7172211113255f
- perf/aux rb_alloc_aux() was returning
-ENOTSUPP to userspace 8a1898db51a3390241cd5fae267dc8aaa9db0f8b
- Fix MIPS I6400 counter selection
f7a31b5e7874f77464a4eae0a8ba84b9ae0b3a54
- perf/core: Drop kernel samples even though :u is specified
(this breaks things and it reverted in 4.13)
cc1582c231ea041fbc68861dfaf957eaf902b829
- Fix Broadwell-EP DRAM RAPL units 33b88e708e7dfa58dc896da2a98f5719d2eb315c
- Add big-endian bitfield definitions for perf_mem_data_src on Power
8c5073db0ee680c7e70e123918c9b260e49f757d
- Fix spurious NMI with PEBS Load Latency event fd583ad1563bec5f00140e1f2444adbcd331caad
- Qualcomm L3 cache PMU 3071f13d75f627ed8648535815a0506d50cbc6ed
- AMD IOMMU support multiple IOMMU pmus
- Goldmont top-down events
ed827adb009490673c9c63e0b716e0fa36afbcc1
- Add PERF_RECORD_NAMESPACES e422267322cd319e2695a535e47c5b1feeac45eb
- r8a7796: Add Cortex-A53 PMU ccc499330dbcaa8f6065bd1b10a64ca09fa96c3e,
Add CA53 L2 cache-controller node,
Add Cortex-A57 PMU node
- ARM: dts: socfpga: Add support for PMU
34869353774bc6de05291fc6ad50d7f471fa3cd8
4.11 -- 30 April 2017
- Properly check limits of value in perf_cpu_time_max_percent
1572e45a924f254d9570093abde46430c3172e3d
- Various Power9 event tweaks
- Kaby Lake RAPL support f2029b1e47b607619d1dd2cb0bbb77f64ec6b7c2
- Have SET_FILTER ioctl fail if specify a kernel filter and
exclude_filter==1 9ccbfbb157a38921702402281ca7be530b4c3669
- Large number of Intel vendor events added to perf tool json file
- Qualcomm ARM l2 cache pmu driver 21bdbb7102edeaebb5ec4ef530c8f442f7562c96
- Add support for PT_WRITE events
5443624bedd0d23e112d5f2a919435182875bce9
- AMD Fam17h uncore support bc1daef6b5da574bca0a2ec7f9b4d0c5fe0c7d11
- Fake PPC 8xx events added
75b824727680a9d12c34d78096a5ac642e53f5d0
- Intel i915 GPU perf interface does *not* use perf_event for reasons
described in 7abbd8d670bb928366aa94332a173aa3d394ebfe
4.10 -- 19 February 2017
- PERF_RECORD_MMAP was missing flags on anon memory
0b3589be9b98994ce3d5aeca52445d1f5627c4ba
- Fix PM_BRU_CMPL event on Power9
d89f473ff6f84872e761419f7233d6e00f99c340
- Reject non-sampling PEBS events
18e7a45af91acdde99d3aa1372cc40e1f8142f7b
- BTS and LBR cannot be enabled at same time on goldmont
b0c1ef52959582144bbea9a2b37db7f4c9e399f7
- Power9 raw events
18201b204286a1ef478ef52b00ab9f6c5739b4f6
- ARM: imx: Added perf functionality to mmdc driver
e76bdfd7403aae582461901955d0136381e34435
- Power8 json events
2a81fa3bb5edb4a9dc9cb04cd591c99d41eb4f4c
- Lots of intel json event files
4.9 -- 11 December 2016
- Counter overflow could fail since v3.11 on many
architectures (easiest to reproduce on KNL/SLM
(silvermont?))
7f612a7f0bc13a2361a152862435b7941156b6af
- cstate events for Knights Mill and Knights Landing
1dba23b12f49d7cf3d4504171c62541122b55141
889882bce2a5f69242c1f3acd840983f467499b9
- Disallow grouping of uncore events from different PMUs
033ac60c7f21f9996a0fab2fd04f334afbf77b33
- Support for AMD fam17h
e40ed1542dd779e5037a22c6b534e57127472365
- IMC uncores for more skylake
d786810b2f896854506e7b698a137f074942e410
- Honor fixed counters count when running in VM
f92b7604149a55cb601fc0b52911b1e11f0f2514
- Knights Mill added to pmu/uncore/RAPL
ba2f81575eba8dcf128354169c20ae23f810f652
36c4b6c14d20b37fda79cbcd3e8ef7d11f5ef9dc
608284bf0def3ca5e6936920fcd84294101ef12d
- Perf tool now supports reading event lists from Intel
provided JSON file.
- ARM64 APM-Xgene support 832c927d119b5be3a01376b8e3033286eb5797e1
- Add Skylake server uncore support cd34cd97b7b4336aa2c623c37daffab264c7c6ce
- Intel Apollo Lake (Goldmont) RAPL support
2668c6195685f4b6f281767d10b4f4f2e32c2305
- ARM cpumask in /sys 48538b5863d8e8f8d5
- ARM: 8611/1: l2x0: add PMU support b828f960215f02e5d2c88bbd27565c694254a15a
- Can have bpf program as handler for perf event?
4df20483ab287b24a7ffe38e53d473880de3dd98
- Optimization to make uncore/rapl reads faster if on same CPU
d6a2f9035bfc27d0e9d78b13635dda9fb017ac01
- group_sched_out was not scheduling events out atomically
3f005e7de3db8d0b3f7a1f399aa061dc35b65864
4.8 -- 2 October 2016
- Exlclusive filter perf-pmu
3bf6215a1b30db7df6083c708caab3fe1a8e8abe
- On AMD HW_CACHE_REFERENCES and HW_CACHE_MISSES measure L2
080fe0b790ad438fc1b61621dac37c1964ce7f35
- On many systems had wrong number of uncores
10e9e7bd598f9a66a11a22514c68c13c41fc821b
- Wrong Power9 Events
1a058f164348a71229afd35bb5bbbb0fb514555d
- Default loglevel for perf throttling message changed
0d87d7ec22a0879d3926faa4f4f4412a5dee1fba
- Skylake client uncore
46866b59dfbe9bf99bb1323ce1f3fd2073a81aa3
- Power9 support
8c002dbd05eecbb2933e9668da9614b33c7a97d2
- Skylake server RAPL support
348c5ac6c7dc117e1de095bf07c86c31101d56f3
- Intel model numbers all changed to #defines, annoying
- Perf callchain limit in perf_event_max_stack
97c79a38cd454602645f0470ffb444b3b75ce574
4.7 -- 24 July 2016
- Fix Intel constraints when hyperthreading is off
(9010ae4a8dee29)
- Address ranges in Intel PT (eadf48cab4b6b0ab8b)
- Skylake RAPL PSys domain (3521ba1cc351e80)
- New /proc/sys/kernel/perf_event_max_stack file (c5dfd78eb79851e278b7)
- arm64 Broadcom Vulcan PMU (201a72b2829fa6d58)
- write_backward support (9ecda41acb971e)
- Intel Goldmont support (8b92c3a78d40fb220)
- AMD fam17h retired instruction msr (aaf248848db5039276)
- AMD fam15h+ PTSC (8a22426184774d7ced9c1)
- RAPL driver modularized (4b6e2571bf00019e01625)
- Intel uncore driver modularized (e633c65a1d5859da17)
4.6 -- 15 May 2016
- Kabylake support (cba1b3798e2c4c094)
- Skylake server support (b89c173788c3a8ed)
- AMD accumulated power driver (c7ab62bfbe0e27)
- AMD IOMMU events (f8519155b4d522)
- Intel MBM memory bandwidth monitoring (87f01cc2a2914b61)
- Fix PEBS source type for Nehalem and Westmere (e17dc65328057)
- Actually enable Knight's Landing (4d120c535d638a952e)
- x86 perf support moved from
arch/x86/kernel/cpu to arch/x86/events/
4.5 -- 13 March 2015
- Large number of fixes based on results of syzkaller
- Skylake IMC uncore driver (0e1eb0a1f5530b)
- Remove l1-dcache-stores event on amd fam15h (9cc2617de5b92)
- Knight's Landing uncore support (77af0037de0)
- Knight's Landing PMU support (1e7b939062)
- Broadwell-EP uncore support (d6980ef32570e)
- cycles:pp support for atom (673d188ba5b1)
- PEBS support fixed on Core2/Atom (1424a09a9e18)
- Fix LBR support on Atom (6fc2e83077b)
- Add cycles:ppp (724697648eec)
- Add support for ARM Cortex-A72 (5d7ee87708d4d86)
- ARM64 add event descriptions (9e9caa6a4961)
- ARM add event descriptions (3fbac6ccb6c3)
4.4 -- 10 January 2016
- PERF_BRANCH_SAMPLE_CALL (c229bf9dc179d2)
- Intel cstate PMU (7ce1346a68425)
4.3 -- 1 November 2015
- Add ARCv2 support
- Add a msr driver for various intel free-running counters
SC, IA32_APERF, IA32_MPERF, IA32_PPERF, SMI_COUNT
b7b7c7821d932ba18ef6c8eafc8536066b4c2ef4
- Add broadwell-de uncore support
070e98873cf7196cad58f8b6e5278dd5533c81f0
- Add intel skylake support
9a92e16fd7b4ccd9aabcbc4d42a3fb5f9a3cf4a1
- skylake support for branch cycle counts
71ef3c6b9d4665ee7afbbe4c208a98917dcfc32f
- haswell and broadwell cbox/arb uncore
3a999587b4a1815cf4dadddf6b5aad470f048239
- intel ARB uncore support
e3a13192d86048e91a2a1d534abe5ac2397d9113
- Knight's landing RAPL support
3a2a7797326a4bc59b7ff0cc92c8b274abf21892
4.2 -- 31 August 2015
- More broadwell models supported
4b36f1a4139c9284df74c0f5d7655603d67807df
- Skylake has valid PEBS status bits
a3d86542de8850be52e8589da22b24002941dfb7
- Enable batched PEBS samples (gathering multiple samples
before triggering an interrupt)
3569c0d7c5440d6fd06b10e1ef9614588a049bc7
- Allow sampling indirect jumps
5b68164d6a1fdbe02b30bd777d1f686c6d901f28
- Fix the intel hyperthread workaround
- Broadwell-U uncore IMC support
a41f3c8cd4e28dcbebd8ec27a9602c86cfa5f009
- Fix PERF_COUNT_SW_CPU_MIGRATIONS event
ff303e66c240ba6269e31817a386995440a18c99
4.1 -- 22 June 2015
- cycles:pp event broken on atom/core2/nhm/wsm since
3.19, fixed. 517e6341fa123ec3a2f9ea78ad547be910529881
- Fix RAPL domains (DRAM RAPL counters were off by a factor of 4
on Haswell-EP)
645523960102fa0ac0578d070630e49ab05f06d1
- Can attach eBPF filters to tracepoint events
(2541517c32be2531e0da59dfd7efc1ce844644f5)
- clockid support (can chance which kernel clock used in timestamps)
- Intel Processor Trace support (Broadwell)
(52ca9ced3f70779589e6ecc329baffe69d8f5f7a)
- AUX buffer support (in conjunction with Processor Trace)
- Intel Cache QoS support (CQM), in a separate PMU
(4afbb24ce5e723c8a093a6674a3c33062175078a)
- Haswell LBR Stack backtrace support
(e9d7f7cd97c45e2c612d3b38be05b4cfb27939ee)
- Broadwell CPU support (91f1b70582c62576f429cf78d53751c66677553d)
- Broadwell INST_RETIRED.ALL event cannot be used if bottom 6 bits
of period not zero (294fe0f52a44c6f207211de0686c369a961b5533)
- Update Haswell Offcore event support (it's different from Sandybridge)
(0f1b5ca240c65ed9533f193720f337bf24fb2f2f)
- Workaround for the Hyperthreading event scheduling bug
on recent Intel machines:
errata BJ122 (SNB) BV98 (IVB) HSD29 (HSW) lead to count corruptions
for the various MEM_UOPS_RETIRED and MEM_LOADS_UOPS_RETIRED
events and hyperthreading enabled.
(b37609c30e41264c4df4acff78abfc894499a49b)
- Add a new BTS (branch trace) PMU driver that is separate.
(8062382c8dbe2dc11d37e7f0b139508cf10de9d4)
- Update the userspace page info for software events at context
switch time (6a694a607a97d58c042fb7fbd60ef1caea26950c)
4.0 -- 13 April 2015
- rdpmc instruction used to be enabled globally in all processes
once one process started perf; now this is saved/restored
per-process (7911d3f7af14a614617e38245fedf98a724e46a9)
3.19 -- 8 February 2014
- Atom airmont support (ef454caeb740ee4e1b89aeb7f7692d5ddffb6830)
- PEBS interrupt support
3.18 -- 7 December 2014
3.17 -- 6 October 2014
3.16 -- 3 August 2014
3.15 -- 8 June 2014
- Support for IMC (memory controller)
uncore on (non-server) Sandybridge, Ivybridge and Haswell
- Userspace callchains no longer supported on function trace events
- Various Pentium 4 fixes.
3.14 -- 31 March 2014
- RAPL energy support (4788e5b4b2338f85fa42a712a182d8afd65d7c58)
- Different ring-buffer write behavior
(c7f2e3cd6c1f4932ccc4135d050eae3f7c7aef63)
- New PERF_FLAG_FD_CLOEXEC flag (a21b0b354d4ac39be691f51c53562e2c24443d9e)
- PERF_EVENT_IOC_PERIOD changes take effect immediately rather than
after next stop/start
(bad7192b842c83e580747ca57104dd51fe08c223).
This changes the ABI, and ARM had different
behavior for a while from (3581fe0ef37ce12ac7a4f74831168352ae848edc)
to (9450d14fb959336803e5209119eb422b667b96aa).
3.13 -- 19 January 2014
- More HSW transasctional memory support
3.12 -- 3 November 2013
- enhanced Intel Silvermont (22nm Atom) CPU support (offcore events)
- improved Intel SNB-EP uncore PMU: QPI filters
- add attr->mmap2 attribute (but disabled before release)
- all Power7 events available via sysfs (urgh)
- PERF_EVENT_IOC_ID ioctl to return event ID
- export u64 time_zero on the mmap header page to allow TSC
calculation
- The previous exposed a bug in how the mmap/rdpmc page worked
(cap_usr_time and cap_usr_rdpmc mapped to the same bit).
Thus user rdpmc detection code has to be re-written
to use new fields.
- dummy software event
- new PERF_SAMPLE_IDENTIFIER to make samples always parseable
3.11 -- 2 September 2013
- Event multiplexing by hrtimers
(9e6302056f8029f438e853432a856b9f13de26a6)
- Add sysfs entry to adjust multiplexing interval per PMU
(62b8563979273424d6ebe9201e34d1acc133ad4f)
- AMD IOMMU uncore PMU support
(7be6296fdd75f716f7348251433ea68c4b362cf3)
- hw_breakpoint cleanups (inspired my by trinity bug report)
- Intel Haswell PMU
- Overflow on hw breakpoint events fixed to not double count
(ab573844e3058eef2788803d373019f8bebead57)
- Allow overlapping bit ranges in sysfs format files
(fd851780e61ac36e8d59fe87cca01a2e673930ff)
3.10 -- 30 June 2013
- AMD Fam16h Northbridge and L2I event support
(c43ca5091a374c1f6778bd7e4a39a5a10735a917)
- Intel PEBS Precise store
(9ad64c0f481c37a63dd39842a0fd264bee44a097)
- Intel PEBS Load Latency Measurement
(f20093eef5f7843a25adfc0512617d4b1ff1aa6e)
- Change AMD Fam15h Northbridge support to use separate PMU
(0cf5f4323b1b51ecca3e952f95110e03ea611882)
- Ivy Bridge Model 58 Uncore support
(9a6bc14350b130427725f33e371e86212fa56c85)
- Ivy Bridge EP Model 62 Uncore support
(e850f9c33c0c7cc4097ae29f6f8d633237d235e6)
- MEM_*_RETIRED events blocked on Ivy Bridge due to Errata
BV98 (741a698f420c34c458294a6accecfbad702a7c52)
- Fix PERF_SAMPLE_BRANCH_KERNEL to require root.
(7cc23cd6c0c7d7f4bee057607e7ce01568925717)
3.9 -- 28 April 2013
- Fix crash Fix offcore_rsp valid mask for SNB/IVB (f1923820c447e986a9d)
- Fix crash Treat attr.config as u64 in perf_swevent_init() (8176cced706b5e5d158875)
- fix kernel crash with PEBS/BTS after suspend/resume (1d9d8639c063)
- Add SNB/SNB-EP scheduling constraints for cycle_activity event
(fd4a5aef002bb5)
- perf/POWER7: Create a sysfs format entry for Power7 events (3bf7b07ece6e)
- Meta architecture support
- Add Intel IvyBridge event scheduling constraints (69943182bb9)
- AMD Fam15h Northbridge Support (e259514eef764a)
3.8 -- 18 February 2013
- Generic event mappings are now available in
/sys/bus/event_source/devices/cpu/events/
(a47473939db20e3961b200eb00acf5fcf084d755)
3.7 -- 10 December 2012
- Intel Knights Corner / MIC / Xeon Phi Support
- Minor updates to P6 driver
3.6 -- 1 October 2012
- Complete Intel uncore support, including the large set
of Nehalem-EX uncores.
3.5 -- 22 July 2012
- AMD IBS support?
- More fam15h constraint support?
- Re-enable PEBS on Sandybridge if new enough firmware available?
- Preliminary Uncore support?
- Xen virtualized counter support
- uprobe support
3.4 -- 20 May 2012
- s390x support
- LBR support
(bce38cd53e5ddba9cb6d708c4ef3d04a4016ec7e)
(caff2befffe899e63df5cc760b7ed01cfd902685)
- Userspace rdpmc() support (read without syscall), although
no proper way to detect it.
3.3 -- 19 March 2012
- ENOSPC errors switched to be EINVAL.
- Intel fixed counter 2 - (core2 and later). This is the way
to count UNHALTED_REFERENCE_CYCLES. The regular cycle counter
is affected by frequency scaling and Turbo mode.
Special support is needed because the event number means different
things when programmed on the fixed counter versus a standard
generic counter.
Access is by PERF_COUNT_HW_REF_CPU_CYCLES.
- User access to Nehalem/Westmere/Sandybridge Offcore events
(note, this is different than "Uncore" events)
finally added. (These require special support as they
access an extra MSR). Unfortunately detecting if this
feature is available is not-reliable for older kernels
due to an implementation bug.
- cpuid has a mask that tells if some events are not supported.
Support is added to disable these events properly.
- Introduced Possible
bug with regards to throttling
- Event scheduler re-write by Richter.
- Nehalem/Westmere
node-stores and node-stores-misses generalized events changed.
- Support for KVM in-guest counter use.
3.2 -- 4 January 2012
3.1 -- 24 October 2011
- Model 45 SandyBridge EP support
3.0 -- 22 July 2011
- PERF_COUNT_HW_STALLED_CYCLES_FRONTEND and
PERF_COUNT_HW_STALLED_CYCLES_BACKEND generalized event
2.6.39 -- 19 May 2011
Nehalem/Westmere Offcore Response support was removed for raw
access (needed by PAPI and other external tools)
See here for more details
- SandyBridge support
- AMD Family 15h (Bulldozer, Interlagos) support
- cgroup support
- Xeon E7 (aka Westmere EX) support (model 47)
- Nehalem built-in cache events changed.
- Fixes resource leak DoS when using inherit (f07b34a6fac9873fd)
- Fixes kernel panic when multithreaded/multiplexing runs
are done (6db8828cafd6a)
2.6.38 -- 15 March 2011
2.6.37 -- 5 Jan 2011
- Support for MIPS merged (de74696cde9).
- Fix bogus AMD64 TLB events (ba0cef3d149ce4db293c572bf36ed352b11ce7b9).
- Fix bogus context time tracking (ce9f2357a).
- Removed the /sys/devices/system/cpu/perf_events directories.
If you're tryig to detect if perf is running, try
/proc/sys/kernel/perf_event_paranoid
- Before 2.6.37 you could enable profiling in a sibling event
group by sending a PERF_EVENT_IOC_REFRESH to the
group leader (this was undocumented behavior).
This behavior was removed in 2.6.37.
2.6.36 -- 21 Oct 2010
- Support for SH-3 merged
- Support for DEC Alpha merged (c1b3662b648).
- ARM and SH oprofile built on top of perf
- Better handling of spurious NMIs (e51ab6afa1).
- Support for raw SPARC64 events (c12212b66).
- Fix for some Pentium 4 bugs that could lock machine (c991da813a0).
- Per-thread events with a cpu filter, i.e., cpu != -1, were not
reporting correct timings when the thread never ran on the
monitored cpu.(3f8aa77a1)
- Fix Nehalem-EX PMU programming errata (15c1ed06db).
2.6.35 -- 1 Aug 2010
- Support for Pentium 4 processors merged. Common usage cases
still broken though, and won't be fixed until at least 2.6.39
- "Retired Branches" predefined event fixed for AMD64;
the wrong event is used on all previous kernels (f287d332ce835f77a4f5077d2c0ef1e3f9ea42d2).
- SNOOPQ_REQUEST_OUTSTANDING constraints fixed for Westmere
(bbbe758bd)
- Frequency-driven sw-events were broken (a6ee4fa268).
- A "make perf-tarbz2-src-pkg" option was made to make it
possible to build perf w/o the kernel (ad2ad58ae53).
- Fix a problem when tracefiles aren't aligned to 8-bytes
causing problems on arches that need that (2dba103a17bd).
- The value scale times of group siblings are not
updated when the monitored task dies (8dbab958a29).
- PERF_FORMAT_GROUP didn't
work as expected if the task the counters were attached to quit
before the read() call (025b88fee770b).
- perf kvm measuring guest performance support.
- Possible problem with overflow on Power fixed
(219a92a4c40db2fac604f63bce9a5a3fe1967879)
2.6.34 -- 17 May 2010
- Support for Nehalem-EX chips added
- Support for ARMv6 processors added (b94658f857c47f2)
- Support for Intel CoreDuo added (d41180d7bc3e74f14ef)
- Support for Westmere processors (b3f73080401e2fa3a6)
- Fix for PERF_FORMAT_GROUP not working for attached processes.
(050735b08ca8a016bbace4445fa025b88fee770b)
- Enforce constraints on AMD Northbridge Events
(38331f62c20456454eed9ebea2525f072c6f1d2e)
- The PERF_COUNT_SW_CONTEXT_SWITCHES event switched from
being reported always as a user event to being
reported always as a kernel event
(e49a5bd38159dfb1928fd25b173bc9de4bbadb21)
- Simple LBR support?
(caff2befffe899e63df5cc760b7ed01cfd902685)
2.6.33 -- 24 Feb 2010
- Fixes made that are needed for PAPI
event multiplexing to work properly.
- Fixes made that enable more straightforward
detection of counter conflicts when allocating events.
- Counts might not have been updated when attached
processes exit (f439c167ae559533,cd9e13a4c89ee44)
- Add event constraints for x86 processors
(f1682835c3cd5c54a1)
- Add support for per-task per-cpu counters
(f4c4176f21533e22bcc292030da72bcfa105f5b8)
2.6.32 -- 3 Dec 2009
- Performance Counters for Linux renamed Perf Events.
- Main header file renamed from perf_counter.h to
perf_event.h
- F_SETOWN_EX fcntl() parameter introduced,
which is needed to
properly get overflow counts from threads.
(F_SETOWN behavior was unintentionally broken
in 2.6.12 and no one noticed until 2.6.32)
2.6.31 -- 9 Sep 2009
- Initial merge of Performance Counters for Linux (PCL) code.
Various resolved issues to Watch For when using Older Kernels
- Event Constraints -- On kernels before 2.6.34 event constraints are not enforced by
the kernel. This means that when using PAPI or perf on machines
like Core2 or Nehalem you will get a "0" result for some events
if specified first on the command line, but proper results
when specified second. (or vice-versa). Short of upgrading
the kernel there's not much that can be done about this without
a lot of overhead.
- Inherit Multithreading Crash -- On kernels before 2.6.39
it is very easy to completely out-of-memory the kernel by turning
on event inheriting and spawning a lot of threads. A fix has been
merged and backported via stable updates.
Back to the unofficial perf_event page