perf_events patches

It can be slow-going getting patches included in the kernel. And for various reasons the kernel devs will block patches. The ones I have here are ones useful for running PAPI.

Use rdpmc when possible in-kernel

The rdpmc instruction is faster than the equivelant rdmsr call, so use it when possible in the kernel.

The perfctr kernel patches did this, after extensive testing showed rdpmc to always be faster (One can look in etc/costs in the perfctr-2.6 package to see a historical list of the overhead).

I have done some tests on a 3.2 kernel, the kernel module I used was included in the first posting of this patch:

rdmsr rdpmc
Core2 T9900 203.9 cycles30.9 cycles
AMD fam0fh 56.2 cycles 9.8 cycles
Atom 6/28/2 129.7 cycles50.6 cycles
Sandybridge-EP103.9 cycles32.2 cycles


The speedup of using rdpmc is large, although granted it really is a drop in the bucket compared to the other overheads involved.

It's probably possible (and desirable) to do this without requiring a new field in the hw_perf_event structure, but the fixed events make this tricky.

Changes since the last version: properly use the "rdpmc" macro, make event_base_rdpmc an int rather than unsigned long


Intel RAPL events patch

On Sandybridge chips you can measure energy consumed. Patch coming soon.

Re-enable raw Nehalem/Westmere OFFCORE_EVENTS support

This patch is no longer necessary after the 3.3 release.

Nehalem/Westmere OFFCORE_EVENTS support was finally merged into the 2.6.39 development tree. Shortly before release access to the RAW interface (which PAPI/libpfm4 needs to access these events) was disabled by Ingo Molnar. You can read his detailed reasoning for this decision in the git changelog.

Until support is re-enabled, you can apply the following patch: offcore_raw.patch
Back to the unofficial perf_events page