It can be slow-going getting patches included in the kernel. And for various
reasons the kernel devs will block patches. The ones I have here are ones
useful for running PAPI.
Use rdpmc when possible in-kernel
The rdpmc instruction is faster than the equivelant rdmsr call,
so use it when possible in the kernel.
The perfctr kernel patches did this, after extensive testing showed
rdpmc to always be faster (One can look in etc/costs in the perfctr-2.6
package to see a historical list of the overhead).
I have done some tests on a 3.2 kernel, the kernel module I used
was included in the first posting of this patch:
| ||rdmsr ||rdpmc|
|Core2 T9900 ||203.9 cycles||30.9 cycles|
|AMD fam0fh || 56.2 cycles|| 9.8 cycles|
|Atom 6/28/2 ||129.7 cycles||50.6 cycles|
|Sandybridge-EP||103.9 cycles||32.2 cycles|
The speedup of using rdpmc is large, although granted
it really is a drop in the bucket compared to the other overheads
It's probably possible (and desirable) to do this without
requiring a new field in the hw_perf_event structure, but the fixed events
make this tricky.
Changes since the last version: properly use the "rdpmc" macro,
make event_base_rdpmc an int rather than unsigned long
Intel RAPL events patch
On Sandybridge chips you can measure energy consumed.
Patch coming soon.
Re-enable raw Nehalem/Westmere OFFCORE_EVENTS support
This patch is no longer necessary after the 3.3 release.
Nehalem/Westmere OFFCORE_EVENTS support was finally merged into the 2.6.39
development tree. Shortly before release access to the RAW interface
(which PAPI/libpfm4 needs to access these events) was disabled by Ingo Molnar.
You can read his detailed reasoning for this decision in the
is re-enabled, you can apply the following patch: offcore_raw.patch
Back to the unofficial perf_events page