Retired Branches
PAPI_BR_INS
branches:u
The retired branches performance counter is another one that
in theory should be deterministic, but in practice this
is not the case (mainly on x86)
x86 and x86_64
Retired branch counts on x86 in general include at least one
extra instruction each time a hardware interrupt or
page fault happens, even
if only user space code is being monitored.
On x86 the non-determinism is related to the
hardware interrupt and page-fault count, similar to the way the
retired instruction
count behaves.
- Pentium Pro, II, III
- BR_INST_RETIRED (0xc4,0x00) -
This particular counter seems to overcount by approximately
an integer multiple
of the number of hardware interrupts taken. This value
was measured as 4x on a Pentium III using perf, and
3x on a Pentium Pro using pfmon.
- AMD
- RETIRED_BRANCH_INSTRUCTIONS (r5000c2:u) -
Documentation specifies this includes *all* control-flow
changes, including exceptions and interrupts.
- Atom
- BRANCH_INSTRUCTIONS_RETIRED (branches:u)
- Core2
- BRANCH_INSTRUCTIONS_RETIRED (branches:u) --
The cpuid instruction also counts as a branch.
- Nehalem
- BRANCH_INSTRUCTIONS_RETIRED (branches:u) --
- SandyBridge
- BRANCH_INSTRUCTIONS_RETIRED (branches:u) --
- Pentium D
- BRANCH_RETIRED:MMNP:MMNM:MMTP:MMTM
POWER
- Power 6
branches:u on Power6 is not deterministic. Apparently there
is an extra bit you can set to get better results, but the
perf_events default event does not set this.
Back to main Performance Counters Page