Retired Branches
PAPI_BR_INS
branches:u
The retired branches performance counter is another one that
in theory should be deterministic, but in practice this
is not the case (mainly on x86)
x86 and x86_64
Retired branch counts on x86 in general include at least one
extra instruction each time a hardware interrupt or
page fault happens, even
if only user space code is being monitored.  
On x86 the non-determinism is related to the
hardware interrupt and page-fault count, similar to the way the
retired instruction
count behaves.
- Pentium Pro, II, III
    
    - BR_INST_RETIRED (0xc4,0x00) - 
    This particular counter seems to overcount by approximately
    an integer multiple
    of the number of hardware interrupts taken.  This value
    was measured as 4x on a Pentium III using perf, and
    3x on a Pentium Pro using pfmon.
    
 
- AMD
    
    - RETIRED_BRANCH_INSTRUCTIONS (r5000c2:u) - 
    Documentation specifies this includes *all* control-flow
    changes, including exceptions and interrupts.
    
 
- Atom
    
    - BRANCH_INSTRUCTIONS_RETIRED (branches:u)
    
 
- Core2
    
    - BRANCH_INSTRUCTIONS_RETIRED (branches:u) -- 
           The cpuid instruction also counts as a branch.
    
 
- Nehalem
    
    - BRANCH_INSTRUCTIONS_RETIRED (branches:u) -- 
    
 
- SandyBridge
    
    - BRANCH_INSTRUCTIONS_RETIRED (branches:u) -- 
    
 
- Pentium D
 
    - BRANCH_RETIRED:MMNP:MMNM:MMTP:MMTM
    
 
POWER
- Power 6
 branches:u on Power6 is not deterministic.  Apparently there
        is an extra bit you can set to get better results, but the
        perf_events default event does not set this.
     
Back to main Performance Counters Page