Retired Instructions
PAPI_TOT_INS
instructions:u
The retired instruction performance counter is important, as it
in theory should be deterministic, and is part of the highly reported
cycles per instruction (CPI) metric. Unfortunately, there are many
causes of non-determinism present with this counter. Also, various
processor errata can apply to this counter.
Please see the paper
"Can Hardware Performance Counters be Trusted" by Weaver and McKee for
more details on retired instruction non-determinism. The main causes
are Linux address space randomization, un-intential non-determinism in the
binary or operating system (just changing the number of environment
variables can cause problems), as well as processor errata.
ia64
- IA32_INST_RETIRED - x86 compatability mode
- IA64_INST_RETIRED
MIPS
- INSTRUCTIONS_GRADUATED - r12000
PPC
- PM_INST_CMPL
On PPC32, this value might be lower than expected due to "folded"
branch instructions.
SPARC
x86 and x86_64
Retired instruction counts on x86 in general also include at least one
extra instruction each time a hardware interrupt happens, even
if only user space code is being monitored. The one exception to
this is the Pentium 4 counter.
Another special case are rep
prefixed string instructions.
Even if the instruction repeats many times, the instruction is only
counted as one instruction.
A page fault that brings a page into memory for the first time
(on a load or store) also counts as an additional instruction.
If the x87 top-of-stack pointer overflows an extra instruction
is counted.
Note that instructions incorporating floating point "fwait" count
as two instructions even though the disassembler only lists them
as one.
- Pentium Pro, II, III
- INST_RETIRED - this count is equal to the number
of retired instructions plus the hardware count
as measured by
HW_INT_RX.
- Core2
- INSTRUCTIONS_RETIRED
- count is high by number of hardware interrupts.
- Atom
- INSTRUCTIONS_RETIRED
- count is high by number of hardware interrupts.
- Nehalem
- INSTRUCTIONS_RETIRED
- count is high by number of hardware interrupts.
- SandyBridge
- INSTRUCTIONS_RETIRED
- count is high by number of hardware interrupts.
- AMD
- RETIRED_INSTRUCTIONS - the count is high by the number
of hardware interrupts, possibly the same as counted
by INTERRUPTS_TAKEN.
Also the "fninit", "fnsave", and "fnclex" instructions count
as an extra instruction
if one of the exception status word flags (such as PE or ZE)
is set.
- Pentium 4
- INSTR_RETIRED:NBOGUSNTAG -
This counter does not
include hardware interrupts, unlike other x86 processors.
This counter counts the fldcw
instruction as
two separate instructions, which can cause up to 2% error
on some of the SPEC benchmarks.
This counter also counts the
fldenv
, frstor
, maskmovq
,
emms
,
cvtpd2pi (mem)
,
cvttpd2pi (mem)
,
sfence
, and mfence
instructions twice.
If a long-running rep
prefixed instruction is interrupted
by an interrupt, an additional instruction is counted.
- Pentium D
- INSTR_COMPLETED:NBOGUS -
This counter does not have the
fldcw
problem found
in older Pentium 4s. It does count hardware interrupts.
Back to main Performance Counters Page