Retired Instructions

PAPI_TOT_INS

instructions:u

The retired instruction performance counter is important, as it in theory should be deterministic, and is part of the highly reported cycles per instruction (CPI) metric. Unfortunately, there are many causes of non-determinism present with this counter. Also, various processor errata can apply to this counter.

Please see the paper "Can Hardware Performance Counters be Trusted" by Weaver and McKee for more details on retired instruction non-determinism. The main causes are Linux address space randomization, un-intential non-determinism in the binary or operating system (just changing the number of environment variables can cause problems), as well as processor errata.

ia64

MIPS

PPC

SPARC

x86 and x86_64

Retired instruction counts on x86 in general also include at least one extra instruction each time a hardware interrupt happens, even if only user space code is being monitored. The one exception to this is the Pentium 4 counter.

Another special case are rep prefixed string instructions. Even if the instruction repeats many times, the instruction is only counted as one instruction.

A page fault that brings a page into memory for the first time (on a load or store) also counts as an additional instruction.

If the x87 top-of-stack pointer overflows an extra instruction is counted.

Note that instructions incorporating floating point "fwait" count as two instructions even though the disassembler only lists them as one.


Back to main Performance Counters Page