Retired Instruction Inconsistencies on x86
Background
I am doing research using hardware performance counters on the x86 platform.
The counters on the various implementations of x86 do not always match, nor
do they match the counts given by various DBI tools. So on this page
is a summary of some of the instructions that cause problems in the counts.
Many thanks to
Kenneth Hoste who
has been a big help in understanding these issues.
See this page for some tables of retired
instruction counts on various machines for the SPEC benchmarks.
fldcw
The fldcw instruction is counted as two instructions
on the Pentium 4-based (NetBurst) cores, but only as one instruction
on P6 cores. (Sample code that exhibits this:
anom_c.c, anom.s).
This is the cause of a lot of the discrepancies found on the P4 based machines
on SPEC, most notably the mesa benchmark.
Here is a file of fldcw instructions counts in SPEC CPU 2000
Here is a file of fldcw instructions counts in SPEC CPU 2006
rep prefix
The rep prefix will repeat a string instruction,
(while decrementing %ecx) until %ecx is zero.
The Intel documentation specifies that a rep prefixed instruction
counts as one instruction, no matter how many times it repeats.
There are also some errata on some models if a rep sequence is
interrupted by a hardware interrupt
Operating System Concerns
Under Linux, it seems that the first time a page is faulted into
memory, an extra instruction is generated. This only happens
the first time a page is accessed (i.e., it does not happen on
subsequent TLB misses). I am not sure the source of this.
Hardware interrupts that happen while a program are running also
seem to increment the retired_instruction count. On AMD processors
this can be tracked with the INTERRUPTS_TAKEN performance counter.
I do not think the Intel processors have a corresponding counter.
In theory, it should be possible to find out the values for the
pages used and the interrupts that happen through the
/proc/<pid>/stat file.
Here is one of the sample codes I used when testing these counters:
rep_test.s
Back to my Phase Behavior Page