Retired Instruction Inconsistencies on x86

Background

I am doing research using hardware performance counters on the x86 platform. The counters on the various implementations of x86 do not always match, nor do they match the counts given by various DBI tools. So on this page is a summary of some of the instructions that cause problems in the counts. Many thanks to Kenneth Hoste who has been a big help in understanding these issues.

See this page for some tables of retired instruction counts on various machines for the SPEC benchmarks.

fldcw

The fldcw instruction is counted as two instructions on the Pentium 4-based (NetBurst) cores, but only as one instruction on P6 cores. (Sample code that exhibits this: anom_c.c, anom.s).

This is the cause of a lot of the discrepancies found on the P4 based machines on SPEC, most notably the mesa benchmark.

Here is a file of fldcw instructions counts in SPEC CPU 2000

Here is a file of fldcw instructions counts in SPEC CPU 2006

rep prefix

The rep prefix will repeat a string instruction, (while decrementing %ecx) until %ecx is zero.

The Intel documentation specifies that a rep prefixed instruction counts as one instruction, no matter how many times it repeats. There are also some errata on some models if a rep sequence is interrupted by a hardware interrupt

Operating System Concerns

Under Linux, it seems that the first time a page is faulted into memory, an extra instruction is generated. This only happens the first time a page is accessed (i.e., it does not happen on subsequent TLB misses). I am not sure the source of this.

Hardware interrupts that happen while a program are running also seem to increment the retired_instruction count. On AMD processors this can be tracked with the INTERRUPTS_TAKEN performance counter. I do not think the Intel processors have a corresponding counter.

In theory, it should be possible to find out the values for the pages used and the interrupts that happen through the /proc/<pid>/stat file.

Here is one of the sample codes I used when testing these counters: rep_test.s
Back to my Phase Behavior Page