Load Instructions
PAPI_LD_INS
Retired loads.
x86 and x86_64
On most (all?) processors the first floating point instruction
adds in an extra load.
Page faults count as a load (5 loads on a Pentium D).
Seems to only count the "cmps" string instruction (load two values
from different memory locations and compare) as one
load rather than two (for exception see Pentium D).
CMOV instructions with memory as an op count a load even
if the condition isn't met.
x87 FPU execeptions cause an additional load instruction.
SSE exceptions cause an additional load instruction.
fbstp instructions count as 1 load and 1 store.
- Pentium Pro, II, III
- AMD
- Atom
- Core2
- INST_RETIRED:LOADS - r5001c0:u
The "leave" instruction counts as two loads.
The "fstenv", "fxsave", and "fsave" instructions also count
as a load.
The "maskmovq" and "maskmovdqu" instructions count as a load.
"movups" , "movupd" and "movdqu" _to_ memory are counted as a load
(as well as a store)
- Nehalem, Nehalem EX
- MEM_INST_RETIRED:LOADS - r50010b:u
paddb, paddw, and paddd operating on a memory value
*do not* count as loads on Nehalem.
- SandyBridge
- MEM_UOP_RELATED:ANY_LOADS - r5381d0:u
As with P4, counts uops not ops.
- Pentium 4
- FRONT_END_EVENT:NBOGUS,UOPS_TYPE:TAGLOADS
push of a segment (fs/gs) counts as two.
movups (store) counts as a load.
movdqu (load) counts as two loads.
prefetch instructions *do not* count as loads
lddqu counts as two loads.
movupd (load) counts as two loads.
fstps counts as two (not zero) loads.
fldt counts as two loads.
fldenv counts as 7 loads?
frstor counts as 23 loads??
fxrstor counts as 26 loads???
rep lods string instructions count as individual loads
rep movs string instructions are done as blocks of 16-byte copies,
plus one byte each for the remainder
rep cmps string instruction counts as two individual loads
rep scas string instruction counts as individual loads
Back to main Performance Counters Page