Software Prefetches
PAPI_PRF_SW
Retired software prefetches.
x86 and x86_64
There are various kinds of software prefetches on x86/x86_64.
Four of them came with SSE1 (note this is implementation specific;
on AMD prefetcht0/t1/t2 all do the same thing, and on intel P4 tends
to treat them differently than others):
- PREFETCHNTA - non temporal, meaning you plan to use
it once and never again
- PREFETCHT0 - load to all cache levels
- PREFETCHT1 - load to level2 and higher (not L1)
- PREFETCHT2 - load to level3 and higher (not l1 or l2)
Two of them are AMD 3dNow instructions that are remaining non-deprecated
despite the deprecation of all other 3dNow instructions:
- PREFETCH - prefetch into L1
- PREFETCHW - prefetch for writing into L1; puts it in modified state
to begin with
Breakdown by Processor Type
- Pentium Pro, II, III
- AMD
all of these events appear to be speculative, and can also be affected
by conflicting hardware prefetches.
- PREFETCH_INSTRUCTIONS_DISPATCHED:LOAD - r53014b
- PREFETCH_INSTRUCTIONS_DISPATCHED:STORE - r53024b
- PREFETCH_INSTRUCTIONS_DISPATCHED:NTA - r53034b
- PREFETCH_INSTRUCTIONS_DISPATCHED:ALL - r53074b
- Atom
- PREFETCH:PREFETCHT0 - r530107
This event seems to double count if an invalid instruction
is fetched
- PREFETCH:SW_L2 - r530607
This event is complicated and measured t0/t1/t2 but
with varying multiples depending on conflicts.
- PREFETCH:PREFETCHNTA - r530807
Seems to measure t0/t1 as well, with varying
multiples depending on conflicts.
- Core2
- SSE_PRE_EXEC:NTA + SSE_PRE_EXEC:L1 + SSE_PRE_EXEC:L2 -
r530007:u,r530107:u,r530207
The "fxsave" instruction counts as 2 T0 prefetches,
the "fxrstor" instruction counts as 5 T0 prefetches.
- Nehalem, Nehalem EX
- Pentium 4
Back to main Performance Counters Page