On Nehalem/Westmere processors the predefined PAPI_FP_OPS event is
SSE-only and will return a value of zero on applications compiled with
32-bit style x87 floating point. This page investigates replacing
the PAPI_FP_OPS used by PAPI in 4.1.2.1 and prior
(FP_COMP_OPS_EXE:SSE_SINGLE_PRECISION+
FP_COMP_OPS_EXE:SSE_DOUBLE_PRECISION) with
a different event
(FP_COMP_OPS_EXE:SSE_FP+FP_COMP_OPS_EXE:X87).
DGEMM Results
64-bit with PAPI 4.1.2.1:
FP_COMP_OPS_EXE:SSE_SINGLE_PRECISION +
FP_COMP_OPS_EXE:SSE_DOUBLE_PRECISION
64-bit with proposed new event:
FP_COMP_OPS_EXE:SSE_FP + FP_COMP_OPS_EXE:X87
Note the graph is essentially identical to the previous one.
32-bit with PAPI 4.1.2.1:
FP_COMP_OPS_EXE:SSE_SINGLE_PRECISION +
FP_COMP_OPS_EXE:SSE_DOUBLE_PRECISION
Note that the NAIVE and ATLAS results are 0.
32-bit with proposed new event:
FP_COMP_OPS_EXE:SSE_FP + FP_COMP_OPS_EXE:X87
Note with the new event NAIVE and ATLAS results match expected.
SGEMM results
64-bit with PAPI 4.1.2.1:
64-bit with proposed new event:
32-bit with PAPI 4.1.2.1:
32-bit with proposed new event:
Summary
The above plots show that the proposed new event matches previous
64-bit behavior, while at the same time generating expected
results on 32-bit. The new event is derived out of the same
number of subevents as the previous event, so minimal impact
will be seen by switching events.
Some investigation still needs to be done to see why the GOTO behavior
is a steady 8x less than expected, even with "inherit" support built in.
ATLAS shows an interesting periodic behavior on 64-bit and is
consistently less by a factor of two.
See results for other processorsBack to main page