Nehalem PAPI_FP_OPS event

On Nehalem/Westmere processors the predefined PAPI_FP_OPS event is SSE-only and will return a value of zero on applications compiled with 32-bit style x87 floating point. This page investigates replacing the PAPI_FP_OPS used by PAPI in 4.1.2.1 and prior (FP_COMP_OPS_EXE:SSE_SINGLE_PRECISION+ FP_COMP_OPS_EXE:SSE_DOUBLE_PRECISION) with a different event (FP_COMP_OPS_EXE:SSE_FP+FP_COMP_OPS_EXE:X87).


DGEMM Results

64-bit with PAPI 4.1.2.1: FP_COMP_OPS_EXE:SSE_SINGLE_PRECISION + FP_COMP_OPS_EXE:SSE_DOUBLE_PRECISION

64-bit with proposed new event: FP_COMP_OPS_EXE:SSE_FP + FP_COMP_OPS_EXE:X87
Note the graph is essentially identical to the previous one.
32-bit with PAPI 4.1.2.1: FP_COMP_OPS_EXE:SSE_SINGLE_PRECISION + FP_COMP_OPS_EXE:SSE_DOUBLE_PRECISION
Note that the NAIVE and ATLAS results are 0.

32-bit with proposed new event: FP_COMP_OPS_EXE:SSE_FP + FP_COMP_OPS_EXE:X87
Note with the new event NAIVE and ATLAS results match expected.

SGEMM results

64-bit with PAPI 4.1.2.1:
64-bit with proposed new event:
32-bit with PAPI 4.1.2.1:
32-bit with proposed new event:

Summary

The above plots show that the proposed new event matches previous 64-bit behavior, while at the same time generating expected results on 32-bit. The new event is derived out of the same number of subevents as the previous event, so minimal impact will be seen by switching events.

Some investigation still needs to be done to see why the GOTO behavior is a steady 8x less than expected, even with "inherit" support built in. ATLAS shows an interesting periodic behavior on 64-bit and is consistently less by a factor of two.
See results for other processors
Back to main page