Deterministic Performance Counter Work
I have been researching the inherent determinism (and overcount) of hardware
performance counters.
Currently this work has mostly been about finding deterministic
events on x86_64 machines, as well as finding sources of non-determinism
and overcount.
TL;DR Summary
On x86, most performance counters are not deterministic. This includes
things like "userspace retired instructions"
that you'd expect to be the same from
run to run. This is even true if you write assembly language programs
with exact numbers of instructions to avoid OS and application variations.
Why? It turns out interrupts (both software and hardware) will increment
most retired instructions counters.
And interrupts are usually unpredictable.
Why does this happen, on both AMD and Intel?
The current theory is the iret (interrupt return)
instruction gets broken into four uops:
some that finish in kernelspace and some in userspace.
The latter gets counted
as a userspace instruction and makes things all non-deterministic.
The unaffected
counters (often stores, or conditional branches) are ones that do not
count iret instructions.
Here is an ongoing paper with the most recent results:
A summary of my findings as of April 2013 can be found in
the following ISPASS 2013 paper:
- V.M. Weaver, D. Terpstra, S. Moore.
"Non-Determinism and Overcount on Modern Hardware
Performance Counter Implementations",
IEEE International Symposium on Performance Analysis of
Systems and Software (ISPASS 2013), Austin, Texas, April 2013.
- The IEEE would prefer you obtain this paper through their
IEEE Explore interface
here.
- You can also view my personal copy of the paper. Warning!
IEEE Copyright rules apply!
ispass2013_deterministic.pdf
- Here are the slides from the talk I gave at ISPASS:
ispass2013_deterministic_slides.pdf
I presented an earlier version on this work at the
FHPM 2010 Workshop:
A somewhat older and more disorganized list of the results I find are
presented broken-out here.
Source Code
This project has two phases. The first involved generating
a large hand-coded assembly benchmark that was used to
find non-determinisms in the hardware performance counters
on a variety of x86_64 systems.
The second phase is automating the code generating and having
some sort of automatic search for such problems. This work
is underway.
The most recent version of the tool used to gather this
information can be obtained via:
git clone https://github.com/deater/deterministic
An older stable release can be downloaded:
deterministic-0.23.tar.bz2 (2M)
15 February 2013
Uses of this research
The most high-profile use of this work is by the
RR deterministic replay team at Mozilla.
You can read an article that references the work
here.
Back to my projects page