The implications of Hyper-threading on machines with
Simulator Workloads
by Vince Weaver ( vince _at_ csl.cornell.edu )
11 November 2005
The Problem
I disabled hyper-threading on the sampaka and
cluizel clusters, a decision multiple people have questioned.
I have conducted a set of experiments to determine the performance
implications of hyper-threading for the most common workload of the clusters,
namely long running processor simulations.
The Experiment
The test was run on "cluizel39", one node of the cluizel cluster.
The node was running Linux 2.6.11 with perfctr and bmcsensor support
patched in. This kernel does have a hyper-threading aware
scheduler.
The node has the following hardware:
- Dual Intel Pentium IV Xeon Processors, 2.8GHz
- 2GB of RAM. 8kB L1 D-cache. 12kB trace cache. 512kB unified L2 cache.
- No disk. All I/O via NFS over 100MB/s ethernet.
The test program was the alpha version of simple-scalar 4.0.
The test benchmark was equake from the spec2k benchmarks, compiled
statically on boulder. The Minnispec lgred.in input was used.
Multiple identical copies of the simulation were launched simultaneously
via a script, and the resulting time it took each thread to complete
was recorded.
The test was conducted both with and without Hyperthreading enabled
via the BIOS.
Results
The errorbars indicate maximum and minimum time of the simultaneous
runs, while the plotted line connects the average times.
Note the errorbars are much wider for odd-numbers of processes.
This is due to the way the Linux scheduler works.
In the 3-thread case, one process gets a CPU to itself, thus finishing
much sooner than the two processes that must share the remaining CPU.
Bouncing processes from cpu to cpu is avoided because
it can cause poor cache performance.
There is no significant performance gain from hypertheading; in fact
in the 3 thread case the hyperthreading performance is worse.
These results are far from the 25-30% maximum performance gains
Intel claim are possible.
Also note that the OS does take hyperthreading into account when
scheduling. If it didn't, then the 2 and 4 cpu cases would be worse for
the HT case (as the scheduler would have assigned the jobs to the first
two cpus it saw (parent and sibling of cpu#0), making one cpu overload
and leaving the other idle).
Conclusion
Hyper-threading would not be a benefit on our clusters.
In the two process case, the machine performs identically
whether hyperthreading is enabled on disabled.
When running more than 2 threads, hyperthreading has no significant
impact on the time taken to finish a job. In fact, hyperthreading
can make performance slightly worse.
There are numerous other reasons to disable hyperthreading:
- RAM Pressure: With twice as many threads, each ends up
with half as much available RAM.
- Confuses the batch scheduler: NBS does not have special
support for hyperthreads; it sees each as a full processor.
Thus it will put 4 jobs on one node and leave others idle.
- Licensing confusion: Many commercial software products
will charge twice as much for a license for a hyper-threaded CPU
- User confusion: Userspace utilities like "top" report
hyperthreaded CPU's as being 2 distinct CPUs. Thus users
will run twice as many processes to get "full" utilization even
if it doesn't make sense performance-wise.
- Security Reasons:It is reported that information can be
leaked between hyperthreads