The GFLOPS/W of the various machines in the VMW Research Group

This is a companion to the top performance list for the same set of machines. We are using approximately the same methodology as the green500 list.

I'm using a WattsUpPro? meter for the power readings.

The raw data for these rankings is available here: http://github.com/deater/performance_results.git

# Name GFLOPS/W GFLOPS Average Power Max Power Idle Power arch vendor type Cores/Threads RAM Other Linpack Settings
1 m1 5.9 GFLOPS/W 154 GFLOPS 26 W - W - W ARM64 Apple M1 Macbook Air 4 (8) 16GB (N=12000, 4 threads, OpenBLAS)
2 Raspberry Pi5 3.6 GFLOPS/W 31.4 GFLOPS 8.6W 15.2W 4.3W ARM64 ARMv8 Cortex A76 4 8GB (N=20000, 4 threads, OpenBLAS)
3 haswell-ep 2.13 GFLOPS/W 428 GFLOPS 201 W 298 W 58.7 W x86_64 Intel 6/63/2 hsw e5-2640v3 16 (32) 80GB (N=80000, 16 threads, OpenBLAS)
4 Raspberry Pi-4B (4GB) 64-bit kernel/userspace 2.02 GFLOPS/W 13.5 GFLOPS 6.66W 7.30W 2.56W aarch64 ARMv8 Cortex A72 4 4GB (N=20000, 4 threads, OpenBLAS)
5 haswell/quadro (cpu only) 1.68 GFLOPS/W 181 GFLOPS 107.9 W 134.1 W 29.3 W x86_64 Intel 6/60/3 hsw i7-4790 4 (8) 16GB Quadro K2200 (N=40000, 4 threads, OpenBLAS)
6 broadwell macbookair 1.64 GFLOPS/W 47.7 GFLOPS 29.1 W 32.6 W 10.0 W x86_64 Intel 6/61/4 bdw i5-5250U 2 (4) 4GB macbook-air (N=20000, 2 threads, OpenBLAS)
7 haswell desktop 1.56 GFLOPS/W 145 GFLOPS 92.7 W 126.6 W 22.3 W x86_64 Intel 6/60/3 hsw i7-4770 4 (8) 4GB (N=20000, 4 threads, OpenBLAS)
8 Raspberry Pi-4B (1GB) 1.50 GFLOPS/W 9.92 GFLOPS 6.6W 7.9W 2.9W ARM ARMv7/8 Cortex A72 4 1GB (N=9000, 4 threads, OpenBLAS)
9 haswell desktop (instrumented) 1.47 GFLOPS/W 115 GFLOPS 80.6 W 107 W 25.9 W x86_64 Intel 6/60/3 hsw i5-4570S 4 (4) 4GB (N=20000, 4 threads, OpenBLAS)
10 Raspberry Pi Zero 2 W 1.46 GFLOPS/W 5.10 GFLOPS 3.5W 4.8W 0.9W ARM ARMv7 Cortex A53 4 512MB (N=6000, 4 threads, OpenBLAS)
11 Raspberry Pi-4B (4GB) 1.35 GFLOPS/W 9.69 GFLOPS 7.2W 8.2W 2.8W ARM ARMv7/8 Cortex A72 4 4GB (N=19000, 4 threads, OpenBLAS)
12 ivb mac-mini 1.21 GFLOPS/W 41.2 GFLOPS 33.9 W 35.8 W 11.5 W x86_64 Intel 6/58/9 ivb i5-3210M 2 (4) 4GB (N=20000, 2 threads, OpenBLAS)
13 jetson-tx1 1.20 GFLOPS/W 16 GFLOPS 13.4 W 15.3 W 2.1 W ARM ARMv8 CortexA57 4 4GB NVIDIA (N=20000, 4 thread, OpenBLAS)
14 Raspberry Pi-3A+ 1.19 GFLOPS/W 5.00 GFLOPS 4.1W 7.6W 1.4W ARM ARMv7/8 Cortex A53 4 512MB (N=5000, 4 threads, OpenBLAS)
15 raspberry pi 2B-v1.2 1.07 GFLOPS/W 4.43 GFLOPS 4.1W 5.1W 1.7W ARM ARMv7/8 Cortex A53 4 1GB 2B-v1.2 (N=8000, 4 threads, OpenBLAS)
16 ivb macbook-air 1.02 GFLOPS/W 34.5 GFLOPS 34.0 W 37.2 W 13.8 W x86_64 Intel 6/58/9 ivb i5-3427U 2 (4) 4GB (N=10000, 2 threads, OpenBLAS)
17 raspberry pi4 cluster 0.863 GFLOPS/W 50.7 GFLOPS 58.8W 64.8W 33.6W ARM ARMv8 Cortex A72 20 24GB 4 Model B (N=40000, 20 threads, OpenBLAS)
18 raspberry pi3 0.813 GFLOPS/W* 3.62 GFLOPS 4.3W 4.8W 1.8W ARM ARMv7/8 Cortex A53 4 1GB 3 Model B (N=6000*, 4 threads, OpenBLAS)
19 fam17h-epyc 0.795 GFLOPS/W 109 GFLOPS 137 W 151 W 67 W x86_64 AMD 23/1/2 EPYC 7251 8 (16) 16GB (N=40000, 8 threads, OpenBLAS)
20 raspberry pi 3B+ 0.73 GFLOPS/W 5.3 GFLOPS 7.3W 9.4W 2.6W ARM ARMv7/8 Cortex A53 4 1GB 3B+ (N=10000, 4 threads, OpenBLAS)
21 BeagleV 0.68 GFLOPS/W 5.3 GFLOPS 8.0W 9.0W 3.4W RISCV RISCV64 TH1520 4 4GB BeagleV-Ahead (N=8000, 4 threads, OpenBLAS)
22 odroid-xu 0.599 GFLOPS/W 8.3 GFLOPS 13.9 W 18.4 W 2.7 W ARM ARMv7 Cortex A7/A15 4 Big 4 Little 2GB Exynos 5 Octa (N=12000, 4 threads, OpenBLAS)
23 fam15h-piledriver 0.466 GFLOPS/W 122 GFLOPS 262 W 335 W 167 W x86_64 AMD 21/2/0 Opteron 6376 16 (32) 16GB (N=40000, 16 threads, OpenBLAS)
24 dragonboard 0.450 GFLOPS/W 2.10 GFLOPS 4.7W 5.7W 2.4W ARM ARMv8 Cortex-A53 4 1GB Snapdragon 410c (N=8000, 4 threads, OpenBLAS)
25 haswell/quadro (GPU only) 0.436 GFLOPS/W 38.4 GFLOPS (Double) 88.0 W 121.1 W 29.2 W NVIDIA Quadro K2200 4GB (N=40000, hpl-cuda)
26 raspberry pi2 0.432 GFLOPS/W 1.47 GFLOPS 3.4 W 3.6 W 1.8 W ARM ARMv7 Cortex A7 4 1GB Model 2 (N=10000, 4 threads, OpenBLAS)
27 fam15h-a10 0.432 GFLOPS/W 54 GFLOPS 125.6 W 148.6 W 28.2 W x86_64 AMD 21/19/1 A10-6800B 4 8GB (N=30000, 4 threads, OpenBLAS)
28 fam16h-a8-jaguar 0.354 GFLOPS/W 14.1 GFLOPS 39.7 W 43.6 W 22.5 W x86_64 AMD 22/48/1 A8-6410 4 4GB (N=10000, 4 threads, OpenBLAS)
29 core2 0.292 GFLOPS/W 18.0 GFLOPS 61.7 W 67.9 W 23.4 W x86_64 Intel 6/23/10 Core2 P8700 2 4GB (N=15000, 2 threads, OpenBLAS)
30 chromebook 0.277 GFLOPS/W 3.0 GFLOPS 10.7 W 11.1 W 5.9 W ARM ARMv7 Cortex A15 2 2GB Exynos 5 Dual (N=10000, 2 threads, OpenBLAS)
31 fam10h-phenom 0.277 GFLOPS/W 40.3 GFLOPS 145.4 W 175.0 W 69.5 W x86_64 AMD 16/4/3 Phenom II X4 955 4 2GB (N=15000, 4 threads, OpenBLAS)
32 raspberry pi-zero-w 0.238 GFLOPS/W 0.247 GFLOPS 1.0 W 1.1 W 0.6 W ARM ARMv6 BCM2835 1 512MB Model Zero-W (N=4000, 1 thread, OpenBLAS)
33 raspberry pi-zero 0.236 GFLOPS/W 0.319 GFLOPS 1.3 W 1.4 W 0.8 W ARM ARMv6 BCM2835 1 512MB Model Zero (N=5000, 1 thread, OpenBLAS)
34 raspberry pi-aplus 0.223 GFLOPS/W 0.218 GFLOPS 1.0 W 1.0 W 0.8 W ARM ARMv6 BCM2835 1 256MB Model A+ (N=4000, 1 thread, OpenBLAS)
35 cubieboard2 0.194 GFLOPS/W 0.861 GFLOPS 4.4W 4.6W 2.2W ARM ARMv7 Cortex A7 2 1GB Allwinner A20 (N=8000, 2 threads, OpenBLAS)
36 atom-cedarview desktop 0.170 GFLOPS/W 3.1 GFLOPS 18.2 W 18.5 W 15.5 W x86_64 Intel 6/54/1 Atom D2550 2 (4) 4GB (N=10000, 2 threads, OpenBLAS)
37 pi-cluster 0.166 GFLOPS/W 15.5 GFLOPS 93.1 W 96.8 W 71.3 W arm Cortex A7 Raspberry Pi 2 96 24GB (N=48000, 96 threads, OpenBLAS)
38 pandaboard-es 0.163 GFLOPS/W 0.951 GFLOPS 5.8 W 6.5 W 3.0 W ARM ARMv7 Cortex A9 2 1GB ES, OMAP4 (N=4000, 2 threads, OpenBLAS)
39 atom-cedarview server 0.149 GFLOPS/W 2.6 GFLOPS 22.1 W 22.4 W 18.6 W x86_64 Intel 6/54/9 Atom S1260 2 (4) 4GB (N=20000, 2 threads, OpenBLAS)
40 raspberry pi-bplus 0.118 GFLOPS/W 0.213 GFLOPS 1.8 W 1.9 W 1.6 W ARM ARMv6 BCM2835 1 512MB Model B+ (N=5000, 1 thread, OpenBLAS)
41 fam14h-bobcat 0.106 GFLOPS/W 2.76 GFLOPS 26.1 W 27.1 W 14.8 W x86_64 AMD 20/2/0 Bobcat 2 2GB G-T56N (N=8000, 2 threads, OpenBLAS)
42 raspberry pi compute-module 0.103 GFLOPS/W 0.217 GFLOPS 2.1W 2.2W 1.9W ARM ARMv6 BCM2835 1 512MB Pi Compute Module (N=6000, 1 thread, OpenBLAS)
43 atom-eeepc 0.086 GFLOPS/W 1.37 GFLOPS 15.9 W 16.3 W 10.2 W x86 Intel 6/28/2 Atom N270 1 (2) 2GB eeepc 901 (N=12000, 2 threads, OpenBLAS)
44 raspberry pi b 0.073 GFLOPS/W 0.213 GFLOPS 2.9 W 3.0 W 2.7 W ARM ARMv6 BCM2835 1 512MB Model B (N=5000, 1 thread, OpenBLAS)
45 Pentium D 0.064 GFLOPS/W 10.3 GFLOPS 160.7 W 180.5 W 77.2 W x86_64 Intel 15/6/5 Pentium 4/D 1 (2) 1GB (N=8000, 2 threads, OpenBLAS)
46 beaglebone-black 0.026 GFLOPS/W 0.068 GFLOPS 2.6 W 2.8 W 1.9 W ARM ARMv7 Cortex A8 1 512MB TI AM3 (N=5000, 1 thread, OpenBLAS)
47 gumstix-overo 0.015 GFLOPS/W 0.041 GFLOPS 2.7 W 2.8 W 2.0 W ARM ARMv7 Cortex A8 1 256MB TI OMAP3 (N=4000, 1 thread, ATLAS)
48 beagleboard-xm 0.014 GFLOPS/W 0.054 GFLOPS 4.0 W 4.3 W 3.2 W ARM ARMv7 Cortex A8 1 512MB TI DM3730 (N=5000, 1 thread, OpenBLAS)
49 Pentium II 0.005 GFLOPS/W 0.238 GFLOPS 48.3 W 48.7 W 31.2 W x86 Intel 6/5/2 Pentium II 1 256MB (N=3000, 1 thread, OpenBLAS)
50 sparc 0.003 GFLOPS/W 0.456 GFLOPS 140.7W 146.8W 136.9W SUN SPARC Ultra 1 512MB TI Ultrasparc II (N=5000, 1 thread, OpenBLAS)
51 appleII 6.65E-9 GFLOPS/W 1.33E-7 GFLOPS 20.1 W 20.1 W 20.1 W MOS 65C02 Apple IIe 1 128k platinum (N=10, 1 thread, BASIC)
? ELF Membership Card ? ??? GFLOPS ? ? ? ELF RCA1802 1 32kB ??? (??, 1 thread)
? sandybridge-ep ? 85 GFLOPS ? ? ? x86_64 Intel 6/45/? snb 12 (24) 16GB (N=40000 12 threads, ATLAS)
? trimslice ? ??? GFLOPS ? ? ? ARM ARMv7 Cortex A9 2 1GB Tegra2 ???
? octane ? ??? GFLOPS ? ? ? MIPS SGI MIPS R12k 1 ??? ??? (??, 1 thread)
? avr32 ? ??? GFLOPS ? ? ? AVR32 AVR AP7000 1 ??? ??? (??, 1 thread)
? gumstix-netstix ? ??? GFLOPS ? ? ? ARM ARMv5 Intel PXA255 1 64MB ??? (??, 1 thread)
? k6-2+ ? ??? GFLOPS ? ? ? x86 AMD K6-2+ 1 ?? ??? (??, 1 thread)
? 486 ? ??? GFLOPS ? ? ? x86 Cyrix 486 1 20MB ??? (??, 1 thread)
? g3-iBook ? ??? GFLOPS ? ? ? PPC Apple G3 1 640MB ??? (??, 1 thread)
? g4-powerBook ? ??? GFLOPS ? ? ? PPC Apple G4 1 2 GB ??? (??, 1 thread)
? p4 ? ??? GFLOPS ? ? ? x86 Intel Pentium 4 1 768MB ??? (??, 1 thread)
? core duo ? ??? GFLOPS ? ? ? x86 Intel Core Duo 2 2 GB ??? (??, 1 thread)
* -- the pandaboard crashes due to overheating (even with heatsinks) if N=10000 is used

* -- the pi3 overheats/throttles when running Linpack. With a full heatsink and active cooling it can reportedly achieve 6.4GFLOPS likely breaking the 1GFLOP/W barrier.

Red indicates machine currently is not in working order


Back to VMW Research Group page