Transcript Document
Benchmarks
Programs specifically chosen to measure performance
Must reflect typical workload of the user
Benchmark types
Real applications
Small benchmarks
Benchmark suites
Synthetic benchmarks
Real Applications
Workload: Set of programs a typical user runs day in and
day out.
To use these real applications for metrics is a direct way of
comparing the execution time of the workload on two
machines.
Using real applications for metrics has certain restrictions:
They are usually big
Takes time to port to different machines
Takes considerable time to execute
Hard to observe the outcome of a certain improvement
technique
Comparing & Summarizing Performance
Computer A
Computer B
Program 1
1s
100 s
Program 2
1000 s
100 s
Total time
1001 s
200 s
A is 100 times faster than B for program 1
B is 10 times faster than A for program 2
For total performance, arithmetic mean is used:
1 n
AM Timei
n i 1
Arithmetic Mean
If each program, in the workload, are not run equal # times, then
we have to use weighted arithmetic mean:
1 n
AM wi Timei
n i 1
• Suppose that the program 1 runs 10 times as
often as the program 2. Which machine is
faster?
Computer A
Computer B
Program 2 (seconds)
weight
10
1
1
1000
100
100
Weighted AM
-
?
?
Program 1 (seconds)
Small Benchmarks
Small code segments which are common in many applications
For example, loops with certain instruction mix
for (j = 0; j<8; j++)
S = S + Aj Bi-j
Good for architects and designers
Since small code segments are easy to compile and simulate even by
hand, designers use these kind of benchmarks while working on a
novel machine
Can be abused by compiler designers by introducing special-purpose
optimizations targeted at specific benchmark.
Benchmark Suites
SPEC (Standard Performance Evaluation Corporation)
Non-profit organization that aims to produce "fair, impartial and
meaningful benchmarks for computers”
Began in 1989 - SPEC89 (CPU intensive)
Companies agreed on a set of real programs and inputs which they
hope reflect a typical user’s workload best.
Valuable indicator of performance
Can still be abused
Updates are required as the applications and their workload change by
time
SPEC Benchmark Sets
CPU Performance (SPEC CPU2006)
Graphics (SPECviewperf)
High-performance computing (HPC2002,
MPI2007, OMP2001)
Java server applications (jAppServer2004)
a multi-tier benchmark for measuring the performance of Java 2
Enterprise Edition (J2EE) technology-based application servers.
Mail systems (MAIL2001, SPECimap2003)
Network File systems (SFS97_R1 (3.0))
Web servers (SPEC WEB99, SPEC WEB99 SSL)
More information: http://www.spec.org/
SPECInt
Integer Benchmarks
Name
Description
400.perlbench
Programming Language
401.bzip2
Compression
403.gcc
C Compiler
429.mcf
Combinatorial Optimization
445.gobmk
Artificial Intelligence
456.hmmer
Search Gene Sequence
458.sjeng
Artificial Intelligence
462.libquantum
Physics / Quantum Computing
464.h264ref
Video Compression
471.omnetpp
Discrete Event Simulation
473.astar
Path-finding Algorithms
483.xalancbmk
XML Processing
SPECfp
Floating Point Benchmarks
Name
Type
wupwise
Quantum chromodynamics
swim
Shallow water model
mgrid
multigrid solver in 3D potential field
applu
Parabolic/elliptic partial dif. equation
mesa
Three-dimensional graphics library
galgel
Computational fluid dynamics
art
Image recognition using neural nets
equake
Seismic wave propagation simulation
facerec
Image recognition of faces
ammp
Computational chemistry
lucas
Primality testing
fma3d
Crash simulation
sixtrack
High-energy nuclear physics acceleration design
apsi
meteorology; pollutant distribution
SPEC CPU2006 – Summarizing
SPEC ratio: the execution time measurements are normalized by
dividing the measured execution time by the execution time on a
reference machine
Sun Microsystems Fire V20z, which has an AMD Opteron 252 CPU,
running at 2600 MHz.
164.gzip benchmark executes in 90.4 s.
The reference time for this benchmark is 1400 s,
benchmark is 1400/90.4 × 100 = 1548 (a unitless value)
Performances of different programs in the suites are summarized
using “geometric mean” of SPEC ratios.
Pentium III & Pentium 4
Comparing Pentium III and Pentium 4
Ratio
Pentium III
CINT2000/Clock rate in MHz
0.47
CFP2000/Clock rate in MHz
Implementation efficiency?
0.34
Pentium 4
0.36
0.39
SPEC WEB99
System
Processor
# of disk # of
drivers CPUs
# of
networks
Clock
rate
(GHz)
Result
1550/1000
Pentium III
2
2
2
1
2765
1650
Pentium III
3
2
1
1.4
1810
2500
Pentium III
8
2
4
1.13
3435
2550
Pentium III
1
2
1
1.26
1454
2650
Pentium 4 Xeon
5
2
4
3.06
5698
4600
Pentium 4 Xeon
10
2
4
2.2
4615
6400/700
Pentium III Xeon
5
4
4
0.7
4200
6600
Pentium 4 Xeon MP
8
4
8
2
6700
8450/700
Pentium III Xeon
7
8
8
0.7
8001
Power Consumption Concerns
Performance studied at different levels:
1. Maximum power
2. Intermediate level that conserves battery life
3. Minimum power that maximizes battery life
Intel Mobile Pentium & Pentium M:
two available clock rates
1.
2.
Maximum
Reduced clock rate
Pentium M @ 1.6/0.6 GHz
Pentium 4-M @ 2.4/1.2 GHz
Pentium III-M @ 1.2/0.8 GHz
Three Intel Mobile Processors
Energy Efficiency
Synthetic Benchmarks
Artificial programs constructed to try to match the characteristics of
a large set of program.
Goal: Create a single benchmark program where the execution
frequency of instructions in the benchmark simulates the instruction
frequency in a large set of benchmarks.
Examples:
Dhrystone, Whetstone
They are not real programs
Compiler and hardware optimizations can inflate the improvement
far beyond what the same optimization would do with real programs
Amdahl’s Law in Computing
Improving one aspect of a machine by a factor of n does not
improve the overall performance by the same amount.
Speedup = (Performance after imp.) /
(Performance before imp.)
Speedup = (Execution time before imp.)/
(Execution time after imp.)
Execution Time After Improvement =
Execution Time Unaffected +
(Execution Time Affected/n)
Amdahl’s Law
Example: Suppose a program runs in 100 s on a
machine, with multiplication responsible for 80 s of this
time.
How much do we have to improve the speed of
multiplication if we want the program to run 4 times
faster?
Can we improve the performance by a factor 5?
Amdahl’s Law
The performance enhancement possible due to a given improvement is
limited by the amount that the improved feature is used.
In previous example, it makes sense to improve multiplication since it
takes 80% of all execution time.
But after certain improvement is done, the further effort to optimize the
multiplication more will yield insignificant improvement.
Law of Diminishing Returns
A corollary to Amdahl’s Law is to make a common case faster.
Examples
Suppose we enhance a machine making all floating-point instructions run
five times faster. If the execution time of some benchmark before the
floating-point enhancement is 10 seconds, what will the speedup be if half
of the 10 seconds is spent executing floating-point instructions?
We are looking for a benchmark to show off the new floating-point unit
described above, and want the overall benchmark to show a speedup of 3.
One benchmark we are considering runs for 90 seconds with the old
floating-point hardware. How much of the execution time would floatingpoint instructions have to account for in this program in order to yield our
desired speedup on this benchmark?
Remember
Total execution time is a consistent summary of performance
Execution Time = (IC CPI)/f
For a given architecture, performance increases come from:
increases in clock rate (without too much adverse CPI effects)
2. improvements in processor organization that lower CPI
3. compiler enhancements that lower CPI and/or IC
1.