Transcript Document

Benchmarks
 Programs specifically chosen to measure performance
 Must reflect typical workload of the user
 Benchmark types
 Real applications
 Small benchmarks
 Benchmark suites
 Synthetic benchmarks
Real Applications
 Workload: Set of programs a typical user runs day in and
day out.
 To use these real applications for metrics is a direct way of
comparing the execution time of the workload on two
machines.
 Using real applications for metrics has certain restrictions:
 They are usually big
 Takes time to port to different machines
 Takes considerable time to execute
 Hard to observe the outcome of a certain improvement
technique
Comparing & Summarizing Performance
Computer A
Computer B
Program 1
1s
100 s
Program 2
1000 s
100 s
Total time
1001 s
200 s
 A is 100 times faster than B for program 1
 B is 10 times faster than A for program 2
 For total performance, arithmetic mean is used:
1 n
AM   Timei
n i 1
Arithmetic Mean
 If each program, in the workload, are not run equal # times, then
we have to use weighted arithmetic mean:
1 n
AM   wi  Timei
n i 1
• Suppose that the program 1 runs 10 times as
often as the program 2. Which machine is
faster?
Computer A
Computer B
Program 2 (seconds)
weight
10
1
1
1000
100
100
Weighted AM
-
?
?
Program 1 (seconds)
Small Benchmarks
 Small code segments which are common in many applications
 For example, loops with certain instruction mix
 for (j = 0; j<8; j++)
S = S + Aj  Bi-j
 Good for architects and designers
 Since small code segments are easy to compile and simulate even by
hand, designers use these kind of benchmarks while working on a
novel machine
 Can be abused by compiler designers by introducing special-purpose
optimizations targeted at specific benchmark.
Benchmark Suites
 SPEC (Standard Performance Evaluation Corporation)
 Non-profit organization that aims to produce "fair, impartial and
meaningful benchmarks for computers”
 Began in 1989 - SPEC89 (CPU intensive)
 Companies agreed on a set of real programs and inputs which they
hope reflect a typical user’s workload best.
 Valuable indicator of performance
 Can still be abused
 Updates are required as the applications and their workload change by
time
SPEC Benchmark Sets
 CPU Performance (SPEC CPU2006)
 Graphics (SPECviewperf)
 High-performance computing (HPC2002,
MPI2007, OMP2001)
 Java server applications (jAppServer2004)
 a multi-tier benchmark for measuring the performance of Java 2
Enterprise Edition (J2EE) technology-based application servers.
 Mail systems (MAIL2001, SPECimap2003)
 Network File systems (SFS97_R1 (3.0))
 Web servers (SPEC WEB99, SPEC WEB99 SSL)
 More information: http://www.spec.org/
SPECInt
Integer Benchmarks
Name
Description
400.perlbench
Programming Language
401.bzip2
Compression
403.gcc
C Compiler
429.mcf
Combinatorial Optimization
445.gobmk
Artificial Intelligence
456.hmmer
Search Gene Sequence
458.sjeng
Artificial Intelligence
462.libquantum
Physics / Quantum Computing
464.h264ref
Video Compression
471.omnetpp
Discrete Event Simulation
473.astar
Path-finding Algorithms
483.xalancbmk
XML Processing
SPECfp
Floating Point Benchmarks
Name
Type
wupwise
Quantum chromodynamics
swim
Shallow water model
mgrid
multigrid solver in 3D potential field
applu
Parabolic/elliptic partial dif. equation
mesa
Three-dimensional graphics library
galgel
Computational fluid dynamics
art
Image recognition using neural nets
equake
Seismic wave propagation simulation
facerec
Image recognition of faces
ammp
Computational chemistry
lucas
Primality testing
fma3d
Crash simulation
sixtrack
High-energy nuclear physics acceleration design
apsi
meteorology; pollutant distribution
SPEC CPU2006 – Summarizing
 SPEC ratio: the execution time measurements are normalized by
dividing the measured execution time by the execution time on a
reference machine
 Sun Microsystems Fire V20z, which has an AMD Opteron 252 CPU,
running at 2600 MHz.
 164.gzip benchmark executes in 90.4 s.
 The reference time for this benchmark is 1400 s,
 benchmark is 1400/90.4 × 100 = 1548 (a unitless value)
 Performances of different programs in the suites are summarized
using “geometric mean” of SPEC ratios.
Pentium III & Pentium 4
Comparing Pentium III and Pentium 4
Ratio
Pentium III
CINT2000/Clock rate in MHz
0.47
CFP2000/Clock rate in MHz
Implementation efficiency?
0.34
Pentium 4
0.36
0.39
SPEC WEB99
System
Processor
# of disk # of
drivers CPUs
# of
networks
Clock
rate
(GHz)
Result
1550/1000
Pentium III
2
2
2
1
2765
1650
Pentium III
3
2
1
1.4
1810
2500
Pentium III
8
2
4
1.13
3435
2550
Pentium III
1
2
1
1.26
1454
2650
Pentium 4 Xeon
5
2
4
3.06
5698
4600
Pentium 4 Xeon
10
2
4
2.2
4615
6400/700
Pentium III Xeon
5
4
4
0.7
4200
6600
Pentium 4 Xeon MP
8
4
8
2
6700
8450/700
Pentium III Xeon
7
8
8
0.7
8001
Power Consumption Concerns
Performance studied at different levels:
1. Maximum power
2. Intermediate level that conserves battery life
3. Minimum power that maximizes battery life
 Intel Mobile Pentium & Pentium M:
two available clock rates

1.
2.



Maximum
Reduced clock rate
Pentium M @ 1.6/0.6 GHz
Pentium 4-M @ 2.4/1.2 GHz
Pentium III-M @ 1.2/0.8 GHz
Three Intel Mobile Processors
Energy Efficiency
Synthetic Benchmarks
 Artificial programs constructed to try to match the characteristics of
a large set of program.
 Goal: Create a single benchmark program where the execution
frequency of instructions in the benchmark simulates the instruction
frequency in a large set of benchmarks.
 Examples:
 Dhrystone, Whetstone
 They are not real programs
 Compiler and hardware optimizations can inflate the improvement
far beyond what the same optimization would do with real programs
Amdahl’s Law in Computing
 Improving one aspect of a machine by a factor of n does not
improve the overall performance by the same amount.
 Speedup = (Performance after imp.) /
(Performance before imp.)
 Speedup = (Execution time before imp.)/
(Execution time after imp.)
 Execution Time After Improvement =
Execution Time Unaffected +
(Execution Time Affected/n)
Amdahl’s Law
 Example: Suppose a program runs in 100 s on a
machine, with multiplication responsible for 80 s of this
time.
 How much do we have to improve the speed of
multiplication if we want the program to run 4 times
faster?
 Can we improve the performance by a factor 5?
Amdahl’s Law
 The performance enhancement possible due to a given improvement is
limited by the amount that the improved feature is used.
 In previous example, it makes sense to improve multiplication since it
takes 80% of all execution time.
 But after certain improvement is done, the further effort to optimize the
multiplication more will yield insignificant improvement.
 Law of Diminishing Returns
 A corollary to Amdahl’s Law is to make a common case faster.
Examples
 Suppose we enhance a machine making all floating-point instructions run
five times faster. If the execution time of some benchmark before the
floating-point enhancement is 10 seconds, what will the speedup be if half
of the 10 seconds is spent executing floating-point instructions?
 We are looking for a benchmark to show off the new floating-point unit
described above, and want the overall benchmark to show a speedup of 3.
One benchmark we are considering runs for 90 seconds with the old
floating-point hardware. How much of the execution time would floatingpoint instructions have to account for in this program in order to yield our
desired speedup on this benchmark?
Remember

Total execution time is a consistent summary of performance


Execution Time = (IC  CPI)/f
For a given architecture, performance increases come from:
increases in clock rate (without too much adverse CPI effects)
2. improvements in processor organization that lower CPI
3. compiler enhancements that lower CPI and/or IC
1.