Performance and Quantitative Principles

Download Report

Transcript Performance and Quantitative Principles

ENGS 116 Lecture 2
1
Performance and Quantitative Principles
Vincent H. Berk
September 26th, 2008
Reading for today: Chapter 1.1 - 1.4, Amdahl article
Reading for Monday: Chapter 1.5 – 1.11, Mazor article
Homework for Wednesday: 1.1, 1.3, 1.6, 1.7, 1.13
ENGS 116 Lecture 2
2
Review

Task of Computer Designers
– Determine which attributes are important for a new machine
– Design a machine to maximize performance without violating
cost/power/functionality constraints
 3 Components of “Architecture”
– Instruction set design
– Organization
– Hardware
ENGS 116 Lecture 2
3
Benchmarking Games
 Different configurations used to run the same workload on two
systems.
 Compiler customized to optimize the workload.
 Workload arbitrarily picked to skew results.
 Test specification written to be biased toward one machine.
ENGS 116 Lecture 2
4
Design benchmarks for:
 Industrial and design
 Consumer Electronics
 Networking, routers
 Office applications
 Telecommunications
 Weapon systems
ENGS 116 Lecture 2
5
Execution time
 Weighted arithmetic mean: sum over execution time of all programs
run, times their relative frequencies
 Normalized execution time: take a reference machine, set it to 1, then
compute normalized execution times for others based on this machine
 Geometric mean of normalized execution time (reference computer
becomes irrelevant, ratios can arbitrarily be compared)
ENGS 116 Lecture 2
6
Amdahl’s Law
Execution time after improvement =
 Exec.timeaffectedby improvemen

t

+ Exec.timeunaffected
Amountof improvemen
t


Speedup =
Execution time before improvement
Execution time after improvement
 Make the common case fast
ENGS 116 Lecture 2
7
Amdahl’s Law
Speedup due to enhancement E:
Speedup(E)
=
ExTime w/o E
ExTime w/ E

Performance w/ E
Performance w/o E
Suppose that enhancement E accelerates a fraction F of the task
by a factor S, and the remainder of the task is unaffected:
ExTime (E) =
Speedup (E) =
ENGS 116 Lecture 2
8
Amdahl’s Law

Fract ionenhanced 
ExT imenew = ExT imeold  ( 1  Fract ionenhanced ) +

Speedup enhanced 

Speedup overall =
ExT imeold
1
=
Fract ionenhanced
ExT imenew
( 1  Fract ionenhanced ) +
Speedup enhanced
ENGS 116 Lecture 2
9
Amdahl’s Law
Example: Floating point instructions improved to run 2X,
but only 10% of actual instructions are FP
ExTime new=
Speedup overall=
ENGS 116 Lecture 2
10
Corollary: Make The Common Case Fast
 All instructions require an instruction fetch, only a fraction require a
data fetch/store.
– Optimize instruction access over data access
 Programs exhibit locality.
Spatial Locality
Temporal Locality
 Access to small memories is faster.
– Provide a storage hierarchy such that the most frequent accesses are to
the smallest (closest) memories.
Disk/Tape
Registers Cache
Memory
ENGS 116 Lecture 2
11
Metrics of Performance
Application
Answers per month
Operations per second
Programming
Language
Compiler
ISA
Millions of instructions per second: MIPS
Millions of FP operations per second: MFLOPS
Datapath
Control
Function Units
Transistors
Wires
Megabytes per second
Pins
Cycles per second (clock rate)
ENGS 116 Lecture 2
12
Marketing Metrics
MIPS =
Instr Count Clock Rate
=
6
Time 10
CPI 10 6
 Machines with different instruction sets?
 Programs with different instruction mixes?
– Dynamic frequency of instructions
 Uncorrelated with performance
FP Operations
MFLOPS =
Time 10 6
 Machine dependent
 Often not where time is spent
ENGS 116 Lecture 2
13
Aspects of CPU Performance
CP U t ime=
Seconds Inst rcount
Cycles
Seconds
=


P rogram P rogram Inst rcount
Cycle
Instr. Count
Program
Compiler
Instruction Set
Organization
Technology
CPI
Clock Rate
ENGS 116 Lecture 2
14
Aspects of CPU Performance
CP U t ime=
Seconds Inst rcount
Cycles
Seconds
=


P rogram P rogram Inst rcount
Cycle
Program
Instr. Count
X
Compiler
X
(X)
Instruction Set
X
X
Organization
Technology
CPI
X
Clock Rate
X
X
ENGS 116 Lecture 2
15
Cycles Per Instruction
Average Cycles per Instruction
CPI = (CPU Time  Clock Rate) / Instruction Count
Cycles / Instruction Count
CPU time = Cycle Time 
n
 CP I  I
i
i
i=1
Instruction Frequency
n
CP I =  CP Ii  Fi
whereFi =
i=1
Invest resources where time is spent!
Ii
Inst rCount
=
ENGS 116 Lecture 2
16
Example: Calculating CPI
Base Machine (Reg / Reg)
Op
FreqCycles
CPI (i)
ALU
50%
1
.5
Load
20%
2
.4
Store
10%
2
.2
Branch
20%
2
.4
1.5
Typical Mix
(% Time)
(33%)
(27%)
(13%)
(27%)
ENGS 116 Lecture 2
17
Example
Want to add register / memory operations
- One source operand in memory
- One source operand in register
- Cycle count of 2
Side effect: Branch cycle count will increase to 3.
What fraction of the loads must be eliminated for this to pay off?
Base Machine (Reg / Reg)
Op
ALU
Load
Store
Branch
Freq
50%
20%
10%
20%
Cycles
1
2
2
2
ENGS 116 Lecture 2
18
Example Solution
Exec Time = Instruction Count  CPI  Clock
Op
Freq
ALU
.50
Load
.20
Store
.10
Branch
.20
Reg/Mem
1.00
Cycles
1
2
2
2
1.5
CPI
.5
.4
.2
.4
Freq
Cycles CPI
ENGS 116 Lecture 2
19
Example Solution
Exec Time = Instruction Count  CPI  Clock
Op
Freq
ALU
.50
Load
.20
Store
.10
Branch
.20
Reg/Mem
1.00
Cycles
1
2
2
2
1.5
CPI
.5
.4
.2
.4
Freq
.5 – X
.2 – X
.1
.2
X
1–X
Cycles
CPI
1
.5 – X
2
.4 – 2X
2
.2
3
.6
2
2X
(1.7 – X) /(1 – X)
Cycles New
Instructions New
CPINew must be normalized to new instruction frequency
ENGS 116 Lecture 2
20
Example Solution
Exec Time = Instruction Count  CPI  Clock
Op
Freq
ALU
.50
Load
.20
Store
.10
Branch
.20
Reg/Mem
1.00
Cycles
1
2
2
2
1.5
CPI
.5
.4
.2
.4
Freq
.5 – X
.2 – X
.1
.2
X
1–X
Cycles
CPI
1
.5 – X
2
.4 – 2X
2
.2
3
.6
2
2X
(1.7 – X) / (1 – X)
Instr CntOld  CPIOld  ClockOld = Instr CntNew  CPINew  ClockNew
ENGS 116 Lecture 2
21
Example Solution
Exec Time = Instruction Count  CPI  Clock
Op
Freq
Cycles CPI
Freq
ALU
.50
1
.5
.5 – X
Load
.20
2
.4
.2 – X
Store
.10
2
.2
.1
Branch
.20
2
.4
.2
Reg/Mem
X
1.00
1.5
1–X
Instr CntOld  CPIOld  ClockOld
Cycles
CPI
1
.5 – X
2
.4 – 2X
2
.2
3
.6
2
2X
(1.7 – X) / (1 – X)
= Instr CntNew  CPINew  ClockNew
1.00
 1.5
= (1 – X)  (1.7 – X) / (1 – X)
1.5
=
1.7 – X
0.2
=
X
ALL loads must be eliminated for this to be a win!