Performance and Quantitative Principles
Download
Report
Transcript Performance and Quantitative Principles
ENGS 116 Lecture 2
1
Performance and Quantitative Principles
Vincent H. Berk
September 26th, 2008
Reading for today: Chapter 1.1 - 1.4, Amdahl article
Reading for Monday: Chapter 1.5 – 1.11, Mazor article
Homework for Wednesday: 1.1, 1.3, 1.6, 1.7, 1.13
ENGS 116 Lecture 2
2
Review
Task of Computer Designers
– Determine which attributes are important for a new machine
– Design a machine to maximize performance without violating
cost/power/functionality constraints
3 Components of “Architecture”
– Instruction set design
– Organization
– Hardware
ENGS 116 Lecture 2
3
Benchmarking Games
Different configurations used to run the same workload on two
systems.
Compiler customized to optimize the workload.
Workload arbitrarily picked to skew results.
Test specification written to be biased toward one machine.
ENGS 116 Lecture 2
4
Design benchmarks for:
Industrial and design
Consumer Electronics
Networking, routers
Office applications
Telecommunications
Weapon systems
ENGS 116 Lecture 2
5
Execution time
Weighted arithmetic mean: sum over execution time of all programs
run, times their relative frequencies
Normalized execution time: take a reference machine, set it to 1, then
compute normalized execution times for others based on this machine
Geometric mean of normalized execution time (reference computer
becomes irrelevant, ratios can arbitrarily be compared)
ENGS 116 Lecture 2
6
Amdahl’s Law
Execution time after improvement =
Exec.timeaffectedby improvemen
t
+ Exec.timeunaffected
Amountof improvemen
t
Speedup =
Execution time before improvement
Execution time after improvement
Make the common case fast
ENGS 116 Lecture 2
7
Amdahl’s Law
Speedup due to enhancement E:
Speedup(E)
=
ExTime w/o E
ExTime w/ E
Performance w/ E
Performance w/o E
Suppose that enhancement E accelerates a fraction F of the task
by a factor S, and the remainder of the task is unaffected:
ExTime (E) =
Speedup (E) =
ENGS 116 Lecture 2
8
Amdahl’s Law
Fract ionenhanced
ExT imenew = ExT imeold ( 1 Fract ionenhanced ) +
Speedup enhanced
Speedup overall =
ExT imeold
1
=
Fract ionenhanced
ExT imenew
( 1 Fract ionenhanced ) +
Speedup enhanced
ENGS 116 Lecture 2
9
Amdahl’s Law
Example: Floating point instructions improved to run 2X,
but only 10% of actual instructions are FP
ExTime new=
Speedup overall=
ENGS 116 Lecture 2
10
Corollary: Make The Common Case Fast
All instructions require an instruction fetch, only a fraction require a
data fetch/store.
– Optimize instruction access over data access
Programs exhibit locality.
Spatial Locality
Temporal Locality
Access to small memories is faster.
– Provide a storage hierarchy such that the most frequent accesses are to
the smallest (closest) memories.
Disk/Tape
Registers Cache
Memory
ENGS 116 Lecture 2
11
Metrics of Performance
Application
Answers per month
Operations per second
Programming
Language
Compiler
ISA
Millions of instructions per second: MIPS
Millions of FP operations per second: MFLOPS
Datapath
Control
Function Units
Transistors
Wires
Megabytes per second
Pins
Cycles per second (clock rate)
ENGS 116 Lecture 2
12
Marketing Metrics
MIPS =
Instr Count Clock Rate
=
6
Time 10
CPI 10 6
Machines with different instruction sets?
Programs with different instruction mixes?
– Dynamic frequency of instructions
Uncorrelated with performance
FP Operations
MFLOPS =
Time 10 6
Machine dependent
Often not where time is spent
ENGS 116 Lecture 2
13
Aspects of CPU Performance
CP U t ime=
Seconds Inst rcount
Cycles
Seconds
=
P rogram P rogram Inst rcount
Cycle
Instr. Count
Program
Compiler
Instruction Set
Organization
Technology
CPI
Clock Rate
ENGS 116 Lecture 2
14
Aspects of CPU Performance
CP U t ime=
Seconds Inst rcount
Cycles
Seconds
=
P rogram P rogram Inst rcount
Cycle
Program
Instr. Count
X
Compiler
X
(X)
Instruction Set
X
X
Organization
Technology
CPI
X
Clock Rate
X
X
ENGS 116 Lecture 2
15
Cycles Per Instruction
Average Cycles per Instruction
CPI = (CPU Time Clock Rate) / Instruction Count
Cycles / Instruction Count
CPU time = Cycle Time
n
CP I I
i
i
i=1
Instruction Frequency
n
CP I = CP Ii Fi
whereFi =
i=1
Invest resources where time is spent!
Ii
Inst rCount
=
ENGS 116 Lecture 2
16
Example: Calculating CPI
Base Machine (Reg / Reg)
Op
FreqCycles
CPI (i)
ALU
50%
1
.5
Load
20%
2
.4
Store
10%
2
.2
Branch
20%
2
.4
1.5
Typical Mix
(% Time)
(33%)
(27%)
(13%)
(27%)
ENGS 116 Lecture 2
17
Example
Want to add register / memory operations
- One source operand in memory
- One source operand in register
- Cycle count of 2
Side effect: Branch cycle count will increase to 3.
What fraction of the loads must be eliminated for this to pay off?
Base Machine (Reg / Reg)
Op
ALU
Load
Store
Branch
Freq
50%
20%
10%
20%
Cycles
1
2
2
2
ENGS 116 Lecture 2
18
Example Solution
Exec Time = Instruction Count CPI Clock
Op
Freq
ALU
.50
Load
.20
Store
.10
Branch
.20
Reg/Mem
1.00
Cycles
1
2
2
2
1.5
CPI
.5
.4
.2
.4
Freq
Cycles CPI
ENGS 116 Lecture 2
19
Example Solution
Exec Time = Instruction Count CPI Clock
Op
Freq
ALU
.50
Load
.20
Store
.10
Branch
.20
Reg/Mem
1.00
Cycles
1
2
2
2
1.5
CPI
.5
.4
.2
.4
Freq
.5 – X
.2 – X
.1
.2
X
1–X
Cycles
CPI
1
.5 – X
2
.4 – 2X
2
.2
3
.6
2
2X
(1.7 – X) /(1 – X)
Cycles New
Instructions New
CPINew must be normalized to new instruction frequency
ENGS 116 Lecture 2
20
Example Solution
Exec Time = Instruction Count CPI Clock
Op
Freq
ALU
.50
Load
.20
Store
.10
Branch
.20
Reg/Mem
1.00
Cycles
1
2
2
2
1.5
CPI
.5
.4
.2
.4
Freq
.5 – X
.2 – X
.1
.2
X
1–X
Cycles
CPI
1
.5 – X
2
.4 – 2X
2
.2
3
.6
2
2X
(1.7 – X) / (1 – X)
Instr CntOld CPIOld ClockOld = Instr CntNew CPINew ClockNew
ENGS 116 Lecture 2
21
Example Solution
Exec Time = Instruction Count CPI Clock
Op
Freq
Cycles CPI
Freq
ALU
.50
1
.5
.5 – X
Load
.20
2
.4
.2 – X
Store
.10
2
.2
.1
Branch
.20
2
.4
.2
Reg/Mem
X
1.00
1.5
1–X
Instr CntOld CPIOld ClockOld
Cycles
CPI
1
.5 – X
2
.4 – 2X
2
.2
3
.6
2
2X
(1.7 – X) / (1 – X)
= Instr CntNew CPINew ClockNew
1.00
1.5
= (1 – X) (1.7 – X) / (1 – X)
1.5
=
1.7 – X
0.2
=
X
ALL loads must be eliminated for this to be a win!