CpE 242 Introduction to Computer Architecture Lecture 2

Download Report

Transcript CpE 242 Introduction to Computer Architecture Lecture 2

CpE 442
Introduction to Computer Architecture
The Role of Performance
Instructor: H. H. Ammar
CpE442 Lec2.1
Introduction to Computer Architectures
Overview of Today’s Lecture:
The Role of Performance
° Review from Last Lecture
° Definition and Measures of Performance
° Summarizing Performance and Performance
Pitfalls
CpE442 Lec2.2
Introduction to Computer Architectures
Review: What is "Computer Architecture"
° Co-ordination of
levels of abstraction
Application
Operating
System
Compiler
Instr. Set Proc. I/O system
Instruction Set
Architecture
Digital Design
Circuit Design
° Under a set of rapidly changing Forces
CpE442 Lec2.3
Introduction to Computer Architectures
Review: Levels of Representation
temp = v[k];
High Level Language
Program
v[k] = v[k+1];
v[k+1] = temp;
Compiler
lw $15,
lw $16,
sw $16,
sw $15,
Assembly Language
Program
Assembler
Machine Language
Program
0000
1010
1100
0101
1001
1111
0110
1000
1100
0101
1010
0000
0110
1000
1111
1001
0($2)
4($2)
0($2)
4($2)
1010
0000
0101
1100
1111
1001
1000
0110
0101
1100
0000
1010
1000
0110
1001
1111
Machine Interpretation
Control Signal
Specification
CpE442 Lec2.4
Introduction to Computer Architectures
Review: Levels of Organization
SPARCstation 20
Computer
SPARC
Processor
CpE442 Lec2.5
Memory
Devices
Control
Input
Datapath
Output
Introduction to Computer Architectures
Review: Summary from Last Lecture
° All computers consist of five components
• Processor: (1) datapath and (2) control
• (3) Memory
• (4) Input devices and (5) Output devices
° Not all “memory” are created equally
• Cache: fast (expensive) memory are placed closer to the processor
• Main memory: less expensive memory--we can have more
° Input and output (I/O) devices has the messiest organization
• Wide range of speed: graphics vs. keyboard
• Wide range of requirements: speed, standard, cost ... etc.
• Least amount of research (so far)
CpE442 Lec2.6
Introduction to Computer Architectures
Processor Performance
120
IBM Power 2/590
P
e 100
r
f
o
80
r
60
DEC AXP
3000
m
a
n
c
HP 9000/750
1.54X/yr
40
IBM
20
e
Sun-4/260
MIPS M/120
MIPS M2000 RS6000/540
1.35X/yr
0
1987
1988
1989
1990
1991
1992
1993
Year
CpE442 Lec2.7
Introduction to Computer Architectures
Metrics of performance
Answers per month
Operations per second
Application
Programming
Language
Compiler
ISA
(millions) of Instructions per second – MIPS
(millions) of (F.P.) operations per second – MFLOP/s
Datapath
Control
Megabytes per second
Function Units
Transistors Wires Pins
CpE442 Lec2.8
Cycles per second (clock rate)
Introduction to Computer Architectures
Relating Processor Metrics
° CPU execution time = CPU clock cycles/pgm X clock
cycle time
° or CPU execution time = CPU clock cycles/pgm ÷
clock rate
° CPU clock cycles/pgm = Instructions/pgm X CPI the
avg. clock cycles per instruction
° or CPI = CPU clock cycles/pgm ÷ Instructions/pgm
° CPI tells us something about the Instruction Set
Architecture, the Implementation of that architecture,
and the program measured
CpE442 Lec2.9
Introduction to Computer Architectures
Aspects of CPU Performance
CPU time
= Seconds
Program
instr. count
= Instructions x Cycles
Program
CPI
x Seconds
Instruction
Cycle
clock rate
Program
Compiler
Instr. Set Arch.
Organization
Technology
CpE442 Lec2.10
Introduction to Computer Architectures
Aspects of CPU Performance
CPU time
= Seconds
Program
instr count
= Instructions x Cycles
Program
CPI
Program
X
(x)
Compiler
X
(x)
Instr. Set.
X
X
Organization
Technology
CpE442 Lec2.11
X
x Seconds
Instruction
Cycle
clock rate
X
X
Introduction to Computer Architectures
Organizational Trade-offs
Application
Programming
Language
Compiler
ISA
Datapath
Control
Instruction Mix
CPI
Function Units
Transistors Wires Pins
CpE442 Lec2.12
Cycle Time
Introduction to Computer Architectures
CPI
“Average cycles per instruction”
CPI = (CPU Time * Clock Rate) / Instruction Count
= Clock Cycles / Instruction Count
n
S
CPU time = ClockCycleTime *
i =1
n
CPI =
S CPI i
i =1
CPI * I
i
i
"instruction frequency"
*
F
i
where F i =
I i
Instruction Count
Invest Resources where time is Spent!
CpE442 Lec2.13
Introduction to Computer Architectures
Example
Base Machine (Reg / Reg)
Op
Freq(Fi)
CPI(i)
ALU
50%
1
Load
20%
2
Store
10%
2
Branch
20%
2
Typical Mix
.5
.4
.2
.4
1.5
% Time
33%
27%
13%
27%
The CPI = 1.5 cycles per instruction
Assignment 1: Turn in the solution of the following problems
from the text book By Thursday September 4,
Chapter 2, Exercises Section, problems number
2.1, 2.2, 2.3, 2.4, 2.10, 2.11, 2.12, 2.13, and 2.15
CpE442 Lec2.14
Introduction to Computer Architectures
Assume a program of 1 million instructions, Compare the
performance of
Base Machine (B) with the above CPI, 1 GHZ clock, and
Enhanced Machine (E) with 1.333 GHZ and a one cycle increase
for L/S
And branch instructions
Enhanced Machine (Reg / Reg)
Op
Freq CPI(i)
% Time
ALU 50%
1
.5
25%
Load 20%
3
.6
30%
Store 10%
3
.3
15%
Branch20% 3
.6
30%
2.0
CpE442 Lec2.15
Introduction to Computer Architectures
Perf. of machine X = 1 / exec. Time of prog
on machine X
Perf. of E / Perf. of B = exec. Time of B /
exec. Time of E = 1.5 * 1 / 2 * 0.75
=1
Performance of B is similar to that of E,
No gain in performance
CpE442 Lec2.16
Introduction to Computer Architectures
Marketing Metrics
MIPS = Instruction Count / (Time * 10^6)
= Clock Rate / (CPI * 10^6)
•machines with different instruction sets ?
•programs with different instruction mixes ?
• dynamic frequency of instructions
• uncorrelated with performance
MFLOP/S= FP Operations / (Time * 10^6)
•machine dependent
•often not where time is spent
CpE442 Lec2.17
Introduction to Computer Architectures
Example showing why MIPS can fail
Compare performance with Compilers 1 and 2 for a
given program on a given machine
Instruction Count in Billion for
instruction classes
A B C
Compiler 1
5 1
1
Compiler 2
10 1
1
clock cycles
1 2
3
Clock cycles using compiler1 = 10 Billion
Clock cycles using compiler2 = 15 Billion
assuming 1GHZ clock
CPU Time 1 = 10 secs
CPU Time 2 = 15 secs
yet the MIPS rating is
MIPS 1 = (instr. Count/cpu time in sec x 10^6) = 700
MIPS 2 = 800
CpE442 Lec2.18
Introduction to Computer Architectures
Why Do Benchmarks?
° How we evaluate differences
• Different systems
• Changes to a single system
° Provide a target
• Benchmarks should represent large class of important
programs
• Improving benchmark performance should help many
programs
° For better or worse, benchmarks shape a field
° Good ones accelerate progress
• good target for development
° Bad benchmarks hurt progress
• help real programs v. sell machines/papers?
• Inventions that help real programs don’t help benchmark
CpE442 Lec2.19
Introduction to Computer Architectures
Programs to Evaluate Processor Performance
° (Toy) Benchmarks
• 10-100 line
• e.g.,: sieve, puzzle, quicksort
° Synthetic Benchmarks
• attempt to match average frequencies of real
workloads
• e.g., Whetstone, dhrystone
° Kernels
• Time critical excerpts Real programs
• e.g., gcc, spice
CpE442 Lec2.20
Introduction to Computer Architectures
Successful Benchmark: SPEC
° EE Times + 5 companies band together to
perform Systems Performance Evaluation
Committee (SPEC) in 1988:
Sun, MIPS, HP, Apollo, DEC
° Create standard list of programs, inputs,
reporting: some real programs, includes OS
calls, some I/O
CpE442 Lec2.21
Introduction to Computer Architectures
SPEC first round
° First round 1989; 10 programs, single number to summarize
performance
° One program: 99% of time in single line of code
° New front-end compiler could improve dramatically
800
700
SPEC Perf
600
500
400
300
200
100
tomcatv
fpppp
matrix300
eqntott
li
nasa7
doduc
spice
epresso
gcc
0
B ench mark
CpE442 Lec2.22
Introduction to Computer Architectures
SPEC second round, SPEC95
• 8 integer benchmarks in C and 10 floating pt benchmarks in Fortran
CpE442 Lec2.23
Introduction to Computer Architectures
Amdahl's Law
Speedup due to enhancement E:
ExTime w/o E
Speedup(E) = -------------------ExTime w/ E
Performance w/ E
=
--------------------Performance w/o E
Suppose that enhancement E accelerates a fraction F of the task
by a factor S and the remainder of the task is unaffected then,
ExTime(with E) = ((1-F) + F/S) X ExTime(without E)
Speedup(with E) = ExTime(without E) ÷
((1-F) + F/S) X ExTime(without E)
<= 1/(1-F) speed up is bounded by this factor
CpE442 Lec2.24
Introduction to Computer Architectures
Performance Evaluation Summary
CPU time
= Seconds
Program
= Instructions x Cycles
Program
x Seconds
Instruction
Cycle
° Time is the measure of computer performance!
° Good products created when have:
• Good benchmarks
• Good ways to summarize performance
° If not good benchmarks and summary, then choice between
improving product for real programs vs. improving product to
get more sales=> sales almost always wins
° Remember Amdahl’s Law: Speedup is limited by unimproved
part of program
CpE442 Lec2.25
Introduction to Computer Architectures