Lecture 1: Course Introduction and Overview

Download Report

Transcript Lecture 1: Course Introduction and Overview

Lectures 1: Review of Technology
Trends and Cost/Performance
Prof. David A. Patterson
Computer Science 252
Spring 1998
DAP.S98 1
Original
Big Fishes Eating Little Fishes
DAP.S98 2
1988 Computer Food Chain
Mainframe
Supercomputer
Minisupercomputer
Work- PC
Ministation
computer
Massively Parallel
Processors
DAP.S98 3
Massively Parallel Processors
Minisupercomputer
Minicomputer
1998 Computer Food Chain
Mainframe
Server
Supercomputer
Work- PC
station
Now who is eating whom?
DAP.S98 4
Why Such Change in 10 years?
• Performance
– Technology Advances
» CMOS VLSI dominates older technologies (TTL, ECL) in
cost AND performance
– Computer architecture advances improves low-end
» RISC, superscalar, RAID, …
• Price: Lower costs due to …
– Simpler development
» CMOS VLSI: smaller systems, fewer components
– Higher volumes
» CMOS VLSI : same dev. cost 10,000 vs. 10,000,000 units
– Lower margins by class of computer, due to fewer services
• Function
– Rise of networking/local interconnection technology
DAP.S98 5
Technology Trends:
Microprocessor Capacity
100000000
“Graduation Window”
Alpha 21264: 15 million
Pentium Pro: 5.5 million
PowerPC 620: 6.9 million
Alpha 21164: 9.3 million
Sparc Ultra: 5.2 million
10000000
Moore’s Law
Pentium
i80486
Transistors
1000000
i80386
i80286
100000
CMOS improvements:
• Die size: 2X every 3 yrs
• Line width: halve / 7 yrs
i8086
10000
i8080
i4004
1000
1970
1975
1980
1985
1990
1995
2000
Year
DAP.S98 6
Memory Capacity
(Single Chip DRAM)
size
1000000000
100000000
Bits
10000000
1000000
100000
10000
1000
1970
1975
1980
1985
1990
1995
year
1980
1983
1986
1989
1992
1996
2000
2000
size(Mb)
cyc time
0.0625 250 ns
0.25
220 ns
1
190 ns
4
165 ns
16
145 ns
64
120 ns
256
100 ns
Year
DAP.S98 7
Technology Trends
(Summary)
Capacity
Speed (latency)
Logic
2x in 3 years
2x in 3 years
DRAM
4x in 3 years
2x in 10 years
Disk
4x in 3 years
2x in 10 years
DAP.S98 8
Processor Performance
Trends
1000
Supercomputers
100
Mainframes
10
Minicomputers
Microprocessors
1
0.1
1965
1970
1975
1980
1985
1990
1995
2000
Year
DAP.S98 9
Processor Performance
(1.35X before, 1.55X now)
1200
1000
DEC Alpha 21264/600
1.54X/yr
800
600
DEC Alpha 5/500
400
200
0
DEC Alpha 5/300
DEC
HP
IBM
AXP/
SunMIPSMIPS
9000/
DEC Alpha 4/266
-4/ M M/ RS/ 750 500
6000
IBM POWER 100
260 2000 120
87 88 89 90 91 92 93 94 95 96 97
DAP.S98 10
Performance Trends
(Summary)
• Workstation performance (measured in Spec
Marks) improves roughly 50% per year
(2X every 18 months)
• Improvement in cost performance estimated
at 70% per year
DAP.S98 11
Measurement and Evaluation
Architecture is an iterative process:
• Searching the space of possible designs
• At all levels of computer systems
Design
Analysis
Creativity
Cost /
Performance
Analysis
Good Ideas
Bad Ideas
Mediocre Ideas
DAP.S98 12
Computer Architecture Topics
Input/Output and Storage
Disks, WORM, Tape
Emerging Technologies
Interleaving
Bus protocols
DRAM
Memory
Hierarchy
Coherence,
Bandwidth,
Latency
L2 Cache
L1 Cache
VLSI
Instruction Set Architecture
RAID
Addressing,
Protection,
Exception Handling
Pipelining, Hazard Resolution,
Superscalar, Reordering,
Prediction, Speculation,
Vector, DSP
Pipelining and Instruction
Level Parallelism
DAP.S98 13
Computer Architecture Topics
P M
P M
S
°°°
P M
P M
Interconnection Network
Processor-Memory-Switch
Multiprocessors
Networks and Interconnections
Shared Memory,
Message Passing,
Data Parallelism
Network Interfaces
Topologies,
Routing,
Bandwidth,
Latency,
Reliability
DAP.S98 14
CS 252 Course Focus
Understanding the design techniques, machine
structures, technology factors, evaluation
methods that will determine the form of
computers in 21st Century
Technology
Parallelism
Programming
Languages
Applications
Computer Architecture:
• Instruction Set Design
• Organization
• Hardware
Operating
Systems
Measurement &
Evaluation
Interface Design
(ISA)
History
DAP.S98 15
Topic Coverage
Textbook: Hennessy and Patterson, Computer
Architecture: A Quantitative Approach, 2nd Ed., 1996.
• 1.5 weeks Review: Fundamentals of Computer Architecture (Ch. 1),
Instruction Set Architecture (Ch. 2), Pipelining (Ch. 3)
•
•
•
•
•
•
•
1 week: Pipelining and Instructional Level Parallelism (Ch. 4)
2.5 weeks: Vector Processors and DSPs (Appendix B)
1 week: Memory Hierarchy (Chapter 5)
1.5 weeks: Input/Output and Storage (Chapter 6)
1.5 weeks: Networks and Interconnection Technology (Chapter 7)
1.5 weeks: Multiprocessors (Ch. 8 + Culler book draft Chapter 1)
Research Guest Lectures: Reconfigurable MPer(“BRASS”),
DRAM+MPer(“IRAM”), Systems of Systems (“Millennium”)
DAP.S98 16
CS252: Staff
Instructor: David A. Patterson
Office: 635 Soda Hall, 642-6587 patterson@cs
Office Hours: Wed 3:30-4:30 or by appt.
(Contact Tim Ryan, 643-4014, tryan@cs, 634 Soda )
T. A:
Joe Gebis
Office: ?? Soda Hall, 642-?? gebis @eecs
TA Office Hours
Class:
TBD
Wed, Fri 2:10:00 - 3:30:00 203 McLaughlin
Text:
Computer Architecture: A Quantitative Approach,
Second Edition (1996) (•second printing)
Web page: http://http.cs.berkeley.edu/~patterson/252/
Lectures available online <11:30AM day of lecture
Newsgroup: ucb.class.c252
DAP.S98 17
Lecture style
•
•
•
•
•
•
•
1-Minute Review
20-Minute Lecture
5- Minute Administrative Matters
25-Minute Lecture
5-Minute Break (water, stretch)
25-Minute Lecture
Instructor will come to class early & stay after to
answer questions
Attention
20 min.
Break “In Conclusion, ...”
Time
DAP.S98 18
Grading
• 30% Homeworks (work in pairs)
• 30% Examinations (2 Midterms)
• 30% Research Project (work in pairs)
–
–
–
–
–
–
–
–
–
Transition from undergrad to grad student
Berkeley wants you to succeed, but you need to show initiative
pick topic
meet 3 times with faculty/TA to see progress
give oral presentation
give poster session
written report like conference paper
3 weeks work full time for 2 people
Opportunity to do “research in the small” to help make
transition from good student to research colleague
• 10% Class Participation
DAP.S98 19
Course Style
• Reduce the pressure of taking quizes
–
–
–
–
Only 2 Graded Quizes: Wednesday Mar. 4 and Wed. Apr. 22
Our goal: test knowledge vs. speed writing
3 hrs to take 1.5-hr test (5:30-8:30 PM, Sibley Auditorium)
Both mid-term quizes can bring summary sheet
» Transfer ideas from book to paper
– Last chance Q&A: during class time day of exam
• Students/Staff meet over free pizza/drinks at La Vals:
Wed Mar. 4 (8:30 PM) and Wed Apr 22 (8:30 PM)
DAP.S98 20
Course Style
• Everything is on the course Web page:
www.cs.berkeley.edu/~pattrsn/252S98/index.html
• Notes:
– ASUC said today that the books would be in in less than 1 week.
They can also be found in local book stores (Cody's and a few in
Barnes and Noble), as well as at WWW bookstores.
– The Handouts section of the CS152 homepage from Fall 1997
includes the midterms from this semester and as well as pointers to
past exams. Solutions are included.
• Schedule:
–
–
–
–
–
–
2 Graded Quizes: Wednesday Mar. 4 and Wed. Apr. 22
Project Reviews: Fri. Feb 25, Wed. Apr 1, Wed. Apr 15
Oral Presentations: Thu/Fri April 30/May 1 1-7PM/1-5PM
252 Poster Session: Wed May 6
252 Last lecture: Fri May 8
Project Papers/URLs due: Mon May 11
• Project Suggestions
DAP.S98 21
Related Courses
CS 152
Strong
Prerequisite
How to build it
Implementation details
Basic knowledge of the
organization of a computer
is assumed!
CS 252
Why, Analysis,
Evaluation
CS 258
Parallel Architectures,
Languages, Systems
CS 250
Integrated Circuit Technology
from a computer-organization viewpoint
DAP.S98 22
Coping with CS 252
• Spring 95 CS 252 = my worst teaching experience
• Too many students with too varied background?
• 60 students:
– To give proper attention to projects (as well as homeworks and
quizes), I can handle up to 36 students
• Limiting Number of Students
–
–
–
–
First priority is first year CS/ EECS grad students
Second priority is N-th year CS/ EECS grad students
Third priority is College of Engineering grad students
Fourth priority is CS/EECS undegraduate seniors
(Note: 1 graduate course unit = 2 undergraduate course units)
– All other categories
• If not this semester, 252 is offered regularily (Fall)
DAP.S98 23
Coping with CS 252
• Students with too varied background?
– In past, CS grad students took written prelim exams on
undergraduate material in hardware, software, and theory
– 1st 5 weeks reviewed background, helped 252, 262, 270
– Prelims were dropped => some unprepared for CS 252?
• In class exam on Wednesday January 28
– Doesn’t affect grade, only admission into class
– 2 grades: Admitted or audit/take CS 152 1st
– Improve your experience if recapture common background
• Review: Chapters 1- 3, CS 152 home page, maybe
“Computer Organization and Design (COD)2/e”
– Chapters 1 to 8 of COD if never took prerequisite
– If did take a class, be sure COD Chapters 2, 6, 7 are familiar
– Copies in Bechtel Library on 2-hour reserve
DAP.S98 24
Computer Engineering
Methodology
Technology
Trends
DAP.S98 25
Computer Engineering
Methodology
Evaluate Existing
Systems for
Bottlenecks
Benchmarks
Technology
Trends
DAP.S98 26
Computer Engineering
Methodology
Evaluate Existing
Systems for
Bottlenecks
Benchmarks
Technology
Trends
Simulate New
Designs and
Organizations
Workloads
DAP.S98 27
Computer Engineering
Methodology
Implementation
Complexity
Evaluate Existing
Systems for
Bottlenecks
Benchmarks
Technology
Trends
Implement Next
Generation System
Simulate New
Designs and
Organizations
Workloads
DAP.S98 28
Measurement Tools
• Benchmarks, Traces, Mixes
• Hardware: Cost, delay, area, power estimation
• Simulation (many levels)
– ISA, RT, Gate, Circuit
• Queuing Theory
• Rules of Thumb
• Fundamental “Laws”/Principles
DAP.S98 29
The Bottom Line:
Performance (and Cost)
Plane
DC to Paris
Speed
Passengers
Throughput
(pmph)
Boeing 747
6.5 hours
610 mph
470
286,700
BAD/Sud
Concodre
3 hours
1350 mph
132
178,200
• Time to run the task (ExTime)
– Execution time, response time, latency
• Tasks per day, hour, week, sec, ns … (Performance)
– Throughput, bandwidth
DAP.S98 30
The Bottom Line:
Performance (and Cost)
"X is n times faster than Y" means
ExTime(Y)
--------ExTime(X)
=
Performance(X)
--------------Performance(Y)
• Speed of Concorde vs. Boeing 747
• Throughput of Boeing 747 vs. Concorde
DAP.S98 31
Amdahl's Law
Speedup due to enhancement E:
ExTime w/o E
Speedup(E) = ------------ExTime w/ E
=
Performance w/ E
------------------Performance w/o E
Suppose that enhancement E accelerates a fraction F
of the task by a factor S, and the remainder of the
task is unaffected
DAP.S98 32
Amdahl’s Law
ExTimenew = ExTimeold x (1 - Fractionenhanced) + Fractionenhanced
Speedupenhanced
Speedupoverall =
ExTimeold
ExTimenew
1
=
(1 - Fractionenhanced) + Fractionenhanced
Speedupenhanced
DAP.S98 33
Amdahl’s Law
• Floating point instructions improved to run 2X;
but only 10% of actual instructions are FP
ExTimenew =
Speedupoverall =
DAP.S98 34
Amdahl’s Law
• Floating point instructions improved to run 2X;
but only 10% of actual instructions are FP
ExTimenew = ExTimeold x (0.9 + .1/2) = 0.95 x ExTimeold
Speedupoverall =
1
0.95
=
1.053
DAP.S98 35
Metrics of Performance
Application
Answers per month
Operations per second
Programming
Language
Compiler
ISA
(millions) of Instructions per second: MIPS
(millions) of (FP) operations per second: MFLOP/s
Datapath
Control
Function Units
Transistors Wires Pins
Megabytes per second
Cycles per second (clock rate)
DAP.S98 36
Aspects of CPU Performance
CPU time
= Seconds
= Instructions x
Program
Program
CPI
Program
Compiler
X
(X)
Inst. Set.
X
X
Technology
x Seconds
Instruction
Inst Count
X
Organization
Cycles
X
Cycle
Clock Rate
X
X
DAP.S98 37
Cycles Per Instruction
“Average Cycles per Instruction”
CPI = (CPU Time * Clock Rate) / Instruction Count
= Cycles / Instruction Count
n
CPU time = CycleTime *
•CPI
i =1
* I
i
i
“Instruction Frequency”
n
CPI =
•CPI
i =1
*i F
i
where F
=i
I
i
Instruction Count
Invest Resources where time is Spent!
DAP.S98 38
Example: Calculating CPI
Base Machine (Reg / Reg)
Op
Freq Cycles CPI(i)
ALU
50%
1
.5
Load
20%
2
.4
Store
10%
2
.2
Branch
20%
2
.4
1.5
(% Time)
(33%)
(27%)
(13%)
(27%)
Typical Mix
DAP.S98 39
SPEC: System Performance
Evaluation Cooperative
• First Round 1989
– 10 programs yielding a single number (“SPECmarks”)
• Second Round 1992
– SPECInt92 (6 integer programs) and SPECfp92 (14 floating point
programs)
» Compiler Flags unlimited. March 93 of DEC 4000 Model 610:
spice: unix.c:/def=(sysv,has_bcopy,”bcopy(a,b,c)=
memcpy(b,a,c)”
wave5: /ali=(all,dcom=nat)/ag=a/ur=4/ur=200
nasa7: /norecu/ag=a/ur=4/ur2=200/lc=blas
• Third Round 1995
– new set of programs: SPECint95 (8 integer programs) and
SPECfp95 (10 floating point)
– “benchmarks useful for 3 years”
– Single flag setting for all programs: SPECint_base95,
SPECfp_base95
DAP.S98 40
How to Summarize Performance
• Arithmetic mean (weighted arithmetic mean)
tracks execution time: •
(Ti)/n or •
(Wi*Ti)
• Harmonic mean (weighted harmonic mean) of
rates (e.g., MFLOPS) tracks execution time:
n/•
(1/Ri) or n/•
(Wi/Ri)
• Normalized execution time is handy for scaling
performance (e.g., X times faster than
SPARCstation 10)
• But do not take the arithmetic mean of
normalized execution time,
use the geometrici)^1/n)
DAP.S98 41
5 minute Class Break
• 80 minutes straight is too long for me to
lecture (2:10:00 – 3:30:00):
–
–
–
–
–
–
–
1 minute:
20 minute
3 minutes:
25 minutes:
5 minutes:
25 minutes:
1 minute:
review last time & motivate this lecture
lecture
discuss class manangement
lecture
break
lecture
summary of today’s important topics
DAP.S98 42
SPEC First Round
• One program: 99% of time in single line of code
• New front-end compiler could improve dramatically
800
700
500
400
300
200
100
tomcatv
fpppp
matrix300
eqntott
li
nasa7
doduc
spice
epresso
0
gcc
SPEC Perf
600
Benchmark
DAP.S98 43
Impact of Means on
SPECmark89 for IBM 550
Ratio to VAX:
Program
gcc
espresso
spice
doduc
nasa7
li
eqntott
matrix300
fpppp
tomcatv
Mean
Time:
Before After Before After
30
29
49
51
35
34
65
67
47
47
510 510
46
49
41
38
78 144
258 140
34
34
183 183
40
40
28
28
78 730
58
6
90
87
34
35
33 138
20
19
54
72
124 108
Geometric
Ratio
1.33
Ratio
1.16
Weighted Time:
Before After
8.91
9.22
7.64
7.86
5.69
5.69
5.81
5.45
3.43
1.86
7.86
7.86
6.68
6.68
3.43
0.37
2.97
3.07
2.01
1.94
54.42 49.99
Arithmetic
Weighted
Arith.
Ratio
1.09
DAP.S98 44
Performance Evaluation
• “For better or worse, benchmarks shape a field”
• Good products created when have:
– Good benchmarks
– Good ways to summarize performance
• Given sales is a function in part of performance
relative to competition, investment in improving
product as reported by performance summary
• If benchmarks/summary inadequate, then choose
between improving product for real programs vs.
improving product to get more sales;
Sales almost always wins!
• Execution time is the measure of computer
performance!
DAP.S98 45
Integrated Circuits Costs
IC cost = Die cost + Testing cost + Packaging cost
Final test yield
Die cost =
Wafer cost
Dies per Wafer * Die yield
Dies per wafer = š* ( Wafer_diam / 2)2 – š* Wafer_diam – Test dies
Die Area
¦ 2 * Die Area
{
Die Yield = Wafer yield * 1 +
Defects_per_unit_area * Die_Area

Die Cost goes roughly with die area4

}
DAP.S98 46
Real World Examples
Chip
Metal Line Wafer Defect Area Dies/ Yield Die Cost
layers width cost
/cm2 mm2 wafer
386DX
2 0.90 $900
1.0
43 360 71%
$4
486DX2
3 0.80 $1200
1.0
81 181 54%
$12
PowerPC 601 4 0.80 $1700
1.3 121 115 28%
$53
HP PA 7100 3 0.80 $1300
1.0 196
66 27%
$73
DEC Alpha
3 0.70 $1500
1.2 234
53 19%
$149
SuperSPARC 3 0.70 $1700
1.6 256
48 13%
$272
Pentium
3 0.80 $1500
1.5 296
40 9%
$417
– From "Estimating IC Manufacturing Costs,” by Linley Gwennap,
Microprocessor Report, August 2, 1993, p. 15
DAP.S98 47
Cost/Performance
What is Relationship of Cost to Price?
• Component Costs
• Direct Costs (add 25% to 40%) recurring costs: labor,
purchasing, scrap, warranty
• Gross Margin (add 82% to 186%) nonrecurring costs:
R&D, marketing, sales, equipment maintenance, rental, financing
cost, pretax profits, taxes
• Average Discount to get List Price (add 33% to 66%): volume
discounts and/or retailer markup
List Price
Average
Discount
Avg. Selling Price
Gross
Margin
Direct Cost
Component
Cost
25% to 40%
34% to 39%
6% to 8%
15% to 33%
DAP.S98 48
Chip Prices (August 1993)
• Assume purchase 10,000 units
Chip
Area
mm2
386DX
Mfg. Price Multi- Comment
cost
plier
43
$9
$31
486DX2
81
PowerPC 601 121
$35
$77
$245
$280
3.4 Intense Competition
7.0 No Competition
3.6
DEC Alpha
234 $202 $1231
6.1 Recoup R&D?
Pentium
296 $473
2.0 Early in shipments
$965
DAP.S98 49
Summary: Price vs. Cost
100%
80%
Average Discount
60%
Gross Margin
40%
Direct Costs
20%
Component Costs
0%
Mini
5
4
W/S
PC
4.7
3.5
3.8
Average Discount
2.5
3
Gross Margin
1.8
2
Direct Costs
1.5
1
Component Costs
0
Mini
W/S
PC
DAP.S98 50
Computer Architecture Is …
the attributes of a [computing] system as seen
by the programmer, i.e., the conceptual
structure and functional behavior, as distinct
from the organization of the data flows and
controls the logic design, and the physical
implementation.
Amdahl, Blaaw, and Brooks, 1964
SOFTWARE
DAP.S98 51
Computer Architecture’s
Changing Definition
• 1950s to 1960s: Computer Architecture Course
Computer Arithmetic
• 1970s to mid 1980s: Computer Architecture Course
Instruction Set Design, especially ISA appropriate
for compilers
• 1990s: Computer Architecture Course
Design of CPU, memory system, I/O system,
Multiprocessors
DAP.S98 52
Instruction Set Architecture (ISA)
software
instruction set
hardware
DAP.S98 53
Interface Design
A good interface:
• Lasts through many implementations (portability,
compatability)
• Is used in many differeny ways (generality)
• Provides convenient functionality to higher levels
• Permits an efficient implementation at lower levels
use
use
use
Interface
imp 1
time
imp 2
imp 3
DAP.S98 54
Evolution of Instruction Sets
Single Accumulator (EDSAC 1950)
Accumulator + Index Registers
(Manchester Mark I, IBM 700 series 1953)
Separation of Programming Model
from Implementation
High-level Language Based
(B5000 1963)
Concept of a Family
(IBM 360 1964)
General Purpose Register Machines
Complex Instruction Sets
(Vax, Intel 432 1977-80)
Load/Store Architecture
(CDC 6600, Cray 1 1963-76)
RISC
(Mips,Sparc,HP-PA,IBM RS6000, . . .1987)
DAP.S98 55
Evolution of Instruction Sets
• Major advances in computer architecture are
typically associated with landmark instruction
set designs
– Ex: Stack vs GPR (System 360)
• Design decisions must take into account:
–
–
–
–
–
technology
machine organization
programming langauges
compiler technology
operating systems
• And they in turn influence these
DAP.S98 56
A "Typical" RISC
•
•
•
•
32-bit fixed format instruction (3 formats)
32 32-bit GPR (R0 contains zero, DP take pair)
3-address, reg-reg arithmetic instruction
Single address mode for load/store:
base + displacement
– no indirection
• Simple branch conditions
• Delayed branch
see: SPARC, MIPS, HP PA-Risc, DEC Alpha, IBM PowerPC,
CDC 6600, CDC 7600, Cray-1, Cray-2, Cray-3
DAP.S98 57
Example: MIPS
Register-Register
31
26 25
Op
21 20
Rs1
16 15
Rs2
11 10
6 5
Rd
0
Opx
Register-Immediate
31
26 25
Op
21 20
Rs1
16 15
0
immediate
Rd
Branch
31
26 25
Op
Rs1
21 20
16 15
Rs2/Opx
0
immediate
Jump / Call
31
26 25
Op
0
target
DAP.S98 58
Summary, #1
• Designing to Last through Trends
Capacity
•
Speed
Logic
2x in 3 years
2x in 3 years
DRAM
4x in 3 years
2x in 10 years
Disk
4x in 3 years
2x in 10 years
6yrs to graduate => 16X CPU speed, DRAM/Disk size
• Time to run the task
– Execution time, response time, latency
• Tasks per day, hour, week, sec, ns, …
– Throughput, bandwidth
• “X is n times faster than Y” means
ExTime(Y)
--------ExTime(X)
=
Performance(X)
-------------Performance(Y)
DAP.S98 59
Summary, #2
• Amdahl’s Law:
Speedupoverall =
ExTimeold
ExTimenew
1
=
(1 - Fractionenhanced) + Fractionenhanced
Speedupenhanced
• CPI Law:
CPU time
= Seconds
Program
= Instructions x
Program
Cycles
x Seconds
Instruction
Cycle
• Execution time is the REAL measure of computer
performance!
• Good products created when have:
– Good benchmarks, good ways to summarize performance
• Die Cost goes roughly with die area4
• Can PC industry support engineering/research
DAP.S98 60
investment?