Transcript lect01

Lecture 1: Course Introduction,
Technology Trends, Performance
Professor Alvin R. Lebeck
Computer Science 220
Fall 2001
Administrative
• Office Hours
Office: D304 LSRC
Hours: Mon 10:00-11:00 Thurs 2:00-3:00 or by appointment (email)
email: [email protected]
Phone: 660-6551
• Teaching Assistant
Fareed Zaffar
Office: D125 LSRC
Hours: Tuesday 10:00-11:00, Wednesday 1:00-2:00
email: [email protected]
Phone: 660-6576
© Alvin R. Lebeck 2001
CPS 220
2
Administrative (Grading)
• 30% Homeworks
– 6 Homeworks
– 5 points per day late, for first 10 days
– Always do the homework (better late than never)
• 30% Examinations (Midterm + Final)
• 30% Research Project (work in pairs)
• 10% Class Participation
• This course requires hard work.
© Alvin R. Lebeck 2001
CPS 220
3
Administrative (Continued)
• Midterm Exam: In class (75 min) Closed book
• Final Exam: (3 hours) closed book
• This is a “Quals” Course.
– Quals pass based on Midterm and Final exams only
© Alvin R. Lebeck 2001
4
Administrative (Continued)
• Course Web Page
– http://www.cs.duke.edu/courses/fall01/cps220
– Lectures posted there after class (pdf)
– Homework posted there
• Course News Group
– duke.cs.cps220
– Use it to 1) read announcements/comments on class or homework,
2) ask questions (help), 3) communicate with each other
• Need Duke CS account
– Duke ID, ACPUB account name (see HW #0)
© Alvin R. Lebeck 2001
5
SPIDER: Systems Seminar
• Systems & Architecture Seminar
– Wednesdays 3:45-5:00 in D344
– duke.cs.os-research (spider newsgroup)
• Presentations on current work
– Practice talks for conferences
– Discussion on recent papers
– Your own research
• Why you should go?
– If you want to work in Systems/Architecture…
– Good time to practice public speaking in front of friendly crowd
– Learn about current topics
© Alvin R. Lebeck 2001
6
Assignment
• Homework #0 (Background, due Thursday)
• Read Chapters 1 & 2
© Alvin R. Lebeck 2001
7
CPS 220 Course Focus
Understanding the design techniques, machine
structures, technology factors, evaluation methods that
will determine the form of computers in 21st Century
Technology
Parallelism
Programming
Languages
Applications
Computer Architecture:
• Instruction Set Design
• Organization
• Hardware
Power
Operating
Systems
© Alvin R. Lebeck 2001
Measurement &
Evaluation
CPS 220
Interface Design
(ISA)
History
8
Related Courses
Prerequisites
• CPS 104: Basic Machine Organization
• CPS 110: Basic Operating System Functions
• This course: focus on why, analysis, evaluation
– Cost/performance
– Power budget
Follow on Courses
• CPS 221: Advanced Computer Architecture II
– Parallel computer architecture
© Alvin R. Lebeck 2001
9
Computer Architecture Is …
the attributes of a [computing] system as
seen by the programmer, i.e., the
conceptual structure and functional
behavior, as distinct from the organization
of the data flows and controls, the logic
design, and the physical implementation.
Amdahl, Blaaw, and Brooks, 1964
SOFTWARE
© Alvin R. Lebeck 2001
CPS 220
10
Topic Coverage
Textbook: Hennessy and Patterson, Computer
Architecture: A Quantitative Approach, 2nd Ed.,
1995.
•
•
•
•
•
•
•
•
•
•
•
Fundamentals of Computer Architecture (Chapter 1)
Instruction Set Architecture (Chapter 2, Appendix C&D)
Pipelining (Chapter 3)
Advanced Pipelining and ILP (Chapter 4)
Memory Hierarchy (Chapter 5)
Input/Output and Storage (Chapter 6)
Networks and Interconnection Technology (Chapter 7)
Multiprocessors (Chapter 8)
Vectors (Apendix)
New Architectures/trends (papers)
Power (papers)
© Alvin R. Lebeck 2001
CPS 220
11
Computer Architecture Topics
Input/Output and Storage
Disks, WORM, Tape
Emerging Technologies
Interleaving
Bus protocols
DRAM
Memory
Hierarchy
Coherence,
Bandwidth,
Latency
L2 Cache
L1 Cache
Addressing,
Protection,
Exception Handling
VLSI
Instruction Set Architecture
Pipelining, Hazard Resolution,
Superscalar, Reordering,
Prediction, Speculation
© Alvin R. Lebeck 2001
RAID
CPS 220
Pipelining and Instruction
Level Parallelism
12
Computer Architecture Topics (CPS 221)
P M
P M
S
°°°
P M
P M
Interconnection Network
Processor-Memory-Switch
Multiprocessors
Networks and Interconnections
© Alvin R. Lebeck 2001
CPS 220
Shared Memory,
Message Passing,
Data Parallel
Network Interfaces
Topologies,
Routing,
Bandwidth,
Latency,
Reliability
13
Computer Engineering Methodology
Technology
Trends
© Alvin R. Lebeck 2001
14
Computer Engineering Methodology
Evaluate Existing
Systems for
Bottlenecks
Benchmarks
Technology
Trends
© Alvin R. Lebeck 2001
15
Computer Engineering Methodology
Evaluate Existing
Systems for
Bottlenecks
Benchmarks
Technology
Trends
Simulate New
Designs and
Organizations
Workloads
© Alvin R. Lebeck 2001
16
Computer Engineering Methodology
Implementation
Complexity
Evaluate Existing
Systems for
Bottlenecks
Benchmarks
Technology
Trends
Implement Next
Generation System
Simulate New
Designs and
Organizations
Workloads
© Alvin R. Lebeck 2001
17
Context for Designing New Architectures
• Application Area
– Special Purpose (e.g., DSP) / General Purpose
– Scientific (FP intensive) / Commercial (Mainframe)
– Portable (Power matters)
• Level of Software Compatibility
– Object Code/Binary Compatible (cost HW vs. SW; IBM S/360)
– Assembly Language (dream to be different from binary)
– Programming Language; Why not?
© Alvin R. Lebeck 2001
CPS 220
18
Context for Designing New Architectures
• OS Requirements for General Purpose Apps
–
–
–
–
–
Size of Address Space
Memory Management/Protection
Context Switch
Interrupts and Traps
Communication
• Standards: Innovation vs. Competition
–
–
–
–
IEEE 754 Floating Point
I/O Bus
Networks
Operating Systems / Programming Languages ...
© Alvin R. Lebeck 2001
CPS 220
19
Technology Trends: Microprocessor Capacity
1000000000
“Graduation Window”
100000000
10000000
1000000
100000
10000
1000
19
71
19
74
19
78
19
82
19
85
19
89
19
93
19
95
20
00
20
03
20
06
20
06
20
11
100
Intel
Digital
Pentium Pro: 5.5 million
Sparc Ultra: 5.2 million
PowerPC 620: 6.9 million
Alpha 21164: 9.3 million
Alpha 21264: 15 million
Pentium III: 28 million
Pentium 4: 42 million
Alpha 21364: 100 million
Alpha 21464: 250 million
CMOS improvements:
• Die size: 2X every 3 yrs
• Line width: halve / 7 yrs
© Alvin R. Lebeck 2001
20
DRAM Capacity (single chip)
size
10000000
1000000
100000
10000
1000
100
year size
1980 64 Kb
1983 256 Kb
1986 1 Mb
1989 4 Mb
1992 16 Mb
1996 64Mb
1998 256Mb
2002 1Gb
cyc time
250 ns
220 ns
190 ns
165 ns
145 ns
104 ns
10
1980 1983 1986 1989 1992 1996 1998 2002
© Alvin R. Lebeck 2001
21
Technology Trends (Summary)
Capacity
Speed
Logic
2x in 3 years
2x in 3 years
DRAM
4x in 3 years
1.4x in 10 years
Disk
2x in 3 years
1.4x in 10 years
© Alvin R. Lebeck 2001
CPS 220
22
Processor Performance
300
Sun UltraSparc
P
e
r
f
o
r
m
a
n
c
e
1.54X /yr
250
DEC 21064a
200
150
IBM Power 2/590
100
DEC AX P 3000
HP 9000/750
50
MIPS M/120
MIPS M2000
Sun-4/260
IBM
RS6000/540
1.35X /yr
0
1987 1988 1989 1990 1991 1992 1993 1994 1995
Year
© Alvin R. Lebeck 2001
CPS 220
23
Alpha SPECint and SPECfp
Integer
Floating Point
1.54x/yr
Performance (Specmark)
700
600
500
400
300
200
100
0
1995 1996 1997 1998 1999 2000 2001 2002 2003 2004
© Alvin R. Lebeck 2001
24
Chip Area Reachable in One Clock Cycle
Fraction of Chip Reached
1.2
1
0.8
f16
f8
fSIA
0.6
0.4
0.2
0
250
180
130
100
70
50
35
Nanometers
© Alvin R. Lebeck 2001
25
Power Density
Power Density W/cm^2
1000
100
Processor
Hot Plate
Laser diode
10
1
1.5
1
0.8 0.6 0.35 0.25 0.18 0.13 0.1
Microns
© Alvin R. Lebeck 2001
26
Processor Perspective
• Putting performance growth in perspective:
Pentium-III
Cray YMP
Personal Comp. Supercomputer
Year
1998
1988
MIPS
> 400 MIPS
< 50 MIPS
Linpack
140 MFLOPS
160 MFLOPS
Cost
$3,000
$1M ($1.6M in 1994$)
Clock
400 MHz
167 MHz
Cache
512 KB
0.25 KB
Memory
128 MB
256 MB
• 1988 supercomputer in 1998 personal computer!
© Alvin R. Lebeck 2001
27
Measurement and Evaluation
Design
Architecture is an iterative process:
• Searching the space of possible designs
• At all levels of computer systems
Analysis
Creativity
Cost /
Performance
Analysis
Good Ideas
Bad Ideas
© Alvin R. Lebeck 2001
CPS 220
Mediocre Ideas
28
Measurement Tools
•
•
•
•
How do I evaluate an idea?
Performance, Cost, Die Area, Power Estimation
Benchmarks, Traces, Mixes
Simulation (many levels)
– ISA, RT, Gate, Circuit
• Queuing Theory
• Rules of Thumb
• Fundamental Laws
• Question: What is “better” Boeing 747 or Concorde?
© Alvin R. Lebeck 2001
CPS 220
29
The Bottom Line: Performance (and Cost)
Plane
DC to Paris
Speed
Passengers
Throughput
(pmph)
Boeing 747
6.5 hours
610 mph
470
286,700
BAD/Sud
Concorde
3 hours
1350 mph
132
178,200
• Time to run the task (ExTime)
– Execution time, response time, latency
• Tasks per day, hour, week, sec, ns … (Performance)
– Throughput, bandwidth
© Alvin R. Lebeck 2001
CPS 220
30
The Bottom Line: Performance (and Cost)
"X is n times faster than Y" means
ExTime(Y)
--------ExTime(X)
=
Performance(X)
--------------Performance(Y)
• Speed of Concorde vs. Boeing 747
• Throughput of Boeing 747 vs. Concorde
© Alvin R. Lebeck 2001
CPS 220
31
Performance Terminology
“X is n% faster than Y” means:
ExTime(Y)
--------- =
ExTime(X)
Performance(X)
-------------Performance(Y)
=
1
+
n
----100
n = 100(Performance(X) - Performance(Y))
Performance(Y)
Example: Y takes 15 seconds to complete a task,
X takes 10 seconds. What % faster is X?
© Alvin R. Lebeck 2001
CPS 220
32
Example
ExTime(Y)
=
ExTime(X)
© Alvin R. Lebeck 2001
15
10
1.5
1.0
=
n
=
n
=
CPS 220
=
Performance (X)
Performance (Y)
100 (1.5 - 1.0)
1.0
50%
33
Amdahl's Law
Speedup due to enhancement E:
ExTime w/o E
Speedup(E) = ------------ExTime w/ E
=
Performance w/ E
------------------Performance w/o E
Suppose that enhancement E accelerates a fraction F
of the task by a factor S, and the remainder of the
task is unaffected, then:
ExTime(E) =
Speedup(E) =
© Alvin R. Lebeck 2001
CPS 220
34
Amdahl’s Law
ExTimenew = ExTimeold x (1 - Fractionenhanced) + Fractionenhanced
Speedupenhanced
Speedupoverall =
ExTimeold
ExTimenew
1
=
(1 - Fractionenhanced) + Fractionenhanced
Speedupenhanced
© Alvin R. Lebeck 2001
CPS 220
35
Amdahl’s Law
• Floating point instructions improved to run 2X;
but only 10% of actual instruction execution time
is FP
ExTimenew =
Speedupoverall =
© Alvin R. Lebeck 2001
CPS 220
36
Amdahl’s Law
• Floating point instructions improved to run 2X;
but only 10% of actual instruction execution time
is FP
ExTimenew = ExTimeold x (0.9 + .1/2) = 0.95 x ExTimeold
Speedupoverall =
© Alvin R. Lebeck 2001
1
0.95
CPS 220
=
1.053
37
Corollary: Make The Common Case Fast
• All instructions require an instruction fetch,
only a fraction require a data fetch/store.
– Optimize instruction access over data access
• Programs exhibit locality
Spatial Locality
Temporal Locality
• Access to small memories is faster
– Provide a storage hierarchy such that the most frequent
accesses are to the smallest (closest) memories.
Reg's
© Alvin R. Lebeck 2001
Cache
Memory
CPS 220
Disk / Tape
38
Occam's Toothbrush
• The simple case is usually the most frequent and the
easiest to optimize!
• Do simple, fast things in hardware and be sure the
rest can be handled correctly in software
© Alvin R. Lebeck 2001
CPS 220
39
Metrics of Performance
Application
Answers per month
Operations per second
Programming
Language
Compiler
ISA
(millions) of Instructions per second: MIPS
(millions) of (FP) operations per second: MFLOP/s
Datapath
Control
Function Units
Transistors Wires Pins
© Alvin R. Lebeck 2001
Megabytes per second
Cycles per second (clock rate)
CPS 220
40
Aspects of CPU Performance
CPU time
= Seconds
Program
= Instructions x
Cycles
Program
Instruction
Instr. Cnt
CPI
x Seconds
Cycle
Clock Rate
Program
Compiler
Instr. Set
Organization
Technology
© Alvin R. Lebeck 2001
CPS 220
41
Marketing Metrics
Instructio n Count
Clock Rate
6
MIPS 
 10 
 106
Time
CPI
• Machines with different instruction sets ?
• Programs with different instruction mixes ?
– Dynamic frequency of instructions
• Uncorrelated with performance
FP Operations
MFLOPS 
 10 6
Time
• Machine dependent
• Often not where time is spent
© Alvin R. Lebeck 2001
CPS 220
Normalized:
add,sub,compare,mult
1
divide, sqrt
4
exp, sin, . . .
8
43
Cycles Per Instruction
“Average Cycles Per Instruction”
CPI 
CPU time  Clock Rate
Cycles

Instructio n Count
Instructio n Count
n
CPU time  Cycle Time   CPI i  I i
i 1
“Instruction Frequency”
n
CPI   CPI i  Fi
i 1
wher e Fi 
Ii
Instructio n Count
Invest Resources where time is Spent!
© Alvin R. Lebeck 2001
44
Organizational Trade-offs
Application
Programming
Language
Compiler
Instruction Mix
ISA
Datapath
Control
Function Units
Transistors Wires Pins
© Alvin R. Lebeck 2001
CPS 220
CPI
Cycle Time
45
Example: Calculating CPI
Base Machine (Reg / Reg)
Op
Freq Cycles CPIi
ALU
50%
1
.5
Load
20%
2
.4
Store
10%
2
.2
Branch
20%
2
.4
1.5
(% Time)
(33%)
(27%)
(13%)
(27%)
Typical Mix
© Alvin R. Lebeck 2001
CPS 220
46
Example
Add register / memory operations to traditional RISC:
– One source operand in memory
– One source operand in register
– Cycle count of 2
Branch cycle count to increase to 3.
What fraction of the loads must be eliminated for this
to pay off?
Base Machine (Reg / Reg)
Op
Freq Cycles
ALU
50% 1
Load
20% 2
Store
10% 2
Branch
20% 2
© Alvin R. Lebeck 2001
CPS 220
47
Next Time
• Benchmarks
• Performance Metrics
• Cost
• Instruction Set Architectures
TODO
• Read Chapters 1 & 2
• Do Homework #0
© Alvin R. Lebeck 2001
CPS 220
48