Processor - University of California, Berkeley

Download Report

Transcript Processor - University of California, Berkeley

Introduction to
Hardware/Architecture
David A. Patterson
http://cs.berkeley.edu/~patterson/talks
[email protected]
EECS, University of California
Berkeley, CA 94720-1776
1
Technology Trends:
Microprocessor Capacity
100000000
Alpha 21264: 15 million
Pentium Pro: 5.5 million
PowerPC 620: 6.9 million
Alpha 21164: 9.3 million
Sparc Ultra: 5.2 million
10000000
Moore’s Law
Pentium
i80486
Transistors
1000000
i80386
i80286
100000
2X transistors/Chip
Every 1.5 years
i8086
10000
i8080
Called “Moore’s Law”:
i4004
1000
1970
1975
1980
1985
1990
1995
2000
Year
2
900
800
700
600
500
400
300
200
100
0
Technology Trends: Processor
Performance DEC Alpha 21264/600
1.54X/yr
DEC Alpha 5/500
DEC
HP
Sun MIPSMIPSIBM
AXP/
9000/
-4/ M M/ RS/
500
750
6000
260 2000 120
DEC Alpha 5/300
DEC Alpha 4/266
IBM POWER 100
87 88 89 90 91 92 93 94 95 96 97
Processor performance increase/yr
mistakenly referred to as Moore’s Law (transistors/chip)
3
5 components of any Computer
Computer
Processor Memory
(active)
(passive)
Control
(“brain”)
(where
programs,
Datapath data live
when
(“brawn”)
running)
Devices
Input
Output
Keyboard,
Mouse
Disk,
Network
Display,
Printer
4
Computer Technology
=>Dramatic Change




Processor
 2X in speed every 1.5 years; 1000X performance in last 15 years
Memory
 DRAM capacity: 2x / 1.5 years; 1000X size in last 15 years
 Cost per bit: improves about 25% per year
Disk
 capacity: > 2X in size every 1.5 years
 Cost per bit: improves about 60% per year
 120X size in last decade
State-of-the-art PC “when you graduate” (1997-2001)
 Processor clock speed:
1500 MegaHertz
(1.5 GigaHertz)
 Memory capacity:
500 MegaByte
(0.5 GigaBytes)
 Disk capacity: 100 GigaBytes
(0.1 TeraBytes)
 New units! Mega => Giga, Giga => Tera
5
Integrated Circuit Costs
Die cost =
Wafer cost
Dies per Wafer * Die yield
Dies
Flaws
Die Cost is goes roughly with the cube of the area:
fewer dies per wafer * yield worse with die area
6
Die Yield (1993 data)
Raw Dices Per Wafer
wafer diameter
6”/15cm
8”/20cm
10”/25cm
die area (mm2)
100
144
196
139
90
62
265
177
124
431
290
206
256
44
90
153
324
32
68
116
400
23
52
90
die yield
23%
19%
16% 12% 11%
10%
typical CMOS process:  =2, wafer yield=90%, defect density=2/cm2, 4 test sites/wafer
6”/15cm
8”/20cm
10”/25cm
Good Dices Per Wafer (Before Testing!)
31
16
9
5
3
2
59
32
19
11
7
5
96
53
32
20
13
9
typical cost of an 8”, 4 metal layers, 0.5um CMOS wafer: ~$2000
7
1993 Real World Examples
Chip
Metal Line
layers width
0.90
0.80
0.80
0.80
0.70
0.70
WaferDefect Area Dies/
cost /cm2 mm2 wafer
386DX
486DX2
PowerPC 601
HP PA 7100
DEC Alpha
SuperSPARC
2
3
4
3
3
3
$900
$1200
$1700
$1300
$1500
$1700
Pentium
3 0.80 $1500
1.0
1.0
1.3
1.0
1.2
1.6
43 360
81 181
121 115
196 66
234 53
256 48
1.5 296
40
Yield Die Cost
71%
54%
28%
27%
19%
13%
$4
$12
$53
$73
$149
$272
9%
$417
From "Estimating IC Manufacturing Costs,” by Linley Gwennap, Microprocessor Report,
August 2, 1993, p. 15
8
Processor Trends/ History

History of innovations to 2X / 1.5 yr



Pipelining (helps seconds / clock, or clock rate)
Out-of-Order Execution (helps clocks / instruction)
Superscalar (helps clocks / instruction)
9
Pipelining is Natural!
° Laundry Example
° Ann, Brian, Cathy, Dave
each have one load of clothes
to wash, dry, fold, and put away
A
B
C
D
° Washer takes 30 minutes
° Dryer takes 30 minutes
° “Folder” takes 30 minutes
° “Stasher” takes 30 minutes
to put clothes into drawers
10
Sequential Laundry
6 PM
T
a
s
k
O
r
d
e
r
A
7
8
9
10
11
12
1
2 AM
30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30
Time
B
C
D
Sequential laundry takes
8 hours for 4 loads
11
Pipelined Laundry: Start work ASAP
6 PM
T
a
s
k
7
8
9
10
30 30 30 30 30 30 30
11
12
1
2 AM
Time
A
B
C
O
r
d
e
r
D
Pipelined laundry takes
3.5 hours for 4 loads!
12
Pipeline Hazard: Stall
6 PM
T
a
s
k
7
8
9
30 30 30 30 30 30 30
A
10
11
12
1
2 AM
Time
bubble
B
C
O
r
d
e
r
D
E
F
A depends on D; stall since folder tied up
13
Out-of-Order Laundry: Don’t Wait
6 PM
T
a
s
k
7
8
9
30 30 30 30 30 30 30
A
10
11
12
1
2 AM
Time
bubble
B
C
O
r
d
e
r
D
E
F
A depends on D; rest continue; need more
resources to allow out-of-order
14
Superscalar Laundry: Parallel per stage
6 PM
T
a
s
k
8
9
30 30 30 30 30
A
B
C
O
r
d
e
r
7
D
E
F
10
11
12
1
2 AM
Time
(light clothing)
(dark clothing)
(very dirty clothing)
(light clothing)
(dark clothing)
(very dirty clothing)
More resources, HW match mix of parallel tasks?
15
Superscalar Laundry: Mismatch Mix
6 PM
T
a
s
k
O
r
d
e
r
7
8
9
30 30 30 30 30 30 30
A
B
C
D
10
11
12
1
2 AM
Time
(light clothing)
(light clothing)
(dark clothing)
(light clothing)
Task mix underutilizes extra resources
16
State of the Art: Alpha 21264






15M transistors
2 64KB caches on chip; 16MB L2 cache off chip
Clock <1.7 nsec, or >600 MHz
90 watts
Superscalar: fetch up to 6 instructions/clock cycle,
retires up to 4 instruction/clock cycle
Execution out-of-order
17
Other example: Sony Playstation 2

Emotion Engine: 6.2 GFLOPS, 75 million polygons per
second (Microprocessor Report, 13:5)


Superscalar MIPS core + vector coprocessor + graphics/DRAM
18
Claim: “Toy Story” realism brought to games
The Goal: Illusion of large,
fast, cheap memory



Fact: Large memories are slow,
fast memories are small
How do we create a memory that is large, cheap
and fast (most of the time)?
Hierarchy of Levels

Similar to Principle of Abstraction:
hide details of multiple levels
19
Hierarchy Analogy: Term Paper


Working on paper in library at a desk
Option 1: Every time need a book








Leave desk to go to shelves (or stacks)
Find the book
Bring one book back to desk
Read section interested in
When done with section, leave desk and go to
shelves carrying book
Put the book back on shelf
Return to desk to work
Next time need a book, go to first step
20
Hierarchy Analogy: Library

Option 2: Every time need a book







Leave some books on desk after fetching them
Only go to shelves when need a new book
When go to shelves, bring back related books in case
you need them; sometimes you’ll need to return books
not used recently to make space for new books on
desk
Return to desk to work
When done, replace books on shelves, carrying as
many as you can per trip
Illusion: whole library on your desktop
Buzzword “cache” from French for hidden treasure
21
Why Hierarchy works: Natural Locality

The Principle of Locality:
Program access a relatively small portion of the
address space at any instant of time.
Probability
of reference

0
2^n - 1
Address Space

What programming constructs lead to Principle
of Locality?
22
Memory Hierarchy: How Does it Work?

Temporal Locality (Locality in Time):
 Keep most recently accessed data items closer to
the processor
 Library Analogy: Recently read books are kept on
desk
 Block is unit of transfer (like book)

Spatial Locality (Locality in Space):
 Move blocks consists of contiguous words to the
upper levels
 Library Analogy: Bring back nearby books on
shelves when fetch a book; hope that you might
need it later for your paper
23
Memory Hierarchy Pyramid
Central Processor Unit (CPU)
Increasing
“Upper”
Distance from
Level 1
Levels in
CPU,
Level 2
memory
Decreasing
hierarchy
Level 3
cost / MB
“Lower”
...
Level n
Size of memory at each level
(data cannot be in level i unless also in i+1)
24
Big Idea of Memory Hierarchy



Temporal locality: keep recently accessed data
items closer to processor
Spatial locality: moving contiguous words in
memory to upper levels of hierarchy
Uses smaller and faster memory technologies
close to the processor



Fast hit time in highest level of hierarchy
Cheap, slow memory furthest from processor
If hit rate is high enough, hierarchy has access
time close to the highest (and fastest) level and
size equal to the lowest (and largest) level
25
Disk Description / History
Embed. Proc. Track
(ECC, SCSI) Sector
Track Arm
Buffer Head
Platter
Cylinder
1973:
1. 7 Mbit/sq. in
140 MBytes
1979:
7. 7 Mbit/sq. in
2,300 MBytes
source: New York Times, 2/23/98, page C3,
“Makers of disk drives crowd even more data into even smaller spaces”
26
Disk History
Areal Density
10000
1000
100
10
1
1970
1980
1990
Year
1989:
63 Mbit/sq. in
60,000 MBytes
1997:
1450 Mbit/sq. in
2300 Mbytes
source: N.Y. Times, 2/23/98, page C3 (2.5” diameter)
1997:
3090 Mbit/s. i.
8100 Mbytes
(3.5” diameter)
2000
2000:
10,100
Mb/s. i.
25,000
MBytes
2000:
11,000
Mb/s. i.
73,400
MBytes
27
State of the Art: Ultrastar 72ZX



Embed. Proc. Track

Sector
Cylinder
Platter
Track Arm
Head
Buffer
Latency =
Queuing Time +
Controller time +
per access Seek Time +
+
Rotation Time +
per byte
Size / Bandwidth
{
source: www.ibm.com;
www.pricewatch.com; 2/14/00







73.4 GB, 3.5 inch disk
2¢/MB
16 MB track buffer
11 platters, 22 surfaces
15,110 cylinders
7 Gbit/sq. in. areal density
17 watts (idle)
0.1 ms controller time
5.3 ms avg. seek
(seek 1 track => 0.6 ms)
3 ms = 1/2 rotation
37 to 22 MB/s to media 28
A glimpse into the future?

IBM microdrive for digital cameras


340 Mbytes
Disk target in 5-7 years?
29
Questions?
Contact us if you’re interested:
email: [email protected]
http://iram.cs.berkeley.edu/
30