Transcript ppt
CS-447– Computer Architecture
M,W 10-11:20am
Lecture 19
Memory Hierarchy
November 7th, 2007
Majd F. Sakr
[email protected]
www.qatar.cmu.edu/~msakr/15447-f07/
15-447 Computer Architecture
Fall 2007 ©
During This Lecture
° Introduction to the Memory Hierarchy
• Processor Memory Gap
• Locality
• Latency Hiding
15-447 Computer Architecture
Fall 2007 ©
The Big Picture
Computer
Processor Memory
(active) (passive)
Control
(“brain”)
Datapath
(“brawn”)
Devices
Input
(where
programs, Output
data live
when
running)
15-447 Computer Architecture
Keyboard,
Mouse
Disk,
Network
Display,
Printer
Fall 2007 ©
Processor-DRAM Memory Gap (latency)
Performance
100
10
1
µProc
60%/yr.
“Moore’s Law”
(2X/1.5yr)
Processor-Memory
Performance Gap:
(grows 50% / year)
DRAM
DRAM
9%/yr.
(2X/10 yrs)
CPU
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
1000
Time
15-447 Computer Architecture
Fall 2007 ©
Memories:
°SRAM:
• value is stored on a pair of inverting
gates
• very fast but takes up more space than
DRAM (4 to 6 transistors)
15-447 Computer Architecture
Fall 2007 ©
Memories:
DRAM:
• value is stored as a charge on capacitor
(must be refreshed)
• very small but slower than SRAM (factor
of 5 to 10)
Word line
Pass
Transistor
Capacitor
Bit
line
15-447 Computer Architecture
Fall 2007 ©
Memory
° Users want large and fast memories!
° SRAM access times are .5 – 5ns at cost of $4000 to
$10,000 per GB.
° DRAM access times are 50-70ns at cost of $100 to
$200 per GB.
° Disk access times are 5 to 20 million ns at cost of
$.50 to $2 per GB.
15-447 Computer Architecture
Fall 2007 ©
2004
Storage Trends
SRAM
metric
1980
1985
1990
1995
2000
2005
2005:1980
$/MB
access (ns)
19,200
300
2,900
150
320
35
256
15
100
12
75
10
256
30
1980
1985
1990
1995
2000
2005
2005:1980
$/MB
8,000
access (ns)
375
typical size(MB) 0.064
880
200
0.256
100
100
4
30
70
16
1
60
64
0.20
50
1,000
40,000
8
15,000
1985
1990
1995
2000
2005
2005:1980
100
75
10
8
28
160
0.30
10
1,000
0.05
8
9,000
0.001
10,000
4
22
400,000 400,000
DRAM
metric
Disk
metric
1980
$/MB
500
access (ms)
87
typical size(MB) 1
15-447 Computer Architecture
Fall 2007 ©
CPU Clock Rates
1980
1985
1990
1995
processor
8080
286
386
Pentium P-III
P-4
clock rate(MHz)
cycle time(ns)
1
1,000
6
166
20
50
150
6
3,000
0.3
15-447 Computer Architecture
2000
750
1.3
2005
2005:1980
3,000
3,333
Fall 2007 ©
The CPU-Memory Gap
The gap widens between DRAM, disk, and CPU speeds.
ns
100,000,000
10,000,000
1,000,000
100,000
10,000
1,000
100
Disk seek time
DRAM access time
SRAM access time
10
1
0
CPU cycle time
1980
1985
1990
1995
2000
2005
Year
15-447 Computer Architecture
Fall 2007 ©
Locality
° Principle of Locality:
• Programs tend to reuse data and instructions near
those they have used recently, or that were recently
referenced themselves.
• Temporal locality: Recently referenced items are
likely to be referenced in the near future.
• Spatial locality: Items with nearby addresses tend
to be referenced close together in time.
sum = 0;
Locality Example:
for (i = 0; i < n; i++)
• Data
sum += a[i];
– Reference array elements in succession
return sum;
(stride-1 reference pattern): Spatial locality
– Reference sum each iteration: Temporal locality
• Instructions
– Reference instructions in sequence: Spatial locality
– Cycle through loop repeatedly: Temporal locality
15-447 Computer Architecture
Fall 2007 ©
Locality Example
° Claim: Being able to look at code and get a
qualitative sense of its locality is a key skill
for a professional programmer.
° Question: Does this function have good
locality?
int sum_array_rows(int a[M][N])
{
int i, j, sum = 0;
for (i = 0; i < M; i++)
for (j = 0; j < N; j++)
sum += a[i][j];
return sum;
}
15-447 Computer Architecture
Fall 2007 ©
Locality Example
°Question: Does this function have
good locality?
int sum_array_cols(int a[M][N])
{
int i, j, sum = 0;
for (j = 0; j < N; j++)
for (i = 0; i < M; i++)
sum += a[i][j];
return sum;
}
15-447 Computer Architecture
Fall 2007 ©
Memory Hierarchy (1/3)
°Processor
• executes instructions on order of
nanoseconds to picoseconds
• holds a small amount of code and data in
registers
°Memory
• More capacity than registers, still limited
• Access time ~50-100 ns
°Disk
• HUGE capacity (virtually limitless)
• VERY slow: runs ~milliseconds
15-447 Computer Architecture
Fall 2007 ©
Memory Hierarchy (2/3)
Processor
Higher
Levels in
memory
hierarchy
Lower
Level 1
Level 2
Increasing
Distance
from Proc.,
Decreasing
speed
Level 3
...
Level n
Size of memory at each level
As we move to deeper levels the latency
goes up and price per bit goes down.
15-447 Computer Architecture
Fall 2007 ©
Memory Hierarchy (3/3)
°If level closer to Processor, it must be:
• smaller
• faster
• subset of lower levels (contains most
recently used data)
°Lowest Level (usually disk) contains
all available data
°Other levels?
15-447 Computer Architecture
Fall 2007 ©
Memory Caching
°We’ve discussed three levels in the
hierarchy: processor, memory, disk
°Mismatch between processor and
memory speeds leads us to add a new
level: a memory cache
°Implemented with SRAM technology
15-447 Computer Architecture
Fall 2007 ©
Memory Hierarchy Analogy: Library (1/2)
°You’re writing a term paper
(Processor) at a table in Library
°Library is equivalent to disk
• essentially limitless capacity
• very slow to retrieve a book
°Table is memory
• smaller capacity: means you must return
book when table fills up
• easier and faster to find a book there
once you’ve already retrieved it
15-447 Computer Architecture
Fall 2007 ©
Memory Hierarchy Analogy: Library (2/2)
°Open books on table are cache
• smaller capacity: can have very few open
books fit on table; again, when table fills up,
you must close a book
• much, much faster to retrieve data
°Illusion created: whole library open on
the tabletop
• Keep as many recently used books open on
table as possible since likely to use again
• Also keep as many books on table as
possible, since faster than going to library
15-447 Computer Architecture
Fall 2007 ©
Memory Hierarchy Basis
°Disk contains everything.
°When Processor needs something,
bring it into to all higher levels of
memory.
°Cache contains copies of data in
memory that are being used.
°Memory contains copies of data on
disk that are being used.
°Entire idea is based on Temporal
Locality: if we use it now, we’ll want to
use it again soon (a Big Idea)
15-447 Computer Architecture
Fall 2007 ©
A View of the Memory Hierarchy
Regs
Instr. Operands
Cache
Blocks
Upper Level
Faster
L2 Cache
Blocks
Memory
Pages
Disk
Files
Tape
15-447 Computer Architecture
Larger
Lower Level
Fall 2007 ©