CS61C: Machine Structures
Download
Report
Transcript CS61C: Machine Structures
CS 161
Ch 7: Memory Hierarchy
LECTURE 16
Instructor: L.N. Bhuyan
www.cs.ucr.edu/~bhuyan
1
1999 ©UCB
Cache Access Time
With Load Bypass:
Average Access Time = Hit Time x (1 - Miss Rate) + Miss
Penalty x Miss Rate
OR
Without Load Bypass
Average Memory Acess Time = Time for a hit + Miss rate
x Miss penalty
2
1999 ©UCB
Unified vs Split Caches
Proc
Unified
Cache-1
Unified
Cache-2
I-Cache-1
Proc
D-Cache-1
Unified
Cache-2
Unified Cache:
• Low Miss ratio because more space available for either instruction or data
• Low cache bandwidth because instruction and data cannot be read at the same time
due to one port.
Split Cache:
• High miss ratio because either instructions or data may run out of space even though
space is available at other cache
• High bandwidth because an instruction and data can be accessed at the same time.
Example:
• 16KB I&D: Inst miss rate=0.64%, Data miss rate=6.47%
• 32KB unified: Aggregate miss rate=1.99%
° Which is better (ignore L2 cache)?
• Assume 33% data ops 75% accesses from instructions (1.0/1.33)
• hit time=1, miss time=50
• Note that data hit has 1 stall for unified cache (only one port)
AMATHarvard=75%x(1+0.64%x50)+25%x(1+6.47%x50) = 2.05
AMATUnified=75%x(1+1.99%x50)+25%x(1+1+1.99%x50)= 2.24
3
1999 ©UCB
Static RAM (SRAM)
°Six transistors in cross connected
fashion
• Provides regular AND inverted outputs
• Implemented in CMOS process
Single Port 6-T SRAM Cell
4
1999 ©UCB
Dynamic Random Access Memory - DRAM
° DRAM organization is similar to SRAM except that
each bit of DRAM is constructed using a pass
transistor and a capacitor, shown in next slide
° Less number of transistors/bit gives high density,
but slow discharge through capacitor.
° Capacitor needs to be recharged or refreshed
giving rise to high cycle time. Q: What is the
difference between access time and cycle time?
° Uses a two-level decoder as shown later. Note that
2048 bits are accessed per row, but only one bit is
used.
5
1999 ©UCB
Dynamic RAM
° SRAM cells exhibit high speed/poor density
° DRAM: simple transistor/capacitor pairs in
high density form
Word Line
C
.
.
.
Bit Line
Sense Amp
6
1999 ©UCB
DRAM logical organization (4 Mbit)
•Access time of DRAM = Row access time + column
access time + refreshing
Column Decoder
…
Sense Amps & I/O
11
D
A0…A10
Row Decoder
…
Q
Memory Array
(2,048 x 2,048)
Storage
Word Line Cell
° Square root of bits per RAS/CAS
7
1999 ©UCB
Virtual Memory
° Idea 1: Many Programs sharing DRAM Memory so
that context switches can occur
° Idea 2: Allow program to be written without memory
constraints – program can exceed the size of the
main memory
° Idea 3: Relocation: Parts of the program can be
placed at different locations in the memory instead
of a big chunk.
° Virtual Memory:
(1) DRAM Memory holds many programs running
at same time (processes)
(2) use DRAM Memory as a kind of “cache” for
disk
8
1999 ©UCB
Disk Technology in Brief
°Disk is mechanical memory
tracks
3600 - 7200 RPM
rotation speed
R/W arm
°Disk Access Time =
seek time + rotational delay + transfer time
• usually measured in milliseconds
°“Miss” to disk is extremely expensive
• typical access time = millions of clock cycles
9
1999 ©UCB
Virtual Memory has own terminology
° Each process has its own private “virtual
address space” (e.g., 232 Bytes); CPU actually
generates “virtual addresses”
° Each computer has a “physical address space”
(e.g., 128 MegaBytes DRAM); also called “real
memory”
° Address translation: mapping virtual addresses
to physical addresses
• Allows multiple programs to use (different
chunks of physical) memory at same time
• Also allows some chunks of virtual memory
to be represented on disk, not in main
memory (to exploit memory hierarchy)
10
1999 ©UCB
Mapping Virtual Memory to Physical Memory
° Divide Memory into equal sized
“chunks” (say, 4KB each)
° Any chunk of Virtual Memory
assigned to any chunk of
Physical Memory (“page”)
64 MB
Virtual Memory
Stack
Physical Memory
Single
Process
Heap
Heap
Static
Code
0
11
0
1999 ©UCB
Handling Page Faults
°A page fault is like a cache miss
• Must find page in lower level of hierarchy
°If valid bit is zero, the Physical Page
Number points to a page on disk
°When OS starts new process, it
creates space on disk for all the pages
of the process, sets all valid bits in
page table to zero, and all Physical
Page Numbers to point to disk
• called Demand Paging - pages of the
process are loaded from disk only as
needed
12
1999 ©UCB
Comparing the 2 levels of hierarchy
° Cache
Virtual Memory
° Block or Line
Page
° Miss
Page Fault
° Block Size: 32-64B
Page Size: 4K-16KB
° Placement:
Fully Associative
Direct Mapped,
N-way Set Associative
° Replacement:
LRU or Random
Least Recently Used
(LRU) approximation
° Write Thru or Back
Write Back
° How Managed:
Hardware
Hardware + Software
(Operating System)
13
1999 ©UCB
How to Perform Address Translation?
°VM divides memory into equal sized
pages
°Address translation relocates entire
pages
• offsets within the pages do not change
• if make page size a power of two, the
virtual address separates into two fields:
Virtual Page Number
Page Offset
virtual address
• like cache index, offset fields
14
1999 ©UCB
Mapping Virtual to Physical Address
Virtual Address
31 30 29 28 27 .………………….12 11 10 9 8 ……..……. 3 2 1 0
Virtual Page Number
Page Offset
1KB page
size
Translation
Physical Page Number
29 28 27 .………………….12 11 10
15
Page Offset
9 8 ……..……. 3 2 1 0
Physical Address
1999 ©UCB
Address Translation
°Want fully associative page placement
°How to locate the physical page?
°Search impractical (too many pages)
°A page table is a data structure which
contains the mapping of virtual pages
to physical pages
• There are several different ways, all up to
the operating system, to keep this data
around
°Each process running in the system
has its own page table
16
1999 ©UCB
Address Translation: Page Table
Virtual Address (VA):
virtual page nbr offset
Page Table
Register
index
into
page
table
Page Table
is located
in physical
memory
17
Page Table
...
V
A.R. P. P. N.
Val Access Physical
-id Rights Page
Number
V
A.R. P. P. N.
0
A.R.
...
+
Physical
Memory
Address (PA)
Access Rights: None, Read Only,
Read/Write, Executable
disk
1999 ©UCB
Optimizing for Space
°Page Table too big!
• 4GB Virtual Address Space ÷ 4 KB page
220 (~ 1 million) Page Table Entries
4 MB just for Page Table of single
process!
°Variety of solutions to tradeoff Page
Table size for slower performance
° Use a limit register to restrict page table size
and let it grow with more pages,Multilevel
page table, Paging page tables, etc.
(Take O/S Class to learn more)
18
1999 ©UCB
How to Translate Fast?
° Problem: Virtual Memory requires two
memory accesses!
• one to translate Virtual Address into Physical
Address (page table lookup)
• one to transfer the actual data (cache hit)
• But Page Table is in physical memory!
° Observation: since there is locality in pages
of data, must be locality in virtual addresses
of those pages!
° Why not create a cache of virtual to physical
address translations to make translation
fast? (smaller is faster)
° For historical reasons, such a “page table
cache” is called a Translation Lookaside
Buffer, or TLB
19
1999 ©UCB
Typical TLB Format
Virtual Physical Valid Ref Dirty Access
Page Nbr Page Nbr
Rights
“tag”
“data”
•TLB just a cache of the page table mappings
• Dirty: since use write back, need to know
whether or not to write page to disk when
replaced
• Ref: Used to calculate LRU on replacement
• TLB access time comparable to cache
(much less than main memory access time)
20
1999 ©UCB
Translation Look-Aside Buffers
•TLB is usually small, typically 32-4,096 entries
• Like any other cache, the TLB can be fully
associative, set associative, or direct mapped
data
data
virtual
addr.
Processor
physical
addr.
hit
TLB
hit
miss
Cache
Main
Memory
OS Fault
Handler
Disk
Memory
miss
Page
Table
21
page fault/
protection violation
1999 ©UCB
31 30 29
15 14 13 12 11 10 9 8
Virtual Address
Virtual page number
3210
Page offset
20
Valid Dirty
12
DECStation 3100/
MIPS R2000
Physical page number
Tag
TLB
TLB hit
64 entries,
fully
associative
20
Physical Address
Physical page number
Physical address tag
Page offset
Cache index
14
16
Valid
Tag
Byte
2 offset
Data
Cache
16K entries,
direct
mapped
22
Cache hit
32
Data
1999 ©UCB
Real Stuff: Pentium Pro Memory Hierarchy
23
° Address Size:
32 bits (VA, PA)
° VM Page Size:
4 KB, 4 MB
° TLB organization:
separate i,d TLBs
(i-TLB: 32 entries,
d-TLB: 64 entries)
4-way set associative
LRU approximated
hardware handles miss
° L1 Cache:
8 KB, separate i,d
4-way set associative
LRU approximated
32 byte block
write back
° L2 Cache:
256 or 512 KB
1999 ©UCB