Virtual Memory
Download
Report
Transcript Virtual Memory
Virtual Memory
Announcements
• TA Swap
– Tyler Steele is a 415 TA
– Rohit Doshi is a 414 TA
• My office hours today: 3:25-4:25
2
Review: Memory Hierarchy of a
Modern Computer System
• Take advantage of the principle of locality to:
– Present as much memory as in the cheapest technology
– Provide access at speed offered by the fastest technology
Processor
Control
1s
Size (bytes): 100s
On-Chip
Cache
Speed (ns):
Registers
Datapath
Second
Level
Cache
(SRAM)
Main
Memory
(DRAM)
10s-100s
100s
Ks-Ms
Ms
Secondary
Storage
(Disk)
Tertiary
Storage
(Tape)
10,000,000s 10,000,000,000s
(10s ms)
(10s sec)
3
Gs
Ts
Review: A Summary on Sources
of Cache Misses
• Compulsory (cold start): first reference to a block
– “Cold” fact of life: not a whole lot you can do about it
– Note: When running “billions” of instruction, Compulsory Misses are
insignificant
• Capacity:
– Cache cannot contain all blocks access by the program
– Solution: increase cache size
• Conflict (collision):
– Multiple memory locations mapped to same cache location
– Solutions: increase cache size, or increase associativity
• Two others:
– Coherence (Invalidation): other process (e.g., I/O) updates memory
– Policy: Due to non-optimal replacement policy
4
Review: Where does a Block Get
Placed in a Cache?
• Example: Block 12 placed in 8 block cache
32-Block Address Space:
Block
no.
Block
no.
1111111111222222222233
01234567890123456789012345678901
Direct mapped:
Set associative:
Fully associative:
block 12 can go
only into block 4
(12 mod 8)
block 12 can go
anywhere in set 0
(12 mod 4)
block 12 can go
anywhere
01234567
Block
no.
01234567
Set Set Set Set
0 1 2 3
Block
no.
01234567
5
Review: Other Caching Questions
• What line gets replaced on cache miss?
– Easy for Direct Mapped: Only one possibility
– Set Associative or Fully Associative:
• Random
• LRU (Least Recently Used)
• What happens on a write?
– Write through: The information is written to both the cache and to
the block in the lower-level memory
– Write back: The information is written only to the block in the cache
• Modified cache block is written to main memory only when it is
replaced
• Question is block clean or dirty?
6
Caching Applied to Address Translation
CPU
Virtual
Address
TLB
Cached?
Yes
No
Physical
Address
Physical
Memory
Translate
(MMU)
Data Read or Write
(untranslated)
• Question is one of page locality: does it exist?
– Instruction accesses spend a lot of time on the same page (since
accesses sequential)
– Stack accesses have definite locality of reference
– Data accesses have less page locality, but still some…
• Can we have a TLB hierarchy?
– Sure: multiple levels at different sizes/speeds
7
What Actually Happens on a TLB
Miss?
• Hardware traversed page tables:
– On TLB miss, hardware in MMU looks at current page table to fill TLB
(may walk multiple levels)
• If PTE valid, hardware fills TLB and processor never knows
• If PTE marked as invalid, causes Page Fault, after which kernel decides
what to do afterwards
• Software traversed Page tables (like MIPS)
– On TLB miss, processor receives TLB fault
– Kernel traverses page table to find PTE
• If PTE valid, fills TLB and returns from fault
• If PTE marked as invalid, internally calls Page Fault handler
• Most chip sets provide hardware traversal
– Modern operating systems tend to have more TLB faults since they use
translation for many things
– Examples:
• shared segments
• user-level portions of an operating system
8
Goals for Today
• Virtual memory
• How does it work?
– Page faults
– Resuming after page faults
• When to fetch?
• What to replace?
– Page replacement algorithms
• FIFO, OPT, LRU (Clock)
– Page Buffering
– Allocating Pages to processes
9
What is virtual memory?
•
Each process has illusion of large address space
– 232 for 32-bit addressing
•
•
However, physical memory is much smaller
How do we give this illusion to multiple processes?
– Virtual Memory: some addresses reside in disk
page 0
page 1
page 2
page 3
page table
disk
page 4
page N
Virtual memory
Physical memory
10
Virtual Memory
• Separates users logical memory from physical
memory.
– Only part of the program needs to be in memory for
execution
– Logical address space can therefore be much larger than
physical address space
– Allows address spaces to be shared by several
processes
– Allows for more efficient process creation
11
Virtual Memory
• Load entire process in memory (swapping), run it, exit
– Is slow (for big processes)
– Wasteful (might not require everything)
• Solutions: partial residency
– Paging: only bring in pages, not all pages of process
– Demand paging: bring only pages that are required
• Where to fetch page from?
– Have a contiguous space in disk: swap file (pagefile.sys)
12
How does VM work?
• Modify Page Tables with another bit (“valid”)
– If page in memory, valid = 1, else valid = 0
– If page is in memory, translation works as before
– If page is not in memory, translation causes a page fault
0
1
2
3
32 :V=1
4183 :V=0
177 :V=1
5721 :V=0
Page Table
Disk
Mem
13
Page Faults
• On a page fault:
– OS finds a free frame, or evicts one from memory (which one?)
• Want knowledge of the future?
– Issues disk request to fetch data for page (what to fetch?)
• Just the requested page, or more?
– Block current process, context switch to new process (how?)
• Process might be executing an instruction
– When disk completes, set valid bit to 1, and current process in
ready queue
14
Steps in Handling a Page Fault
15
Resuming after a page fault
• Should be able to restart the instruction
• For RISC processors this is simple:
– Instructions are idempotent until references are done
• More complicated for CISC:
– E.g. move 256 bytes from one location to another
– Possible Solutions:
• Ensure pages are in memory before the instruction executes
16
Page Fault (Cont.)
• Restart instruction
– block move
– auto increment/decrement location
17
When to fetch?
• Just before the page is used!
– Need to know the future
• Demand paging:
– Fetch a page when it faults
• Prepaging:
– Get the page on fault + some of its neighbors, or
– Get all pages in use last time process was swapped
18
Performance of Demand Paging
• Page Fault Rate 0 p 1.0
– if p = 0 no page faults
– if p = 1, every reference is a fault
• Effective Access Time (EAT)
EAT = (1 – p) x memory access
+ p (page fault overhead
+ swap page out
+ swap page in
+ restart overhead
)
19
Demand Paging Example
• Memory access time = 200 nanoseconds
• Average page-fault service time = 8 milliseconds
• EAT = (1 – p) x 200 + p (8 milliseconds)
= (1 – p x 200 + p x 8,000,000
= 200 + p x 7,999,800
• If one access out of 1,000 causes a page fault
EAT = 8.2 microseconds.
This is a slowdown by a factor of 40!!
20
What to replace?
• What happens if there is no free frame?
– find some page in memory, but not really in use, swap it
out
• Page Replacement
– When process has used up all frames it is allowed to use
– OS must select a page to eject from memory to allow new
page
– The page to eject is selected using the Page Replacement
Algorithm
• Goal: Select page that minimizes future page faults
21
Page Replacement
• Prevent over-allocation of memory by modifying pagefault service routine to include page replacement
• Use modify (dirty) bit to reduce overhead of page
transfers – only modified pages are written to disk
• Page replacement completes separation between logical
memory and physical memory – large virtual memory
can be provided on a smaller physical memory
22
Page Replacement
23
Page Replacement Algorithms
• Random: Pick any page to eject at random
– Used mainly for comparison
• FIFO: The page brought in earliest is evicted
– Ignores usage
– Suffers from “Belady’s Anomaly”
• Fault rate could increase on increasing number of pages
• E.g. 0 1 2 3 0 1 4 0 1 2 3 4 with frame sizes 3 and 4
• OPT: Belady’s algorithm
– Select page not used for longest time
• LRU: Evict page that hasn’t been used the longest
– Past could be a good predictor of the future
24
First-In-First-Out (FIFO) Algorithm
• Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
• 3 frames (3 pages can be in memory at a time per
process): 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
1
1
4
5
2
2
1
3
3
3
2
4
9 page faults
• 4 frames: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
1
1
5
4
2
2
1
5
3
3
2
4
4
3
10 page faults
25
FIFO Illustrating Belady’s Anomaly
26
Optimal Algorithm
• Replace page that will not be used for longest period of
time
• 4 frames example
1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
1
4
2
6 page faults
3
4
5
• How do you know this?
• Used for measuring how well your algorithm performs
27
Least Recently Used (LRU)
Algorithm
• Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
1
1
1
1
5
2
2
2
2
2
3
5
5
4
4
4
4
3
3
3
28
Implementing Perfect LRU
• On reference: Time stamp each page
• On eviction: Scan for oldest frame
• Problems:
– Large page lists
– Timestamps are costly
• Approximate LRU
13
– LRU is already an approximation!
0xffdcd: add r1,r2,r3
0xffdd0: ld r1, 0(sp) 14
14
t=4
t=14
t=14
t=5
29
LRU: Clock Algorithm
• Each page has a reference bit
– Set on use, reset periodically by the OS
• Algorithm:
– FIFO + reference bit (keep pages in circular list)
• Scan: if ref bit is 1, set to 0, and proceed. If ref bit is 0, stop and evict.
• Problem:
– Low accuracy for large memory
R=1
R=1
R=0
R=0
R=1
R=0
R=0
R=1
R=1
R=0
R=1
30
LRU with large memory
• Solution: Add another hand
– Leading edge clears ref bits
– Trailing edge evicts pages with ref bit 0
• What if angle small?
• What if angle big?
R=1
R=1
R=0
R=0
R=1
R=0
R=0
R=1
R=1
R=0
R=1
31
Clock Algorithm: Discussion
• Sensitive to sweeping interval
– Fast: lose usage information
– Slow: all pages look used
• Clock: add reference bits
– Could use (ref bit, modified bit) as ordered pair
– Might have to scan all pages
• LFU: Remove page with lowest count
– No track of when the page was referenced
– Use multiple bits. Shift right by 1 at regular intervals.
• MFU: remove the most frequently used page
• LFU and MFU do not approximate OPT well
32
Page Buffering
• Cute simple trick: (XP, 2K, Mach, VMS)
– Keep a list of free pages
– Track which page the free page corresponds to
– Periodically write modified pages, and reset modified bit
evict
add
used
free
modified list
(batch writes
= speed)
unmodified
free list
33
Allocating Pages to Processes
• Global replacement
– Single memory pool for entire system
– On page fault, evict oldest page in the system
– Problem: protection
• Local (per-process) replacement
– Have a separate pool of pages for each process
– Page fault in one process can only replace pages from its own
process
– Problem: might have idle resources
34
Allocation of Frames
• Each process needs minimum number of pages
• Example: IBM 370 – 6 pages to handle SS MOVE
instruction:
– instruction is 6 bytes, might span 2 pages
– 2 pages to handle from
– 2 pages to handle to
• Two major allocation schemes
– fixed allocation
– priority allocation
35
Summary
• Demand Paging:
– Treat memory as cache on disk
– Cache miss get page from disk
• Transparent Level of Indirection
– User program is unaware of activities of OS behind scenes
– Data can be moved without affecting application correctness
• Replacement policies
– FIFO: Place pages on queue, replace page at end
– OPT: replace page that will be used farthest in future
– LRU: Replace page that hasn’t be used for the longest time
• Clock Algorithm: Approximation to LRU
– Arrange all pages in circular list
– Sweep through them, marking as not “in use”
– If page not “in use” for one pass, than can replace
36