Transcript slides

Virtual Memory
CSE451
Andrew Whitaker
Problem: Physical Memory Scarcity
 Q: Which is bigger:
A 64 bit address space
Or, the number of hydrogen
molecules in a star?
 A: It’s the star
2^64 bytes in an address space
10^57 hydrogen atoms in a star
57 * log 10 > 64 * log 2
 But, the fact we have to ask is
significant!
Solution: Virtual Memory
 Physical memory stores a subset of virtual address
space
 The rest is stored on disk
 Main memory acts as a page cache
virtual
memory
physical
memory
 Implemented transparently by the OS and hardware
 Application can’t tell which pages are in memory
How Does Virtual Memory Work?
1 1 1
2
V R M
prot
20
page frame number
Page table entry contains a valid bit
Says whether the mapping is valid
If valid bit is not set, the system raises a
page fault
Behaves like an (involuntary) system call
What Happens on a Page Fault?
 Hardware raises an exception
 OS page fault handler runs
 First, make sure the address is valid
 Second, verify access type
 Not allowed to write to a read-only page!
 If access is legal, allocate a new page frame
 What happens if physical memory is scarce… stay tuned!
 OS reads page frame from disk
 Process/thread is blocked during this process
 Once read completes, OS updates the PTE and resumes
the process/thread
Process Startup
 Two options for new processes:
Eagerly bring in pages
Lazily fault in pages
 Called demand paging
 Most OS’s prefer demand paging
Doesn’t require guessing or maintaining history
 Slightly smarter approach: clustering
Bring in the faulting page and several subsequent pages
Dealing With Memory Scarcity
Physical memory is over-allocated
We can run out!
OS needs a page replacement strategy
Before diving into particular strategies, lets
look at some theory…
Locality of Reference
Programs tend to reuse data and
instructions they have used recently
These “hot” items are a small fraction of
total memory
Rule of thumb: 90% of execution time is spent
in 10% of the code
This is what makes caching work!
The Working Set Model
The working set represents those pages
currently in use by a program
The working set contents change over time
So does the size of the working set
To get reasonable performance, a program
must be allocated enough pages for its
working set
Request / second of throughput
A hypothetical web server
working
set
Not linear! Performance
decays rapidly without
enough memory
Number of page frames allocated to process
Thrashing
 Programs with an allocation beneath their
working set will thrash
Each page fault evicts a “hot” page
That page will be needed soon…
Evicted page takes a page fault
[repeat]
 Net result: no work gets done
All time is devoted to paging operations
Over-allocation
Giving a program more
than its working set
does very little good
Page eviction strategies
take advantage of this
Generalizing to Multiple Processes
Let W to be the sum of working sets for all
active processes
Let M be amount of physical memory
If W > M, the system as a whole will thrash
No page replacement policy will work :-(
If W <= M, the system might perform well
Key issue: is the page replacement policy smart
enough to identify working sets?
Belady’s Algorithm
 Evict the page that won’t be used for the
longest time in the future
This page is probably not in the working set
 If it is in the working set, we’re thrashing
 This is optimal!
Minimizes the number of page faults
 Major problem: this requires a crystal ball
We can’t “know” future memory sequence
Temporal Locality
 Assume the past is a decent indicator of the
future
 How good are these algorithms:
LIFO
 Newest page is kicked out
FIFO
 Oldest page is kicked out
Random
 Random page is kicked out
LRU
 Least recently used page is kicked out
Paging Algorithms
 LIFO is horrendous
 Random is also pretty bad
 LRU is pretty good
 FIFO is mediocre
VAX VMS used a form of FIFO because of hardware
limitations
Implementing LRU: Approach #1
One (bad) approach:
on each memory reference:
long timeStamp = System.currentTimeMillis();
sortedList.insert(pageFrameNumber,timeStamp);
Problem: this is too inefficient
Time stamp + data structure manipulation on
each memory operation
Too complex for hardware
Making LRU Efficient
Use hardware support
Reference bit is set when pages are accessed
Can be cleared by the OS
1 1 1
2
V R M
prot
20
page frame number
Trade off accuracy for speed
It suffices to find a “pretty old” page
Approach #2: LRU Approximation
with Reference Bits
 For each page, maintain a set of reference bits
 Let’s call it a reference byte
 Periodically, shift the HW reference bit into the highestorder bit of the reference byte
 Suppose the reference byte was 10101010
 If the HW bit was set, the new reference bit become 11010101
 Frame with the lowest value is the LRU page
Analyzing Reference Bits
Pro: Does not impose overhead on every
memory reference
Interval rate can be configured
Con: Scanning all page frames can still be
inefficient
e.g., 4 GB of memory, 4KB pages => 1 million
page frames
Approach #3: LRU Clock
Use only a single bit per page frame
Basically, this is a degenerate form of reference
bits
On page eviction:
Scan through the list of reference bits
If the value is zero, replace this page
If the value is one, set the value to zero
Why “Clock”?
Typically implemented with a circular queue
0
0
0
0
1
1
0
1
0
1
0
0
Analyzing Clock
 Pro: Very low overhead
Only runs when a page needs evicted
Takes the first page that hasn’t been referenced
 Con: Isn’t very accurate (one measly bit!)
Degenerates into FIFO if all reference bits are set
 Pro: But, the algorithm is self-regulating
If there is a lot of memory pressure, the clock runs more
often (and is more up-to-date)
When Does LRU Do Badly?
 LRU performs poorly when there is little temporal
locality:
1 2 3 4 5 6 7 8
 Example: Many database workloads:
SELECT *
FROM Employees
WHERE Salary < 25000