Transcript slides
Virtual Memory
CSE451
Andrew Whitaker
Problem: Physical Memory Scarcity
Q: Which is bigger:
A 64 bit address space
Or, the number of hydrogen
molecules in a star?
A: It’s the star
2^64 bytes in an address space
10^57 hydrogen atoms in a star
57 * log 10 > 64 * log 2
But, the fact we have to ask is
significant!
Solution: Virtual Memory
Physical memory stores a subset of virtual address
space
The rest is stored on disk
Main memory acts as a page cache
virtual
memory
physical
memory
Implemented transparently by the OS and hardware
Application can’t tell which pages are in memory
How Does Virtual Memory Work?
1 1 1
2
V R M
prot
20
page frame number
Page table entry contains a valid bit
Says whether the mapping is valid
If valid bit is not set, the system raises a
page fault
Behaves like an (involuntary) system call
What Happens on a Page Fault?
Hardware raises an exception
OS page fault handler runs
First, make sure the address is valid
Second, verify access type
Not allowed to write to a read-only page!
If access is legal, allocate a new page frame
What happens if physical memory is scarce… stay tuned!
OS reads page frame from disk
Process/thread is blocked during this process
Once read completes, OS updates the PTE and resumes
the process/thread
Process Startup
Two options for new processes:
Eagerly bring in pages
Lazily fault in pages
Called demand paging
Most OS’s prefer demand paging
Doesn’t require guessing or maintaining history
Slightly smarter approach: clustering
Bring in the faulting page and several subsequent pages
Dealing With Memory Scarcity
Physical memory is over-allocated
We can run out!
OS needs a page replacement strategy
Before diving into particular strategies, lets
look at some theory…
Locality of Reference
Programs tend to reuse data and
instructions they have used recently
These “hot” items are a small fraction of
total memory
Rule of thumb: 90% of execution time is spent
in 10% of the code
This is what makes caching work!
The Working Set Model
The working set represents those pages
currently in use by a program
The working set contents change over time
So does the size of the working set
To get reasonable performance, a program
must be allocated enough pages for its
working set
Request / second of throughput
A hypothetical web server
working
set
Not linear! Performance
decays rapidly without
enough memory
Number of page frames allocated to process
Thrashing
Programs with an allocation beneath their
working set will thrash
Each page fault evicts a “hot” page
That page will be needed soon…
Evicted page takes a page fault
[repeat]
Net result: no work gets done
All time is devoted to paging operations
Over-allocation
Giving a program more
than its working set
does very little good
Page eviction strategies
take advantage of this
Generalizing to Multiple Processes
Let W to be the sum of working sets for all
active processes
Let M be amount of physical memory
If W > M, the system as a whole will thrash
No page replacement policy will work :-(
If W <= M, the system might perform well
Key issue: is the page replacement policy smart
enough to identify working sets?
Belady’s Algorithm
Evict the page that won’t be used for the
longest time in the future
This page is probably not in the working set
If it is in the working set, we’re thrashing
This is optimal!
Minimizes the number of page faults
Major problem: this requires a crystal ball
We can’t “know” future memory sequence
Temporal Locality
Assume the past is a decent indicator of the
future
How good are these algorithms:
LIFO
Newest page is kicked out
FIFO
Oldest page is kicked out
Random
Random page is kicked out
LRU
Least recently used page is kicked out
Paging Algorithms
LIFO is horrendous
Random is also pretty bad
LRU is pretty good
FIFO is mediocre
VAX VMS used a form of FIFO because of hardware
limitations
Implementing LRU: Approach #1
One (bad) approach:
on each memory reference:
long timeStamp = System.currentTimeMillis();
sortedList.insert(pageFrameNumber,timeStamp);
Problem: this is too inefficient
Time stamp + data structure manipulation on
each memory operation
Too complex for hardware
Making LRU Efficient
Use hardware support
Reference bit is set when pages are accessed
Can be cleared by the OS
1 1 1
2
V R M
prot
20
page frame number
Trade off accuracy for speed
It suffices to find a “pretty old” page
Approach #2: LRU Approximation
with Reference Bits
For each page, maintain a set of reference bits
Let’s call it a reference byte
Periodically, shift the HW reference bit into the highestorder bit of the reference byte
Suppose the reference byte was 10101010
If the HW bit was set, the new reference bit become 11010101
Frame with the lowest value is the LRU page
Analyzing Reference Bits
Pro: Does not impose overhead on every
memory reference
Interval rate can be configured
Con: Scanning all page frames can still be
inefficient
e.g., 4 GB of memory, 4KB pages => 1 million
page frames
Approach #3: LRU Clock
Use only a single bit per page frame
Basically, this is a degenerate form of reference
bits
On page eviction:
Scan through the list of reference bits
If the value is zero, replace this page
If the value is one, set the value to zero
Why “Clock”?
Typically implemented with a circular queue
0
0
0
0
1
1
0
1
0
1
0
0
Analyzing Clock
Pro: Very low overhead
Only runs when a page needs evicted
Takes the first page that hasn’t been referenced
Con: Isn’t very accurate (one measly bit!)
Degenerates into FIFO if all reference bits are set
Pro: But, the algorithm is self-regulating
If there is a lot of memory pressure, the clock runs more
often (and is more up-to-date)
When Does LRU Do Badly?
LRU performs poorly when there is little temporal
locality:
1 2 3 4 5 6 7 8
Example: Many database workloads:
SELECT *
FROM Employees
WHERE Salary < 25000