Transcript Slide
Chapter 9: Virtual Memory
Objectives
Understand the virtual memory system, its benefits, and the
mechanisms that make it feasible:
Demand paging
Page-replacement algorithms
Frame allocation
Locality and working-set models
Understand how the kernel memory is allocated and used
Operating System Concepts – 7th Edition, Feb 22, 2005
9.2
Silberschatz, Galvin and Gagne ©2005
Background
Virtual memory – separation of user logical memory from physical
memory
Only part of the program needs to be in memory for execution
Logical address space can therefore be much larger than
physical address space
Allows address spaces to be shared by several processes
Allows for more efficient process creation
Virtual memory can be implemented via
Demand paging
Demand segmentation
Operating System Concepts – 7th Edition, Feb 22, 2005
9.3
Silberschatz, Galvin and Gagne ©2005
Virtual Memory That is Larger Than Physical Memory
Operating System Concepts – 7th Edition, Feb 22, 2005
9.4
Silberschatz, Galvin and Gagne ©2005
Demand Paging
The core enabling idea of virtual memory systems:
A page is brought into memory only when needed
Why?
Less I/O needed
Less memory needed
Faster response
More processes can be admitted to the system
How?
Process generates logical (virtual) addresses which are
mapped to physical addresses using a page table
If the requested page is not in memory, kernel brings it from
hard disk
How do we know whether a page is in memory?
Operating System Concepts – 7th Edition, Feb 22, 2005
9.5
Silberschatz, Galvin and Gagne ©2005
Valid-Invalid Bit
Each page table entry has a valid–invalid bit:
v in-memory,
Frame #
i not-in-memory
Initially, it is set to i on for entries
v
v
v
v
i
During address translation, if valid–invalid bit
is i, then it could be:
Illegal reference (outside process’
address space) abort process
Legal reference but not in memory
page fault (to bring the page)
Operating System Concepts – 7th Edition, Feb 22, 2005
9.6
valid-invalid bit
….
i
i
page table
Silberschatz, Galvin and Gagne ©2005
Handling a Page Fault
1. Operating system looks at another table to decide:
Invalid reference abort
Just not in memory bring it in (steps 2 – 6)
2. Get empty frame
3. Swap page into frame
4. Reset tables
5. Set validation bit = v
6. Restart the instruction that caused the page fault
Operating System Concepts – 7th Edition, Feb 22, 2005
9.7
Silberschatz, Galvin and Gagne ©2005
Handling a Page Fault (cont’d)
Operating System Concepts – 7th Edition, Feb 22, 2005
9.8
Silberschatz, Galvin and Gagne ©2005
Handling a Page Fault (cont’d)
Difficulties with restarting an instruction
C A + B (page fault while accessing C, bring it in memory and then redo the addition)
Can be complicated for other instructions
e.g., block move which moves a block of data from one location to
another possible overlapping
–
Some data may be overwritten
–
Sol: hardware attempts to access both ends of both blocks; if any
is not in memory, a page fault occurs before executing instruction
Bottom line: demanding paging may raise subtle problems and they must
be addressed
Operating System Concepts – 7th Edition, Feb 22, 2005
9.9
Silberschatz, Galvin and Gagne ©2005
Performance of Demand Paging
Page Fault Rate 0 p 1.0
if p = 0 means no page faults
if p = 1, means every reference is a fault
Effective Access Time (EAT)
EAT = (1 – p) x memory access
+ p (page fault overhead
+ swap page out
+ swap page in
+ restart overhead
)
Operating System Concepts – 7th Edition, Feb 22, 2005
9.10
Silberschatz, Galvin and Gagne ©2005
Demand Paging: Example
Memory access time = 200 nanoseconds
Average page-fault service time = 8 milliseconds (disk latency, seek and
transfer time)
EAT = (1 – p) x 200 + p (8 milliseconds)
= (1 – p x 200 + p x 8,000,000
= 200 + p x 7,999,800
If one access out of 1,000 causes a page fault, then
EAT = 8.2 microseconds.
This is a slowdown by a factor of 40!!
Bottom line: We should minimize number of page faults; they are very
costly
Operating System Concepts – 7th Edition, Feb 22, 2005
9.11
Silberschatz, Galvin and Gagne ©2005
Virtual Memory and Process Creation
Virtual memory allows faster/efficient process creation using
Copy-on-Write (COW) technique
COW allows both parent and child processes to initially share the
same pages in memory
If either process modifies a shared page, the page copied
Operating System Concepts – 7th Edition, Feb 22, 2005
9.12
Silberschatz, Galvin and Gagne ©2005
Page Replacement
Page fault occurs need to bring the requested page in memory:
Find the location of the desired page on disk
Find a free frame:
- If there is a free frame, use it
- If there is no free frame, use a page replacement algorithm
to select a victim frame
Bring the desired page into the (newly) free frame; update the
page table and free frame list
Restart the process
Operating System Concepts – 7th Edition, Feb 22, 2005
9.13
Silberschatz, Galvin and Gagne ©2005
Page Replacement
Note: we can save the swap out overhead (step 1) if the victim page was
NOT modified
We associate a dirty (modify) bit with each page to indicate whether a
page has been modified significant savings (I/O operation)
Operating System Concepts – 7th Edition, Feb 22, 2005
9.14
Silberschatz, Galvin and Gagne ©2005
Page Replacement Algorithms
Objective: minimize page-fault rate
Algorithm evaluation
Take particular string of memory references (reference string) and
Compute the number of page faults on that string
The reference string looks like:
1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
Notes
We use page numbers
The address sequence could have been: 100, 250, 270, 301, 490,
…, Assuming a page of size 100 bytes.
References 250 and 270 are in the same page (2); only the first one
may cause a page fault. It is why we mention 2 only once
Operating System Concepts – 7th Edition, Feb 22, 2005
9.15
Silberschatz, Galvin and Gagne ©2005
Page Faults vs. Number of Frames
Operating System Concepts – 7th Edition, Feb 22, 2005
9.16
Silberschatz, Galvin and Gagne ©2005
First-In-First-Out (FIFO) Page Replacement
Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
3 frames (3 pages can be in memory at a time per process)
Let us work it out
On every page fault, we show memory contents
Number of page faults: 9
Pros
Easy to understand and implement
Cons
Performance may not always be good:
It may replace a page that is used heavily (e.g., one that has
a variable which is being using most of the time)
It suffers from Belady’s anomaly
Operating System Concepts – 7th Edition, Feb 22, 2005
9.17
Silberschatz, Galvin and Gagne ©2005
FIFO: Belady’s Anomaly
Assume the reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
If we have 3 frames, how many page faults?
9 page faults
If we have 4 frames, how many page faults? Students work it out.
10 page faults
More frames are supposed to result in fewer page faults!
Belady’s Anomaly: more frames more page faults
Operating System Concepts – 7th Edition, Feb 22, 2005
9.18
Silberschatz, Galvin and Gagne ©2005
FIFO: Belady’s Anomaly (cont’d)
Operating System Concepts – 7th Edition, Feb 22, 2005
9.19
Silberschatz, Galvin and Gagne ©2005
Optimal Algorithm
Replace page that will not be used for longest period of time
4 frames example
1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
1
4
2
6 page faults
3
4
5
How do we know the future? We cannot!
Used for measuring how well an algorithm performs (for
comparison)
Operating System Concepts – 7th Edition, Feb 22, 2005
9.20
Silberschatz, Galvin and Gagne ©2005
Least Recently Used (LRU) Algorithm
Try to approximate the Optimal policy; look at the past and infer the future
If a page has not been used for a long period, it may not be needed
anymore (e.g., pages of an initialization module)
Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
1
1
1
1
5
2
2
2
2
2
3
5
5
4
4
4
4
3
3
3
7 page faults (compare to: optimal 6, FIFO 10)
LRU and Optimal do not suffer from Belady’s anomaly
Operating System Concepts – 7th Edition, Feb 22, 2005
9.21
Silberschatz, Galvin and Gagne ©2005
LRU Implementation: Counters
Every page-table entry has a time-of-use (counter) field
Every time page is referenced through this entry, copy the CPU logical
clock into this field
CPU clock is maintained in a register and incremented with every
memory access
When a page needs to be replaced, search for the page with smallest
(oldest) value
Cons: search time, updating the time-of-use files (writing to memory),
clock overflow
Needs hardware support (increment clock and update time-of-use field)
Operating System Concepts – 7th Edition, Feb 22, 2005
9.22
Silberschatz, Galvin and Gagne ©2005
LRU Implementation: Stack
Keep a stack of page numbers in a
doubly-linked list
If a page is referenced, move it to
the top
The least recently used page sinks
to the bottom
Requires updating 6 pointers in the
worst case for a memory reference
No search for replacement
Also needs hardware support to
update the stack
Operating System Concepts – 7th Edition, Feb 22, 2005
9.23
Silberschatz, Galvin and Gagne ©2005
LRU Implementation (cont’d)
Can we implement LRU without hardware support?
Say by using interrupts, i.e., when the hardware needs to
update the stack or the counters, it issues an interrupt and an
ISR does the update?
NO. Too costly, it will slow every memory reference by a factor
of at least 10
Even LRU (which approximates OPT) is not easy to
implement without hardware support!
Operating System Concepts – 7th Edition, Feb 22, 2005
9.24
Silberschatz, Galvin and Gagne ©2005
Second-chance Replacement
An approximation of LRU, also known as Clock replacement
Each page has a reference bit (ref_bit), initially = 0
When page is referenced, ref_bit is set to 1 (by hardware)
Maintain a moving pointer to the next (candidate) victim
When choosing a page to replace, check the ref_bit of the
next victim,
if
ref_bit == 0, replace it
else
set ref_bit to 0 (and leave the page in memory),
move the pointer to next page, and repeat till a victim is
found
Operating System Concepts – 7th Edition, Feb 22, 2005
9.25
Silberschatz, Galvin and Gagne ©2005
Second-Chance (Clock) Page-Replacement Algorithm
Operating System Concepts – 7th Edition, Feb 22, 2005
9.26
Silberschatz, Galvin and Gagne ©2005
Counting Page Replacement Algorithms
Keep a counter of the number of references that have been made to
each page
LFU Algorithm: replace page with smallest count
Problem: some pages were heavily used at earlier time, but are
no longer needed, will stay (and waste) memory
MFU Algorithm: replace page with highest count
Argument: page with the smallest count was probably just
brought in and has yet to be used
Problem: consider a code that uses a module or a subroutine
heavily, MFU will consider it a good candidate for eviction!
There are situations in which LFU and MFU could be helpful
E.g., consider a database code that reads many pages then
processes them. Even though the read module accumulated
large frequency, we need to evict its pages during processing
(MFU would be better than LRU here)
Operating System Concepts – 7th Edition, Feb 22, 2005
9.27
Silberschatz, Galvin and Gagne ©2005
Allocation of Frames
Each process needs a minimum number of pages
Defined by the computer architecture (hardware)
instruction width and number of address indirection levels
Consider an instruction that takes one operand and allows one level
of indirection. What is the min number of frames needed to execute
it?
load [addr]
Answer: 3 (load is in a page, addr is in another, [addr] is in a third)
Note: Maximum number of frames allocated to a process is
determined by the OS
Operating System Concepts – 7th Edition, Feb 22, 2005
9.28
Silberschatz, Galvin and Gagne ©2005
Frame Allocation
Equal allocation: All processes get the same number of frames
m frames, n processes each process gets m/n frames
Proportional allocation: Allocate according to the size of process
si size of process pi
m 64
m total number of frames
s1 10 a1
ai allocation for pi
si
s
10
64 5
137
127
s2 127 a2
64 59
137
m
i
Priority: Use proportional allocation using priorities rather than size
Operating System Concepts – 7th Edition, Feb 22, 2005
9.29
Silberschatz, Galvin and Gagne ©2005
Global vs. Local Replacement
If a page fault occurs and there is no free frame, we need to free
Global replacement
Process selects a replacement frame from the set of all frames;
one process can take a frame from another
Pros: Better throughput (process can use any available frame)
Cons: a process cannot control its own page-fault rate
Commonly used in operating systems
Local replacement
Each process selects from only its own set of allocated frames
Pros: Each process has its own share of frames; not impacted
by the paging behavior of others
Cons: a process may suffer from high page-fault rate even
though there are lightly used frames allocated to other processes
Operating System Concepts – 7th Edition, Feb 22, 2005
9.30
Silberschatz, Galvin and Gagne ©2005
Thrashing
What will happen if a process does not have “enough” frames
to maintain its active set of pages in memory?
The page-fault rate is very high. This leads to:
low CPU utilization, which
makes the OS think that it needs to increase the degree of
multiprogramming, thus
OS admits another process to the system (making it worse!)
Thrashing a process is busy swapping pages in and out more
than executing
Operating System Concepts – 7th Edition, Feb 22, 2005
9.31
Silberschatz, Galvin and Gagne ©2005
Thrashing (cont'd)
Operating System Concepts – 7th Edition, Feb 22, 2005
9.32
Silberschatz, Galvin and Gagne ©2005
Thrashing (cont’d)
To prevent thrashing, we should provide each process with as
many frames as it needs
How do we know how many frames a process actually needs?
A program is usually composed of several functions or
modules
When executing a function, memory references are made to
instructions and local variables of that function and some
global variables
So, we may need to keep in memory only the pages needed
to execute the function
After finishing a function, we execute another. Then, we bring
in pages needed by the new function
This is called the Locality Model
Operating System Concepts – 7th Edition, Feb 22, 2005
9.33
Silberschatz, Galvin and Gagne ©2005
Locality Model
The Locality Model states that
As a process executes, it moves from locality to locality, where a
locality is a set of pages that are actively used together
Notes
locality is not restricted to functions/modules; it is more general.
It could be a segment of the code in the same function, e.g.,
loop touching data/instructions in several pages
Localities may overlap
Locality is a major reason behind the success of demand paging
How can we know the size of a locality?
Using the Working-Set model
Operating System Concepts – 7th Edition, Feb 22, 2005
9.34
Silberschatz, Galvin and Gagne ©2005
Working-Set Model
Let be a fixed number of page references
called working-set window
The set of pages in the most recent page references is the working
set
Example: = 10
Size of WS at t1 is 5 pages, and at t2 is 2 pages
Operating System Concepts – 7th Edition, Feb 22, 2005
9.35
Silberschatz, Galvin and Gagne ©2005
Working-Set Model (cont’d)
The accuracy of the WS model depends on choosing
if is too small, it will not encompass entire locality
if is too large, it will encompass several localities
if = it will encompass entire program
Using the WS model
OS monitors the WS of each process
It allocates number of frames = WS size to that process
If we have more memory frames available, another process can
be started
Operating System Concepts – 7th Edition, Feb 22, 2005
9.36
Silberschatz, Galvin and Gagne ©2005
Keeping Track of the Working Set
WS is a moving window
At each memory reference, a new reference is added at one
end, and another is dropped off the other end
Maintaining the entire window is costly
Solution: Approximate with interval timer + a reference bit
Example: = 10,000 references
Timer interrupts every 5,000 references
Keep in memory 2 bits for each page
Whenever a timer interrupts, copy and sets the values of all
reference bits to 0
If one of the bits in memory = 1 page in working set
Operating System Concepts – 7th Edition, Feb 22, 2005
9.37
Silberschatz, Galvin and Gagne ©2005
Control Thrashing Using WS Model
WSSi the working set size of process Pi
Total number of pages referenced in the most recent
m memory size in frames
D = WSSi total demand frames
if D > m Thrashing
Policy if D > m, then suspend one of the processes
But, marinating WS is costly. Is there an easier way to control
thrashing?
Operating System Concepts – 7th Edition, Feb 22, 2005
9.38
Silberschatz, Galvin and Gagne ©2005
Control Thrashing Using Page-Fault Rate
Monitor page-fault rate and increase/decrease allocated frames accordingly
Establish “acceptable” page-fault rate range (upper and lower bound)
If actual rate too low, process loses frame
If actual rate too high, process gains frame
Operating System Concepts – 7th Edition, Feb 22, 2005
9.39
Silberschatz, Galvin and Gagne ©2005
Allocating Kernel Memory
Treated differently from user memory, why?
Kernel requests memory for structures of varying sizes
Process descriptors (PCB), semaphores, file descriptors, …
Some of them are less than a page
Some kernel memory needs to be contiguous
some hardware devices interact directly with physical
memory without using virtual memory
Virtual memory may just be too expensive for the kernel (cannot
afford a page fault)
Often, a free-memory pool is dedicated to the kernel from which it
allocates the needed memory using:
Buddy system, or
Slab allocation
Operating System Concepts – 7th Edition, Feb 22, 2005
9.40
Silberschatz, Galvin and Gagne ©2005
Buddy System
Allocates memory from fixed-size segment consisting of physically-
contiguous pages
Memory allocated using power-of-2 allocator
Satisfies requests in units sized as power of 2
Request rounded up to next highest power of 2
When smaller allocation needed than is available, current
chunk split into two buddies of next-lower power of 2
Fragmentation: 17 KB request will be rounded to 32 KB!
Continue until appropriate sized chunk available
Adjacent “buddies” are combined (or coalesced) together to
form a large segment
Used in older Unix/Linux systems
Operating System Concepts – 7th Edition, Feb 22, 2005
9.41
Silberschatz, Galvin and Gagne ©2005
Buddy System Allocator
Operating System Concepts – 7th Edition, Feb 22, 2005
9.42
Silberschatz, Galvin and Gagne ©2005
Slab Allocator
Slab allocator
Creates caches, each consisting of one or more slabs
Slab is one or more physically contiguous pages
Single cache for each unique kernel data structure
Each cache is filled with objects – instantiations of the data
structure
Objects are initially marked as free
When structures stored, objects marked as used
Benefits
Fast memory allocation, no fragmentation
Used in Solaris, Linux
Operating System Concepts – 7th Edition, Feb 22, 2005
9.43
Silberschatz, Galvin and Gagne ©2005
Slab Allocation
Operating System Concepts – 7th Edition, Feb 22, 2005
9.44
Silberschatz, Galvin and Gagne ©2005
VM and Memory-Mapped Files
VM enables mapping a file to memory address space of a process
How?
A page-sized portion of the file is read from the file system into
a physical frame
Subsequent reads/writes to/from file are treated as ordinary
memory accesses
Example: mmap() on Unix systems
Why?
I/O operations (e.g., read(), write()) on files are treated as
memory accesses Simplifies file handling (simpler code)
More efficient: memory accesses are less costly than I/O
system calls
One way of implementing shared memory for inter-process
communication
Operating System Concepts – 7th Edition, Feb 22, 2005
9.45
Silberschatz, Galvin and Gagne ©2005
Memory-Mapped Files and Shared Memory
Memory-mapped files allow several processes to map the same file
Allowing the pages in memory to be shared
Win XP implements shared memory using this technique
Operating System Concepts – 7th Edition, Feb 22, 2005
9.46
Silberschatz, Galvin and Gagne ©2005
Other Issues: Pre-paging
Page size selection impacts
fragmentation
page table size
I/O overhead
locality
Prepaging
Prepage all or some of the pages a process will need, before
they are referenced
Tradeoff
Reduce number of page faults at process startup
But, may waste memory and I/O because some of the
prepaged pages may not be used
Operating System Concepts – 7th Edition, Feb 22, 2005
9.47
Silberschatz, Galvin and Gagne ©2005
Other Issues – Program Structure
Program structure
int data [128][128];
Each row is stored in one page; allocated frames <128
How many page faults in each of the following programs?
Program 1
for (j = 0; j <128; j++)
for (i = 0; i < 128; i++)
data[i][j] = 0;
#page faults: 128 x 128 = 16,384
Program 2
for (i = 0; i < 128; i++)
for (j = 0; j < 128; j++)
data[i][j] = 0;
#page faults: 128
Operating System Concepts – 7th Edition, Feb 22, 2005
9.48
Silberschatz, Galvin and Gagne ©2005
Other Issues – I/O interlock
Example Scenario
A process allocates a buffer for an I/O request (in its own address space)
The process issues an I/O request and waits (blocks) for it
Meanwhile, CPU is given to another process, which makes a page fault
The (global) replacement algorithm chooses the page that contains the
buffer as a victim!
Later, the I/O device sends an interrupt signaling the request is ready
BUT the frame that contains the buffer is now used by a different
process!
Solutions
Lock the (buffer) page in memory (I/O Interlock)
Make I/O in kernel memory (not in user memory): data is first transferred
to kernel buffers then copied to user space
Note: page locking can be used in other situations as well, e.g., kernel pages
are locked in memory
Operating System Concepts – 7th Edition, Feb 22, 2005
9.49
Silberschatz, Galvin and Gagne ©2005
OS Example: Windows XP
Uses demand paging with clustering
Clustering brings in pages surrounding the faulting page
Processes are assigned working set minimum and working set
maximum
Working set minimum is the minimum number of pages the process
is guaranteed to have in memory
A process may be assigned as many pages up to its working set
maximum
When the amount of free memory in the system falls below a
threshold, automatic working set trimming is performed to
restore the amount of free memory
Working set trimming removes pages from processes that have
pages in excess of their working set minimum
Operating System Concepts – 7th Edition, Feb 22, 2005
9.50
Silberschatz, Galvin and Gagne ©2005
Summary
Virtual memory: A technique to map a large logical address space
onto a smaller physical address space
Uses demand paging: bring pages into memory when needed
Allows: running large programs in small physical memory, page
sharing, efficient process creation, and simplifies programming
Page fault: occurs when a referenced page is not in memory
Page replacement algorithms: FIFO, OPT, LRU, second-chance, …
Frame allocation: proportional, global and local page replacement
Thrashing: process does not have sufficient frames too many page
faults poor CPU utilization
Locality and working-set models
Kernel memory: buddy system, slab allocator
Many issues and tradeoffs: page size, pre-paging, I/O interlock, ….
Operating System Concepts – 7th Edition, Feb 22, 2005
9.51
Silberschatz, Galvin and Gagne ©2005