CHAP8

Transcript CHAP8

Virtual Memory
Chapter 8
Memory Hierarchy
CPU
regs
Register
size:
speed:
$/Mbyte:
line size:
8B
C
a
c
h
e
32 B
Cache
Memory
4 KB
Memory
disk
Disk Memory
4B x 32
1 ns
32 KB-8MB
2 ns
2-3 GB
30 ns
100-500 GB
8 ms
$0.20/MB
4 KB
$0.001/MB
4B
$125/MB
32 B
2012
Larger … slower … cheaper …
Types of Memory
• Real memory
– Main memory
• Virtual memory
– Memory on disk
– Allows for effective multiprogramming and relieves the
user of tight constraints of main memory
Virtual Memory Properties
• A process may be broken up into pieces that do not need to be located
contiguously in main memory
• No need to load all pieces of a process in main memory
• A process may be swapped in and out of main memory such that it
occupies different regions
• Memory references are dynamically translated into physical addresses at
run time
• Advantages:
– More processes may be maintained in main memory
– With so many processes in main memory, it is very likely a process
will be in the Ready state at any particular time (CPU efficiency)
– A process may be larger than all of main memory
A System with Virtual Memory (paging)
• Each process has its own page table
• Each page table entry contains the frame number of the
corresponding page in main memory or the disk address
0
1
Page Table
Virtual
Addresses
0:
1:
Memory
Physical
Addresses
CPU
P-1:
N-1
Disk
Address Translation:
Hardware converts virtual addresses to physical addresses via page table
VM Address Translation
Parameters
P = 2p = page size (bytes)
N = 2n = Virtual address limit
M = 2m = Physical address limit
n–1
p p–1
virtual page number
0
virtual address
page offset
address translation
m–1
p p–1
physical page number
page offset
0
physical address
Page offset bits don’t change as a result of translation
Execution of a Program
• Operating system brings into main memory a few pieces of the program
Resident set - portion of process that is in main memory
• An interrupt is generated when a page is needed that is not in main
memory - page fault
• Operating system places the process in a blocking state and a DMA is
started to get the page from disk
• An interrupt is issued when disk I/O is complete which causes the OS to
place the affected process in the Ready Queue
Thrashing –
• Swapping out a piece of a process just before that piece is needed
• The processor spends most of its time swapping pieces rather than
executing user instructions
HERE
Draw a typical RAM and no free pages
To explain Thrashing
Principle of Locality
• Program and data references within a process tend to cluster
• Only a small portion of a process will be needed over a short
period of time
• Possible to make intelligent guesses about which pieces will
be needed in the future
• Therefore, principle of locality can be used to make virtual
memory to work efficiently
Page Fault


Page table entry indicates virtual address not in memory
OS exception handler invoked to move data from disk into memory


current process suspends, others can resume
OS has full control over placement, etc.
Before fault
After fault
Memory
Memory
Page Table
CPU
Virtual
Address
Page Table
Physical
Address
CPU
Disk
Disk
Servicing a Page Fault

Processor Signals Controller


Read block of length P starting
at disk address X and store
starting at memory address Y
Read Occurs

(1) Initiate Block Read
Processor
Reg
(3) Read Done
Cache
Direct Memory Access (DMA)
under control of I/O controller
Memory-I/O bus

I/O Controller Signals
Completion


Interrupt processor
OS resumes suspended process
(2) DMA
Transfer
Memory
I/O
controller
disk
Disk
disk
Disk
Direct Memory Access


Takes control of the system from the CPU to transfer data to
and from memory over the system bus
Only one bus master, usually the DMA controller due to tight
timing constraints.

Cycle stealing is used to transfer data on the system bus. i.e.

No interrupts occur (except at the very end)
the instruction cycle is suspended so that data can be
transferred by DMA controller in bursts. CPU can only access
the bus between these bursts
Support Needed for Paging
• Hardware support for address translation is critically needed
for paging
• OS must be able to move pages efficiently between
secondary memory and main memory (DMA takes care of it)
• Software/Hardware support to handle page faults
Page Tables
Virtual Page
Number
Memory resident
page table
(physical page
Valid or disk address)
1
1
0
1
1
1
0
1
0
1
Physical Memory
Disk Storage
(swap file or
regular system file)
Address Translation
virtual address
page table base register
VPN acts as
table index
n–1
p p–1
virtual page number (VPN)
page offset
0
valid modify physical page number (PPN)
if valid=0
then page
not in memory
m–1
p p–1
physical page number (PPN)
0
page offset
physical address
Page Table Entries




Present bit (P) == valid bit
Modify bit (M) == dirty bit is needed to indicate if the page
has been altered since it was last loaded into main memory
If no change has been made, the page does not have to be
written back to the disk when it needs to be swapped out
In addition, a Reference (or use) Bit can be used to indicate if
the page is referenced (read) since it was last checked.
This can be particularly useful for Clock Replacement Policy (to
be studied later)
Translation Lookaside Buffer
• Each virtual memory reference can cause two physical memory
accesses
– one to fetch the page table entry
– one to fetch the data
• To overcome this problem a high-speed cache is set up for page
table entries
– called the TLB - Translation Lookaside Buffer
• TLB stores most recently used page table entries. Stores VP numbers
and the mapping for it. Uses associative mapping (i.e. many virtual
page numbers map to the same TLB index).
• Given a virtual address, processor examines the TLB first
• If page table entry is present (a hit), the frame number is
retrieved and the real address is formed
• If page table entry is not found in the TLB (a miss), the page
number is used to index the page table, and TLB is updated to
include the new page entry
Use of a TLB (Translation Lookaside Buffer)
Multi-level Page Tables

Given:




Level 2
Tables
4KB (212) page size
32-bit address space
4-byte PTE
Problem:

(4 MB)
Level 1
Root
Table
(4 KB)
Would need a 4 MB page table!
220 *4 bytes

Common solution


multi-level page tables e.g., 2-level table

Level 1 table: 1024 entries, each of
which points to a Level 2 page table.

Level 2 table: 1024 entries, each of
which points to a page
Page tables are stored in VM
...
Page Size ?
• Smaller page size  less amount of internal fragmentation
(due to the last page of a process)
• Smaller page size  More pages required per process
 Larger page tables
 Page tables in virtual memory
(i.e. double page faults possible)
• Also, secondary memory is designed to efficiently transfer
large blocks of data which favors a larger page size.
Example Page Sizes
In Linux, type “getconf PAGESIZE”
Fetch Policy
• Determines when a page should be brought into memory
• Demand paging: bring pages into main memory only when it is referenced
– Many page faults when the process first started
• Prepaging brings in more pages than needed
– More efficient to bring in pages that reside contiguously on the disk
Replacement Policy
• If memory is full, which page should be replaced?
– Page that is least likely to be referenced in the near future
– Most policies predict the future behavior on the basis of past behavior
• Frame Locking
– Associate a lock bit with each frame. If frame is locked, it may not be replaced
– Kernel, Control structures, I/O buffers
Basic Replacement Algorithms
• Optimal policy (OPT)
– Selects the page for which the time to the next reference is the longest
– Impossible to have perfect knowledge of future events
• First-In, First-Out (FIFO)
– Simplest to implement
– Page that has been in memory the longest is replaced (may be needed again soon!)
• Least Recently Used (LRU)
– Replaces the page that has not been referenced for the longest time
– By the principle of locality, this should be the page least likely to be
referenced in the near future
– Each page could be tagged with the time of last reference (extra overhead!)
• Clock Policy
–
–
–
–
–
Additional bit called  use bit
When a page is first loaded in memory  use bit = 1
When the page is referenced  use bit = 1
Going clockwise, the first frame encountered with the use bit = 0 is replaced.
During the search for replacement, use bit is changed from 1 to 0
Page Replacement
Page Refs
A
B
C
D
A
B
E
A
B
C
D
E
A
A
B
A
B
C
A
B
C
D
A
B
C
D
A
B
C
D
E
B
C
D
E
A
C
D
E
A
B
D
E
A
B
C
D
A
B
C
D
E
B
C
10
faults
A
B
A
B
C
A
B
C
D
A
B
C
D
A
B
C
D
A
B
C
E
A
B
C
E
A
B
C
E
A
B
C
E
D
B
C
E
D
B
C
E
6
faults
A
B
A
B
C
A
B
C
D
A
B
C
D
A
B
C
D
A
B
E
D
A
B
E
D
A
B
E
D
A
B
E
C
A
B
D
C
E
B
D
C
8
faults
FIFO
A
OPT
A
LRU
Example of a clock policy operation
Resident Set Size
• Fixed-allocation
– gives a process a fixed number of pages within which to execute
– when a page fault occurs, one of the pages of that process is replaced
• Variable-allocation
– number of pages allocated to a process varies over the lifetime of the process
Variable Allocation, Global Scope
•
•
•
•
Easiest to implement; Adopted by many OS
OS keeps list of free frames
When a page fault occurs, a free frame is added to the resident set of this process
If no free frame, replaces one from any process (replacement algorithm picks!)
Variable Allocation, Local Scope
• When a new process is added, allocate to it a number of page frames based on the
application type, or other criteria (resident set)
• page fault  select a page for replacement from the resident set of this process
• From time to time; increase/decrease the resident set size to improve performance
Load Control & Process Suspension
• Load Control determines the optimum number of processes to be resident in
main memory
• As multiprogramming level increases, processor utilization rises, but
Too many processes  small resident set for each process  thrashing
Too few processes
 higher probability for all processes to be blocked
 idle CPU time increases
If multiprogramming level is to be reduced, some processes must be suspended…
Here are some good choices for suspension:
• Lowest priority process
• Faulting process (will soon be blocked anyway)
– high probability that it does not have its working set in main memory
• Last process activated (least likely to have its working set resident)
• Process with smallest resident set
– this process requires the least future effort to reload
• Largest process
– generates the most free frames making future suspensions unlikely

CHAP8

Transcript CHAP8

Directory