Paging - METU Computer Engineering

Download Report

Transcript Paging - METU Computer Engineering

Memory Management
•
•
•
•
•
•
•
•
Background
Logical versus Physical Address Space
Swapping
Contiguous Allocation
Virtual memory Management
Paging
– Background
– Implementation of Paging
– Demand Paging
– Page-Replacement Algorithms
– Thrashing
Segmentation
Segmentation with Paging
1
1999
Background
•
•
•
Program must be brought into memory and placed within a process for it to
be executed.
Input queue – collection of processes on the disk that are waiting to be
brought into memory for execution.
Address binding of instructions and data to memory addresses can
happen at three different stages.
– Compile time: If memory location known a priori, absolute code can be
generated; must recompile code if starting location changes.
– Load time: Must generate relocatable code if memory location is not
known at compile time.
– Execution time: Binding delayed until run time if the process can be
moved during its execution from one memory segment to another.
Need hardware support for address maps (e.g., base and limit
registers).
2
1999
Dynamic Loading and Linking
•
•
Loading
– Routine is not loaded until it is called
– Better memory-space utilization; unused routine is never loaded.
– Useful when large amounts of code are needed to handle
infrequently occurring cases.
Linking
– Linking postponed until execution time.
– Small piece of code, stub, used to locate the appropriate
memory-resident library routine.
– Stub replaces itself with the address of the routine, and executes
the routine.
– Operating system needed to check if routine is in processes’
memory address.
3
1999
Overlays
•
•
•
•
An early solution to the problem of accommodating programs that are
too large to fit in main memory is that of overlays: splitting the
programs into pieces and one piece loading another.
The responsibility of splitting to minimize swaps is that of the
user/programmer.
Keep in memory only those instructions and data that are needed at
any given time.
No special support needed from operating system, programming
design of overlay structure is complex. User arranged the programs in
a execution tree. Only the path to the root needs to be resident.
A
C
D
E
F
G
4
E
1999
Logical (or virtual) vs. Physical Address
Space
•
•
•
•
The concept of a logical address space that is bound to a physical
address space is central to proper memory management.
– Logical address – generated by the CPU unique to individual
programs; also referred to as virtual address. Each program
has its own virtual address space.
– Physical address – address seen by the memory unit, unique
to the system. Normally, two virtual addresses will not bound
to the same physical address, if it is not specially intended.
Compile time address binding involves logical addressing only.
Logical and physical addresses are the same in load-time addressbinding.
Logical (virtual) and physical addresses differ in execution-time
address-binding scheme.
Any instruction execution involves real memory location to
complete, if memory address is an operand.
5
1999
Memory-Management Unit (MMU)
•
•
•
•
MMU is a hardware device that maps virtual to physical address.
For example, in a partitioned memory allocation scheme, the
MMU adds a value in the relocation register (starting address of
the segment allocated to the program) to every address (offset)
generated by a user process at the time it is sent to memory.
The user program deals with logical addresses; it never sees the
real physical addresses.
Operating systems see the virtual address of the user programs
as well as the physical memory address, so that certain MMU
registers are loaded by proper values, during the program
execution.
6
1999
Contiguous Memory Allocation
•
•
Main memory usually into two parts:
– Resident operating system, usually held in low memory with
interrupt vector.
– User processes then held in high memory.
Partitioning schemes: Fixed size partitioning, variable size
partitioning. In both schemes
– Relocation-register scheme used to protect user processes
from each other, and from changing operating-system code
and data.
– Relocation register contains value of smallest physical
address; limit register contains range of logical addresses –
each logical address must be less than the limit register.
7
1999
Contiguous Memory Allocation (Cont.)
•
Variable size Multiple-partition allocation
– Hole – block of available memory; holes of various size are
scattered throughout memory.
– When a process arrives, it is allocated memory from a hole
large enough to accommodate it.
– Operating system maintains information about:
a) allocated partitions b) free partitions (hole)
OS
OS
OS
OS
process 5
process 5
process 5
process 5
process 9
process 9
process 8
process 2
process 10
process 2
process 2
8
process 2
1999
Contiguous Allocation (Cont.):Dynamic
Storage-Allocation Problem
How to satisfy a request of size n from a list of free holes,
maintained by a list?
•First-fit:
Allocate the first hole that is big enough. Next fit is its
variation.
•Best-fit:
Allocate the smallest hole that is big enough; must
search entire list, unless ordered by size. Produces the smallest
leftover hole.
•Worst-fit:
Allocate the largest hole; must also search entier list.
Produces the largest leftover hole.
•Quick fit: separate list for each size is maintained, with this
merge is expensive
First-fit and best-fit seem to perform better than worst-fit in terms of
speed and storage utilization.
*Note that the list could be linear or linked.
9
1999
Fragmentation
•
External fragmentation – total memory space exists to satisfy a
request, but it is not contiguous.
•
Internal fragmentation – allocated memory may be slightly larger than
requested memory; this difference in memory is internal to a
partition, but not being used.
•
Reduce external fragmentation by compaction
– Shuffle memory contents to place all free memory together in
one large block.
– Compaction is possible only if relocation is dynamic, and is done
at execution time.
– I/O problem
 Lock job in memory while it is involved in I/O.
 Do I/O only into OS buffers.
10
1999
Modeling multi-Programming
•
•
If p is the fraction of the time a process spends in I/O wait state,
then
CPU utilization with n processes in the system = 1 - pn
==> higher the multiprogramming ==> better is the CPU utilization.
– Balancing of n based on p: a mix of CPU- versus I/Ointensive jobs
•
•
Fifty percent rule: On the average (over time), if the mean
number of processes in memory is n, the mean number of holes is
n/2.
Unused memory rule: If k is the ratio of the average size of a
process to that of a hole, then the fraction of memory occupied by
holes, f = k/(k+2).
– Computation of f: if s=process size, h=hole size==> k=h/s,
hole memory H=ks(n/2)=m-ns, ==> m=ns(k/2 +1), from f=h/m,
f= (ksn/2)/m, thus, f=k/(k+2).
11
1999
How to keep track of memory usage in
variable size partition case
•
•
•
1. Bit Maps
2. Linked Lists
3. Buddy System
Memory Management with Bit Maps
Divide up the memory into allocation units.
Corresponding to each allocation unit is a bit in the bit map
that is 0/1 (allocated/free).
Issues: Size of the allocation unit: too small versus too large
Allocation efficiency of large chunks of memory
Memory Management with Linked Lists
Just keep track of the end points of allocated and free memory
segments.
•
Memory Management with Buddy System
Deal with the memory as segments of size 2k for some positive k.
Buddy Splitting and coalescing. It is fast, but suffers from both
internal and external fragmentation.
12
1999
Swapping
•
A process can be swapped temporarily out of memory to a
backing store, and then brought back into memory for continued
execution.
•
Backing store – fast disk large enough to accommodate copies of
all memory images for all users; must provide direct access to
these memory images.
•
Roll out, roll in – swapping variant used for priority-based
scheduling algorithms; lower-priority process is swapped out so
higher-priority process can be loaded and executed.
•
•
Major part of swap time is transfer time; total transfer time is
directly proportional to the amount of memory swapped.
Modified and modern versions of swapping are found on many
systems, i.e., UNIX and Microsoft Windows.
13
1999
Schematic View of Swapping
14
1999
Virtual Memory (VM)
Content
• Background
• Paging
–
–
–
–
•
•
•
•
•
valid-invalid bits
page-faults
address translation
multilevel page table
Demand Paging and its performance
Page Replacement and Page-Replacement Algorithms
Allocation of Frames
Thrashing
Segmentation: pure segmentation, Segmentation with paging
15
1999
VM: Background
•
•
Basic idea behind the virtual memory is that the combined size of
program, data, and stack may exceed the the amount of physical
memory available for it.
The operating system uses physical memory as well as secondary
storage to solve the problem. Paging seems to be a state of art
universal method. Process is allocated physical memory whenever
the latter is available.
Paging
•
•
•
•
•
•
Divide physical memory into fixed-sized blocks called frames (size is
power of 2, between 512 bytes and 8192 bytes).
Divide logical memory into blocks of same size called pages.
Keep track of all free frames.
To run a program of size n pages, need to find n free frames and load
program.
Set up a page table to translate logical to physical addresses.
Internal fragmentation.
16
1999
Paging: Valid-Invalid Bit
•
•
•
With each page table entry a valid–invalid bit is associated
(1  in-memory, 0  not-in-memory)
Initially valid–invalid but is set to 0 on all entries.
Example of a page table snapshot.
Frame #
valid-invalid bit
1
1
1
1
0

0
0
•
page table
During address translation, if valid–invalid bit in page table entry
is 0  page fault.
17
1999
Paging: Page Fault
•If there is ever a reference to a page, first reference will trap to
OS  page fault
•OS looks at another table to select an empty frame:
•Swap page into frame.
•Reset page’s, validation bit to 1.
•Restart instruction
If there is no free page
•Page replacement – find some page in memory, but not really in use
•For performance reasons use an algorithm which will result in
minimum number of page faults.
•Same page may be brought into memory several times.
18
1999
Address Translation Scheme
•
•
•
•
•
Address generated by CPU is divided into:
– Page number (p) – used as an index into a page table which
contains base address of each page in physical memory.
– Page offset (d) – combined with base address to define the
physical memory address that is sent to the memory unit.
The base address is the address of the frame to which the page is
mapped.
If the virtual page is already in the memory, the mapping is straight
forward and very efficient.
If the page is not in the memory, first a page fault occurs, after
which the address mapping is as above.
See the next illustration for clearity:
19
1999
Address Translation Architecture
20
1999
Paging Example
21
1999
Implementation of Page Table
•
•
•
•
•
Page table is kept in main memory.
Page-table base register (PTBR) points to the page table.
Page-table length register (PRLR) indicates size of the page
table.
In this scheme every data/instruction access requires two memory
accesses. One for the page table and one for the data/instruction.
The two memory access problem can be solved by the use of a
special fast-lookup hardware cache called associative registers or
translation look-aside buffers (TLBs)
22
1999
Associative Register
•
Associative registers – parallel search
Page #
•
Frame #
Address translation (A´, A´´)
– If A´ is in associative register, get frame # out.
– Otherwise get frame # from page table in memory
•
Valid-invalid bit attached to each entry in the page table:
– “valid” indicates that the associated page is in the process’
logical address space, and is thus a legal page.
– “invalid” indicates that the page is not in the process’ logical
address space.
23
1999
Effective Access Time
•
•
•
•
Associative Lookup =  time unit
Assume memory cycle time is 1 microsecond
Hit ratio – percentage of times that a page number is found in the
associative registers = 
Effective Access Time (EAT)
EAT = f(PageTableAccessTime,MemoryAccessTime)
EAT = (1 + )  + (2 + )(1 – )
=2+–
24
1999
Two-Level Page-Table Scheme
25
1999
Two-Level Paging Example
•
•
•
A logical address (on 32-bit machine with 4K page size) is divided
into:
– a page number consisting of 20 bits.
– a page offset consisting of 12 bits.
Since the page table is paged, the page number is further divided
into:
– a 10-bit page number.
– a 10-bit page offset.
Thus, a logical address is as follows:
page number page offset
pi
p2
d
10
10
12
where pi is an index into the outer page table, and p2 is the
displacement within the page of the outer page table.
26
1999
Address-Translation Scheme
•
Address-translation scheme for a three-level 32-bit paging
architecture. SegmentTable(s1),
FirstLevelSegmentPageTable(s2),
SecondLevelSegmentPageTable(d1), PageOffset(d2),
27
1999
Multilevel Paging and Performance
•
•
•
Since each level is stored as a separate table in memory,
mapping a logical address to a physical one may take four
memory accesses.
Even though time needed for one memory access is theoretically
four times as much, caching permits performance to remain
reasonable.
Cache hit rate of 98 percent yields: mem.access=100 nsec,
cache.access=20 nsec,
effective access time = 0.98 x (100+20) + 0.02 x (400+20)
= 126 nanoseconds.
which is only a 26 percent slowdown in memory access time.
28
1999
Inverted Page Table
•
•
•
•
One entry for each real page of memory.
Entry consists of the virtual address of the page stored in that real
memory location, with information about the process that owns
that page.
Decreases memory needed to store each page table, but
increases time needed to search the table when a page reference
occurs.
Use hash table to limit the search to one — or at most a few —
page-table entries.
29
1999
Inverted Page Table Architecture
30
1999
Shared Pages
•
•
Shared code
– One copy of read-only (reentrant) code shared among
processes (i.e., text editors, compilers, window systems).
– Shared code must appear in same location in the logical
address space of all processes.
Private code and data
– Each process keeps a separate copy of the code and data.
– The pages for the private code and data can appear
anywhere in the logical address space.
31
1999
Shared Pages Example
32
1999
Virtual Memory: Demand Paging
•
Bring a page into memory only when it is needed.
– Less I/O needed
– Less memory needed
– Faster response
– More users
•
Page is needed  reference to it
– invalid reference  abort
– not-in-memory  bring to memory
•
Page replacement – find some page in memory, but not really I
use, swap it out:
– algorithm
– performance – want an algorithm which will result in minimum
number of page faults.
•
Same page may be brought into memory several times.
33
1999
Performance of Demand Paging
•
•
Page Fault Rate 0  p  1.0
– if p = 0 no page faults
– if p = 1, every reference is a fault
Effective Access Time (EAT)
EAT = (1 – p) x memory access
+ p (page fault overhead
+ [swap page out ]
+ swap page in
+ restart overhead)
34
1999
Demand Paging Example
•
•
•
•
Memory access time = 1 microsecond
50% of the time the page that is being replaced has been
modified and therefore needs to be swapped out.
Swap Page Time = 10 msec = 10,000 msec
Computation of Effective Access Time
EAT = (1 – p) x 1 + p (15000)
= 1 + 15000P
(in msec)
= 7501 msec
35
1999
Page Replacement
•
•
•
•
•
Prevent over-allocation of memory by modifying page-fault service
routine to include page replacement.
Use modify (dirty) bit to reduce overhead of page transfers – only
modified pages are written to disk.
Page replacement completes separation between logical memory
and physical memory – large virtual memory can be provided on a
smaller physical memory.
Want lowest page-fault rate
– Evaluate algorithm by running it on a particular string of
memory references (reference string) and computing the
number of page faults on that string.
In all examples, the reference string is taken as
1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5.
36
1999
First-In-First-Out (FIFO) Algorithm
•
•
•
•
Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
3 frames (3 pages can be in memory at a time per process)
1
1
4
5
2
2
1
3
3
3
2
4
1
1
5
4
2
2
1
5
3
3
2
4
4
3
9 page faults
4 frames
10 page faults
FIFO Replacement – Belady’s Anomaly
– more frames  less page faults
37
1999
Optimal Algorithm
•
•
Replace page that will not be used for longest period of time.
4 frames example
1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
1
4
2
6 page faults
3
4
•
•
5
How do you know this?
Used for measuring how well your algorithm performs.
38
1999
Least Recently Used (LRU) Algorithm
•
Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
1
5
2
•
•
3
5
4
3
4
Counter implementation
– Every page entry has a counter; every time page is
referenced through this entry, copy the clock into the counter.
– When a page needs to be changed, look at the counters to
determine which are to change.
Stack implementation – keep a stack of page numbers in a double
link form: Move the page referenced to the top:requires 6 pointers
to be changed. No search for replacement
39
1999
LRU Approximation Algorithms
•
•
Reference bit
– With each page associate a bit, initially -= 0
– When page is referenced bit set to 1.
– Replace the one which is 0 (if one exists). We do not know
the order, however.
Second chance
– Need reference bit.
– Clock replacement.
– If page to be replaced (in clock order) has reference bit = 1.
then:
 set reference bit 0.
 leave page in memory.
 replace next page (in clock order), subject to same
rules.
40
1999
Counting Algorithms
•
•
•
Keep a counter of the number of references that have been made
to each page.
LFU Algorithm: replaces page with smallest count.
MFU Algorithm: based on the argument that the page with the
smallest count was probably just brought in and has yet to be
used.
41
1999
Allocation of Frames
•
•
•
•
Each process needs a minimum number of pages to be in the
memory to progress in execution.
Example: IBM 370 – 6 pages to handle SS MOVE instruction:
– instruction is 6 bytes, might span 2 pages.
– 2 pages to handle from.
– 2 pages to handle to.
Two major allocation schemes.
– fixed allocation: equal or proportional to the size of the
program.
– priority allocation: select the page to be replaced from lower
priority process or proportional to priority.
Global vs. Local replacement: choosing from the entire space vs.
choosing from the self address space.
42
1999
Thrashing
•
If a process does not have “enough” pages, the page-fault rate is
very high. This leads to:
– low CPU utilization.
– operating system thinks that it needs to increase the degree of
multiprogramming.
– another process added to the system.
•
Thrashing  a process is busy swapping pages in and out.
43
1999
Working-Set Model
•
•
•
•
•
•
•
Locality model: Process migrates from one locality to another.
Localities may overlap. Thrashing occurs, because
 size of locality > total memory size
  working-set window  a fixed number of page references
Example: 10,000 instruction
WSSi (working set of Process Pi) =
total number of pages referenced in the most recent  (varies in
time)
– if  too small will not encompass entire locality.
– if  too large will encompass several localities.
– if  =   will encompass entire program.
D =  WSSi  total demand frames
if D > m  Thrashing
Policy if D > m, then suspend one of the processes.
44
1999
Keeping Track of the Working Set
•
•
•
•
Approximate with interval timer + a reference bit
Example:  = 10,000
– Timer interrupts after every 5000 time units.
– Keep in memory 2 bits for each page.
– Whenever a timer interrupts copy and sets the values of all
reference bits to 0.
– If one of the bits in memory = 1  page in working set.
Why is this not completely accurate?
Improvement = 10 bits and interrupt every 1000 time units.
45
1999
Page-Fault Frequency Scheme
•
Establish “acceptable” page-fault rate.
– If actual rate too low, process loses frame.
– If actual rate too high, process gains frame.
46
1999
Virtual memory: Segmentation
•
•
Memory-management scheme that supports user view of
memory.
A program is a collection of segments. A segment is a logical unit
such as:
main program,
procedure,
function,
local variables, global variables,
common block,
stack,
symbol table, arrays
47
1999
Logical View of Segmentation
1
4
1
2
3
2
4
3
user space
physical memory space
48
1999
Segmentation Architecture
•
Logical address consists of a two tuple:
<segment-number, offset>,
•
•
•
Segment table – maps two-dimensional physical addresses; each
table entry has:
– base – contains the starting physical address where the
segments reside in memory.
– limit – specifies the length of the segment.
Segment-table base register (STBR) points to the segment table’s
location in memory.
Segment-table length register (STLR) indicates number of
segments used by a program;
segment number s is legal if s < STLR.
49
1999
Segmentation Architecture (Cont.)
•
•
•
Relocation.
– dynamic
– by segment table
Sharing.
– shared segments
– same segment number
Allocation.
– first fit/best fit
– external fragmentation
50
1999
Segmentation Architecture (Cont.)
•
•
•
•
Protection. With each entry in segment table associate:
– validation bit = 0  illegal segment
– read/write/execute privileges
Protection bits associated with segments; code sharing occurs at
segment level.
Since segments vary in length, memory allocation is a dynamic
storage-allocation problem.
A segmentation example is shown in the following diagram
51
1999
Sharing of segments
52
1999
Segmentation with Paging – MULTICS
•
•
The MULTICS system solved problems of external fragmentation
and lengthy search times by paging the segments.
Solution differs from pure segmentation in that the segment-table
entry contains not the base address of the segment, but rather the
base address of a page table for this segment.
53
1999
MULTICS Address Translation Scheme
54
1999
Segmentation with Paging – Intel 386
•
As shown in the following diagram, the Intel 386 uses
segmentation with paging for memory management with a twolevel paging scheme.
55
1999
Intel 30386 address translation
56
1999