Memory Mangement2

Download Report

Transcript Memory Mangement2

Memory Management
1
Background
• Memory consists of a large array of words or bytes, each
with its own address. The CPU fetches instructions from
memory according to the value of the program counter.
These instructions may cause additional loading from
and storing to specific memory addresses.
• Memory unit sees only a stream of memory addresses. It
does not know how they are generated.
• Program must be brought into memory and placed within
a process for it to be run.
• Input queue – collection of processes on the disk that
are waiting to be brought into memory for execution.
• User programs go through several steps before being
2
run.
Multistep
Processing
of a User
Program
3
Binding of Instructions and Data to Memory
Address binding of instructions and data to memory addresses can
happen at three different stages.
• Compile time: If memory location known a priori,
absolute code can be generated; must recompile code if
starting location changes.
Example: .COM-format programs in MS-DOS.
• Load time: Must generate relocatable code if memory
location is not known at compile time.
• Execution time: Binding delayed until run time if the
process can be moved during its execution from one
memory segment to another. Need hardware support for
address maps (e.g., relocation registers).
4
Logical vs. Physical Address Space
• The concept of a logical address space that is bound to a
separate physical address space is central to proper memory
management.
– Logical address – address generated by the CPU; also
referred to as virtual address.
– Physical address – address seen by the memory unit.
• The set of all logical addresses generated by a program is a
logical address space; the set of all physical addresses
corresponding to these logical addresses is a physical
address space.
• Logical and physical addresses are the same in compile-time
and load-time address-binding schemes; logical (virtual) and
physical addresses differ in execution-time address-binding 5
scheme.
Memory-Management Unit
(MMU)
• Hardware device that maps virtual address to
physical address.
• In a simple MMU scheme, the value in the
relocation register is added to every address
generated by a user process at the time it is sent
to memory.
• The user program deals with logical addresses;
it never sees the real physical addresses.
6
Dynamic relocation using a relocation
register
7
Dynamic Loading
• Routine is not loaded until it is called
• Better memory-space utilization; unused routine
is never loaded.
• Useful when large amounts of code are needed
to handle infrequently occurring cases.
• No special support from the operating system is
required.
• Implemented through program design.
8
Dynamic Linking
• Linking is postponed until execution time.
• Small piece of code, stub, is used to locate the
appropriate memory-resident library routine, or
to load the library if the routine is not already
present.
• Stub replaces itself with the address of the
routine, and executes the routine.
• Operating system is needed to check if routine is
in processes’ memory address.
• Dynamic linking is particularly useful for libraries.
9
Swapping
• A process can be swapped temporarily out of memory to a backing
store, and then brought back into memory for continued execution.
• Backing store – fast disk large enough to accommodate copies of all
memory images for all users; must provide direct access to these
memory images.
• Roll out, roll in – swapping variant used for priority-based scheduling
algorithms; lower-priority process is swapped out so higher-priority
process can be loaded and executed.
• Major part of swap time is transfer time; total transfer time is directly
proportional to the amount of memory swapped.
• Modified versions of swapping are found on many systems (i.e.,
UNIX, Linux, and Windows).
10
Schematic View of Swapping
11
Contiguous Allocation
• Main memory usually into two partitions:
– Resident operating system, usually held in low
memory with interrupt vector
– User processes then held in high memory
• Single-partition allocation
– Relocation-register scheme used to protect
user processes from each other, and from
changing operating-system code and data
– Relocation register contains value of smallest
physical address; limit register contains range
of logical addresses – each logical address
must be less than the limit register
12
HW support for relocation and limit
registers
13
Memory Allocation
How to satisfy a request of size n from a list of free blocks
• First-fit: Allocate the first block that is big enough
• Best-fit: Allocate the smallest block that is big
enough; must search entire list, unless ordered by
size. Produces the smallest leftover block.
• Worst-fit: Allocate the largest block; must also
search entire list. Produces the largest leftover
block.
First-fit and best-fit better than worst-fit in terms of
speed and storage utilization
14
Fragmentation
• External Fragmentation – total memory space
exists to satisfy a request, but it is not
contiguous.
• Internal Fragmentation – allocated memory
may be slightly larger than requested memory;
this size difference is memory internal to a
partition, but not being used.
• Reduce external fragmentation by compaction
– Shuffle memory contents to place all free memory
together in one large block.
– Compaction is possible only if relocation is dynamic,
15
and is done at execution time.
Paging
• Logical address space of a process can be
noncontiguous; process is allocated physical memory
whenever the latter is available.
• Divide physical memory into fixed-sized blocks called
frames (size is power of 2, for example 512 bytes).
• Divide logical memory into blocks of same size called
pages.
• Keep track of all free frames.
• To run a program of size n pages, need to find n free
frames and load program.
• Set up a page table to translate logical to physical
addresses.
• Internal fragmentation may occurs.
16
Address Translation Scheme
• Address generated by CPU is divided into:
– Page number (p) – used as an index into a
page table which contains base address of
each page in physical memory.
– Page offset (d) – combined with base address
to define the physical memory address that is
sent to the memory unit.
17
Address Translation Architecture
18
Paging Example
19
Paging
Example
page size:
4 bytes
20
Free Frames
Before allocation
After allocation
21
Hardware Support
• Most OS allocate a Page Table for each process. A
pointer to the Page Table is stored with the other register
values in the PCB
• When the dispatcher starts a process, it must reload the
user registers and define the correct hardware pagetable values from the stored user table.
• Hardware implementation can be done in these ways
– Set of dedicated registers- built with high speed logic to make
page-address translation efficient
– Page table is kept in the main memory- Page-table base register
(PTBR) points to the page table (In this scheme every
data/instruction-byte access requires two memory accesses.
One for the page-table entry and one for the byte.)
22
Hardware Support
•
•
•
•
The two memory access problem can be solved by the
use of a special fast-lookup hardware cache called
associative registers or associative memory or
translation look-aside buffers (TLBs).
TLB entry consist of two parts: a key and a value. An
item to be searched is compared with all keys
simultaneously. If item is located the corresponding
value is returned
Fast but expensive.
Typically, the number of entries in a TLB is between 64
and 1024.
23
Associative Memory
• Associative memory – parallel search
Page #
Frame #
Address translation (P, F)
– If P is in associative register, get frame# out.
– Otherwise get frame# from page table in memory
24
Paging Hardware With TLB
25
• Some TLBs Store Address-Space Identifiers (ASIDs) in
each TLB entry, which uniquely identifies each process
and is used to provide address space numbers for that
process.
• When the TLB attempts to resolve virtual page numbers,
it ensures that the ASID for the currently running process
matches the ASID associated with the virtual page
• If the ASID do not match then it is treated as a TLB miss.
• ASID allows the TLB to contain entries for several
processes simultaneously
26
Segmentation
• Memory-management scheme that supports user view of
memory.
• A program is a collection of segments. Each segment
has an name and a length. The addresses of segment
specify both the segment name and the offset within the
segment
• A segment is a logical unit such as:
•
•
•
•
•
•
•
main program, procedure,
function, method,
object,
local variables, global variables,
common block,
stack,
symbol table, arrays
27
User’s View of a Program
28
Logical View of Segmentation
1
4
1
2
3
4
2
3
user space
physical memory space
29
Segmentation Architecture
• Logical address consists of a two tuple:
<segment-number, offset>
• Segment table – maps two-dimensional physical
addresses; each table entry has:
– base – contains the starting physical address where the
segments reside in memory.
– limit – specifies the length of the segment.
• Segment-table base register (STBR) points to the
segment table’s location in memory.
• Segment-table length register (STLR) indicates number
of segments used by a program;
segment number s is legal if s < STLR.
30
Segmentation Hardware
31
Example of Segmentation
32
Sharing of Segments
33
Segmentation with Paging
• Both paging and segmentation have their
advantages and disadvantages.
• Problems of external fragmentation and lengthy
search times can be solved by paging the
segments.
• Solution differs from pure segmentation in that
the segment-table entry contains not the base
address of the segment, but rather the base
address of a page table for this segment.
34
Virtual-Memory
Management
35
Background

Virtual memory – separation of user logical memory from
physical memory. Allows an extremely large virtual memory
to be provided for programmers when only a smaller
physical memory is available.





Only part of the program needs to be in memory for execution.
Logical address space can therefore be much larger than
physical address space.
Allows address spaces to be shared by several processes.
Allows for more efficient process creation.
Virtual memory can be implemented via:


Demand paging
Demand segmentation
36
Virtual Memory That is Larger Than
Physical Memory

37
Virtual-address Space
38
Shared Library Using Virtual
Memory
39
Demand Paging




Technique of bringing a page into memory only when it is
needed, is used in virtual memory systems
Pager will bring the required pages rather than whole process,
into the main memory.
Benefits Less I/O needed
 Less memory needed
 Faster response
 More users
Page is needed  reference to it
 invalid reference  abort
 not-in-memory  bring to memory
40
Transfer of a Paged Memory to
Contiguous Disk Space
41





To distinguish between the pages that are in the memory and
the pages that are on the disk, Valid-Invalid scheme is used.
This bit is set to ‘valid’ if the page is both legal and in memory
This bit is set to ‘invalid’ if the page is either not valid (not in
logical address space of the process) or is valid but not in the
main memory .
The process executes and accesses pages that are memory
resident , execution proceeds normally.
If the page tries to access a page that is not in memory,
(access to a page marked invalid causes a page fault trap- as
a result of OS failure to bring the desired page into memory.
42
Page Table When Some Pages
Are Not in Main Memory
43
Procedure for handling page
fault
1.
2.
3.
4.
5.
6.
Check an Page table( in PCB) for this process to
determine whether the reference was a valid or an invalid
memory access.
If the reference was invalid , the process is terminated. If
it was valid, but page is not brought in, it is paged in .
Free frame is located.
Disk operation is initiated to read desired page in the
newly allocated frame
On completion of disk read, the page of process is
modified to indicate that now the page is in memory
Instruction which was trapped in restarted. Process can
now access the page as though it has always been there.
44
Steps in Handling a Page Fault
45





In the extreme case , a process starts executing with no pages
in memory.
The OS sets the instruction pointer to the first instruction of the
process, which is on non-memory-resident page, the process
immediately faults for the page.
After this page is brought in the memory, the process
continues to execute, faulting as necessary until every page is
in the memory.
When all the pages required are in the memory, process
executes with no faults. This scheme is called pure demand
paging- never bring a page into memory until it is needed
Hardware support –
 Page table
 Secondary memory-to hold swapped pages not in main
memory
46
Performance of Demand
Paging

Page Fault Rate 0  p  1.0



if p = 0 no page faults
if p = 1, every reference is a fault
Effective Access Time (EAT)
EAT = (1 – p) x memory access +p x page fault time
Where
Page fault time= (page fault overhead + [swap page out ]
+ swap page in + restart overhead)
47
Page fault causes the following sequence to occur
•
•
•
•
•
•
•
•
•
•
•
•
Trap to the OS
Save user Registers & Process state.
Determine that the interrupt was a page fault.
Check that the page reference was legal and determine the location of
the page on the disk.
Issue a read from the disk to frame
– Wait in queue for this device until the read request is serviced
– Wait for the device seek and/or latency time.
– Begin the transfer of the page to a free frame.
While waiting allocate CPU to other process
Receive an interrupt from the disk I/O subsystem.
Save the registers & process state for other user.
Determine the interrupt was from he disk.
Correct page table to show page is now in memory.
Wait for CPU to be allocated to process again.
Restore the user registers, process state and new page table and
then resume interrupted instruction.
48
Example to calculate EAT
Average page fault service time =8 milli sec
Memory access time = 200 nano sec
Effective Access Time= (1-p)x 200 + p x 8000000
= 200 + 7999800 x p
EAT is directly proportional to the page-fault rate if p=1 out of 1000 then
EAT= 200 + 7999800 * 1 /1000 = 8.199.8 Nano Sec = 8.2 Micro
seconds
If we want performance to be degraded by 10 %
220>200+7999800 x p
20>7999800xp
P<0.0000025
It is important to keep page fault rate low in order to have less effective
49
access time
Page Replacement

Prevent over-allocation of memory by modifying
page-fault service routine to include page
replacement

Use modify (dirty) bit to reduce overhead of
page transfers – only modified pages are written
to disk

Page replacement completes separation between
logical memory and physical memory – large
virtual memory can be provided on a smaller
physical
memory
50
Need For Page
Replacement
51
Basic Page Replacement
1.
2.
Find the location of the desired page on disk
Find a free frame:



3.
4.
If there is a free frame, use it
If there is no free frame, use a page replacement
algorithm to select a victim frame
Write the victim frame to the disk, change the page
and frames tables accordingly
Read the desired page into the (newly) free frame.
Update the page and frame tables.
Restart the process
To evaluate the page replacement
algorithm
a reference string is used
52
which is a string of memory references
Page Replacement
53
Graph of Page Faults Versus The
Number of Frames
54
FIFO Page Replacement
This algorithm associates the time that the
page was bought into memory.
 When a page has to be replaced, the
oldest page is chosen-can be implemented
by a FIFO queue.In this scheme the new
page that is bought in is inserted at the end
of the queue.

55
FIFO Page Replacement
56



Easy to understand and program but performance
is not always good.
If page selected for replacement is in active use,
every thing still works fine.After replacing an
active page with new one, a fault occurs almost
immediately to retrieve the active page.
A bad replacement choice increases the page
fault rate and slows down process execution.
57
FIFO Illustrating Belady’s
Anomaly
58
Optimal Page Replacement
Replace the page that will not be used for
the longest period of time.
 Guarantees lowest possible page-fault rate
for a fixed number of frames.
 Better than the FIFO page replacement.
 Difficult to implement as it requires future
knowledge of the reference string

59
Optimal Page Replacement
60
Least Recently Used (LRU)
Page Replacement

This algorithm associates with each page
the time of that page’s last use.When a
page must be replaced, LRU chooses the
page that has not been used for the longest
duration of time.
61
LRU Page Replacement
62


Good performance, but difficult to
implement, requires substantial hardware
assistance
Two types of implementations are feasible

Counters- Each page is associated with
page-table entry a time-of-use field and a
logical clock or Counter with CPU.
Whenever a reference to the page is made,
the contents of the clock register are copied
to the field of time-of-use field in the page
table entry of that page.
63

Stack implementation (to record the most
recent page references) - keep a stack of page
numbers in a double link form:
 Page
referenced:
• move it to the top
• Use doubly-linked list: requires 6
pointers to be changed
 No search for replacement
64
Use Of A Stack to Record The Most
Recent Page References
65
LRU Approximation
Algorithms

Reference bit




With each page associate a bit, initially = 0
When page is referenced bit set to 1
Replace the one which is 0 (if one exists). We do not know
the order, however.
Second chance



Need reference bit
FIFO replacement
If page to be replaced (in clock order) has reference bit = 1
then:
• set reference bit 0
• leave page in memory
• replace next page (in clock order), subject to same rules
66
Second-Chance (clock) PageReplacement Algorithm
67
Counting Algorithms



Keep a counter of the number of references that
have been made to each page
Least Frequently Used page-replacement
Algorithm: replaces page with smallest count as
actively used page will have a large count value.
Most Frequently Used page-replacement
Algorithm: based on the argument that the page
with the smallest count was probably just brought
in and has yet to be used
68
Allocation of Frames



Each process needs minimum number of pages
Example: In a single user system with 128 KB of memory
of 1KB each page. This system has 128 frames. The OS
may take 35 KB , leaving 93 frames for the user process.
Under pure demand paging, all 93 frames would initially
be put on the free frame list. When a user process started
execution, it would generate a sequence of page faults.
The first 93 page faults would all get free frame list. On
exhaustion of free frame list, page replacement algorithm
will select the page to be replaced.
Two major allocation schemes
 fixed allocation
 priority allocation
69
Fixed Allocation
Equal allocation – Split m frames among
n processes to give everyone an equal
share, m/n frames. The leftover frames
can be used as a free frame buffer pool.
 For example, if there are 100 frames and
5 processes, give each process 20
frames.

70

Proportional allocation – Allocate
according to the size of process
si  size of process pi
S   si
m  total number of frames
si
ai  allocation for pi   m
S
71
m  64
si  10
s2  127
10
a1 
 64  5
137
127
a2 
 64  59
137
Priority Allocation

Use a proportional allocation scheme
using priorities rather than size

If process Pi generates a page fault,
select for replacement one of its frames
 select for replacement a frame from a
process with lower priority number

72
Global vs. Local Allocation
Global replacement – process
selects a replacement frame from
the set of all frames; one process
can take a frame from another
 Local replacement – each
process selects from only its own
set of allocated frames

73
Thrashing

If a process does not have “enough” pages,
the page-fault rate is very high. This leads to:




low CPU utilization
operating system thinks that it needs to
increase the degree of multiprogramming
another process added to the system
Thrashing  a process is busy swapping
pages in and out rather than execution
74
Cause of Thrashing






OS monitors CPU utilization. If CPU Utilization is too low, OS
increases degree of multiprogramming by introducing more
processes.
A global page replacement algorithm replaces pages without
regard to the processes to which they belong.
Process enters a new phase of execution and requires more
pages. It faults and starts grabbing pages from other processes.
These processes too needs pages, so they fault and take away
pages from other processes.
These faulting processes must use the paging device to swap
pages in and out.
As they queue up for paging device, the ready queue (with
CPU) empties indicating low CPU utilization and leads to
increase in degree of multiprogramming by adding more
processes => this cycle will continue that increases thrashing
and increasing Effective Access Time (EAT)
75
Thrashing (Cont.)
76



Thrashing effects can be reduced by using a local
replacement algorithm. If one process starts thrashing, it
cannot steal frames from other processes and cause thrashing
of the latter process.
The processes will still be thrashing and will be in the queue for
paging device most of the times. The average service time for a
page fault will increase => resulting in increase of Effective
Access Time (EAT).
To prevent this the processes must be provided with as many
frames as it needs. Two methods used are
 Working set Model
 Page fault Frequency
77
Working-Set Model


  working-set window  a fixed number of page
references . If the page is in active use, it will be in the
working set. If no longer being used, it will drop from
the working set.
WSSi (working set of Process Pi) =
total number of pages referenced in the most recent 
(varies in time)






if  too small will not encompass entire locality
if  too large will encompass several localities
if  =   will encompass entire program
D =  WSSi  total demand frames
if D > m  Thrashing
Policy78if D > m, then suspend one of the processes
Working-set model
79
Page-Fault Frequency
Scheme

Establish “acceptable” page-fault rate


If actual rate too low, process loses frame
If actual rate too high, process gains frame
80