Page Allocation strategies in Unix & Linux

Download Report

Transcript Page Allocation strategies in Unix & Linux

Memory Management
Joe O’Pecko
CS-520-A
What is Memory Management?
•
•
The art and the process of coordinating and
controlling the use of memory in a computer
system.
Memory management can be divided into three
areas:
1. Memory management hardware (MMUs, RAM, etc.);
2. Operating system memory management (virtual
memory, protection);
3. Application memory management (allocation,
deallocation, garbage collection).
What is Virtual Memory?
•
An addressing scheme implemented in hardware and software
that allows non-contiguous memory to be addressed as if it is
contiguous. The technique used by all current implementations
provides two major capabilities to the system:
1.
2.
Memory can be addressed that does not currently reside in main
memory and the hardware and OS will load the required memory
from auxiliary storage automatically, without any knowledge of the
program addressing the memory, thus allowing a program to
reference more memory than actually exists in the computer.
In multi tasking systems, total memory isolation, otherwise referred to
as a discrete address space, can be provided to every task except
the lowest level operating system. This greatly increases reliability by
isolating program problems within a specific task and allowing
unrelated tasks to continue to process.
UNIX Memory Management
• UNIX System V and Linux support a
demand paging virtual memory
architecture
– The entire process does not have to reside in
main memory to execute
– The kernel loads pages for a process on
demand when the process references the
pages
• BSD 4.0 was the first major UNIX
implementation of a demand paging policy
Demand Paging – the Good
• Permits greater flexibility in mapping the virtual address space of a
process into the physical memory of a machine
• Allows the size of a process to be greater than the amount of
available physical memory
• Does not load the pages that are never accessed, so saves the
memory for other programs and increases the degree of
multiprogramming.
• Less loading latency at the program startup.
• Less disk overhead because of fewer page reads.
• Ability to run large programs on the machine, even though it does
not have sufficient memory to run the program. This method is better
than an old technique called overlays.
• Does not need extra hardware support than what paging needs,
since protection fault can be used to get page fault.
Demand Paging – the Bad
• Individual programs face extra latency when
they access a page for the first time. So
prepaging, a method of remembering which
pages a process used when it last executed and
preloading few of them, is used to improve
performance.
• Memory management with page replacement
algorithms become slightly more complex.
• Possible security risks, including vulnerability to
timing attacks.
What is memory allocation?
• The process of assigning blocks of memory on
request
• Typically the allocator, i.e. memory manager,
receives memory from the operating system in a
small number of large blocks that it must divide
up to satisfy the request for smaller blocks
• There are typically many ways to perform this,
each with their own strengths and weaknesses
Kernel Memory Allocator
• The Kernel Memory Allocator (KMA) is a
subsystem that tries to satisfy the requests
for memory areas from all parts of the
system.
– Some requests come from other kernel
subsystems needing memory for kernel use.
– Some requests come via system calls from
user programs to increase their processes’
address space.
KMA Goals
• It must be fast. This is crucial since it is invoked
by all kernel subsystems (including interrupt
handlers).
• It should minimize the amount of wasted
memory.
• It should reduce the memory fragmentation
problem.
• It should be able to cooperate with other
memory management subsystems in order to
borrow and release pages from them.
Kernel Memory Allocation
• Linux distinguishes between allocating dynamic
memory for kernel mode processes from user
mode processes
– Kernel functions get dynamic memory in a fairly
straightforward manner
• The kernel is the highest priority component of the OS
• The kernel trusts itself; no need to insert protection code
– User mode processes
• Requests for dynamic memory are nonurgent and can be
deferred since the process may not access additional
memory in a timely fashion
• Nontrusted by kernel, so kernel must be prepared to catch all
addressing errors
Page Frame Management
• The kernel must keep track of the current status
of each page frame
• It must be able to distinguish the page frames
used to contain pages belonging to processes
from those that contain kernel code or kernel
data structures; similarly, it must be able to
determine whether a page frame in dynamic
memory is free or not.
• This state information is kept in a page
descriptor of type page. All page descriptors are
stored in the mem_map array.
Memory Zones
•
•
Ideally, a page frame is a memory storage unit that can be used for
anything: storing kernel and user data, buffering disk data, and so on.
However, real computer architectures have hardware constraints that may
limit the way page frames can be used. In particular, the Linux kernel must
deal with two hardware constraints of the 80 x 86 architecture:
– The Direct Memory Access (DMA) processors for old ISA buses have a strong
limitation: they are able to address only the first 16 MB of RAM.
– In modern 32-bit computers with lots of RAM, the CPU cannot directly access all
physical memory because the linear address space is too small.
•
To cope with these two limitations, Linux 2.6 partitions the physical memory
of every memory node into three zones. In the 80 x 86 UMA architecture the
zones are:
– ZONE_DMA
• Contains page frames of memory below 16 MB
– ZONE_NORMAL
• Contains page frames of memory at and above 16 MB and below 896 MB
– ZONE_HIGHMEM
• Contains page frames of memory at and above 896 MB
Zoned Page Frame Allocator
Kernel subsystem that handles memory allocation requests for groups
of contiguous page frames
The Buddy System Algorithm
• The kernel must establish a robust and efficient strategy
for allocating groups of contiguous page frames.
• Must deal with a well-known memory management
problem call external fragmentation, where frequent
requests and releases of different sized groups of
contiguous page frames lead to a situation in which
small blocks are scattered inside blocks of allocated
page frames.
• Two solutions to avoid external fragmentation:
– Use the paging circuitry to map groups of noncontiguous free
page frames into intervals of contiguous linear addresses.
– Develop a suitable technique to keep track of the existing blocks
of free contiguous page frames, avoiding as much as possible
the need to split up a large free block to satisfy a request for a
smaller one.
The Buddy System Algorithm
(cont.)
• Used by the kernel for allocating groups of
contiguous page frames. (see mm/page_alloc.c)
• All free page frames are grouped into 11 lists of
blocks that contain groups of 1, 2, 4, 8, 16, 32,
64, 128, 256, 512, and 1024 contiguous page
frames
• The physical address of the first page frame of a
block is a multiple of the group size. E.g. the
initial address of a 16-page-frame block is a
multiple of 16 x 2^12
Allocating memory via the Buddy
System Algorithm
• The algorithm for allocating, for example, a block
of 256 contiguous page frames
– First checks for a free block in the 256-page-frame list
– If no free block, it then looks in the 512-page-frame
list for a free block
– If it finds a block, the kernel allocates 256 of the 512
page frames and puts the remaining 256 into the list
of free 256-page-frame blocks.
– If no free 512-page block, kernel looks at the next
larger list (i.e., a free 1024-page-frame block)
allocating it and dividing the block similarly
– If no block can be allocated an error is reported
Reclaiming memory via the Buddy
System Algorithm
• The kernel attempts to merge pairs of free buddy
blocks of size b together into a single block of
size 2b. Two blocks are considered buddies if:
– Both have the same size.
– They are located in contiguous physical addresses.
– The physical address of the first page frame of the
first block is a multiple of 2 x b x 2^12.
• This is an iterative algorithm; if it successfully
merges released blocks, b is doubled and bigger
blocks are attempted.
Buddy System Example
64K
64K
64K
64K
64K
64K
64K
64K
64K
64K
64K
128K
256K
64K
64K
128K
256K
64K
64K
128K
256K
64K
64K
128K
128K
128K
512K
64K
64K
128K
128K
128K
512K
128K
128K
128K
512K
128K
128K
512K
1024K
128K
256K
1024K
512K
512K
64K
64K
64K
64K
64K
64K
64K
Buddy System Data Structures
Linux 2.6 uses a different buddy system for
each zone. Each one relies on the
following main data structures:
• mem_map array – contains all the page
frame descriptors on the system
• An array having 11 elements of type
free_area, one element for each group
size.
/proc/buddyinfo
• Used primarily for diagnosing memory fragmentation issues. Using
the buddy algorithm, each column represents the number of pages
of a certain order (a certain size) that are available at any given
time.
• The DMA row references the first 16 MB on a system, the HighMem
row references all memory greater than 4 GB on a system, and the
Normal row references all memory in between.
Node 0, zone
DMA 90
Node 0, zone
Normal 1650
Node 0, zone
HighMem 2
6
2
1
1
…
310
5
0
0
…
0
0
1
1
…
Memory Area Management
• A memory area is a sequence of memory cells having contiguous
physical addresses and an arbitrary length.
• The buddy system algorithm adopts the page frame as the basic
memory area. This is fine for dealing with relatively large memory
requests, but what about requests for small memory areas, say a
few tens or hundreds of bytes?
• It would be wasteful to allocate a full page frame to store a few
bytes. A better approach instead consists of introducing new data
structures that describe how small memory areas are allocated
within the same page frame. In doing so, we introduce a new
problem called internal fragmentation. It is caused by a mismatch
between the size of the memory request and the size of the memory
area allocated to satisfy the request.
Memory Area Management
early Linux versions
• Power of 2 allocator
– Provides memory areas whose sizes are geometrically
distributed; in other words, the size depends on a power of 2
rather than on the size of the data to be stored. In this way, no
matter what the memory request size is, ensures that the internal
fragmentation is always smaller than 50 percent. Following this
approach, the kernel creates 13 geometrically distributed lists of
free memory areas whose sizes range from 32 to 131, 072
bytes.
– The buddy system is invoked both to obtain additional page
frames needed to store new memory areas and, conversely, to
release page frames that no longer contain memory areas. A
dynamic list is used to keep track of the free memory areas
contained in each page frame.
– Running a memory area allocation algorithm on top of the buddy
system was not efficient.
Slab Allocator
• First adopted in Sun Microsystems Solaris 2.4 Operating
System.
• Views the memory areas as objects
• Does not discard objects which have been allocated and
subsequently released, but instead saves them in
memory in order to service new requests since kernel
functions tend to request memory areas of the same
type repeatedly
• Avoids internal fragmentation by classifying requests for
memory areas according to frequency.
• Groups objects into caches, which store objects of the
same type. E.g. when a file is opened, the memory area
needed to store the corresponding “open file” object is
taken from a slab allocator name filp (for “file pointer”)
Slab Allocator
• Performs the following functions
– Allocate memory
– Initialize objects/structures
– Use objects/structures
– Deconstruct objects/structures
– Free memory
• /proc/slabinfo – gives full information about
memory usage on the slab level. (see also
/usr/bin/slabtop)
Alternative KMAs
•
•
•
•
•
•
•
Resource Map Allocator
Simple Power-of-Two Free Lists
The McKusick-Karels Allocator
The Buddy System
SVR4 Lazy Buddy Allocator
Mach-OSF/1 Zone Allocator
Solaris Slab Allocator
References
• Bach, Maurice J. The Design of the Unix Operating
System. Prentice Hall, 1994
• Bovet, Daniel P. and Cesati, Marco. Understanding the
Linux Kernel 3rd Edition. O’Reilly, 2005
• Knuth, Donald E. The Art of Computer Programming
Volume 1. Addison Wesley, 1997
• http://en.wikipedia.org/wiki/Buddy_memory_allocation
• http://en.wikipedia.org/wiki/Demand_paging
• http://en.wikipedia.org/wiki/Virtual_memory
• http://www.redhat.com/docs/manuals/enterprise/RHEL-4Manual/ref-guide/s1-proc-topfiles.html
• http://www.usenix.org/publications/library/proceedings/bo
s94/full_papers/bonwick.a