Transcript Slide 1

Memory Management
Computer programs must be in main memory (also called random-access memory or RAM) to be executed. Main memory is the
only large storage area (millions to billions of bytes) that the processor can access directly. It commonly is implemented in a
semiconductor technology called dynamic random-access memory (DRAM), which forms an array of memory words. Bach word
has its own address. Interaction is achieved through a sequence of load or store instructions to specific memory addresses. The
load instruction moves a word from main memory to an internal register within the CPU, whereas the store instruction moves the
content of a register to main memory. Aside from explicit loads and stores, the CPU automatically loads instructions from main
memory for execution.
A typical instruction - execution cycle, as executed on a system with a von Neumann architecture, first fetches an instruction
from memory and stores that instruction in the instruction register. The instruction is then decoded and may cause operands to
be fetched from memory and stored in some internal register. After the instruction on the operands has been executed, the result
may be stored back in memory. Notice that the memory unit sees only a stream of memory addresses; it does not know how
they are generated (by the instruction counter, indexing, indirection, literal addresses, or some other means) or what they are for
(instructions or data). Accordingly, we can ignore how a memory address is generated by a program. We are interested only in
the sequence of memory addresses generated by the running program.
Ideally, we want the programs and data to reside in main memory permanently. This arrangement usually is not possible for the
following two reasons:
1. Main memory is usually too small to store all needed programs and data permanently.
2. Main memory is a volatile storage device that loses its contents when power is turned off or otherwise lost.
Thus, most computer systems provide secondary storage as an extension of main memory. The main requirement for secondary
storage is that it be able to hold large quantities of data permanently.
The most common secondary-storage device is a magnetic disk, which provides storage for both programs and data. Most
programs (web browsers, compilers, word processors, spreadsheets, and so on) are stored on a disk until they are loaded into
memory. Many programs then use the disk as both a source and a destination of the information for their processing. Hence, the
proper management of disk storage is of central importance to a computer system.
In a larger sense, however, the storage structure that we have described — consisting of registers, main memory, and magnetic
disks-is only one of many possible storage systems. Others include cache memory, CD-ROM, magnetic tapes, and so on. Each
storage system provides the basic functions of storing a datum and of holding that datum until it is retrieved at a later
time. The main differences among the various storage systems lie in speed, cost, size, and volatility.
The wide variety of storage systems in a computer system can be organized in a hierarchy according to speed and cost. The
higher levels are expensive, but they are fast.
1
As we move down the hierarchy, the cost per bit generally decreases, whereas the access time generally increases. This
trade-off is reasonable; if a given storage system were both faster and less expensive than another-other properties being the
same-then there would be no reason to use the slower, more expensive memory. In fact, many early storage devices, including
paper tape and core memories, are relegated to museums now that magnetic tape and semiconductor memory have become
faster and cheaper. The top four levels of memory in figure above may be constructed using semiconductor memory.
In addition to differing in speed and cost, the various storage systems are either volatile or nonvolatile. As mentioned earlier,
volatile storage loses its contents when the power to the device is removed. In the absence of expensive battery and generator
backup systems, data must be written to nonvolatile storage for safekeeping. In the hierarchy shown in figure above, the
storage systems above the electronic disk are volatile, whereas those below are nonvolatile. An electronic disk can be designed
to be either volatile or nonvolatile. During normal operation, the electronic disk stores data in a large DRAM array, which is
volatile. But many electronic-disk devices contain a hidden magnetic hard disk and a battery for backup power. If external
power is interrupted, the electronic-disk controller copies the data from RAM to the magnetic disk. When external power is
restored, the controller copies the data back into the RAM. Another form of electronic disk is flash memory, which is popular in
cameras and personal digital assistants (PDAs), in robots, and increasingly as removable storage on general-purpose
computers. Flash memory is slower than DRAM but needs no power to retain its contents. Another form of nonvolatile storage
is NVRAM, which is DRAM with battery backup power. This memory can be as fast as DRAM but has a limited duration in
which it is nonvolatile.
The design of a complete memory system must balance all the factors just discussed: It must use only as much expensive
memory as necessary while providing as much inexpensive, nonvolatile memory as possible. Caches can be installed to
improve performance where a large access-time or transfer-rate disparity exists between two components.
2
Caching is an important principle of computer systems. Information is normally kept in some storage system (such as main
memory). As it is used, it is copied into a faster storage system-the cache-on a temporary basis. When we need a particular
piece of information, we first check whether it is in the cache. If it is, we use the information directly from the cache; if it is not,
we use the information from the source, putting a copy in the cache under the assumption that we will need it again soon.
In addition, internal programmable registers, such as index registers, provide a high-speed cache for main memory. The
programmer (or compiler) implements the register-allocation and register-replacement algorithms to decide which information to
keep in registers and which to keep in main memory. There are also caches that are implemented totally in hardware. For
instance, most systems have an instruction cache to hold the next instructions expected to be executed. Without this cache, the
CPU would have to wait several cycles while an instruction was fetched from main memory. For similar reasons, most systems
have one or more high-speed data caches in the memory hierarchy. We are not concerned with these hardware-only caches in
this text, since they are outside the control of the operating system.
Main Memory
We begin our discussion by covering several issues that are pertinent to the various techniques for managing memory. This
includes an overview of basic hardware issues, the binding of symbolic memory addresses to actual physical addresses, and
distinguishing between logical and physical addresses. We conclude with a discussion of dynamically loading and linking code
and shared libraries. Main memory and the registers built into the processor itself are the only storage that the CPU can access
directly. There are machine instructions that take memory addresses as arguments, but none that take disk addresses.
Therefore, any instructions in execution, and any data being used by the instructions, must be in one of these direct-access
storage devices. If the data are not in memory, they must be moved there before the CPU can operate on them.
Registers that are built into the CPU are generally accessible within one cycle of the CPU clock. Most CPUs can decode
instructions and perform simple operations on register contents at the rate of one or more operations per clock tick. The same
cannot be said of main memory, which is accessed via a transaction on the memory bus. Memory access may take many cycles
of the CPU clock to complete, in which case the processor normally needs to stall, since it does not have the data required to
complete the instruction that it is executing. This situation is intolerable because of the frequency of memory accesses. The
remedy is to add fast memory between the CPU and main memory. A memory buffer used to accommodate a speed differential,
called a cache.
3
Basic Hardware
Not only are we concerned with the relative speed of accessing physical memory, but we also must ensure correct operation,
has to protect the operating system from access by user processes and, in addition, to protect user processes from one another.
This protection must be provided by the hardware. It can be implemented in several ways, as we shall see throughout the
chapter. In this section, we outline one possible implementation.
We first need to make sure that each process has a separate memory space. To do this, we need the ability to determine the
range of legal addresses that the process may access and to ensure that the process can access only these legal addresses.
We can provide this protection by using two registers, usually a base and a limit, as illustrated in Figure 1. The base register
holds the smallest legal physical memory address; the limit register specifies the size of the range. For example, if the base
register holds 300040 and limit register is 120900, then the program can legally access all addresses from 300040 through
420940 (inclusive).
Figure1. A base and limit register define a logical address space
4
Protection of memory space is accomplished by having the CPU hardware compare every address generated in user mode
with the registers. Any attempt by a program executing in user mode to access operating-system memory or other users'
memory results in a trap to the operating system, which treats the attempt as a fatal error (Figure 2). This scheme prevents a
user program from (accidentally or deliberately) modifying the code or data structures of either the operating system or other
users.
The base and limit registers can be loaded only by the operating system, which, uses a special privileged instruction. Since
privileged instructions can be executed only in kernel mode, and since only the operating system executes in kernel mode, only
the operating system can load the base and limit registers. This scheme allows the operating system to change the value of the
registers but prevents user programs from changing the registers' contents.
The operating system, executing in kernel mode, is given unrestricted access to both operating system and users' memory.
This provision allows the operating system to load users' programs into users' memory, to dump out those programs in case of
errors, to access and modify parameters of system calls, and so on.
Figure 2. Hardware address protection with base and limit registers
5
Address Binding
Usually, a program resides on a disk as a binary executable file. To be executed, the program must be brought into memory
and placed within a process. Depending on the memory management in use, the process may be moved between disk and
memory during its execution. The processes on the disk that are waiting to be brought into memory for execution form the
input queue.
The normal procedure is to select one of the processes in the input queue and to load that process into memory. As the process
is executed, it accesses instructions and data from memory. Eventually, the process terminates, and its memory space is
declared available.
Most systems allow a user process to reside in any part of the physical memory. Thus, although the address space of the
computer starts at 00000, the first address of the user process need not be 00000. This approach affects the addresses that the
user program can use. In most cases, a user program will go through several steps — some of which may be optional — before
being executed (Figure 3). Addresses may be represented in different ways during these steps. Addresses in the source
program are generally symbolic (such as count). A compiler will typically bind these symbolic addresses to re-locatable
addresses (such as "14 bytes from the beginning of this module"). The linkage editor or loader will in turn bind the re-locatable
addresses to absolute addresses (such as 74014). Each binding is a mapping from one address space to another.
Classically, the binding of instructions and data to memory addresses can be done at any step along the way:
Compile time. If you know at compile time where the process will reside in memory, then absolute code can be generated. For
example, if you know that a user process will reside starting at location R, then the generated compiler code will start at that
location and extend up from there. If, at some later time, the starting location changes, then it will be necessary to recompile
this code. The MS-DOS .COM-format programs are bound at compile time.
Load time. If it is not known at compile time where the process will reside in memory, then the compiler must generate relocatable code. In this case, final binding is delayed until load time. If the starting address changes, we need only reload the
user code to incorporate this changed value.
Execution time. If the process can be moved during its execution from one memory segment to another, then binding must be
delayed until run time. Special hardware must be available for this scheme to work. Most general-purpose operating systems
use this method.
6
Fig 4. Dynamic relocation using a relocation register.
Fig 3. Multistep processing of a user program
7
Logical Versus Physical Address Space
An address generated by the CPU is commonly referred to as a logical address, whereas an address seen by the memory
unit - that is, the one loaded into the memory-address register of the memory - is commonly referred to as a physical
address.
The compile-time and load-time address-binding methods generate identical logical and physical addresses. However, the
execution-time address-binding scheme results in differing logical and physical addresses. In this case, we visually refer to
the logical address as a virtual address. We use logical address and virtual address interchangeably in this text. The set of
all logical addresses generated by a program is a logical address space; the set of all physical addresses corresponding to
these logical addresses is a physical address space. Thus, in the execution-time address-binding scheme, the logical and
physical address spaces differ.
The run-time mapping from virtual to physical addresses is done by a hardware device called the memory-management
unit (MMU). We can choose from many different methods to accomplish such mapping. For the time being, we illustrate this
mapping with a simple MMU scheme. The base register is now called a relocation register. The value in the relocation
register is added to every address generated by a user process at the time it is sent to memory (see Figure 4). For example,
if the base is at 14000, then an attempt by the user to address location 0 is dynamically relocated to location 14000; an
access to location 346 is mapped to location 14346. The MS-DOS operating system running on the Intel 80x86 family of
processors uses four relocation registers when loading and running processes.
The user program never sees the real physical addresses. The program can create a pointer to location 346, store it in
memory, manipulate it, and compare it with other addresses - all as the number 346. Only when it is used as a memory
address (in an indirect load or store, perhaps) is it relocated relative to the base register. The user program deals with
logical addresses. The memory-mapping hardware converts logical addresses into physical addresses. The final location of
a referenced memory address is not determined until the reference is made.
We now have two different types of addresses: logical addresses (in the range 0 to max) and physical addresses (in the
range R + 0 to R + max for a base value R). The user generates only logical addresses and thinks that the process runs in
locations 0 to max. The user program supplies logical addresses; these logical addresses must be mapped to physical
addresses before they are used.
The concept of a logical address space that is bound to a separate physical address space is central to proper memory
management.
8
Dynamic Loading
In our discussion so far, the entire program and all data of a process must be in physical memory for the process to execute. The
size of a process is thus limited to the size of physical memory. To obtain better memory-space utilization, we can use dynamic
loading. With dynamic loading, a routine is not loaded until it is called. All routines are kept on disk in a re-locatable load format.
The main program is loaded into memory and is executed. When a routine needs to call another routine, the calling routine first
checks to see whether the other routine has been loaded. If not, the re-locatable linking loader is called to load the desired
routine into memory and to update the program's address tables to reflect this change. Then control is passed to the newly
loaded routine. The advantage of dynamic loading is that an unused routine is never loaded. This method is particularly useful
when large amounts of code are needed to handle infrequently occurring cases, such as error routines. In this case, although the
total program size may be large, the portion that is used (and hence loaded) may be much smaller.
Dynamic loading does not require special support from the operating system. It is the responsibility of the users to design their
programs to take advantage of such a method. Operating systems may help the programmer, however, by providing library
routines to implement dynamic loading.
Dynamic Linking and Shared Libraries
Figure 3 also shows dynamically linked libraries. Some operating systems support only static linking, in which system language
libraries are treated like any other object module and are combined by the loader into the binary program image. The concept of
dynamic linking is similar to that of dynamic loading. Here, though, linking, rather than loading, is postponed until execution time.
This feature is usually used with system libraries, such as language subroutine libraries. Without this facility, each program on a
system must include a copy of its language library (or at least the routines referenced by the program) in the executable image.
This requirement wastes both disk space and main memory. With dynamic linking, a stub is included in the image for each
library-routine reference. The stub is a small piece of code that indicates how to locate the appropriate memory-resident library
routine or how to load the library if the routine is not already present. When the stub is executed, it checks to see whether the
needed routine is already in memory. If not, the program loads the routine into memory. Either way, the stub replaces itself with
the address of the routine and executes the routine. Thus, the next time that particular code segment is reached, the library
routine is executed directly, incurring no cost for dynamic linking. Under this scheme, all processes that use a language library
execute only one copy of the library code.
This feature can be extended to library updates (such as bug fixes). A library may be replaced by a new version, and all
programs that reference the library will automatically use the new version. Without dynamic linking, all such programs would
need to be re-linked to gain access to the new library. So that programs will not accidentally execute new, incompatible versions
of libraries, version information is included in both the program and the library. More than one version of a library may be loaded
into memory, and each program uses its version information to decide which copy of the library to use. Thus, only programs that
are compiled with the new library version are affected by the incompatible changes incorporated in it. Other programs linked
9
before the new library was installed will continue using the older library. This system is also known as shared libraries.
Monoprogramming without Swapping or Paging
The simplest possible memory management scheme is to run just one program at a time, sharing the memory between that
program and the operating system. Three variations on this theme are shown in Figure 5. The operating system may be at the
bottom of memory in RAM (Random Access Memory), as shown in Fig.5 a), or it may be in ROM (Read-Only Memory) at the
top of memory, as shown in Fig.5 b), or the device drivers may be at the top of memory in a ROM and the rest of the system in
RAM down below, as shown in in Fig.5 c). The first model was formerly used on mainframes and minicomputers but is rarely
used any more. The second model is used on some palmtop computers and embedded systems. The third model was used
by early personal computers (e.g., running MS-DOS), where the portion of the system in the ROM is called the BIOS (Basic
Input Output System).
Figure 5. Three simple ways of organizing memory with an operating system and one user process. Other possibilities also exist.
When the system is organized in this way, only one process at a time can be running. As soon as the user types a command, the
operating system copies the requested program from disk to memory and executes it. When the process finishes, the operating
system displays a prompt character and waits for a new command. When it receives the command, it loads a new program into
memory, overwriting the first one.
10
Multiprogramming with Fixed Partitions
Except on very simple embedded systems, monoprogramming is hardly used any more. Most modern systems allow multiple
processes to run at the same time. Having multiple processes running at once means that when one process is blocked waiting
for I/O to finish, another one can use the CPU. Thus multiprogramming increases the CPU utilization. Network servers always
have the ability to run multiple processes (for different clients) at the same time, but most client (i.e., desktop) machines also
have this ability nowadays.
The easiest way to achieve multiprogramming is simply to divide memory up into n (possibly unequal) partitions. This
partitioning can, for example, be done manually when the system is started up.
When a job arrives, it can be put into the input queue for the smallest partition large enough to hold it. Since the partitions are
fixed in this scheme, any space in a partition not used by a job is wasted while that job runs. In Figure 6 a) we see how this
system of fixed partitions and separate input queues looks.
Figure 6. (a) Fixed memory partitions with
separate input queues for each partition.
(b) Fixed memory partitions with a single input queue.
11
The disadvantage of sorting the incoming jobs into separate queues becomes apparent when the queue for a large partition is
empty but the queue for a small partition is full, as is the case for partitions 1 and 3 in Figure 6 a). Here small jobs have to wait
to get into memory, even though plenty of memory is free. An alternative organization is to maintain a single queue as in Figure
6 b). Whenever a partition becomes free, the job closest to the front of the queue that fits in it could be loaded into the empty
partition and run. Since it is undesirable to waste a large partition on a small job, a different strategy is to search the whole input
queue whenever a partition becomes free and pick the largest job that fits. Note that the latter algorithm discriminates against
small jobs as being unworthy of having a whole partition, whereas usually it is desirable to give the smallest jobs (often
interactive jobs) the best service, not the worst.
One way out is to have at least one small partition around. Such a partition will allow small jobs to run without having to allocate
a large partition for them.
Another approach is to have a rule stating that a job that is eligible to run may not be skipped over more than k times. Each
time it is skipped over, it gets one point. When it has acquired k points, it may not be skipped again.
This system, with fixed partitions set up by the operator in the morning and not changed thereafter, was used by OS/360 on
large IBM mainframes for many years. It was called MFT (Multiprogramming with a Fixed number of Tasks or OS/MFT). It is
simple to understand and equally simple to implement: incoming jobs are queued until a suitable partition is available, at which
time the job is loaded into that partition and run until it terminates. However, nowadays, few, if any, operating systems, support
this model, even on mainframe batch systems.
Swapping
A process must be in memory to be executed. A process, however, can be swapped temporarily out of memory to a backing
store and then brought back into memory for continued execution. For example, assume a multiprogramming environment with
a round-robin CPU-scheduling algorithm. When a quantum expires, the memory manager will start to swap out the process that
just finished and to swap another process into the memory space that has been freed (Figure 7). In the meantime, the CPU
scheduler will allocate a time slice to some other process in memory. When each process finishes its quantum, it will be
swapped with another process. Ideally, the memory manager can swap processes fast enough that some processes will be in
memory, ready to execute, when the CPU scheduler wants to reschedule the CPU. In addition, the quantum must be large
enough to allow reasonable amounts of computing to be done between swaps.
A variant of this swapping policy is used for priority-based scheduling algorithms. If a higher-priority process arrives and wants
service, the memory manager can swap out the lower-priority process and then load and execute the higher-priority process.
When the higher-priority process finishes, the lower-priority process can be swapped back in and continued. This variant of
swapping is sometimes called roll out, roll in.
12
Fig. 7 Swapping 2 processes using a disk as a backing device
The operation of a swapping system is illustrated in Fig. 8. Initially, only process A is in memory. Then processes B and C are
created or swapped in from disk. In Fig 8 d) A is swapped out to disk. Then D comes in and B goes out. Finally A comes in again.
Since A is now at a different location, addresses contained in it must be relocated, either by software when it is swapped in or
(more likely) by hardware during program execution.
13
Fig. 8 Memory allocation changes as processes come into memory and leave it. The shaded regions are unused memory.
The main difference between the fixed partitions of Fig. 6 and the variable partitions of Fig 8 is that the number, location, and
size of the partitions vary dynamically in the latter as processes come and go, whereas they are fixed in the former. The
flexibility of not being tied to a fixed number of partitions that may be too large or too small improves memory utilization, but it
also complicates allocating and de-allocating memory, as well as keeping track of it.
When swapping creates multiple holes in memory, it is possible to combine them all into one big one by moving all the
processes downward as far as possible. This technique is known as memory compaction. It is usually not done because it
requires a lot of CPU time. For example, on a 1-GB machine that can copy at a rate of 2 GB/sec (0.5 nsec/byte) it takes about
0.5 sec to compact all of memory. That may not seem like much time, but it would be noticeably disruptive to a user watching a
video stream. Normally, a process that is swapped out will be swapped back into the same memory space it occupied
previously. This restriction is dictated by the method of address binding. If binding is done at assembly or load time, then the
process cannot be easily moved to a different location. If execution-time binding is being used, however, then a process can be
swapped into a different memory space, because the physical addresses are computed during execution time.
Swapping requires a backing store. The backing store is commonly a fast disk. It must be large enough to allocate copies of all
memory images for all users, and it must provide direct access to these memory images. The system maintains a ready queue
consisting of all processes whose memory images are on the backing store or in memory and are ready to run. Whenever the
CPU scheduler decides to execute a process, it calls the dispatcher. The dispatcher checks to see whether the next process in
the queue is in memory. If it is not, and if there is no free memory region, the dispatcher swaps out a process currently in
memory and swaps in the desired process. It then reloads registers and transfers control to the selected process.
The context-switch time in such a swapping system is fairly high. To get an idea of the context-switch time, let us assume that
the user process is 10 MB in size and the backing store is a standard hard disk with a transfer rate of 40 MB per second. The
actual transfer of the 10-MB process to or from main memory takes 10000 KB/40000 KB per second = 1/4 second = 250
milliseconds. Assuming that no head seeks are necessary, and assuming an average latency of 8 milliseconds, the swap time
is 258 milliseconds. Since we must both swap out and swap in, the total swap time is about 516 milliseconds.
For efficient CPU utilization, we want the execution time for each process to be long relative to the swap time. Thus, in a roundrobin CPU-scheduling algorithm, for example, the time quantum should be substantially larger than 0.516 seconds.
Notice that the major part of the swap time is transfer time. The total transfer time is directly proportional to the amount of
memory swapped. If we have a computer system with 512 MB of main memory and a resident operating system taking 25 MB,
the maximum size of the user process is 487 MB. However, many user processes may be much smaller than this—say, 10 MB.
A 10-MB process could be swapped out in 258 milliseconds, compared with the 6.4 seconds required for swapping 256 MB.
Clearly, it would be useful to know exactly how much memory a user process is using, not simply how much it might be using.
Then we would need to swap only what is actually used, reducing swap time. For this method to be effective, the user must
keep the system informed of any changes in memory requirements. Thus, a process with dynamic memory requirements will
need to issue system calls (request memory and release memory) to inform the operating system of its changing memory
needs.
14
Swapping is constrained by other factors as well. If we want to swap a process, we must be sure that it is completely idle. Of
particular concern is any pending I/O. A process may be waiting for an I/O operation when we want to swap that process
to free up memory. However, if the I/O is asynchronously accessing the user memory for I/O buffers, then the process cannot be
swapped. Assume that the I/O operation is queued because the device is busy. If we were to swap out process P1 and swap in
process P2, the I/O operation might then attempt to use memory that now belongs to process P2. There are two main solutions
to this problem: Never swap a process with pending I/O, or execute I/O operations only into operating-system buffers. Transfers
between operating-system buffers and process memory then occur only when the process is swapped in. A point that is worth
making concerns how much memory should be allocated for a process when it is created or swapped in. If processes are
created with a fixed size that never changes, then the allocation is simple: the operating system allocates exactly what is
needed, no more and no less.
If, however, processes' data segments can grow, for example, by dynamically allocating memory from a heap, as in many
programming languages, a problem occurs whenever a process tries to grow. If a hole is adjacent to the process, it can be
allocated and the process can be allowed to grow into the hole. On the other hand, if the process is adjacent to another process,
the growing process will either have to be moved to a hole in memory large enough for it, or one or more processes will have to
be swapped out to create a large enough hole. If a process cannot grow in memory and the swap area on the disk is full, the
process will have to wait or be killed.
If it is expected that most processes will grow as they run, it is probably a good idea to allocate a little extra memory whenever a
process is swapped in or moved, to reduce the overhead associated with moving or swapping processes that no longer fit in
their allocated memory. However, when swapping processes to disk, only the memory actually in use should be swapped; it is
wasteful to swap the extra memory as well. In Fig 9 a) and b) we see a memory configuration in which space for growth has
been allocated to two processes.
15
Memory Management with Bitmaps
When memory is assigned dynamically, the operating system must manage it. In general terms, there are two ways to keep track
of memory usage: bitmaps and free lists. In this section and the next one we will look at these two methods in turn.
With a bitmap, memory is divided up into allocation units, perhaps as small as a few words and perhaps as large as several
kilobytes. Corresponding to each allocation unit is a bit in the bitmap, which is 0 if the unit is free and 1 if it is occupied (or vice
versa). Figure 10 shows part of memory and the corresponding bitmap.
Figure 10. (a) A part of memory with five processes and three holes. The tick marks show the
memory allocation units. The shaded regions (0 in the bitmap) are free. (b) The corresponding
bitmap. (c) The same information as a list.
The size of the allocation unit is an important design issue. The smaller the allocation unit, the larger the bitmap. However, even
with an allocation unit as small as 4 bytes, 32 bits of memory will require only 1 bit of the map. A memory of 32n bits will use n
map bits, so the bitmap will take up only 1/33 of memory. If the allocation unit is chosen large, the bitmap will be smaller, but
appreciable memory may be wasted in the last unit of the process if the process size is not an exact multiple of the allocation
unit.
16
Another way of keeping track of memory is to maintain a linked list of allocated and free memory segments, where a segment is
either a process or a hole between two processes. The memory of Figure 10 a) is represented in Fig. 10 c) as a linked list of
segments. Each entry in the list specifies a hole (H) or process (P), the address at which it starts, the length, and a pointer to the
next entry.
In this example, the segment list is kept sorted by address. Sorting this way has the advantage that when a process terminates
or is swapped out, updating the list is straightforward. A terminating process normally has two neighbors (except when it is at
the very top or very bottom of memory). These may be either processes or holes, leading to the four combinations shown in Fig.
11. In Fig. 11 a) updating the list requires replacing a P by an H. In Fig. 11 (b) and also in Fig. 11(c), two entries are coalesced
into one, and the list becomes one entry shorter. In Fig. 11(d), three entries are merged and two items are removed from the list.
Since the process table slot for the terminating process will normally point to the list entry for the process itself, it may be more
convenient to have the list as a double-linked list, rather than the single-linked list of Fig. 10(c). This structure makes it easier to
find the previous entry and to see if a merge is possible.
Figure 11. Four neighbor combinations for the terminating process, X.
When the processes and holes are kept on a list sorted by address, several algorithms can be used to allocate memory for a newly
created process (or an existing process being swapped in from disk). We assume that the memory manager knows how much
memory to allocate. The simplest algorithm is first fit. The process manager scans along the list of segments until it finds a hole
that is big enough. The hole is then broken up into two pieces, one for the process and one for the unused memory, except in the
statistically unlikely case of an exact fit. First fit is a fast algorithm because it searches as little as possible.
17
A minor variation of first fit is next fit. It works the same way as first fit, except that it keeps track of where it is whenever it finds
a suitable hole. The next time it is called to find a hole, it starts searching the list from the place where it left off last time, instead
of always at the beginning, as first fit does. Simulations by Bays show that next fit gives slightly worse performance than first fit.
Another well-known algorithm is best fit. Best fit searches the entire list and takes the smallest hole that is adequate. Rather
than breaking up a big hole that might be needed later, best fit tries to find a hole that is close to the actual size needed.
As an example of first fit and best fit, consider Figure 10 again. If a block of size 2 is needed, first fit will allocate the hole at 5,
but best fit will allocate the hole at 18.
Best fit is slower than first fit because it must search the entire list every time it is called. Somewhat surprisingly, it also results in
more wasted memory than first fit or next fit because it tends to fill up memory with tiny, useless holes. First fit generates larger
holes on the average.
To get around the problem of breaking up nearly exact matches into a process and a tiny hole, one could think about worst fit,
that is, always take the largest available hole, so that the hole broken off will be big enough to be useful. Simulation has shown
that worst fit is not a very good idea either.
All four algorithms can be speeded up by maintaining separate lists for processes and holes. In this way, all of them devote their
full energy to inspecting holes, not processes. The inevitable price that is paid for this speedup on allocation is the additional
complexity and slowdown when deallocating memory, since a freed segment has to be removed from the process list and
inserted into the hole list.
If distinct lists are maintained for processes and holes, the hole list may be kept sorted on size, to make best fit faster. When
best fit searches a list of holes from smallest to largest, as soon as it finds a hole that fits, it knows that the hole is the smallest
one that will do the job, hence the best fit. No further searching is needed, as it is with the single list scheme. With a hole list
sorted by size, first fit and best fit are equally fast, and next fit is pointless.
18