Memory Thrashing Protection in Multi
Download
Report
Transcript Memory Thrashing Protection in Multi
Memory Thrashing Protection in
Multi-Programming Environment
Xiaodong Zhang
College of William and Mary
Memory Management for Multiprogramming
Space sharing among interactive programs in
virtual memory is managed by page replacement.
Commonly used policy is the global LRU
replacement in the entire user memory space.
Thrashing: accumulated memory requests of
multiple programs exceed available user space,
No program is able to establish its working set;
causing large page faults;
low CPU utilization; and
execution of each program practically stops.
Past and Existing Thrashing Protection Methods
Local page replacement:
Each program is statically allocated a fixed size.
DEC VAX machine had this in its VMS in early 1980’s.
Two problems: (1) under utilization of memory space,
and (2) could not adapt dynamic changes of programs.
It is no longer used in any systems.
Load control:
While thrashing, some job(s) is/are suspend or swapped.
Open BSD operating systems, IBM RS/6000, HP9000.
HP-UX has a ``serialize()” command for thrashing.
Linux makes load controls based on RSS (resident set
size) reporting the total number of occupied pages.
Limits and Problems of Load Controls
A thrashing is often triggered by a brief spike of
memory demand, a load control can over-react.
Suspending a job causes other related jobs to quit.
When a job is suspended, its working set can be
replaced quickly by other running programs, very
expensive to rebuild the working set.
A lightweight and dynamic protection is much
more desirable than a brute-force action.
Some Insights into Thrashing
The global LRU replacement generates two types
of LRU pages for replacement:
True LRU pages: to which programs do not need to
access.
False LRU pages: to which programs have not been able
to access due to required working set is not set up yet, or
page faults are being conducted.
A system cannot distinguish true or false LRU
pages, but selects both for replacement.
The amount false LRU pages is a status indicator:
no, marginally, or seriously thrashing.
Token-based Thrashing Protection Facility
Jiang/Zhang, Performance Evaluation, 05, (W&M).
Conducted intensive experiments at the kernel level along
with analysis on memory thrashing:
A sudden spike of memory demand from one can generate many
false LRU pages in others, particularly in an less demanding one.
As false pages reach to a certain amount, the system becomes
little productive even when physical memory is not too small.
Basic idea of the token mechanism:
As the system enters a pre-thrashing stage (low RSS, and high
idle CPU), a token is issued to a process so that it can quickly
form its working set and proceed.
This approach can effectively and timely avoid thrashing.
Some Alternatives in Its Implementation
Which process to issue the token?
A less memory demanding process.
How long does a process hold the token?
It is adjustable and proportional to the thrashing degree.
What happens if thrashing is too serious?
It becomes a polite load control mechanism by setting a
long token time so that each program has to be executed
one by one.
Multi-tokens can be effective for light thrashing.
The token and its variations were implemented and
tested in Linux kernel 2.2.
Outcome of This Work
A paper entitled ``Token-ordered LRU: …” has
been rejected by several top system conferences.
(Main reason: this is not a hot OS topic anymore).
A successful technology transfer based on it!
A group of independent Linux kernel developers
organized by Rik van Reil of RedHat started a project to
include the token into the Linux kernel in July 2004.
The implementation insights and detailed technical
discussions are well documented in the Internet.
Token was formally adopted in Linux kernel 2.6.,
12/04, serving millions of users world wide.