Memory Thrashing Protection in Multi

Download Report

Transcript Memory Thrashing Protection in Multi

Memory Thrashing Protection in
Multi-Programming Environment
Xiaodong Zhang
College of William and Mary
Memory Management for Multiprogramming
 Space sharing among interactive programs in
virtual memory is managed by page replacement.
 Commonly used policy is the global LRU
replacement in the entire user memory space.
 Thrashing: accumulated memory requests of
multiple programs exceed available user space,




No program is able to establish its working set;
causing large page faults;
low CPU utilization; and
execution of each program practically stops.
Past and Existing Thrashing Protection Methods
 Local page replacement:




Each program is statically allocated a fixed size.
DEC VAX machine had this in its VMS in early 1980’s.
Two problems: (1) under utilization of memory space,
and (2) could not adapt dynamic changes of programs.
It is no longer used in any systems.
 Load control:




While thrashing, some job(s) is/are suspend or swapped.
Open BSD operating systems, IBM RS/6000, HP9000.
HP-UX has a ``serialize()” command for thrashing.
Linux makes load controls based on RSS (resident set
size) reporting the total number of occupied pages.
Limits and Problems of Load Controls
 A thrashing is often triggered by a brief spike of
memory demand, a load control can over-react.
 Suspending a job causes other related jobs to quit.
 When a job is suspended, its working set can be
replaced quickly by other running programs, very
expensive to rebuild the working set.
 A lightweight and dynamic protection is much
more desirable than a brute-force action.
Some Insights into Thrashing
 The global LRU replacement generates two types
of LRU pages for replacement:


True LRU pages: to which programs do not need to
access.
False LRU pages: to which programs have not been able
to access due to required working set is not set up yet, or
page faults are being conducted.
 A system cannot distinguish true or false LRU
pages, but selects both for replacement.
 The amount false LRU pages is a status indicator:
no, marginally, or seriously thrashing.
Token-based Thrashing Protection Facility
 Jiang/Zhang, Performance Evaluation, 05, (W&M).
 Conducted intensive experiments at the kernel level along
with analysis on memory thrashing:


A sudden spike of memory demand from one can generate many
false LRU pages in others, particularly in an less demanding one.
As false pages reach to a certain amount, the system becomes
little productive even when physical memory is not too small.
 Basic idea of the token mechanism:


As the system enters a pre-thrashing stage (low RSS, and high
idle CPU), a token is issued to a process so that it can quickly
form its working set and proceed.
This approach can effectively and timely avoid thrashing.
Some Alternatives in Its Implementation
 Which process to issue the token?

A less memory demanding process.
 How long does a process hold the token?

It is adjustable and proportional to the thrashing degree.
 What happens if thrashing is too serious?

It becomes a polite load control mechanism by setting a
long token time so that each program has to be executed
one by one.
 Multi-tokens can be effective for light thrashing.
 The token and its variations were implemented and
tested in Linux kernel 2.2.
Outcome of This Work
 A paper entitled ``Token-ordered LRU: …” has
been rejected by several top system conferences.
(Main reason: this is not a hot OS topic anymore).
 A successful technology transfer based on it!


A group of independent Linux kernel developers
organized by Rik van Reil of RedHat started a project to
include the token into the Linux kernel in July 2004.
The implementation insights and detailed technical
discussions are well documented in the Internet.
 Token was formally adopted in Linux kernel 2.6.,
12/04, serving millions of users world wide.