Process Introspection: A Checkpoint Mechanism for High

Download Report

Transcript Process Introspection: A Checkpoint Mechanism for High

Process Introspection: A Checkpoint Mechanism for
High Performance Heterogeneous Distributed
Systems.
University of Virginia.
Author: Adam J. Ferrari.
Some Basic Terminology.
 What is a Process?
A process is an entity that is actually running in an
operating system.
 What does Introspection mean?
Introspection means understanding one’s inner
self. ( Merriam-Webster Online)
Goals of the Process Introspection
Project.
 To construct a checkpoint/restart mechanism for a
heterogeneous environment.
 This mechanism should be:
1. Efficient,
2. Flexible,
3. Most importantly platform independent.
Heterogeneous Environment.
 Became famous mainly due to their better
price/performance ratio.
 Some characteristics :
1. A conglomeration of workstations running on different
operating systems and varied architecture bound together
using a network line.
2. Generally used for computing intensive applications where
many workstations that are idle/having less load participate
in finishing of a task, providing efficient utilization of idle
time.
3. User Dedicated machines.
Ex: Our Own Department.
Efficient Utilization.
 To take the advantage of these heterogeneous workstations,
the following schemes should be provided to the processes:
1. Process Migration.
2. Load Balancing.
3. Fault Tolerance.
Checkpoint/Restart Mechanism.
 Mainly Two Phases:
1. To save the current running state
of a process.
2. Reconstruct the original running
process from the saved image and
resume the execution from
exactly the interrupted point.
Advantages of using the
Checkpoint/Restart Mechanism.
 Process Migration.
1. Distributed Load Balancing.
2. Efficient Resource Utilization.
 Crash Recovery and Rollback Transaction.
 Useful in System Administration.
 Lowering the Programming Burden.
 Running complex simulation or complex modeling.
Implementation Challenges/Complexity.
 Due to the heterogeneous nature of the computing
environment the checkpoint/restart mechanism should be
platform independent.
1. Capture a state of a running process.
2. Reinstantiate it on a completely different architecture or
OS platform which consist of a different instruction set,
data format, address space layout.
Existing Implementations.

V migration mechanism.
Compiler support is used to generate meta information about a process
describing the locations and types of data items to be modified at migration
time to mask data representation differences.
Disadvantages:
1. Requires Kernel Support. Some other examples: MOSIX, Sprite.
2. Requires data to be stored at the same address in all migrated versions.
 Theimer and Hayes.
Construct an intermediate source code representation of a running process at
the point of migration, and to recompile this source at migration target.
Never been implemented.
Process Introspection Design.
 Process + Introspection : The ability of a process to
examine and describe its own internal state in a logical,
and platform independent format.
 Extends the technique of handcoding checkpoint/restart
mechanism into an automated approach.
Components Involved.
 The Process Introspection Design Pattern.
 Process Introspection Library (PIL).
 Automatic Process Introspection Compiler (APrIL).
 Standard Checkpoint Interface.
 Central Checkpoint Coordinator.
Process Introspection Design Pattern.
 A design template for writing checkpointable
codes.
 Based on a Process Model.
Adding functionality to the modules.
 Ability to save/restore threads of control.
1. Poll points (checkpoint requests) inserted to save call stacks.
- Poll point placement is a key performance trade-off issue.
2. Serving a Checkpoint Request.
save data and logical point of execution and return to its calling subroutine.
3. Restart a process from checkpoint.
restore the variables from the checkpoint and use control flow to reach the
correct point of execution, as mentioned by the checkpoint from the initial
subroutine that is active at the checkpoint.
Call the next subroutine from the checkpointed stack.
Adding functionality to the process contd ...

Ability to save/restore memory
blocks.
-- Should take care of different
data representation and address
space layout on different
platforms.
For pointers.
1. Can’t
save a raw memory address.
2. Have to save a logical description.
High level descriptors are needed.
APrIL Compiler
Transformed Code
Hand-coded
Checkpointable
Modules
Process Introspection Library
(PIL)
Checkpoint
Process Introspection Library (PIL).

A consistent API for manipulating the elements of a
process.
 Automates and integrates:
Thread management.
Logical Program Counter Stack.
Data format conversion.
Checkpoint/restart of statically allocated data.
Checkpoint/restart of dynamically allocated data.
Pointer analysis/description.
APrIL: Automatic Process Introspection
Compiler.
 A source code translator.
 Inserting code to keep the PIL tables updated during run
time.
 Placement of Poll Points in the module code as the thread
executing code in the module periodically polls for
checkpoint requests.
 During restart, process must restore all threads of
execution.
Example - Function Prologues
void example(double *A)
{
int i;
double temp[100];
PIL_RegisterStackPointer(temp,PIL_Double,100);
if(PIL_CheckpointStatus&PIL_StatusRestoreNow) {
int PIL_restore_point = PIL_PopLPCValue();
A = PIL_RestoreStackPointer();
i = PIL_RestoreStackInt();
PIL_RestoreStackDoubles(temp,100);
switch(PIL_restore_point) {
case 1: PIL_DoneRestart(); goto _PIL_PollPt_1;
case 2: goto _PIL_PollPt_2;
case 3: PIL_DoneRestart(); goto _PIL_PollPt_3;
}
}
}
Example - Poll Points
_PIL_PollPt_2:
i = function(A,X,100);
_PIL_PollPt_3:
if(PIL_CheckpointStatus&PIL_StatusCheckpointNow) {
if(PIL_CheckpointStatus&PIL_StatusCheckpointInProgress)
PIL_PushLPCValue(2);
else {
PIL_PushLPCValue(3);
PIL_CheckpointStatus|=PIL_StatusCheckpointInProgress;
}
goto _PIL_save_frame_;
}
.
_PIL_save_frame_:
PIL_SaveStackPointer(A);
PIL_SaveStackInt(i);
PIL_SaveStackDoubles(X,100);
return;
APrIL: Automatic Process Inrospection
Compiler.
APrIL
High Level
Language
PIL
Transformed
Code
Back End
Compilers
Binary 1
Binary 2
Binary N
Checkpoint Coordination and Module
Interfaces.
 Helps in achieving interoperation of modules to produce
checkpoint or restart processes.
 SCI events:
Process Startup. – registers any global or data type
definitions
Checkpoint Start/End – information of the module
Restart. – restoring the state from checkpoint.
Judging an implementation.
 Little or no Programmer effort.
 Convenient Programmer Interface.
 Low Checkpoint Request Service Latency.
 Low Runtime Overhead.
 Control over the number of checkpoints.
 Should mix with the environment.
Example Overhead
Measurements
N
Normal
Trans
Opt
Trans Opt
Latency
NXN Matrix Multiply, RS/6000, xlc
32
64
128
256
0.03
0.26
2.66
21.16
0.03
0.27
2.66
21.17
0.01
0.06
1.16
9.14
0.01
0.07
1.16
9.18
0.04
0.08
0.09
0.2
512
288.96
288.99
198.01
199.95
0.6
N
Normal
Trans
Opt
Trans Opt
Latency
Quicksort, 2^N Keys, Ultrasparc, gcc
17
18
19
20
1.95
3.28
6.86
13.91
1.96
3.68
7.54
15.31
0.83
1.2
2.48
4.92
1
1.46
2.99
5.94
0.02
0.022
0.025
0.026
21
28.25
31.25
9.85
12.22
0.03
Run times in seconds, Latencies in milliseconds
Project Status.
 Prototype PIL implemented.
Tested across multiple platforms :
Solaris, IRIX, AIX, OSF1, Linux, Win95/NT.
 Example applications demonstrated
E.g. matrix multiply, SOR, sort.
Hand coded to use PIL.
Checkpointed /restarted across above platforms.
 APrIL under design and construction
References.
 The Process Introspection Project.
http://www.cs.virginia.edu/~ajf2j/introspect/
 Transparent Checkpointing under Unix. J.S. Plank, M.Beck,
G. Kingsley, and K. Li.
 CRAK: Linux Checkpoint/Restart As a Kernel Module.
Hua Zhong and Jason Nieh (Linux taken as example to explain the
design concepts).
Thank you. Questions ??