Single System Image OS for Clusters: Kerrighed

Download Report

Transcript Single System Image OS for Clusters: Kerrighed

GGF10 - GridCPR-WG
PARIS project-team Activities in Checkpoint
Recovery
Christine Morin
[email protected]
PARIS INRIA project-team
IRISA – Rennes (France)
http://www.irisa.fr/paris
Berlin, March 11th, 2004
1
SAN
LAN
Cluster Federations
SAN

A particular case of grid



Interconnection of several clusters of moderate size
Homogeneity and heterogeneity
 More and more homogeneous platforms: PC, Linux
 Heterogeneous networks (SAN, LAN, WAN)
 Clusters with different amount and kinds of resources
Considered applications

Scientific applications (numerical simulation)
 sequential and parallel applications based either on the shared memory
or the message-passing communication paradigm



WAN
Code coupling applications
Applications requiring a huge amount of resources (memory, computing
power)
Dynamicity


A cluster may join or leave the federation at any time
Individual nodes may fail in a cluster
Berlin, March 11th, 2004
2
Grid-aware OS for Cluster
Federations

A single system image OS on each cluster



A cluster appears as a single machine which offers a kind of standard
interface
Mosix, Amoeba, Kerrighed
A cluster federation is seen as a set of pairs

DSM
Structured peer to peer (P2P) network (instead of a hierarchy)
 Fully decentralized control
 Native support for dynamicity
 Designed for scalability

DFS


CPU


Size of the routing tables bounded by log(N)
Probabilistic log(N) bounds on the number of routing hops
“Standardization” of the APIs (IRIS project)
Promising work to take into account the network's topology and security
issues (Pastry)
Structured P2P systems usually provide distributed hash tables (DHT)
 Building block for higher level services
Berlin, March 11th, 2004
3
Current Work on Checkpoint
Recovery

Cluster Federation

Execution of multithreaded applications in cluster federations



A coherence protocol for cached copies of volatile objects in peer-topeer systems (multiple failures tolerated)
Hierarchical checkpointing protocol for code coupling applications
Cluster SSI image operating system: Kerrighed


Full Posix thread interface
 Global process and memory management
 Configurable global scheduler
High availability
 Dynamic resource management for tolerating cluster
reconfigurations (node addition, eviction or failure)
 Checkpoint recovery mechanisms
Berlin, March 11th, 2004
4
Goals for Checkpoint Recovery in
Kerrighed


Experimental platform for
checkpointing strategies for parallel
applications
Basic mechanisms common to
different checkpointing protocols in
MP and SM systems



Being able to checkpoint any kind of
parallel application

Transparent checkpointing
Implementation in a single system of
various checkpointing strategies

To allow the programmer to
choose a suitable strategy for a
particular application

To be able to compare several
strategies with realistic
(industrial) applications
Avoid code duplication in the system

Robustness

Fair comparison

Common framework



Dependency management






Unified model for message-passing
and shared memory models
Direct Dependency Vector (DDV)
management
Message logging
Incremental checkpointing
Checkpointing in background
Communication system


Checkpoint and rollback servers
Checkpoint numbering
Atomic multicast
Stable storage

Berlin, March 11th, 2004
Different implementations

Disk

Memory
5
Checkpoint Recovery in Kerrighed:
Current Status and Work Directions

Current Status





Linux-based Kerrighed prototype
(2.4)

Small kernel patch and a set of
modules
Transparent checkpoint recovery for
(computing) individual processes
Virtualization of a process in the
cluster

Unique ghost mechanism for
process migration,
checkpointing and restoration
Easy specialization of the stable
storage implementation

Ghost can be sent to or
retrieved from network,
memory or disk
Work Directions




Duplication
Complete the debugging of
coordinated checkpointing (and
recovery) for multithreaded and
message-passing based
applications
Checkpointable locks and
barriers in a cluster
Disk I/O management
Posix extension for a proper
integration of transparent
checkpointing/recovery in the
operating system
Migration
Checkpoint/restart
Ghost process
Memory
Berlin, March 11th, 2004
Disk
Network
6
Hierarchical Checkpoint Recovery for
Cluster Federations

Relaxed inter-cluster synchronism to
reflect the architecture


Coordinated checkpointing in a
cluster
Communication-induced
checkpointing between clusters

Independent checkpoints in
each cluster

Forced checkpoints when a
communication generates a
new dependency





Evaluation by discrete-event
simulation

Works well if
 Few inter-cluster
communications
 Inter-cluster
communications « quasiunidirectional »
Force a checkpoint only if the
sender has saved a
checkpoint since its last send
Several cluster checkpoints are kept
Management of Direct Dependency
Vectors (DDV) to detect
dependencies


Simulation
Processing
Display
DDV included in inter-cluster
messages
DDV associated with cluster
checkpoints
Garbage collection of useless
cluster checkpoints
Simulation
Berlin, March 11th, 2004
Simulation
7
Future Work

Checkpoint recovery in the large (we plan to hire a PhD student)




Dealing with applications with huge data sets executed in cluster
federations
Follow-up of our preliminary work on a hierarchical checkpointing protocol
for code coupling applications in cluster federations
Based on Kerrighed experimental platform
 Not only basic coordinated checkpointing but also various variants of
independent and communication-induced strategies
 Standard interface and basic building blocks
Implementation in Kerrighed of ideas studied in previous projects
 ICARE fault tolerant software DSM



Combining replication inherent to the DSM with the replication needed for
ensuring recovery data stability
Extension of the coherence protocol to manage recovery data in memory
HA-PSLS


Integration of a DSM and a parallel file system
Up-grading ICARE



Cohabitation of persistent and memory checkpoints
Swap management (to avoid memory size limitation and to evict recovery data
from memory)
Mapped file management (in-place checkpoints)
Berlin, March 11th, 2004
8
Kerrighed is registered as a community trademark.
http://www.kerrighed.org
[email protected]
Berlin, March 11th, 2004
9
Software Distribution

Kerrighed web site




http://www.kerrighed.org (open since mid-November 2002)
Open source under GPL licence
Current version: Kerrighed V0.81 based on Linux 2.4.24
Kerrighed users mailing-list

[email protected] (created in April 2003)

Kerrighed forum (created February 2004)

Notes



Kerrighed is a registered trademark
Kerrighed deposit at APP for each public release
Kerrighed tutorial (in conjunction with ICS’04, Saint-Malo
(France), June 27th, 2004)
Berlin, March 11th, 2004
10
RoadMap for Kerrighed Prototype

March 2004


April 2004 Kerrighed V1.00 (SSI-OSCAR)


SGFD
January 2005 Kerrighed V1.10



MPI (with migration)
64 bits (opteron)
Checkpointing for parallel applications
July 2005 Kerrighed V2.0

High availability
Berlin, March 11th, 2004
11
Current Support: EDF

Kerrighed research prototype (2000-2003)



CRECO EDF/INRIA
 CIFRE Ph.D. grant (Geoffroy Vallée)
 Industrial Post-Doc (Renaud Lottiaux)
Experimentations with first industrial applications provided by EDF
 HRM1D, CATHARE, Cyrano 3, Aster
Kerrighed integration in OSCAR (2004-2005)


INRIA Industrial Post-Doc (G. Vallée) with EDF & ORNL
SSI-OSCAR
Berlin, March 11th, 2004
12
Current Support: DGA

Kerrighed robustness and full set of functionalities (2003-2005)


COCA PEA funded by DGA
 Partnership with CGEY and ONERA-CERT
 2 full time engineers (Renaud Lottiaux, David Margery)
Experimentations with industrial applications
 Ligase, Gorf3D, Mixsar, RTI HLA
Berlin, March 11th, 2004
13
Current Kerrighed Team (being part
of the PARIS project-team)

Faculty






Geoffroy Vallée (PDI-EDF)

Renaud Lottiaux (INRIA)
David Margery (INRIA)
Invited researcher


Pascal Gallard (INRIA)
Gaël Utard (INRIA)
Louis Rilling (ENS-Cachan)
Engineers



Post-doc


Christine Morin (DR, INRIA)
Former members
PhD students




Isaac Scherson (UCI)
Master students


Jamal Ghaffour
Etienne Rivière

Berlin, March 11th, 2004
Ramamurthy Badrinath
(assistant professor, IIT
Kharagpur, India)
 May 2002 – April 2003
Viet Hoa Dinh (engineer)
 September 2001September 2002
Jean-Yves Burlett (Master
student, univ. Rennes 1)

February-June 2001
Sébastien Monnet (Master
student, univ. Rennes 1)
 February-June 2003
H. Maka (Bachelor student, IIT
Kharagpur)
 May-July 2003
14
Academic Collaborations

University of Ulm, Germany


Rutgers University, USA



SSI-OSCAR
University of California, Irvine, USA


Myrinet, Infiniband
Self healing clusters
ORNL


Checkpointing for shared memory parallel applications
Global scheduling
Deakin University, Australia

SSI (informal contacts)
Berlin, March 11th, 2004
15