Single System Image OS for Clusters: Kerrighed
Download
Report
Transcript Single System Image OS for Clusters: Kerrighed
GGF10 - GridCPR-WG
PARIS project-team Activities in Checkpoint
Recovery
Christine Morin
[email protected]
PARIS INRIA project-team
IRISA – Rennes (France)
http://www.irisa.fr/paris
Berlin, March 11th, 2004
1
SAN
LAN
Cluster Federations
SAN
A particular case of grid
Interconnection of several clusters of moderate size
Homogeneity and heterogeneity
More and more homogeneous platforms: PC, Linux
Heterogeneous networks (SAN, LAN, WAN)
Clusters with different amount and kinds of resources
Considered applications
Scientific applications (numerical simulation)
sequential and parallel applications based either on the shared memory
or the message-passing communication paradigm
WAN
Code coupling applications
Applications requiring a huge amount of resources (memory, computing
power)
Dynamicity
A cluster may join or leave the federation at any time
Individual nodes may fail in a cluster
Berlin, March 11th, 2004
2
Grid-aware OS for Cluster
Federations
A single system image OS on each cluster
A cluster appears as a single machine which offers a kind of standard
interface
Mosix, Amoeba, Kerrighed
A cluster federation is seen as a set of pairs
DSM
Structured peer to peer (P2P) network (instead of a hierarchy)
Fully decentralized control
Native support for dynamicity
Designed for scalability
DFS
CPU
Size of the routing tables bounded by log(N)
Probabilistic log(N) bounds on the number of routing hops
“Standardization” of the APIs (IRIS project)
Promising work to take into account the network's topology and security
issues (Pastry)
Structured P2P systems usually provide distributed hash tables (DHT)
Building block for higher level services
Berlin, March 11th, 2004
3
Current Work on Checkpoint
Recovery
Cluster Federation
Execution of multithreaded applications in cluster federations
A coherence protocol for cached copies of volatile objects in peer-topeer systems (multiple failures tolerated)
Hierarchical checkpointing protocol for code coupling applications
Cluster SSI image operating system: Kerrighed
Full Posix thread interface
Global process and memory management
Configurable global scheduler
High availability
Dynamic resource management for tolerating cluster
reconfigurations (node addition, eviction or failure)
Checkpoint recovery mechanisms
Berlin, March 11th, 2004
4
Goals for Checkpoint Recovery in
Kerrighed
Experimental platform for
checkpointing strategies for parallel
applications
Basic mechanisms common to
different checkpointing protocols in
MP and SM systems
Being able to checkpoint any kind of
parallel application
Transparent checkpointing
Implementation in a single system of
various checkpointing strategies
To allow the programmer to
choose a suitable strategy for a
particular application
To be able to compare several
strategies with realistic
(industrial) applications
Avoid code duplication in the system
Robustness
Fair comparison
Common framework
Dependency management
Unified model for message-passing
and shared memory models
Direct Dependency Vector (DDV)
management
Message logging
Incremental checkpointing
Checkpointing in background
Communication system
Checkpoint and rollback servers
Checkpoint numbering
Atomic multicast
Stable storage
Berlin, March 11th, 2004
Different implementations
Disk
Memory
5
Checkpoint Recovery in Kerrighed:
Current Status and Work Directions
Current Status
Linux-based Kerrighed prototype
(2.4)
Small kernel patch and a set of
modules
Transparent checkpoint recovery for
(computing) individual processes
Virtualization of a process in the
cluster
Unique ghost mechanism for
process migration,
checkpointing and restoration
Easy specialization of the stable
storage implementation
Ghost can be sent to or
retrieved from network,
memory or disk
Work Directions
Duplication
Complete the debugging of
coordinated checkpointing (and
recovery) for multithreaded and
message-passing based
applications
Checkpointable locks and
barriers in a cluster
Disk I/O management
Posix extension for a proper
integration of transparent
checkpointing/recovery in the
operating system
Migration
Checkpoint/restart
Ghost process
Memory
Berlin, March 11th, 2004
Disk
Network
6
Hierarchical Checkpoint Recovery for
Cluster Federations
Relaxed inter-cluster synchronism to
reflect the architecture
Coordinated checkpointing in a
cluster
Communication-induced
checkpointing between clusters
Independent checkpoints in
each cluster
Forced checkpoints when a
communication generates a
new dependency
Evaluation by discrete-event
simulation
Works well if
Few inter-cluster
communications
Inter-cluster
communications « quasiunidirectional »
Force a checkpoint only if the
sender has saved a
checkpoint since its last send
Several cluster checkpoints are kept
Management of Direct Dependency
Vectors (DDV) to detect
dependencies
Simulation
Processing
Display
DDV included in inter-cluster
messages
DDV associated with cluster
checkpoints
Garbage collection of useless
cluster checkpoints
Simulation
Berlin, March 11th, 2004
Simulation
7
Future Work
Checkpoint recovery in the large (we plan to hire a PhD student)
Dealing with applications with huge data sets executed in cluster
federations
Follow-up of our preliminary work on a hierarchical checkpointing protocol
for code coupling applications in cluster federations
Based on Kerrighed experimental platform
Not only basic coordinated checkpointing but also various variants of
independent and communication-induced strategies
Standard interface and basic building blocks
Implementation in Kerrighed of ideas studied in previous projects
ICARE fault tolerant software DSM
Combining replication inherent to the DSM with the replication needed for
ensuring recovery data stability
Extension of the coherence protocol to manage recovery data in memory
HA-PSLS
Integration of a DSM and a parallel file system
Up-grading ICARE
Cohabitation of persistent and memory checkpoints
Swap management (to avoid memory size limitation and to evict recovery data
from memory)
Mapped file management (in-place checkpoints)
Berlin, March 11th, 2004
8
Kerrighed is registered as a community trademark.
http://www.kerrighed.org
[email protected]
Berlin, March 11th, 2004
9
Software Distribution
Kerrighed web site
http://www.kerrighed.org (open since mid-November 2002)
Open source under GPL licence
Current version: Kerrighed V0.81 based on Linux 2.4.24
Kerrighed users mailing-list
[email protected] (created in April 2003)
Kerrighed forum (created February 2004)
Notes
Kerrighed is a registered trademark
Kerrighed deposit at APP for each public release
Kerrighed tutorial (in conjunction with ICS’04, Saint-Malo
(France), June 27th, 2004)
Berlin, March 11th, 2004
10
RoadMap for Kerrighed Prototype
March 2004
April 2004 Kerrighed V1.00 (SSI-OSCAR)
SGFD
January 2005 Kerrighed V1.10
MPI (with migration)
64 bits (opteron)
Checkpointing for parallel applications
July 2005 Kerrighed V2.0
High availability
Berlin, March 11th, 2004
11
Current Support: EDF
Kerrighed research prototype (2000-2003)
CRECO EDF/INRIA
CIFRE Ph.D. grant (Geoffroy Vallée)
Industrial Post-Doc (Renaud Lottiaux)
Experimentations with first industrial applications provided by EDF
HRM1D, CATHARE, Cyrano 3, Aster
Kerrighed integration in OSCAR (2004-2005)
INRIA Industrial Post-Doc (G. Vallée) with EDF & ORNL
SSI-OSCAR
Berlin, March 11th, 2004
12
Current Support: DGA
Kerrighed robustness and full set of functionalities (2003-2005)
COCA PEA funded by DGA
Partnership with CGEY and ONERA-CERT
2 full time engineers (Renaud Lottiaux, David Margery)
Experimentations with industrial applications
Ligase, Gorf3D, Mixsar, RTI HLA
Berlin, March 11th, 2004
13
Current Kerrighed Team (being part
of the PARIS project-team)
Faculty
Geoffroy Vallée (PDI-EDF)
Renaud Lottiaux (INRIA)
David Margery (INRIA)
Invited researcher
Pascal Gallard (INRIA)
Gaël Utard (INRIA)
Louis Rilling (ENS-Cachan)
Engineers
Post-doc
Christine Morin (DR, INRIA)
Former members
PhD students
Isaac Scherson (UCI)
Master students
Jamal Ghaffour
Etienne Rivière
Berlin, March 11th, 2004
Ramamurthy Badrinath
(assistant professor, IIT
Kharagpur, India)
May 2002 – April 2003
Viet Hoa Dinh (engineer)
September 2001September 2002
Jean-Yves Burlett (Master
student, univ. Rennes 1)
February-June 2001
Sébastien Monnet (Master
student, univ. Rennes 1)
February-June 2003
H. Maka (Bachelor student, IIT
Kharagpur)
May-July 2003
14
Academic Collaborations
University of Ulm, Germany
Rutgers University, USA
SSI-OSCAR
University of California, Irvine, USA
Myrinet, Infiniband
Self healing clusters
ORNL
Checkpointing for shared memory parallel applications
Global scheduling
Deakin University, Australia
SSI (informal contacts)
Berlin, March 11th, 2004
15