Profiling Facility on a Metasystem

Download Report

Transcript Profiling Facility on a Metasystem

Checkpointing Facility on a
Metasystem
Yudith Cardinale and Emilio Hernández
Universidad Simón Bolívar
Caracas, Venezuela
Metasystems


Heterogeneous computational resources
tied by middleware and accessible
transparently through a network
Checkpointing is a key service for
recovery from component failures
Java Checkpointing


Checkpoints can be taken in a machine
independent format
Strategies proposed for adding persistence
to the Java execution environment:



Language-level mechanisms
Extensions to the JVM
Checkpointing layer underneath the JVM
Java Checkpointing on
Metasystems

The selection of an appropriate
checkpointing approach for a metasystem
should take into account:




Portability
Transparency
Low intrusiveness
We chose the extended JVM approach
SUMA overview
Checkpointing facility in SUMA
Experiments




Application which calculates first “n”
prime numbers
143 MHz SUN Ultra 1 workstations with
Solaris 7 and JDK 1.2.2
SUMACkpThreadMonitor takes a
checkpoint every four minutes.
Checkpoints are saved in a shared file
system (NFS)
Results
Ckpt size
65KB
125KB
165KB
310KB
Ckpt time Recovery time
1,44 sec.
2,58 sec.
3,87 sec.
7,26 sec.
4,67 sec.
7,02 sec.
9,11 sec.
16,46 sec.

Execution time without ckpt = 819 min 7 sec

Execution time with ckpt = 868 min 41 sec

201 checkpoints were taken
Conclusions and future work



Transparent checkpointing and recovery
services in a Java-based metasystem
Architecture-independent Java checkpoints
Ongoing research: extend to parallel
applications and heuristics to activate
checkpointing automatically