Profiling Facility on a Metasystem
Download
Report
Transcript Profiling Facility on a Metasystem
Checkpointing Facility on a
Metasystem
Yudith Cardinale and Emilio Hernández
Universidad Simón Bolívar
Caracas, Venezuela
Metasystems
Heterogeneous computational resources
tied by middleware and accessible
transparently through a network
Checkpointing is a key service for
recovery from component failures
Java Checkpointing
Checkpoints can be taken in a machine
independent format
Strategies proposed for adding persistence
to the Java execution environment:
Language-level mechanisms
Extensions to the JVM
Checkpointing layer underneath the JVM
Java Checkpointing on
Metasystems
The selection of an appropriate
checkpointing approach for a metasystem
should take into account:
Portability
Transparency
Low intrusiveness
We chose the extended JVM approach
SUMA overview
Checkpointing facility in SUMA
Experiments
Application which calculates first “n”
prime numbers
143 MHz SUN Ultra 1 workstations with
Solaris 7 and JDK 1.2.2
SUMACkpThreadMonitor takes a
checkpoint every four minutes.
Checkpoints are saved in a shared file
system (NFS)
Results
Ckpt size
65KB
125KB
165KB
310KB
Ckpt time Recovery time
1,44 sec.
2,58 sec.
3,87 sec.
7,26 sec.
4,67 sec.
7,02 sec.
9,11 sec.
16,46 sec.
Execution time without ckpt = 819 min 7 sec
Execution time with ckpt = 868 min 41 sec
201 checkpoints were taken
Conclusions and future work
Transparent checkpointing and recovery
services in a Java-based metasystem
Architecture-independent Java checkpoints
Ongoing research: extend to parallel
applications and heuristics to activate
checkpointing automatically