PPT - Computer Science Department, Technion
Download
Report
Transcript PPT - Computer Science Department, Technion
Transparent
Fault-Tolerant
Java Virtual Machine
Roy Friedman & Alon Kama
Computer Science — Technion
FT-JVM Goals
Fault-tolerant environment for executing Java applications
Highly Reliable
Fault-tolerance can be extended by utilizing more machines
Low Maintainability
Apps should execute without interruption, overcoming failures of
individual machines
Apps should not have to be modified in order to run on the system
Recovery upon failure of individual machines should be swift
Transparency
Failures should be masked and the transition to another machine
should be transparent
Fault tolerance by Replication
Replication — Coordinating a set of replicas of the computation on
processors that fail independently
Potential for a dramatic decrease in Mean Time To Repair (MTTR)
Achieve t-fault-tolerance, where t is the number of replicas
Increased cost of hardware for duplication of effort
Overhead and complexity of maintaining consistency
Replication + Transparency (masking of failures, maintaining the
illusion of a single copy) = High availability
Replication for Java
Replication at the Java Virtual Machine level
Replication at this level is cost-effective, portable, and transparent to
the application developer and the user
Approach extends Bressoud & Schneider (1995) who
implemented active replication below the Operating System
T. Bressoud and F. Schneider. Hypervisor-based Fault-Tolerance,
SOSP-15
Design of the FT-JVM
Replication requires deterministic execution. Difficult to achieve because
of:
Preemptive context switches
Lock contention in SMP
I/O availability differences
Environment-specific attributes
Changes made to the VM:
Deterministic thread scheduling
Deterministic thread switching
Non-deterministic ops relay info
to replication module
Design of the FT-JVM
Replication module:
One replication engine per processor, on both primary and backups
Data packages are passed to engine on primary, retrieved from it for
backups
Threads waiting for I/O now yield instead, to be re-scheduled at
specific intervals
I/O is checked at beginning of a frame, determined by X contextswitches or the lack of schedulable application threads primary
Frame n
data
Frame n+1
data
backup
Performance Results
Performance Results
SMP Raytrace
Conclusion
Ideal for long-running, low-I/O Java applications
Only a small performance degradation even for frequent
synchronization between replicas (e.g. every second)
Quick detection and recovery from failure