paper - The University of Hong Kong

Download Report

Transcript paper - The University of Hong Kong

G-JavaMPI:
A Grid Middleware for Distributed Java Computing with
MPI Binding and Process Migration Supports
Lin Chen, Cho-Li Wang, Francis C. M. Lau and Ricky K. K. Ma
Department of Computer Science and Information Systems
The University of Hong Kong
{lchen2+clwang+fcmlau+kk1ma}@csis.hku.hk
Outline
Motivation
Overall system architecture
Detailed Issues
Related works
Conclusion & Future Work
GCC2002 Presentation
Lin Chen, CSIS, HKU (Dec. 26, 2002)
2
Motivation
Grid computing: large-scale resource sharing, high
performance
Globus Project: basic services required by building
and using a Grid
(authentication, security, resource allocation, remote data access,
information services, etc.)
However


long-running applications  continuous computation
Better utilization of resource  scheduling and load balancing
Java process migration

architecture-independent bytecode makes migration easier
GCC2002 Presentation
Lin Chen, CSIS, HKU (Dec. 26, 2002)
3
Motivation
Let the programmer write a grid application easily


no care about inter-site communication and intra-site
communication (we must care about it if directly using globus
communication libraries)
SPMD: one program can be executed in multiple places or
sites
MPI paradigm


a group of distributed processes, they can do peer-to-peer or
collective communication
Communication source or destination addresses are
unrelated with the real physical network address (adaptable)
GCC2002 Presentation
Lin Chen, CSIS, HKU (Dec. 26, 2002)
4
System Overview
(3)
(1)(1*)
Gatekeeper
LS
LS
Gatekeeper
Java-MPI
(2)communication
(*)
Migrating
(restarting a new
process through
Globus remote job
request with
delegated user
credentials and
Java-MPI job
credentials)
Some legacy
messages are
redirected
during migration
GCC2002 Presentation
Lin Chen, CSIS, HKU (Dec. 26, 2002)
WAN
(2*)
JVM
(3*)
Gatekeeper
LS
M
Migration
module
resides in
each
JVM
5
System overview
Globus Toolkit Libraries
LS
Local schedulers
A Java-MPI process
(before migration)
Java MPI communication daemons
Java-MPI processes
M
Migration modules
Java-MPI process
(after migration)
(1*) – (2*) – (3*): MPI communication route before migration
(1*) – (2*) – (3*): MPI communication route after migration
(*): Java MPI communication daemons redirect some legacy messages which should
be go to the migrated process
GCC2002 Presentation
Lin Chen, CSIS, HKU (Dec. 26, 2002)
6
Layered design
Java-MPI Applications
Java-MPI API & Java API
(Java-MPI API Layer)
JVM
Execution State Probe &
Migration Plug-in
JVMDI
(Migration Layer)
Migration
Instructions
Message
Queues
Authentication
Info.
Update
Restorable
Communication Services
(Restorable MPI Comm Layer)
Control
Block
DLB
Policy
(Load Balancing Module)
MPICH-G2
Globus Services
OS
Hardware
GCC2002 Presentation
Lin Chen, CSIS, HKU (Dec. 26, 2002)
7
Java-MPI binding
Restorable communication layer
Daemon, a running MPICH-G2 process,
providing MPI communication services
 Communicate with JavaMPI process
through IPC
 Post-migration message
re-direction

Process
space
Restorable
Communication
GCC2002 Presentation
Lin Chen, CSIS, HKU (Dec. 26, 2002)
8
Java Process Migration
State capturing:

a probe attached in each JVM, saves the process
context through JVMDI (JVM Debugger Interface)


All runtime data: PC register, stack frames, objects,
method area (local variables), etc.
Event notification: method_entry, frame_pop, etc.

Use object serialization to package all reachable
objects in heap

New JDK1.4.0 & 1.4.1 released in Aug. 2002 support “fullspeed debugging”
JVM
GCC2002 Presentation
Lin Chen, CSIS, HKU (Dec. 26, 2002)
JVMDI
1. Execution
state data
2. Event
notification
probe
9
Java process migration
State Restoration:
Exception handler inserted in bytecode
(pre-processing before execution) to
restore local variables and “jump” to the
original execution point
 Re-allocate objects when re-starting JVM
 Dynamic class loading

GCC2002 Presentation
Lin Chen, CSIS, HKU (Dec. 26, 2002)
10
Information update
Migration begin
Migration
Source
site
Notify other sites
(including destination site)
The process arrives
the safe migration point
(consume all legacy messages)
Update local site of
the process’s new place
Begin process state capturation
GCC2002 Presentation
Lin Chen, CSIS, HKU (Dec. 26, 2002)
Other
sites
Migration
Destination
site
11
Process Restart
JVM initialization
At the same time, the probe started
Original
Process
Process suspended in the beginning,
Probe read out context from dumpfile
Restoring the execution context
creates a new user certificate proxy
(proxy_init_cred )
delegated to remote site
get the resource allocation
Process resumed and
continued from the last point
New-started
Process
The new process can be started
(similar to normal globus job submit)
GCC2002 Presentation
Lin Chen, CSIS, HKU (Dec. 26, 2002)
12
Experiment Results
Hardware





32-node Cluster “ostrich”
configured as two grid points of 16 nodes
733MHz Pentium III processor
392MB of memory
connected by a 24-port Fast Ethernet
switch
Software




Linux 2.2.14
Gloubs 2.0
Sun JDK 1.4.0_02 (supporting JVMDI
with full-speed debugging mode)
MPICH 1.2.4 (MPICH-G2)
GCC2002 Presentation
Lin Chen, CSIS, HKU (Dec. 26, 2002)
13
Experiment results
Bandwidth (Kbyte/s)
Bandwidth
5000
4500
4000
3500
3000
2500
2000
1500
1000
500
0
8
16
32
64
128
256
512
1024
2048
Message Size (byte)
Intra-site bandwidth
Inter-site bandwidth
Bandwidth comparison between inter-site and intra-site communication
with the installation of the MPI communication layer.
GCC2002 Presentation
Lin Chen, CSIS, HKU (Dec. 26, 2002)
14
Experiment results
Latency
Latency (s)
0.6
0.5
0.4
0.3
0.2
0.1
0
1
2
4
8
16
32
64
128
256
512 1024 2048
Message Size (byte)
Inter-site latency
Intra-site latency
Latency comparison for small messages between intra-site and inter-site communication
with the installation of the MPI communication layer.
GCC2002 Presentation
Lin Chen, CSIS, HKU (Dec. 26, 2002)
15
Experiment results
time (microsecond)
Time for capturing and restoring objects
3000
2500
2000
1500
1000
500
0
1
10
100
1000
10K
100K
1M
10M
object size (byte)
capturing objects
restoring objects
Time spent in capturing and restoring objects
GCC2002 Presentation
Lin Chen, CSIS, HKU (Dec. 26, 2002)
16
Experiment results
Time for capturing and restoring Java frames
time (seconds)
6
5
4
3
2
1
0
1
10
20
50
100
200
300
number of frames
capturing frames
restoring frames
Time spent in capturing and restoring frames
GCC2002 Presentation
Lin Chen, CSIS, HKU (Dec. 26, 2002)
17
Related Works
Java bindings for MPI: “mpiJava”, “JavaMPI”,
“MPIJ”, etc.
Java process or thread migration:





Add additional backup codes in programs [Aglets[IBM96]]
Insert backup statements in the source or byte code, a
backup object is used to store state [Wasp project
[Funfrocken98]]
Extend the JVM, make state accessible from Java programs,
support type recognition of Java stack [sara Bouchenak
2000]
Use JVMDI to capture state, insert bytecode instructions in
program body to help restoring [Torsten2001]
JESSICA (supports thread migration in JVM)
GCC2002 Presentation
Lin Chen, CSIS, HKU (Dec. 26, 2002)
18
Conclusion
a new middleware for the Grid with
Java-MPI communication and
transparent process migration features.
write MPI-style programs in Java language
 Java process migration mechanism
supports the development of any dynamic
load balancing policy or fault tolerance
mechanism

GCC2002 Presentation
Lin Chen, CSIS, HKU (Dec. 26, 2002)
19
Future Plan
Develop some scientific and
engineering applications on top of this
middleware
Support of the transfer of other I/O
(including file stage-in/out)
Load balancing algorithm for the grid
environment (both CPU and network
load)
GCC2002 Presentation
Lin Chen, CSIS, HKU (Dec. 26, 2002)
20
The End
Thanks !