Satin: divide-and-conquer - Vrije Universiteit Amsterdam

Download Report

Transcript Satin: divide-and-conquer - Vrije Universiteit Amsterdam

The Ibis Project:
Simplifying Grid Programming & Deployment
Henri Bal
[email protected]
Vrije Universiteit Amsterdam
The ‘Promise of the Grid’
Efficient and transparent (i.e. easy-to-use)
wall-socket computing over a distributed set
of resources [Sunderam ICCS’2004, based on Foster/Kesselman]
Parallel computing on grids
●
Mostly limited to
●
trivially parallel applications
●
●
applications that run on one cluster at a time
●
●
parameter sweeps, master/worker
use grid to schedule application on a suitable cluster
Our goal: run real parallel applications on a
large-scale grid, using co-allocated resources
Efficient wide-area algorithms
●
●
●
Latency-tolerant algorithms with
asynchronous communication
●
Search algorithms (Awari-solver [CCGrid’08])
●
Model checkers (DiVinE [PDMC’08])
Algorithms with hierarchical communication
●
Divide-and-conquer
●
Broadcast trees
…..
Reality: ‘Problems of the Grid’
●
●
●
Performance &
scalability
Heterogeneous
Low-level & changing
programming interfaces
User
●
●
●
●
Connectivity
issues
Fault tolerance
Malleability
Wide-Area Grid Systems
writing & deploying grid applications is hard
The Ibis Project
●
Goal:
●
drastically simplify grid programming/deployment
●
write and go!
Approach (1)
●
Write & go: minimal assumptions about
execution environment
●
●
Use middleware-independent APIs
●
●
Virtual Machines (Java) deal with heterogeneity
Mapped automatically onto middleware
Different programming abstractions
●
Low-level message passing
●
High-level divide-and-conquer
Approach (2)
●
Designed to run in dynamic/hostile grid
environment
●
●
●
Handle fault-tolerance and malleability
Solve connectivity problems automatically
(SmartSockets)
Modular and flexible: can replace Ibis
components by external ones
●
Scheduling: Zorilla P2P system or external broker
Global picture
Rest of talk
Applications
Satin: divide & conquer
Communication
layer (IPL)
SmartSockets
JavaGAT
Zorilla P2P
Outline
●
●
●
Grid programming
●
IPL
●
Satin
●
SmartSockets
Grid deployment
●
JavaGAT
●
Zorilla
Applications and experiments
Ibis Design
Ibis Portability Layer (IPL)
●
●
Java-centric “run-anywhere” library
●
Sent along with the application (jar-files)
●
Point-to-point, multicast, streaming, ….
Efficient communication
●
●
Configured at startup, based on capabilities
(multicast, ordering, reliability, callbacks)
Bytecode rewriter avoids serialization overhead
Serialization
●
Based on bytecode-rewriting
●
●
source
Adds (de)serialization code to serializable types
Prevents reflection overhead
during runtime
Java
compiler
bytecode
bytecode
rewriter
JVM
bytecode JVM
JVM
Membership Model
●
●
JEL (Join-Elect-Leave) model
Simple model for tracking resources,
supports malleability & fault-tolerance
●
Notifications of nodes joining or leaving
●
Elections
●
Supports all common programming models
●
Centralized and distributed implementations
●
Broadcast trees, gossiping
Programming models
●
Remote Method Invocation (RMI)
●
Group Method Invocation (GMI)
●
MPJ (MPI Java 'standard')
●
Satin (Divide & Conquer)
Satin: divide-and-conquer
fib(5)
●
fib(4)
Divide-and-conquer is
fib(3)
inherently hierarchical
fib(3)
fib(2)
fib(2)
More general than fib(2) fib(1) fib(1) fib(0) fib(1)
master/worker
fib(0)
fib(1)
Cilk-like primitives (spawn/sync) in Java
Supports malleability and fault-tolerance
Supports data-sharing between different
branches through Shared Objects
cpu 2
●
fib(1)
fib(0)
cpu 3
cpu 1
●
●
●
cpu 1
Satin implementation
●
Load-balancing is done automatically
●
●
●
Cluster-aware Random Stealing (CRS)
Combines Cilk’s Random Stealing with
asynchronous wide-area steals
Self-adaptive malleability and fault-tolerance
●
●
Add/remove machines on the fly
Survive crashes by efficient
recomputations/checkpointing
Self-adaptation with Satin
●
Adapt #CPUs to level of parallelism
●
Migrate work from overloaded to idle CPUs
●
Remove CPUs with poor network connectivity
●
Add CPUs dynamically when
●
●
●
Level of parallelism increases
CPUs were removed or crashed
Can also remove/add entire clusters
●
E.g., for network problems
[Wrzesinska et al., PPoPP’07 ]
Approach
●
Weighted Average Efficiency (WAE):
1/#CPUs * Σ speedi * (1 – overheadi )
overhead is fraction idle+communication time
speedi= relative speed of CPUi (measured
periodically)
●
General idea:
Keep WAE between Emin (30%) and Emax(50%)
Iteration duration
Overloaded network link
Iteration
●
Uplink of 1 cluster reduced to 100 KB/s
●
Remove badly connected cluster, get new one
Connectivity Problems
●
●
●
Firewalls & Network Address Translation
(NAT) restrict incoming traffic
Addressing problems
●
Machines with >1 network interface (IP address)
●
Machine on a private network (e.g., NAT)
No direct communication allowed
●
E.g., between compute nodes and external world
SmartSockets library
●
Detects connectivity problems
●
Tries to solve them automatically
●
●
Integrates existing and several new solutions
●
●
With as little help from the user as possible
Reverse connection setup, STUN, TCP splicing,
SSH tunneling, smart addressing, etc.
Uses network of hubs as a side channel
Example
Example
[Maassen et al., HPDC’07 ]
Overview
JavaGAT
Zorilla P2P
JavaGAT
●
GAT: Grid Application Toolkit
●
●
Used by applications to access grid services
●
●
Makes grid applications independent of the
underlying grid infrastructure
File copying, resource discovery, job submission
& monitoring, user authentication
API is currently standardized (SAGA)
●
SAGA implemented on JavaGAT
Grid Applications with GAT
File.copy(...)
Grid Application
GAT
Remote
Files
submitJob(...)
Monitoring
Info
Resource
service
Management
GAT Engine
GridLab
Globus
Unicore
SSH
P2P
Local
Intelligent
dispatching
gridftp
globus
[van Nieuwpoort et al., SC’07 ]
Zorilla: Java P2P
supercomputing middleware
Zorilla components
●
Job management
●
●
Handling malleability and crashes
Robust Random Gossiping
●
Periodic information exchange between nodes
●
Robust against Firewalls, NATs, failing nodes
●
Clustering: nearest neighbor
●
Flood scheduling
●
Incrementally search for resources at more and
more distant nodes
[Drost et al., HPDC’07 ]
Overview
Ibis applications
●
e-Science (VL-e)
●
Brain MEG-imaging
●
Mass spectroscopy
●
Multimedia content analysis
●
Various parallel applications
●
●
SAT-solver, N-body, grammar learning, …
Other programming systems
●
Workflow engine for astronomy (D-grid), grid file
system, ProActive, Jylab, …
Overview experiments
●
DAS-3: Dutch Computer Science grid
●
Satin applications on DAS-3
●
Zorilla desktop grid experiment
●
Multimedia content analysis
●
High resolution video processing
DAS-3
272 nodes
(AMD Opterons)
792 cores
1TB memory
LAN:
Myrinet 10G
Gigabit Ethernet
WAN (StarPlane):
20-40 Gb/s OPN
Heterogeneous:
2.2-2.6 GHz
Single/dual-core
Delft no Myrinet
Gene sequence comparison in
Satin (on DAS-3)
Speedup on 1 cluster
Run times on 5 clusters
•
Divide&conquer scales much better than master-worker
•
78% efficiency on 5 clusters (with 1462 WAN-msgs/sec)
Barnes-Hut (Satin) on DAS-3
Speedup on 1 cluster
Run times on 5 clusters
•
Shared object extension to D&C model improves scalability
•
57% efficiency on 5 clusters (with 1371 WAN-msgs/sec)
Zorilla Desktop Grid
Experiment
●
●
Small experimental desktop grid setup
●
Student PCs running Zorilla overnight
●
PCs with 1 CPU, 1GB memory, 1Gb/s Ethernet
Experiment: gene sequence application
●
16 cores of DAS-3 with Globus
●
16 core desktop grid with Zorilla
●
Combination, using Ibis-Deploy
Ibis-Deploy deployment tool
877 sec
3574 sec
•
1099 sec
Easy deployment with Zorilla, JavaGAT & Ibis-Deploy
Multimedia content analysis
●
Analyzes video streams to recognize objects
●
Extract feature vectors from images
●
●
Describe properties (color, shape)
●
Data-parallel task implemented with C++/MPI
Compute on consecutive images
●
Task-parallelism on a grid
MMCA application
Client
(Java)
Parallel
Parallel
Servers
Horus
Horus
(C++)
Server
Servers
Ibis
(local desk-top machine)
(Java)
Broker
(grid)
(Java)
(any machine world-wide)
MMCA with Ibis
●
Initial implementation with TCP was unstable
●
Ibis simplifies communication, fault tolerance
●
SmartSockets solves connectivity problems
●
Clickable deployment interface
●
●
Demonstrated at many
conferences (SC’07)
20 clusters on 3 continents, 500-800 cores
●
Frame rate increased from 1/30 to 15 frames/sec
[Seinstra et al., IEEE Multimedia’07 ]
MMCA
‘Most Visionary Research’ award at AAAI 2007, (Frank Seinstra et al.)
High Resolution Video
Processing
●
Realtime processing of CineGrid movie data
●
●
3840x2160 (4xHD) @ 30 fps = 1424 MB/sec
Multi-cluster processing pipeline
●
Using DAS-3, StarPlane and Ibis
CineGrid with Ibis
●
●
Use of StarPlane requires no configuration
●
StarPlane is connected to local Myrinet network
●
Detected & used automatically by SmartSockets
Easy setup of application pipeline
●
Connection administration of application is
simplified by the IPL election mechanism
●
Simple multi-cluster deployment (Ibis-Deploy)
●
Uses Ibis serialization for high throughput
Summary
●
Goal: Simplify grid programming/deployment
●
Key ideas in Ibis
●
Virtual machines (JVM) deal with heterogeneity
●
High-level programming abstractions (Satin)
●
Handle fault-tolerance, malleability, connectivity
problems automatically (Satin, SmartSockets)
●
Middleware-independent APIs (JavaGAT)
●
Modular
Acknowledgements
Current members
Past members
Rob van Nieuwpoort
Jason Maassen
Thilo Kielmann
Frank Seinstra
Niels Drost
Ceriel Jacobs
Kees Verstoep
John Romein
Gosia Wrzesinska
Rutger Hofman
Maik Nijhuis
Olivier Aumage
Fabrice Huet
Alexandre Denis
Roelof Kemp
Kees van Reeuwijk
More information
●
Ibis can be downloaded from
●
●
Papers:
●
•
http://www.cs.vu.nl/ibis
Satin [PPoPP’07], SmartSockets [HPDC’07],
Gossiping [HPDC’07], JavaGAT [SC’07],
MMCA [IEEE Multimedia’07]
Ibis tutorials
•
Next one at CCGrid 2008 (19 May, Lyon)