Transcript Slides

Work Stealing for Irregular Parallel Applications on
Computational Grids
Vladimir Janjic
University of St Andrews
12th December 2011
In this talk…
 Feudal Stealing algorithm for scheduling irregular parallel
applications
 Combination of Grid-GUM and Cluster-aware Random
Stealing
 Irregular parallel applications -- task trees highly unbalanced
November 21, 2006
What is work stealing?
 Work Stealing -- passive, distributed, dynamic scheduling
method
 Idle “thieves” steal work from busy “victims”
November 21, 2006
To steal or not to steal?
 Why stealing?
 Dynamic, adaptive and “cheap”
 Does not require prior knowledge of task dependencies =>
good for irregular applications
 Inherently distributed => scalability
 Why not stealing?
 Not optimal
 Possibly slow work distribution
November 21, 2006
Work stealing on Computational Grids
 Grid-GUM (GpH), Satin (Java d&c), Javelin (Java), Atlas (Java)
 The main problem : Steal attempts can be expensive due to
high latencies
 Especially for irregular applications, where all work may
be concentrated on a few nodes
 The main questions are where to send steal attempts and how
to respond to them
 Use load information (Grid-GUM)
November 21, 2006
Cluster-aware Random Stealing (CRS)
 Local (within a cluster) and remote (outside of cluster) stealing
done in parallel
 Works well for regular applications on heterogeneous
environments with a lot of parallelism
 Not so well for irregular
November 21, 2006
Centralised and distributed work
stealing
November 21, 2006
Feudal Stealing
 Use the CRS algorithm as a base
 Local stealing done using Random Stealing
 Remote stealing done via cluster head nodes
 Only head nodes (and a victim) visited
 Head nodes hold load information
November 21, 2006
3
2
1
4
5
6
7
8
9
Cluster 0
Local load
PE Load
2
2
3
3
4
0
5
7
6
0
7
0
8
1
9
5
Remote Load
Cl Time Load
1 1000 23
2
0
0
3 2000
0
Feudal Stealing
November 21, 2006
How is load information in head nodes
obtained?
 Load of nodes inside the cluster periodically sent
 Load of remote clusters updated from remote-steal messages
 Cluster load information attached to remote-steal messages
(similar to Grid-GUM)
November 21, 2006
Evaluation of Feudal Work Stealing
 Using simulations, on generic benchmarks for load balancing
algorithms (UTS -- Unbalanced Tree Search)
 For regular and less-irregular applications, performs as well as
CRS and better than Grid-GUM
 For highly-irregular applications, better than CRS and GridGUM
November 21, 2006
Comparison of Feudal Stealing, CRS and
Grid-GUM
QuickTime™ and a
decompressor
are needed to see this picture.
November 21, 2006
Improvements of Feudal Stealing over
CRS
November 21, 2006
Conclusions
 Feudal Stealing works well for irregular parallel applications on
Computational Grids
 Sacrifices some desirable features of “pure” work stealing in
order to make better selection of remote targets
 Tested only using simulations. Implementation in Grid-GUM
under way
 Tested only on artificial applications (unbalanced tree search)
November 21, 2006
More info
 Vladimir Janjic, Load Balancing of Irregular Parallel
Applications On Heterogeneous Distributed Computing
Environments, PhD Thesis, University of St Andrews, 2011
 Vladimir Janjic, Kevin Hammond, Think Locally Steal Globally :
Using Dynamic Load Information in Work-Stealing on
Computational Grids, Submitted to CCGrid 2012
 Vladimir Janjic, Kevin Hammond, Feudal Work-Stealing, In
preparration, planned for submission to EuroPar 2012
November 21, 2006