Qiu_Talk - NSF PI Meeting

Download Report

Transcript Qiu_Talk - NSF PI Meeting

Applying Twister for Scientific Applications
NSF Cloud PI Workshop
March 17, 2011
Judy Qiu
http://salsahpc.indiana.edu
School of Informatics and Computing
Indiana University
SALSA
Twister v0.9
March 15, 2011
New Infrastructure for Iterative MapReduce Programming
SALSA Group
•
•
•
•
•
•
Auto generation of partition files and configureMaps
Auto configuration to setup Twister environment automatically on a cluster
Concurrent file loading in Mapper configuration phase
and file loading balancing
Performance improvement (e.g. JVM Tuning)
Scalability
Bingjing Zhang, Yang Ruan, Tak-Lon Wu, Judy Qiu, Adam Hughes, Geoffrey Fox, Applying
Twister to Scientific Applications, Proceedings of IEEE CloudCom 2010 Conference,
Indianapolis, November 30-December 3, 2010
K-Means Clustering
map
map
reduce
Compute the
distance to each
data point from
each cluster center
and assign points
to cluster centers
Time for 20 iterations
Compute new cluster
centers
User program Compute new cluster
centers
• Iteratively refining operation
• Typical MapReduce runtimes incur extremely high overheads
– New maps/reducers/vertices in every iteration
– File system based communication
• Long running tasks and faster communication in Twister enables it to
perform close to MPI
Motivation
Data
Deluge
Experiencing in
many domains
MapReduce
Classic Parallel
Runtimes (MPI)
Data Centered, QoS
Efficient and
Proven techniques
Expand the Applicability of MapReduce to more
classes of Applications
Iterative MapReduce
Map-Only
Input
map
Output
MapReduce
More Extensions
iterations
Input
map
Input
map
reduce
Pij
reduce
SALSA
A Programming Model for Iterative
MapReduce
• Distributed data access
• In-memory MapReduce
• Distinction on static data
and variable data (data
flow vs. δ flow)
• Cacheable map/reduce
tasks (long running tasks)
• Combine operation
• Support fast intermediate
data transfers
Static
data
Configure()
Iterate
User
Program
Map(Key, Value)
δ flow
Reduce (Key, List<Value>)
Combine (Map<Key,Value>)
Close()
SALSA
Cloud Technologies and Their Applications
SaaS
Applications/
Workflow
Data Mining Services in the Cloud
Smith Waterman Dissimilarities, PhyloD Using DryadLINQ, Clustering,
Multidimensional Scaling, Generative Topological Mapping, etc
Higher Level
Languages
Apache PigLatin/Microsoft DryadLINQ/Google Sawzall
Cloud
Platform
Cloud
Infrastructure
Apache Hadoop / Twister
Microsoft Dryad / Twister
Nimbus, Eucalyptus, OpenStack, OpenNebula
Linux Virtual
Machines
Linux Virtual
Machines
Windows Virtual
Machines
Hypervisor/
Virtualization
Xen, KVM
Hardware
Bare-metal Nodes
Windows Virtual
Machines
MPI & Iterative MapReduce papers
•
•
•
•
•
•
•
MapReduce on MPI Torsten Hoefler, Andrew Lumsdaine and Jack Dongarra, Towards
Efficient MapReduce Using MPI, Recent Advances in Parallel Virtual Machine and
Message Passing Interface Lecture Notes in Computer Science, 2009, Volume
5759/2009, 240-249
MPI with generalized MapReduce
Jaliya Ekanayake, Hui Li, Bingjing Zhang, Thilina Gunarathne, Seung-Hee Bae, Judy Qiu,
Geoffrey Fox Twister: A Runtime for Iterative MapReduce, Proceedings of the First
International Workshop on MapReduce and its Applications of ACM HPDC 2010
conference, Chicago, Illinois, June 20-25, 2010
Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn,
Naty Leiser, and Grzegorz Czajkowski Pregel: A System for Large-Scale Graph Processing,
Proceedings of the 2010 international conference on Management of data Indianapolis,
Indiana, USA Pages: 135-146 2010
Yingyi Bu, Bill Howe, Magdalena Balazinska, Michael D. Ernst HaLoop: Efficient Iterative
Data Processing on Large Clusters, Proceedings of the VLDB Endowment, Vol. 3, No. 1,
The 36th International Conference on Very Large Data Bases, September 1317, 2010,
Singapore.
Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, Ion Stoica
Spark: Cluster Computing with Working Sets poster at
http://radlab.cs.berkeley.edu/w/upload/9/9c/Spark-retreat-poster-s10.pdf
Russel Power, Jinyang Li, Piccolo: Building Fast, Distributed Programs with Partitioned
Tables, OSDI 2010, Vancouver, BC, Canada
SALSA
Features of Existing Architectures
Google, Apache Hadoop, Sector/Sphere,
Dryad/DryadLINQ (DAG based)
• Programming Model (SPMD)
– MapReduce (Optionally “map-only”)
– Focus on Single Step MapReduce computations (DryadLINQ supports
more than one stage)
• Input and Output Handling
– Distributed data access (HDFS in Hadoop, Sector in Sphere, and shared
directories in Dryad)
– Outputs normally goes to the distributed file systems
• Intermediate data
– Transferred via file systems (Local disk-> HTTP -> local disk in Hadoop)
– Easy to support fault tolerance
– Considerably high latencies
• Fault Tolerance
SALSA
Twister Architecture
Main program
Broker
Network
Twister
Driver
B
B
B
4
Broker
Connection
Receive static data (1)
OR
Variable data (key,value)
via the brokers (2)
2
1
M
Read static
data from
local disk
Twister
Daemon
Twister
Daemon
1
Local
3
Map output
goes directly
to reducer
R
4
Reduce output
goes to local disk
OR
to Combiner
Local
• Scripts for file manipulations
• Twister daemon is a process, but Map/Reduce tasks are Java Threads (Hybrid
approach)
SALSA
Twister Programming Model
Worker Nodes
configureMaps(..)
Local Disk
configureReduce(..)
Cacheable map/reduce tasks
while(condition){
runMapReduce(..)
May send <Key,Value> pairs directly
Iterations
Map()
Reduce()
Combine()
operation
updateCondition()
} //end while
close()
User program’s process space
Communications/data transfers
via the pub-sub broker network
Two configuration options :
1. Using local disks (only for maps)
2. Using pub-sub bus
SALSA
Twister API
1.configureMaps(PartitionFile partitionFile)
2.configureMaps(Value[] values)
3.configureReduce(Value[] values)
4.runMapReduce()
5.runMapReduce(KeyValue[] keyValues)
6.runMapReduceBCast(Value value)
7.map(MapOutputCollector collector, Key key, Value val)
8.reduce(ReduceOutputCollector collector, Key key,List<Value>
values)
9.combine(Map<Key, Value> keyValues)
SALSA
Pagerank – An Iterative MapReduce Algorithm
Partial
Adjacency
Matrix
Current
Page ranks
(Compressed)
M
Partial
Updates
R
Iterations
C
Partially
merged
Updates
• Well-known pagerank algorithm [1]
• Used ClueWeb09 [2] (1TB in size) from CMU
• Reuse of map tasks and faster communication pays off
[1] Pagerank Algorithm, http://en.wikipedia.org/wiki/PageRank
[2] ClueWeb09 Data Set, http://boston.lti.cs.cmu.edu/Data/clueweb09/
SALSA
Overhead OpenMPI v Twister
negative overhead due to cache
http://futuregrid.org
13
Dimension Reduction Algorithms
• Multidimensional Scaling (MDS) [1]
• Generative Topographic Mapping
(GTM) [2]
o Given the proximity information among
points.
o Optimization problem to find mapping in
target dimension of the given data based on
pairwise proximity information while
minimize the objective function.
o Objective functions: STRESS (1) or SSTRESS (2)
o Find optimal K-representations for the given
data (in 3D), known as
K-cluster problem (NP-hard)
o Original algorithm use EM method for
optimization
o Deterministic Annealing algorithm can be used
for finding a global solution
o Objective functions is to maximize loglikelihood:
o Only needs pairwise distances ij between
original points (typically not Euclidean)
o dij(X) is Euclidean distance between mapped
(3D) points
[1] I. Borg and P. J. Groenen. Modern Multidimensional Scaling: Theory and Applications. Springer, New York, NY, U.S.A., 2005.
[2] C. Bishop, M. Svens´en, and C. Williams. GTM: The generative topographic mapping. Neural computation, 10(1):215–234, 1998.
SALSA
Multi-dimensional Scaling (EM)
While(condition)
{
<X> = [A] [B] <C>
C = CalcStress(<X>)
}
While(condition)
{
<T> = MapReduce1([B],<C>)
<X> = MapReduce2([A],<T>)
C = MapReduce3(<X>)
}
• Maps high dimensional data to lower dimensions (typically 2D or 3D)
• SMACOF (Scaling by Majorizing of COmplicated Function)[1]
[1] J. de Leeuw, "Applications of convex analysis to multidimensional
scaling," Recent Developments in Statistics, pp. 133-145, 1977.
SALSA
Next Generation Sequencing Pipeline on Cloud
MapReduce
Pairwise
clustering
FASTA File
N Sequences
Blast
block
Pairings
Pairwise
Distance
Calculation
Dissimilarity
Matrix
N(N-1)/2 values
1
2
Clustering
MPI
3MDS
Visualization
Visualization
Plotviz
Plotviz
4 5
4
• Users submit their jobs to the pipeline and the results will be shown in a visualization tool.
• This chart illustrate a hybrid model with MapReduce and MPI. Twister will be an unified solution for the pipeline mode.
• The components are services and so is the whole pipeline.
• We could research on which stages of pipeline services are suitable for private or commercial Clouds.
16
Scale-up Sequence Clustering Model
with Twister
Gene Sequences
(N = 1 Million)
O(N2)
Select
Reference
Pairwise
Alignment &
Distance
Calculation
Reference
Sequence Set
(M = 100K)
Distance Matrix
Reference
Coordinates
N-M
Sequence
Set (900K)
Interpolative MDS
with Pairwise
Distance Calculation
x, y, z
MultiDimensional
Scaling (MDS)
O(N2)
N-M
Coordinates
x, y, z
Visualization
O(N2)
3D Plot
SALSA
Current Sequence Clustering
Model with MPI
*
MPI.NET
Implementation
Smith-Waterman /
Needleman-Wunsch
with
Kimura2 / Jukes-Cantor
/ Percent-Identity
C# Desktop
Application based
on VTK
Pairwise
Clustering
Cluster Indices
Gene
Sequences
Pairwise
Alignment &
Distance
Calculation
Visualization
Distance Matrix
3D Plot
Coordinates
MultiDimensional
Scaling
Chi-Square /
Deterministic
Annealing
MPI.NET
Implementation
MPI.NET
Implementation
* Note. The implementations of Smith-Waterman and Needleman-Wunsch algorithms are from Microsoft Biology Foundation library
SALSA
Twister MDS Interpolation
Performance Test
SALSA
300+ Students learning about Twister & Hadoop
MapReduce technologies, supported by FutureGrid.
July 26-30, 2010 NCSA Summer School Workshop
http://salsahpc.indiana.edu/tutorial
Washington
University
University of
Minnesota
Iowa
IBM Almaden
Research Center
University of
California at
Los Angeles
San Diego
Supercomputer
Center
Michigan
State
Univ.Illinois
at Chicago
Notre
Dame
Johns
Hopkins
Penn
State
Indiana
University
University of
Texas at El Paso
University of
Arkansas
University
of Florida
SALSA
http://salsahpc.indiana.edu/b534/
22
SALSA
23
SALSA
Summary




MapReduce and MPI are SPMD programming model
Twister extends the MapReduce to iterative algorithms
Dataming in the Cloud (Data Analysis in the Cloud)
Several iterative algorithms we have implemented





K-Means Clustering
Pagerank
Matrix Multiplication
Breadth First Search
Multi Dimensional Scaling (MDS)
 Integrating a distributed file system
 Integrating with a high performance messaging system
 Programming with side effects yet support fault tolerance
MapReduceRoles4Azure
Several iterative algorithms we have
implemented
Will have prototype Twister4Azure by May 2011
SALSA
Twister for Azure
Scheduling Queue
Worker Role
Job Bulleting Board
MapID
…….
Status
Map Workers
Map
1
Map
2
Map
n
Reduce Workers
Red
1
Map Task Table
MapID
…….
Status
Red
2
Red
n
In Memory Data Cache
Task Monitoring
Role Monitoring
26
SALSA
Sequence Assembly Performance
SALSA
Acknowledgements to:
SALSA HPC Group
Indiana University
http://salsahpc.indiana.edu
SALSA