Introduction to Grid Computing

Download Report

Transcript Introduction to Grid Computing

A Common Application Platform
(CAP) for SURAgrid
-Mahantesh Halappanavar, John-Paul Robinson,
Enis Afgane, Mary Fran Yafchalk and
Purushotham Bangalore.
SURAgrid All-hands Meeting, 27 September, 2007,
Washington D.C.
1
Introduction
Problem Statement:
“How to quickly grid-enable scientific applications to
exploit SURAgrid resources ?”
 Goals for Grid-enabling Applications:





Dynamic resource discovery
Collective resource utilization
Simple Job Management and Accounting
Minimal Programming Efforts
2
Identifying Patterns
Algorithm
Structures
Support Structures
Relationships
3
Basic Process
Problems
Algorithms
Algorithm
Structures
Programs
Supporting
Structures
Implementation
Programming
Environments
4
Algorithm Structures
How to organize? (Linear and Recursive)
1.
Organize by Tasks


2.
Organize by Data Decomposition


3.
Task Parallelism
Divide and Conquer
Geometric Decomposition
Recursive Data
Organize By Flow of Data


Pipeline
Event-Based Coordination
5
Support Structures
Problems
Algorithms
Program Structures
1. SPMD
2. Master/Worker
3. Loop Parallelism
4. Fork/Join
Programs
Implementation
Data Structures
1. Shared Data
2. Shared Queue
3. Distributed Array
6
Relationships: AS and SS
AS
SS
SPMD
Loop
Parallelism
Master/Work
er
Fork/Join
Task
Parallelism
Divide &
Conquer
Geometric
Decomposition
Recursive
Data
Pipeline
Event-based
Coordination
****
****
****
***
**
**
****
***
*
**
***
**
*
*
*
**
****
**
****
****
Relationship between Supporting Structures (SS) patterns and
Algorithm Structure (AS) patterns.
7
Relationships: SS and PE
SS
PE
SPMD
Loop Parallelism
Master-Worker
Fork-Join
OpenMP
MPI
Java
***
****
**
****
*
***
**
***
***
***
****
Relationship between Supporting Structures (SS) patterns and
Programming Environments (PE).
8
Important Observation
“This (SPMD) pattern is by far the most commonly
used pattern for structuring parallel programs. It
is particularly relevant for MPI programmers and
problems using the Task Parallelism and
Geometric Decomposition patterns. It has also
proved effective for problems using the Divide and
Conquer, and Recursive Data patterns.”
Single Program, Multiple Data (SPMD). This is the most common way to organize a
prallel program, especially on MIMD computers. The idea is that a single program is written and
loaded onto each node of a parallel computer. Each copy of the single program runs independently
(aside from coordination events), so the instruction streams executed on each node can be
completely different. The specific path through the code is in part selected by the node ID.
9
The Computing Continuum
Java
MPI
OpenMP
10
Scientific Applications
What
are they?
How are they built?
11
Scientific Applications

Life Sciences:


Engineering:


Aerospace, Civil, Mechanical, Environmental
Physics:


Biology, Chemistry, …
QCD, Black Holes, …
….
The SCaLeS Reports: http://www.pnl.gov/scales/
12
Building Blocks





OpenMP; MPI; BLACS, …
Metis/ParMetis, Zoltan, Chaco, …
BLAS and LAPACK, …
ScaLapack, MUMPS, SuperLU, …
PETSc, Aztec, …
“A core requirement of many engineering and scientific
applications is the need to solve linear and non-linear
systems of equations, eigensystems and other related
problems.” – The Trilinos Project
13
ScaLAPACK
ScaLAPACK
PBLAS
Global
Parallel BLAS.
Local
LAPACK
Linear systems, least
squares, singular
value decomposition,
eigenvalues.
BLACS
platform specific
BLAS
Communication
routines targeting
linear algebra
operations.
MPI/PVM/...
Communication layer
(message passing).
14
Level 1/2/3
http://acts.nersc.gov/scalapack
PETSc
Portable, Extensible Toolkit for Scientific
Computation
PETSc PDE Application Codes
ODE Integrators
Visualization
Nonlinear Solvers,
Interface
Unconstrained Minimization
Linear Solvers
Preconditioners + Krylov Methods
Object-Oriented
Grid
Matrices, Vectors, Indices
Management
Profiling Interface
Computation and Communication Kernels
MPI, MPI-IO, BLAS, LAPACK
15
Observation
“Explicit message passing will remain
the dominant programming model for
the foreseeable future because of the
huge investment in application
codes.”
-Jim Tomkis, Bob Balance, and Sue Kelly, ASC PI Meeing,
Nevada, Feb 2007
16
Common Application Platform
Basic Architecture
Building
Blocks
Conclusions
17
Basic Architecture
Browser
SURAgrid Portal
Command-Line
(GSISSH, etc.)
SiteA
MetaScheduler (MS)
A
B
Site1
Site3
Site2
18
Building Blocks
MPICH-G2
19
Conclusions

Powershift !


Load Balancing




Onus is now on System Administrators
Minimize communication costs
Limits: ScaLAPACK (Latency<500 ms)
Dynamic Redistribution of Work
Heterogeneous Environment

Issues with floating-point operations
20