Experiences with Distributed and Parallel Matlab on CCS

Download Report

Transcript Experiences with Distributed and Parallel Matlab on CCS

A Dynamic World,
what can Grids do for Multi-Core computing?
Daniel Goodman, Anne Trefethen
and Douglas Creager
[email protected]
What we will cover
• Why we think that cluster programming models are not always
enough for multi-core computing
• Why we think that Grid programming models are for many
cases more appropriate
• Quick look over some programming models that have worked
well in grids and we believe could be constructive in multicore environments
• Look at where some of these ideas are reappearing in models
for multi-core computing
Assumptions when
programming clusters
• Nodes within an allocated set are all
homogenous, both in terms of the configuration,
and the loads being placed on them
• Once nodes have been allocated to a process
they will not be used by any other user process
until the first finishes
Assumptions when
programming clusters
• Outside of very tightly coupled tasks on very
large numbers of processors, the noise caused
by other background tasks running on the node
has a minimal effect on user processes
• Because nodes will run the same background
tasks, large supercomputers able to handle the
problem of background tasks through
centralised control of when such tasks execute
Models for Programming
Clusters
• Message passing, MPI
• Shared memory, OpenMP
• Embarrassingly parallel batch jobs
Properties of Multi-core
systems
• Cores will be shared with a wide range
of other applications dynamically
• Load can no longer be considered
homogeneous across the cores
• Cores will likely not be homogeneous
as accelerators become common for
scientific hardware
• Source code will often be unavailable,
preventing compilation against the
specific hardware configuration
Multi-core processor with all
nodes allocated to each task
Idle
Task A
Task B (Single Threaded)
Multi-core processor with all
nodes allocated to each task
Idle
Task A
Task B (Single Threaded)
Multi-core processor with all
nodes allocated to each task
Idle
Task A
Task B (Single Threaded)
Multi-core processor with all
nodes allocated to each task
Idle
Task A
Task B (Single Threaded)
Multi-core processor with all
nodes allocated to each task
Idle
Task A
Task B (Single Threaded)
Multi-core processor where
allocated nodes can change
Idle
Task A
Task B (Single Threaded)
Multi-core processor where
allocated nodes can change
Idle
Task A
Task B (Single Threaded)
Multi-core processor where
allocated nodes can change
Idle
Task A
Task B (Single Threaded)
Multi-core processor where
allocated nodes can change
Idle
Task A
Task B (Single Threaded)
Multi-core processor where
allocated nodes can change
Idle
Task A
Task B (Single Threaded)
Multi-core processor where
allocated nodes can change
Idle
Task A
Task B (Single Threaded)
Map-Reduce
• Developed
by
Google
to
simplify
programming analysis functions to execute
in
their
heterogeneous
distributed
computing environments
• Constructed around the ideas drawn from
functional programming
• Has allowed the easy harnessing of huge
amounts of computing power spanning
many distributed resources
Boinc
• Developed by David Anderson at Berkley
• Is an abstracted version of the framework behind the
SETI@home project
• Designed to make the construction and management of
trivially parallel tasks trivial
• Used by a range of other projects including
climateprediction.net, folding@home, LHC@home.....
Martlet
• Developed for the analysis of data
produced by the climateprediction.net
project
• Based on ideas
programming
from
functional
• Able to dynamically adjust the
workflow to adapt to changing
numbers of resources and data
distributions
Grid-GUM and GpH
• Grid-GUM is a platform to support Glasgow parallel
Haskell in a Grid environment
• Programmer defines places where programs could
potentially have multiple threads executing
• Each processor has a thread and uses work stealing
where possible to handle the dynamic and
heterogeneous nature of tasks and resources.
• Intelligent scheduling reduces communication between
disparate resources e.g. between machines or clusters
Styx Grid Services
• Developed at Reading University to huge amounts of
analyse environmental data.
• Built on top of the Styx protocol originally developed for
the P9 operating system
• Allows the effective construction of workflows pipelining
processes
• Reduces the amount of data active in the system at any
one time, and improves the performance of many stage
analysis techniques
Abstract Grid Workflow
Language (AGWL)
• XML based workflow language developed to hide much
of the system dependant complexity
• AGWL never contains descriptions of the data transfer,
partitioning of data, or locations of hardware
• At runtime, the underlying system examines the
available resources and compiles the workflow into
Concrete Grid Workflow Language automatically adding
the detail
Programming Multi-Core
Some ideas that appear in these projects are also
appearing in some other places.
These include;
•Microsoft’s LINQ constructs
•CodePlay’s Sieve Constructs
•Intel’s Thread Building Blocks API
•Dynamic DAG generation
LINQ
Based on Lambda Calculus and now part of the
.NET framework, LINQ is intended to provide a
uniform way of accessing and applying functions to
data stored in different data structures.
This allows both the easy construction of pipelines,
but also the automatic construction of parallel
pipelines.
This has much in common with Styx Grid Services.
Sieve
Sieve is a range of language constructs and a
supporting compiler that allows users to construct a
range of parallel programming patterns.
These patterns include marking points where the
code can be split and automatically managing a pool
of threads to execute this code complete with work
stealing.
This is the same pattern used by Grid-GUM and
Glasgow Parallel Haskell
Thread Building Blocks
Intel’s Thread Building Blocks is an API
supporting a range of different parallel
programming models.
This includes divide and conquer methods and
batch methods producing tasks to be handled
by a thread pool, allowing dynamic load
These are very similar to Boinc, Martlet and
Map-Reduce
Dynamic Dependency
Analysis
• Work carried out at a range of institutions including
University of Tennessee and Oak Ridge National
Laboratory
• Takes code written in a high level language, and
dynamically converts this into a DAG of dependant tasks
• This can automatically generate thousands of tasks that
can be scheduled to try and both keep all the cores busy
all the time and adapt to changing resources
Conclusions
• Multi-core machine will operate in a much more
heterogeneous and dynamic environment than clusters do
today.
• Some aspects of grid computing have already started
looking at the problems associated with such environments.
• Some approaches to programming multi-core machines
already include some of these ideas.
• Functional programming appears a lot
• It is important that we remember why we must include such
functionality in the models.