Transcript Dist_Prog

Distributed Programming
CA107 Topics in Computing Series
Martin Crane
Karl Podesta
The Basics…..
• What is a Distributed System (DS)?
• How does it differ from a Parallel Computer (MPP)?
– differences become fuzzy…now called Supercomputers or
High Performance Computers (HPC)
• Supercomputers and Supermodels:
–
–
–
–
both expensive
both hard to deal with/prone to tantrums
both look glamorous but...
Both spend lots of time doing tedious tasks for others:
• mostly matrix-vector products for Supercomputers
• being live mannequins for Supermodels
Why High Performance Computing?
• Solve larger and larger scientific problems
– advanced product design
– economic analysis
– weather prediction/ climate modelling
• Store and process huge amount of data
– data mining and knowledge discovery
– image processing, multi-media information
– internet information storage and search (eg
GOOGLE)
Different Supercomputers
(MPPs) in Your Neighbourhood
• Single Instruction, Multiple Data (SIMD)
– as seen on PlayStation 2
– very useful for processing large arrays eg
a(i) = b(i) + c(i)*d(i) {as are found in games}
• Multiple Instruction, Multiple Data (MIMD)
– as seen in Deep Blue
• But these are dinosaurs - we want something
more flexible
Problems with Traditional
Supercomputer (ie MPP)
• Expensive
–
–
–
–
Very high starting cost ($10,000s per node)
Expensive software
High maintenance cost
Costly to upgrade
• Vendor dependent
– lots of companies have come and gone (datacube,
Connection Machines etc.)
So, real/poor people cannot do HPC!
PC Cluster: a poor-man’s
supercomputer!
• built from high-end PCs and high-speed comms
network
• supports standard parallel programming based on
message-passing model (MPI language)
• cheap (16 node cluster can cost less than $10k)
Cluster Diagram Here
DCU CA Cluster Resources
• “John the Baptist” Cluster
–
–
–
–
built by Redbrick using old CA machines
24 individual 450MHz machines
connected by a fast ethernet switch
harbinger of better things….
• “The one that is to come”……
–
–
–
–
24 SMP machines
each with 2 GHz
plus loadsa memory!
arrives about Xmas time, appropriately enough
What are the issues in HPC?
• Communication Vs Computation
– size/ nature of problem
– interconnect speed/ processor speed
• Fault tolerance
– quality of hardware
– nature of problem
• Load balancing
– nature of problem/ quality of programmer
– even an easy problem can be made difficult &
slow by a bad implementation
Influence of Nature of Problem
on Speed
• What is speed?
– speed up is better: Time on 1 node/ Time on n nodes
• Speed-up and Problems
– very good: embarrassingly parallel problems
– fair to middling: regular and synchronous problems
• a bit of cross-talk between nodes
– bad: irregular/ asynchronous problems
• lots of cross-talk between nodes