Parallel Computing
Download
Report
Transcript Parallel Computing
Why Parallel/Distributed
Computing
Sushil K. Prasad
[email protected]
.
What is Parallel and Distributed
computing?
Solving a single problem faster using multiple
CPUs
E.g. Matrix Multiplication C = A X B
Parallel = Shared Memory among all CPUs
Distributed = Local Memory/CPU
Common Issues: Partition, Synchronization,
Dependencies, load balancing
.
Eniac (350 op/s)
1946 - (U.S. Army photo)
.
ASCI White (10 teraops/sec
2006)
Mega flops = 10^6 flops = 2^20
Giga = 10^9 = billion = 2^30
Tera = 10^12 = trillion = 2^40
Peta = 10^15 = quadrillion = 2^50
Exa = 10^18 = quintillion = 2^60
.
65 Years of Speed Increases
Today - 2011
8 Peta flops =
10^15 flops
K computer
ENIAC
350 flops
1946
.
Why Parallel and Distributed
Computing?
Grand Challenge Problems
Weather Forecasting; Global Warming
Materials Design – Superconducting
material at room temperature; nanodevices; spaceships.
Organ Modeling; Drug Discovery
.
Why Parallel and Distributed
Computing?
Physical Limitations of Circuits
Heat and light effect
Superconducting material to counter heat effect
Speed of light effect – no solution!
.
Microprocessor Revolution
Speed (log scale)
Micros
Supercomputers
Mainframes
Minis
Time
.
Why Parallel and Distributed
Computing?
VLSI – Effect of Integration
1 M transistor enough for full
functionality - Dec’s Alpha (90’s)
Rest must go into multiple CPUs/chip
Cost – Multitudes of average CPUs give
better FLPOS/$ compared to traditional
supercomputers
.
Modern Parallel Computers
Caltech’s Cosmic Cube (Seitz and Fox)
Commercial copy-cats
nCUBE Corporation (512 CPUs)
Intel’s Supercomputer Systems
iPSC1, iPSC2, Intel Paragon (512 CPUs)
Thinking Machines Corporation
CM2 (65K 4-bit CPUs) – 12-dimensional hypercube - SIMD
CM5 – fat-tree interconnect - MIMD
Tiahe-1a 4.7 petaflops, 14K Xeon X5670 and 7,168 Nvidia
Tesla M2050
K-computer 8 petaflops (10^15 FLOPS), 2011, 68 K 2.0GHz
8-core CPUs 548,352 cores;
.
Why Parallel and Distributed Computing?
Everyday Reasons
Available local networked workstations and Grid
resources should be utilized
Solve compute-intensive problems faster
Make infeasible problems feasible
Reduce design time
Leverage of large combined memory
Solve larger problems in same amount of time
Improve answer’s precision
Reduce design time
Gain competitive advantage
Exploit commodity multi-core and GPU chips
Find Jobs!
.
Why Shared Memory
programming?
Easier conceptual environment
Programmers typically familiar with concurrent
threads and processes sharing address space
CPUs within multi-core chips share memory
OpenMP an application programming interface
(API) for shared-memory systems
Supports higher performance parallel
programming of symmetrical multiprocessors
Java threads
MPI for Distributed Memory Programming
.
Seeking Concurrency
Data dependence graphs
Data parallelism
Functional parallelism
Pipelining
.
Data Dependence Graph
Directed graph
Vertices = tasks
Edges = dependencies
.
Data Parallelism
Independent tasks apply same operation to
different elements of a data set
for i 0 to 99 do
a[i] b[i] + c[i]
endfor
Okay to perform operations concurrently
Speedup: potentially p-fold, p #processors
.
Functional Parallelism
Independent tasks apply different operations to
different data elements
a2
b3
m (a + b) / 2
s (a2 + b2) / 2
v s - m2
First and second statements
Third and fourth statements
Speedup: Limited by amount of concurrent subtasks
.
Pipelining
Divide a process into stages
Produce several items simultaneously
Speedup: Limited by amount of concurrent subtasks = #of stages in the pipeline