Transcript lec0

Parallel Processing
(CS 676)
Overview
Jeremy R. Johnson
Parallel Processing
1
Goals
• Parallelism: To run large and difficult programs fast.
• Course: To become effective parallel programmers
– “How to Write Parallel Programs”
– “Parallelism will become, in the not too distant future, an essential
part of every programmer’s repertoire”
– “Coordination – a general phenomenon of which parallelism is one
example – will become a basic and widespread phenomenon in CS”
• Why?
–
–
–
–
Some problems require extensive computing power to solve
The most powerful computer by definition is a parallel machine
Parallel computing is becoming ubiquitous
Distributed & networked computers with simultaneous users require
coordination
Parallel Processing
2
Top 500
Parallel Processing
3
LINPACK Benchmark
• Solve a dense N  N system of linear equations, y = Ax,
using Gaussian Elimination with partial pivoting
– 2/3N3 + 2N2 FLOPS
• High Performance LINPACK used to measure performance
for TOP500 (introduced by Jack Dongarra)
 a11

a21
a
 31
a
a
a
12
22
32
 l11
 
 l 21
23
 l
33
 31
a
a
a
13
0
l
l
22
32
0  u11

0  0
l 33  0
Parallel Processing
u
u
12
22
0
u
u
u


23

33
13
4
Example LU Decomposition
• Solve the following linear system
y  z 1
x  z 1
x  y 1
• Find LU decomposition A = PLU
0 1 1 
A  1 0 1
1 1 0
Parallel Processing
5
Big Machines
Cray 2
DoE-Lawrence Livermore
National Laboratory (1985)
3.9 gigaflops
8 processor vector machine
Cray XMP/4
DoE, LANL,… (1983)
941 megaflops
4 processor vector machine
Parallel Processing
6
Big Machines
Tianhe-1A
NSC Tianjin, China (2010)
2.507 petaflops
14,336 Xeon X5670 processors
7,168 Nvidia Tesla M2050 GPUS
Cray Jaguar
ORNL (2009)
1.75 petaflops
224,256 AMD Opteron cores
Parallel Processing
7
Need for Parallelism
Parallel Processing
8
Multicore
Intel Core i7
Parallel Processing
9
Multicore
Cyclops64
80 gigaflops
80 cores @ 500 megahertz
multiply-accumulate
IBM Blue Gene/L
2004-2007
478.2 teraflops
65,536 "compute nodes”
Parallel Processing
10
Multicore
Parallel Processing
11
Multicore
Parallel Processing
12
GPU
Nvidia GTX 480
1.34 teraflops
480 SP (700 MHz)
Fermi chip 3 billion transistors
Parallel Processing
13
Google Server
• 2003: 15,000 servers ranging
from 533 MHz Intel Celeron to
dual 1.4 GHz Intel Pentium III
• 2005: 200,000 servers
• 2006: upwards of servers
Parallel Processing
14
Drexel Machines
• Tux
• 5 nodes
• Draco
• 20 nodes
– 4 Quad-Core AMD
Opteron 8378
processors (2.4 GHz)
– 32 GB RAM
– Dual Xeon Processor
X5650 (2.66 GHz)
– 6 GTX 480
– 72 GB RAM
• 4 nodes
– 6 C2070 GPUs
Parallel Processing
15
Programming Challenge
• “But the primary challenge for an 80-core chip will be
figuring out how to write software that can take advantage
of all that horsepower.”
• Read more: http://news.cnet.com/Intel-shows-off-80-coreprocessor/21001006_36158181.html?tag=mncol#ixzz1AHCK
1LEc
Parallel Processing
16
Basic Idea
• One way to solve a problem fast is to break the problem
into pieces, and arrange for all of the pieces to be solved
simultaneously.
• The more pieces, the faster the job goes - upto a point
where the pieces become too small to make the effort of
breaking-up and distributing worth the bother.
• A “parallel program” is a program that uses the breaking up
and handing-out approach to solve large or difficult
problems.
Parallel Processing
17