What is Parallel Architecture?

Download Report

Transcript What is Parallel Architecture?

EECS 570
• Notes on Chapter 2 – Parallel Programs
EECS 570: Fall 2003 -- rev1
1
Terminology
• Task:
– Programmer-defined sequential piece of work
– Concurrency is only across tasks
– Qualitative amount of work may be:
• small
• large
• Process (thread):
–
–
–
–
Abstract entity that performs tasks
Equivalent OS concepts
Must communicate and synchronize with other processes
Execute on processor
• typically one-to-one mapping
EECS 570: Fall 2003 -- rev1
2
Step in Creating a Parallel Program
EECS 570: Fall 2003 -- rev1
3
Decomposition
• Break up computation into tasks to be divided
among processes
– could be static, quasi-static or dynamic
– i.e., identify concurrency and decide level at which to
exploit it
Goal: Enough tasks to keep processes busy...
...but not too many
EECS 570: Fall 2003 -- rev1
4
Amdahl's Law
• Assume fraction s of sequential execution is
inherently serial
– remainder (1- s) can be perfectly parallelized
• Speedup with p processors is:
1
s-
1- s
p
• Limit: ?
EECS 570: Fall 2003 -- rev1
5
Aside on Cost-Effective Computing
• Isn't Speedup(P) < P inefficient?
• If only throughput matters, use P computers instead?
• But much of a computer's cost is NOT in the processor
[Wood & Hill, IEEE Computer 2/95]
Let Costup(P) = Cost(P)/Cost(l)
• Parallel computing cost-effective:
Speedup(P) > Costup(P)
• E.g. for SGI PowerChallenge w/500MB:
Costup(32) = 8.6
EECS 570: Fall 2003 -- rev1
6
Assignment
• Assign tasks to processes
– Again, can be static, dynamic, or in between
• Goals:
– Balance workload
– Reduce communication
– Minimize management overhead
• Decomposition + Assignment = Partitioning
• Mostly independent of architecture/programming
model
EECS 570: Fall 2003 -- rev1
7
Orchestration
• How do we achieve task communication,
synchronization, and assignment given programming
model?
–
–
–
–
data structures (naming)
task scheduling
communication: messages, shared data accesses
synchronization: locks, semaphores, barriers, etc,
• Goals
–
–
–
–
Reduce cost of communication and synchronization
Preserve data locality (reduce communication, enhance caching)
Schedule tasks to satisfy dependencies early
Reduce overhead of parallelism management
EECS 570: Fall 2003 -- rev1
8
Mapping
• Assign processes to processors
– Usually up to OS, maybe with user
hints/preferences
– Usually assume one-to-one, static
• Terminology:
– space sharing
– gang scheduling
– processor affinity
EECS 570: Fall 2003 -- rev1
9
Parallelizing Computation vs. Data
Above view is centered around computation
• Computation is decomposed and assigned (partitioned)
Partitioning data is often a natural view too
• Computation follows data: owner computes
• Grid example: data mining; High Performance Fortran (HPF)
But not always sufficient
• Distinction between comp. and data stronger in many applications
– Barnes-Hut, Raytrace
EECS 570: Fall 2003 -- rev1
10
Assignment
• Static assignments (given
decomposition into rows)
– block: row i is assigned to
process floor(i/p)
– cyclic: process i is
assigned rows I, i+p, and
so on
• Dynamic
– get a row index, work on the row, get a new row, and so on
• Static assignment reduces concurrency (from n to p)
– block assign, reduces communication by keeping adjacent rows
together
EECS 570: Fall 2003 -- rev1
11