Potential for Parallel Computation

Download Report

Transcript Potential for Parallel Computation

Potential for Parallel
Computation, part 2
Module 3
1
Behavior of Algorithms


Small Problems
 Can calculate actual performance
measures
 Size (operations); time units
 Often can generalize, but need proof
For large problems, characterize the
asymptotic behavior
 i.e. Big Oh!!
 An upper bound
2
Big Oh Definition
Assume f(n) & g(n). f(n) is of order g(n),
that is, f(n) = 0 (g(n)) iff
there exist constants C & N such that for
n>N, |f(n)| < c |g(n)|
That is, f(n) grows no faster than g(n)
 g(n) is an upper bound

3
Omega Ω -- lower bound
As in Big Oh, except
f(n) = Ω (g(n)) means
|f(n)| > c.|g(n)|

g(n) is a lower bound
4
Exact Bound
If both f(n) = O(g(n)) and
f(n) = Ω(g(n)) then
f(=n) = Θ(g(n))
and we say g(n) is an exact bound for
f(n) (aka tight-bound)
5
Speedup and Efficiency of Algorithms
For any given computation (algorithm):
Let Tp be the time to perform a computation with P
processors. (arithmetic units, or PEs)
We assume that any P independent operations can be done
simultaneously.
Note: The Depth of an algorithm
T , the minimum
execution time.
The Speedup with P processors is Sp = T1 / Tp
and Efficiency is
Ep = Sp / P
6
These numbers, SP and EP,
refer to an algorithm and not to a
machine.
Similar numbers can be defined for
specific hardware.
The time T1 can be chosen in
different ways:
To evaluate how good an algorithm
is, it should be the time for the
“BEST” sequential algorithm.
7
The Minimum Number of Processors
Giving the Maximum Speedup:
Let P be the minimum number of P of
processors such that
TP = T
i.e.
P = min { P | TP = T }
Then, TP, SP, EP are the best known time,
speedup, and efficiency, respectively.
8
E<8> = A + B(CDE + F + G) + H
9
Including the distributive law, we can get an even smaller depth.
But the number of operations will increase.
10
Performance

Consider small problems
 e.g.
Evaluation of Arithmetic Expression
 Not much improvement possible
 Even auto-optimization probably takes longer
than sequential processing

Gains: Problems that are “compute bound”
i.e. processor bound; computation
intensive
11
Factors Influencing Performance
Level
Notes
Hardware
Establishes fundamental speed scale
Architecture
Both individual unit & system level
Operating Sys.
As an extension to hardware
Language
Compiler & run-time support
Program
Control structure & synchronization
Algorithm
Data dependence structure
12
Impacting Performance
Hardware (user has no influence)
 Digital logic
 Clock speed
 Circuit interconnection
Architecture
 Sequential Ns degree of parallelism
 ALU, CU, Memory, Cache
 Synchronization among processors
13
Impacting Performance
Operating System
 Shared resources
 Process control, synchronization, data
movement
 I/O
Programming Language
 Operations available & the implementation
 Compiler / optimizations
14
Impacting Performance
Program
 Organization & style; structure
 Data structures
 Study of compiler design is helpful
Algorithm
 Often tradeoff memory vs speed
 Use “reasonable” algorithms, often have
only minimal impact
15
Amdahl’s Law
Let T(P) be the execution time with hardware parallelism
P.
Let S be the time doing the sequential part of the work
and
Time to do the parallel part of the work sequentially is Q,
i.e., S and Q are the sequential and parallel amounts of
work
measured by time on one processor,
The total time is
16
Amdahl’s Law
Expressing this in terms of the fraction
of serial work
Amdahl’s law states that
Speedup
Efficiency
17
There are several consequences of this simple
performance model.
In order to achieve at least 50% efficiency
on the program with hardware parallelism P,
f can be no larger than .
This becomes harder and harder to achieve
as P becomes large.
Amdahl used this result to argue that
sequential processing was best,
But
it has several useful interpretations in different
parallel environments:
18
• A very small amount of unparallelized code can
have a very large effect on efficiency if the parallelism
is large
• A fast vector processor must also have a fast
scalar processor in order to achieve a sizeable
fraction of its peak performance
• Effort in parallelizing a small fraction of code that
is currently executed sequentially may pay off in
large performance gains
• Hardware that allows even a small fraction of new
things to be done in parallel may be considerably
more efficient.
19
Although Amdahl’s law is a simple performance
model, it need not be taken simplistically.
The behavior of the sequential fraction, f, for example, can be
quite important.
System sizes, especially the number, P, of processors are often
increased for the purpose of running larger problems.
Increasing the problem size often does not increase the absolute
amount of sequential work significantly.
In this case, f is a decreasing function of problem size,
and if problem size is increased with P, the somewhat
pessimistic implications of equations look much more favorable.
see Problem 2.16 for a specific example.
The behavior of performance as both problem and system size
increase is called scalability.
20