Models_3_Typedx

Download Report

Transcript Models_3_Typedx

Models 3
Professor Ioana Banicescu
CSE 8843
The work-time presentation framework
of parallel algorithms:
WT paradigm: provides a 2 level top-down description of
parallel algorithm
Upper-level: suppresses details of the algorithm
Lower-level: follows general scheduling principle and results
in a full PRAM description
Upper-level WT:
Goal: describe the algorithm in terms of sequence of time
units, where each time unit may include any number of
concurrent operation
Work: Total number of operations used
Eg: For l <= i <= u par do, the statements corresponding to all
values of I between l and u are executed concurrently.
Example: Compute the sum S of n=2k numbers stored in an
array A, processors Pi, 1<=i<=n
Algorithm: (Sum)
Input: n=2k numbers stored in an array A
Output: S= ni=0 A(i)
Begin
1. For 1<=i<=n
Set B(i):= A(i)
2. For h= 1 to log n do,
for 1 <= i <= n/2k
Set B(i)= B(2i-1) + B(2i)
3. Set S= B(1)
End
This algorithm does not mention:
• How many PEs are there?
• How operations will be allocated to PEs?
Algorithm specifies time units, where each time unit may
include any number of concurrent operations.
Time units: log n + 2
• n operations performed within the first time unit (step 1)
• j-th time unit (iteration l= j-1 of step 2) has n/2j-1
operations for 2 <= j <= (log n + 1)
• Last time unit: 1 operation (step 3)
Work performed by the algorithm:
W 𝑛 =𝑛+
= O(n)
T(n)= O(log n)
log 𝑛
𝑗=1
𝑛/2𝑗 + 1
Lower Level:
• Suppose that WT presentation of the algorithm results in a
parallel algorithm that runs in T(n) time units while
performing W(n) work.
• Using WT scheduling principle, we can almost always adapt
this algorithm to run on a p processor PRAM in:
<= W(n)/p + T(n) parallel steps
The WT scheduling principle:
a) Let Wi(n) be the number of operations performed in time
unit i, 1 <= i <= T(n)
b) Simulate each set of Wi(n) operations in Wi(n)/p parallel
steps by the p processors for each 1<=i<=T(n)
c) If the simulation is successful, the corresponding pprocessor PRAM algorithm takes
≤
𝑖
(𝑐𝑒𝑖𝑙)
𝑊𝑖 (𝑛)
𝑝
≤
𝑖
(𝑓𝑙𝑜𝑜𝑟)
parallel steps as desired
WT scheduling principle:
𝑊𝑖 (𝑛)
𝑝
+ 1 ≤ (floor)
𝑊𝑖 (𝑛)
𝑝
+ T(n)
• During each time unit i, the Wi(n) operations are scheduled
as evenly as possible among available p processors.
• During time units 1 and 2, each PE is scheduled to execute
the same number of operations.
• During time unit 3, the p-th PE executes one less operation
than are executed by remaining PEs.
• During time unit 4, there are only k possible concurrent
operations.