Transcript Document
Towards a Realistic Scheduling Model
Oliver Sinnen, Leonel Sousa, Frode Eika Sandnes
IEEE TPDS, Vol. 17, No. 3, pp. 263-275, 2006.
Parallel processing is the oldest
discipline in computer science –
yet the general problem is far
from solved
Why is parallel processing
difficult?
• ”Jo flere kokker
jo mere søl”
– Partitioning and
transforming
problems
– Load balancing
– Inter-processor
communication
– Granularity
– Architecture
Implementing parallel
systems
• Manually
– MPI
– PVM
– Linda
• Automatically
– Parallelising compilers (Fortran)
– Static scheduling
Taskgraph scheduling:
Representing static
computations
Modelling computations
A=B+C
C
Valid sequences:
CBA, BCA
Invalid sequences:
ABC, ACB, CAB, BAC
B
A
Data dependencies
Another example
A = (B-C)/D
F = B+G
D
C
B
A
G
F
Scheduling
Static taskgraph scheduling
techniques
The scheduling process
A
c1
c3
D
Allocation
C
c4
p2
A
c2
B
p1
B
C
c5
E
D
E
Taskgraph
Schedule
Topological sorting
• Topological sorting
– to order the vertices of a graph
such that the precedence
constraints are not violated
• All valid schedules represent a
topological sort
• Scheduling algorithms differ in
how they topologically sort the
graph
The importance of
abstraction
• Abstraction is important to preserve
generality
• Too specific
float sum = 0;
for (int i=0;i<8;i++)
{
sum += a[i];
}
• General and flexible
float sum = sumArray(a);
Communication
Communication is a major
bottleneck
• Typically from 1:50 to 1:10,000
difference between
computation and
communication
• Communication cost not very
dependent on data size.
• Interconnection network
topology affect the overall time.
Scheduling work prior to
1995
• Assumptions
– Zero-interprocessor
communication costs
– Fully connected processor
interconnection networks.
Amounts of data transfer
Public transport is a good
thing?
Data-size not is not major
factor
• Multiple single messages
connect
send
connect
send
connect
send
• Single compound message
connect
send send send
Interconnection
topology
Fully connected
The ring
To send
something
from here..
…to here
Interprocessor communication
• Zero vs non-zero
communication overheads
• Direct links vs connecting
nodes
P11 P12 P13
P1
P2
P3
P14
P4
Bus
RAM
Shared memory
Bus-based multiprocessor
P21
P22
P23
P24
P31
P32
P33
P34
P41
P42
P43
P44
Distributed memory
Mesh multiprocessor
Avoiding communication
overheads
Duplication
1 a
1
1
1 b 1 c
duplication 1 a
1
1 b
allocation
p1
1 a
1
1 c
allocation
p2
p1
p2
t=1
a
t=1
a
a
t=2
b
t=2
b
c
t=3
c
When considering
communication overheads
Classic communication
model: Assumptions
• Local communications have
zero communication costs
• Communication is conducted by
subsystem.
• Communication can be
performed concurrently
• The network is fully connected
Implications
• Network contention (not
modelled)
– Tasks compete for communication
resources
• Contention can be modelled:
– Different types of edges
– Switch verticies (in addition to
processor verticies)
Processor involvement in
communication I
Two-sided involvement
(TCP/IP PC-cluster)
Processor involvement in
communication II
One-sided involvement
(Shared memory Cray T3E)
Processor involvement in
communication III
Third party involvement
(Dedicated DMA hardware Meiko CS-2)
Problems
• All classic scheduling models
assume third-party involvement.
• Very little hardware are equipped
with dedicated hardware supporting
third-party involvement.
• Estimated finish-times for tasks are
hugely inaccurate.
• Scheduling algorithm are very suboptimal.
Even more problems
Results
bobcat
Sun E3500
3TE-900
The End