Associative - Kent State University

Download Report

Transcript Associative - Kent State University

SIMD and Associative
Computing
Computational Models and
Algorithms
Associative Computing Topics
• Introduction
– References for Associative Computing
– Motivation for the MASC model
– The MASC and ASC Models
– A Language Designed for the ASC Model
– Two ASC Algorithms and Programs
• ASC and MASC Algorithm Examples
– ASC version of Prim’s MST Algorithm
– ASC version of QUICKHULL
– MASC version of QUICKHULL.
2
Associative Computing References
Note: Below KSU papers are available on
the website: http://www.cs.kent.edu/~parallel/
(Click on the link to “papers” in left column)
1.
2.
Maher Atwah, Johnnie Baker, and Selim Akl, An
Associative Implementation of Classical Convex
Hull Algorithms, Proc of the IASTED International
Conference on Parallel and Distributed Computing
and Systems, 1996, 435-438
Johnnie Baker and Mingxian Jin, Simulation of
Enhanced Meshes with MASC, a MSIMD Model,
Proc. of the Eleventh IASTED International
Conference on Parallel and Distributed Computing
and Systems, Nov. 1999, 511-516.
3
Associative Computing References
3.
4.
5.
Mingxian Jin, Johnnie Baker, and Kenneth Batcher,
Timings for Associative Operations on the MASC
Model, Proc. of the 15th International Parallel and
Distributed Processing Symposium, (Workshop on
Massively Parallel Processing, San Francisco, April
2001.
Jerry Potter, Johnnie Baker, Stephen Scott, Arvind
Bansal, Chokchai Leangsuksun, and Chandra
Asthagiri, An Associative Computing Paradigm,
Special Issue on Associative Processing, IEEE
Computer, 27(11):19-25, Nov. 1994. (Note: MASC is
called ‘ASC’ in this article.)
Jerry Potter, Associative Computing - A Programming
Paradigm for Massively Parallel Computers, Plenum
Publishing Company, 1992.
4
Associative Computers
Associative Computer: A SIMD computer
with a few additional features supported in
hardware.
• These additional features can be
supported (less efficiently) in traditional
SIMDs in software.
• The name “associative” is due to its ability
to locate items in the memory of PEs by
content rather than location.
5
Associative Models
The ASC model (for ASsociative Computing) gives
a list of the properties assumed for an
associative computer.
The MASC (for Multiple ASC) Model
• Supports multiple SIMD (or MSIMD)
computation.
• Allows model to have more than one Instruction
Stream (IS)
– The IS corresponds to the control unit of a SIMD.
• ASC is the MASC model with only one IS.
– The one IS version of the MASC model is sufficiently
important to have its own name.
6
ASC & MASC are KSU Models
• Several professors and their graduate students at Kent
State University have worked on models
• The STARAN and the ASPRO fully support the ASC
model in hardware. The MPP supports ASC, partly in
hardware and partly in software.
– Prof. Batcher was chief architect or consultant
• Dr. Potter developed a language for ASC
• Dr. Baker works on algorithms for models, applications,
and architectures to support models
• Dr. Walker has students who worked on a hardware
design to support the ASC and MASC models.
• Dr. Batcher and Dr. Potter are currently not actively
working on ASC/MASC models but still provide advice.
7
Motivation
• The STARAN Computer (Goodyear Aerospace,
early 1970’s) and later the ASPRO provided an
architectural model for associative computing
embodied in the ASC model.
• ASC extends the data parallel programming
style to a complete computational model.
• ASC provides a practical model that supports
massive parallelism.
• MASC provides a hybrid data-parallel, control
parallel model that supports associative
programming.
• Descriptions of these models allow them to be
compared to other parallel models
8
The ASC Model
C
E
L
L
Cells
Memory
PE
  
N
E
T
W
O
R
K
IS
Memory
PE
Memory
PE
9
Basic Properties of ASC
• Instruction Stream
– The IS has a copy of the program and can broadcast
instructions to cells in unit time
• Cell Properties
– Each cell consists of a PE and its local memory
– All cells listen to the IS
– A cell can be active, inactive, or idle
• Inactive cells listen but do not execute IS commands
until reactivated
• Idle cells contain no essential data and are available
for reassignment
• Active cells execute IS commands synchronously
10
Basic Properties of ASC
• Responder Processing
– The IS can detect if a data test is satisfied by
any of its responder cells in constant time
(i.e., any-responders property).
– The IS can select an arbitrary responder in
constant time (i.e., pick-one property).
11
Basic Properties of ASC
• Constant Time Global Operations (on a
parallel variable with a value in each PE)
– Logical OR and AND of binary values
– Maximum and minimum of numbers
– Associative searches
• Communications
– There are two networks
• PE communications (or cell) network
• IS broadcast/reduction network (which could be
implemented as two separate networks)
12
Basic Properties of ASC
– The PE communications network is normally
supported by an interconnection network
• E.g., a 2D mesh
– The broadcast/reduction network(s) are
normally supported by two networks
(broadcast & reduction) but sometimes
combined.
• See posted paper by Jin, Baker, & Batcher (listed in
associative references)
• Control Features
– PEs and the IS and the networks all operate
synchronously, using the same clock
13
Non-SIMD Properties of ASC
• Observation: The ASC properties that are unusual for
SIMDs are the constant time operations:
– Constant time responder processing
• Any-responders?
• Pick-one
– Constant time global operations
• Logical OR and AND of binary values
• Maximum and minimum value of numbers
• Associative Searches
• Typically, SIMDs cannot perform any of these operations
in constant time – but have efficient executions of them.
• These timings are justified by implementations using a
resolver in the paper by Jin, Baker, & Batcher (listed in
14
associative references and posted).
Typical Data Structure for ASC Model
PE1
Make
Color
Year
Dodge
red
1994
PE2
PE3
IS
PE4
Price
Busyidle
1
1
0
0
Ford
blue
1996
1
1
Ford
white
1998
0
1
PE5
PE6
PE7
Model
On
lot
Subaru
red
1997
0
0
0
0
1
1
Make, Color – etc. are fields the programmer establishes
Various data types are supported. Some examples will show
string data, but they are not supported in the ASC simulator. 15
The Associative Search
PE1
Make
Color
Year
Dodge
red
1994
Model
Price
PE2
PE3
IS
PE4
Busyidle
1
1
0
0
Ford
blue
1996
1
1
Ford
white
1998
0
1
PE5
PE6
PE7
On
lot
Subaru
red
1997
0
0
0
0
1
1
IS asks for all cars that are red and on the lot.
PE1 and PE7 respond by setting a mask bit in their PE.
16
Characteristics of Associative
Programming
• Consistent use of style of programming called
data parallel programming
• Consistent use of global associative searching
and responder processing
• Usually, frequent use of the constant time
global reduction operations: AND, OR, MAX,
MIN
• Broadcast of data using IS bus allows the use
of the PE network to be restricted to parallel
data movement.
17
Associative Programming Characteristics
•
•
•
•
Tabular representation of data – think 2D arrays
Use of searching instead of sorting
Use of searching instead of pointers
Use of searching instead of the ordering provided
by linked lists, stacks, queues
• Promotes an highly intuitive programming style
that promotes high productivity
• Uses structure codes (i.e., numeric
representation) to represent data structures such
as trees, graphs, embedded lists, and matrices.
• Examples of the above are given in
– Ref: Nov. 1994 IEEE Computer article.
– Also, see “Associative Computing” book by Potter.
18
Languages Designed for the ASC
• Professor Potter has created several languages for the
ASC model.
• ASC is a C-like language designed for ASC model
• ACE is a higher level language than ASC that uses
natural language syntax; e.g., plurals, pronouns.
• Anglish is an ACE variant that uses an English-like
grammar (e.g., “their”, “its”)
• An OOPs version of ASC for the MASC was discussed
(by Potter and his students), but never designed.
• Language References:
– ASC Primer – Copy available on parallel lab website
www.cs.kent.edu/~parallel/
– “Associative Computing” book by Potter [11] – some features in
this book were never fully implemented in ASC Compiler
19
Algorithms and Programs Implemented in
ASC
• A wide range of algorithms implemented in ASC
without the use of the PE network:
– Graph Algorithms
• minimal spanning tree
• shortest path
• connected components
– Computational Geometry Algorithms
• convex hull algorithms (Jarvis March, Quickhull,
Graham Scan, etc)
• Dynamic hull algorithms
20
ASC Algorithms and Programs
(not requiring PE network)
– String Matching Algorithms
• all exact substring matches
• all exact matches with “don’t care” (i.e., wild card)
characters.
– Algorithms for NP-complete problems
• traveling salesperson
• 2-D knapsack.
– Data Base Management Software
• associative data base
• relational data base
21
ASC Algorithms and Programs
(not requiring a PE network)
– A Two Pass Compiler for ASC – not the one
we will be using. This compiler uses ASC
parallelism.
• first pass
• optimization phase
– Two Rule-Based Inference Engines for AI
• An Expert System OPS-5 interpreter
• PPL (Parallel Production Language interpreter)
– A Context Sensitive Language Interpreter
• (OPS-5 variables force context sensitivity)
– An associative PROLOG interpreter
22
Associative Algorithms & Programs
(using a network)
• There are numerous associative programs that
use a PE network;
– 2-D Knapsack ASC Algorithm using a 1-D mesh
– Image processing algorithms using 1-D mesh
– FFT (Fast Fourier Transform) using 1-D nearest
neighbor & Flip networks
– Matrix Multiplication using 1-D mesh
– An Air Traffic Control Program (using Flip network
connecting PEs to memory)
• Demonstrated using live data at Knoxville in mid
70’s.
• All but first were developed in assembler at
Goodyear Aerospace
23
Two Algorithm Examples for ASC
• The remainder of the slides are optional
• They illustrate two optimal algorithms for ASC –
described at the data structures level.
– Minimal Spanning Tree
• Provides an ASC version of the sequential Prim’s
algorithm (sometimes called Dijkstra-Prim
algorithm)
– Convex Hull
• Provides an ASC version of the Quickhull
sequential algorithm
• The implementation of these algorithms are
simpler than implementation of their sequential
version.
24
Example 1 - MST
• A graph has nodes labeled by some identifying
letter or number and arcs which are directional
and have weights associated with them.
• Such a graph could represent a map where the
nodes are cities and the arc weights give the
mileage between two cities.
3
A
B
5
2
C
D
4
E 5
25
The MST Problem
• The MST problem assumes the weights are positive, the
graph is connected, and seeks to find the minimal
spanning tree,
–
i.e. a subgraph that is a tree1, that includes all nodes
(i.e. it spans), and
– where the sum of the weights on the arcs of the
subgraph is the smallest possible weight (i.e. it is
minimal).
• Why would an algorithm solving this problem be useful?
• Note: The solution may not be unique.
1
A tree is a set of points called vertices, pairs of distinct
vertices called edges, such that (1) there is a sequence
of edges called a path from any vertex to any other, and
(2) there are no circuits, that is, no paths starting from a
26
vertex and returning to the same vertex.
ASC-MST Algorithm Preliminaries
• Next, a “data structure” level presentation of
Prim’s algorithm for the MST is given.
• The data structure used is illustrated in the next
two slides.
– This example is from the Nov. 1994 IEEE Computer
paper cited in the references.
• There are two types of variables for the ASC
model, namely
– the parallel variables (i.e., ones for the PEs)
– the scalar variables (ie., the ones used by the IS).
– Scalar variables are essentially global variables.
• Can replace each with a parallel variable with this scalar
value stored in each entry.
27
ASC-MST Algorithm Preliminaries (cont.)
• In order to distinguish between them here, the parallel
variables names end with a “$” symbol.
• Each step in this algorithm takes constant time.
• One MST edge is selected during each pass through the
loop in this algorithm.
• Since a spanning tree has n-1 edges, the running time of
this algorithm is O(n) and its cost is O(n 2).
– Definition of cost is (running time)  (number of processors)
• Since the sequential running time of the Prim MST
algorithm is O(n 2) and is time optimal, this parallel
implementation is cost optimal.
– Cost & optimality will be covered in parallel algorithm
performance evaluation chapter (See Ch 7 of Quinn)
28
Graph used for Data Structure
a
22
7
b
4
8
c
3
9
6
d
3
e
f
Figure 6 in [Potter, Baker, et. al.]
29
current_best$
7
IS
d ∞
4 ∞ ∞ 3 ∞ yes b
4
root
a
e ∞
3 6
3 ∞ ∞ yes b
3
nextnode
b
f
candidate$
7 ∞ ∞ 6 9 yes b
f$
8
e$
c
d$
2
c$
a
b$
∞ 7
a$
b 2
node$
2 8 ∞ ∞ ∞ no
mask$
a ∞
PEs
parent$
Data Structure for MST Algorithm
4 3 ∞ no
∞ ∞ 9 ∞ ∞ ∞ wait
30
Algorithm: ASC-MST-PRIM(root)
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
Initialize candidates to “waiting”
If there are any finite values in root’s field,
set candidate$ to “yes”
set parent$ to root
set current_best$ to the values in root’s field
set root’s candidate field to “no”
Loop while some candidate$ contain “yes”
for them
restrict mask$ to mindex(current_best$)
set next_node to a node identified in the preceding step
set its candidate to “no”
if the value in their next_node’s field are less than current_best$, then
set current_best$ to value in next_node’s field
set parent$ to next_node
if candidate$ is “waiting” and the value in its next_node’s field is finite
set candidate$ to “yes”
set parent$ to next_node
31
set current_best to the values in next_node’s field
Comments on ASC-MST Algorithm
• The three preceding slides are Figure 6 in [Potter, Baker,
et.al.] IEEE Computer, Nov 1994].
• Preceding slide gives a compact, data-structures level
pseudo-code description for this algorithm
– Pseudo-code illustrates Potter’s use of pronouns
(e.g., them, its) and possessive nouns.
– The mindex function returns the index of a processor
holding the minimal value.
– This MST pseudo-code is much shorter and simpler
than data-structure level sequential MST pseudocodes
• e.g., see one of Baase’s textbooks cited in references
• Algorithm given in Baase’s books is identical to this parallel
algorithm, except for a sequential computer
• Next, a more detailed explanation of the algorithm in
preceding slide will be given next.
32
Tracing 1st Pass of MST Algorithm on Figure 6
root
nextnode
a ∞
2 8 ∞ ∞ ∞
b 2
∞ 7
c
8
7 ∞ ∞ 6 9
IS
d ∞
4 ∞ ∞ 3 ∞
a
e ∞
3 6
f
current_best$
parent$
candidate$
f$
e$
d$
c$
b$
a$
node$
mask$
PEs
(Put below chart & Figure 6 on board)
4 3 ∞
3 ∞ ∞
∞ ∞ 9 ∞ ∞ ∞
33
Algorithm: ASC-MST-PRIM
• Initially assign any node to root.
• All processors set
– candidate$ to “wait”
– current-best$ to 
– the candidate field for the root node to “no”
• All processors whose distance d from their node
to root node is finite do
– Set their candidate$ field to “yes
– Set their parent$ field to root.
– Set current_best$ = d.
34
Algorithm: ASC-MST-PRIM (cont. 2/3)
• While the candidate field of some processor is
“yes”,
– Restrict the active processors whose candidate field
is “yes” and (for these processors) do
• Compute the minimum value x of current_best$.
• Restrict the active processors to those with
current_best$ = x and do
– pick an active processor, say node y.
» Set the candidate$ value of node y to “no”
– Set the scalar variable next-node to y.
35
Algorithm: ASC-MST-PRIM (cont. 3/3)
–If the value z in the next_node column of
a processor is less than its
current_best$ value, then
»Set current_best$ to z.
»Set parent$ to next_node
– For all processors, if candidate$ is “waiting”
and the distance of its node from next_node y
is finite, then
• Set candidate$ to “yes”
• Set current_best$ to the distance of its node from
y.
• Set parent$ to y
36
Quickhull Algorithm for ASC
• Reference:
– [Maher, Baker, Akl, “An Associative
Implementation of Classical Convex
Hull Algorithms” ]
h
• Review of Sequential Quickhull
Algorithm


– Suffices to find the upper convex hull
of points that are on or above the line
• Select point h so that the area of
triangle weh is maximal.
• Proceed recursively with the sets
of points on or above the lines wh
and he .









w







37
e

Previous Illustration


h












w




e

38
Example for Data Structure

P6, h


p5
p7 

p4

p1, w

p2


P3, e

39
Data Structure for Preceding Example
right-pt$
area$
left-pt$
y-coord$
name$
x-coord$
PE
mask
point$
hull$
job$
p1 1
3
p1 p3 1
1
p2 7
1
p1 p3
0
p3 12 2
p1 p3 1
1
p4 8
4
p1 p3
1
h
p5 11 7
p1 p3
1
ctr
p6 8
9
p1 p3 1
1
p7 2
6
p1 p3
1
0
IS
w
e
h
40
Algorithms & Assumption
• Basic algorithms exist for the following problems
in Euclidean geometry for plane:
– Determine whether a third point lies on, above, or
below the line determined by two other points.
– Compute the area of a triangle determined by three
points.
• Standard Assumption
– Three arbitrary points do not all lie on the same line.
Reference: Introduction to Algorithms by Cormen, Leisterson, Rivest,
(& Stein), McGraw Hill, Chapter on Computational Geometry.
41
ASC Quickhull Algorithm
(Upper Convex Hull)
ASC-Quickhull( planar-point-set )
1. Initialize: ctr = 1, area$ = 0, hull$ = 0
2. Find the PE with the minimal x-coord$ and let
w be its point$
a) Set its hull$ value to 1
3. Find the PE with the PE with maximal x-coord$
and let e be its point$
a) Set its hull$ to 1
4. All PEs set their left-pt to w and right-pt to e.
5. If the point$ for a PE lies above the line
a) Then set its job$ value to 1
b) Else set its job$ value to 0
42
ASC Quickhull Algorithm (cont)
6.
Loop while parallel job$ contains a nonzero value
a) The IS makes its active cell those with a maximal
job$ value.
b) Each (active) PE computes and stores the area of
triangle (left-pt$, right-pt$, point$ ) in area$
c) Find the PE with the maximal area$ and let h be its
point.
•
Set its hull$ value to 1
d) Each PE whose point$ is above left  pt , h
•
•
sets its job$ value to ++ctr
sets its right-pt to h
e) Each PE whose point$ is above h, right  pt
•
•
f)
sets its job$ to ++ctr
sets its left-pt to h
Each PE with job$ < ctr -2 sets its job$ value to 0
43
Highest Job Order Assigned to
Points Above Lines




7
4


3
2
5
6

1



44
Order that Triangles are Computed




6
4


2
5
3
7

1



45
Performance of ASC-Quickhull
Average Case:
• Assume either of the following:
– For some integer k>1, on average 1/k of the
points above each line being processed are
eliminated each round.
• For example, consider k = 3, as one of three
different areas are eliminated each round
– O(lg n) points are on the convex hull.
• For randomly generated points, the number of
convex hull points is very close to lg(n) points.
46
Performance of ASC-Quickhull
(cont)
• Either of above assumptions imply the
average running time is O(lg n).
– For example, each pass through algorithm
loop produces one convex hull point.
• The average cost is O(n lg n)
Worst Case:
• Running time is O(n).
• Cost is O(n2)
Recall: The definition of cost is
Cost = (running time)  (nr. of processors)
47
Additional Comments on ASC
Convex Hull Algorithm
• The full “convex hull” algorithm requires that an
order (e.g., clockwise) list of convex hull points
be returned.
– Preceding algorithms for ASC and MASC can be
extended to handle this.
• This detail is omitted here to keep the algorithms
simpler.
– More information can be found in the paper “An
Associative Implementation of Classical Convex Hull
Algorithms” by Atwah, Baker, and Akl and in Maher
Atwah’s master’s thesis at KSU.
48