par-langs-ch7 - Introduction to Concurrency in Programming

Download Report

Transcript par-langs-ch7 - Introduction to Concurrency in Programming

Concurrency in Programming
Languages
Matthew J. Sottile
Timothy G. Mattson
Craig E Rasmussen
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
1
Chapter 7 Objectives
• Examine language constructs for parallel
programming
– Array abstractions
– Message passing
– Control flow
• Look at how parallelism and concurrency fit into
the functional programming paradigm
• Focus on modern languages
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
2
Outline
•
•
•
•
Array abstractions
Message passing
Control-flow parallelism
Functional languages
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
3
Parallelism and arrays
• Array-based computations typically exhibit a high
degree of parallelism
– Example: elementwise addition of two arrays to
compute a third containing the sum of the first two.
• In languages without rich array types, this
parallelism is extracted by parallelizing loops.
– In the sum example above, recognizing that loop
iterations are independent and can run in parallel.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
4
Example: adding two vectors
/* code to sum two vectors of length len */
int i;
double *x,*y,*sum;
int len;
for (i=0;i<len;i++)
sum[i] = x[i]+y[i];
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
5
Example: adding two vectors
/* code to sum two vectors of length len */
int i;
double *x,*y,*sum;
int len;
for (i=0;i<len;i++)
sum[i] = x[i]+y[i];
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
The ith iteration doesn’t
depend on any other, so it
can be run in parallel with
other iterations.
6
Array types
• In languages like C where no rich array type is
present, computations on arrays must be explicitly
coded up as loops.
• Languages with rich array types allow the
programmer to express computations over arrays
more concisely, leaving the decision of how to
implement it up to the compiler.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
7
Array types
• Languages without rich array types require the
compiler to infer parallel operations from loops.
• Then the compiler must figure out how to best
parallelize it.
• In other words, the complexity of the compilation
task to parallelize code is greater for languages
with limited type systems.
– Rich type systems help both programmer and compiler!
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
8
Example: adding two vectors
INTEGER :: len
DOUBLE, DIMENSION(len) :: x,y,sum
sum = x+y
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
9
Example: adding two vectors
INTEGER :: len
DOUBLE, DIMENSION(len) :: x,y,sum
sum = x+y
Addition of the whole array
is expressed by adding
the arrays themselves.
The compiler decides how
to implement this.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
10
Higher-level abstraction
• The array-based computations of languages with
whole-array syntax are defined to update the array
elements in parallel.
• The compiler chooses how to implement this
efficiently.
– Possibly sequentially, possibly in parallel.
– The important thing is that the programmer doesn’t
worry about it – the compiler handles the decision
based on the target platform.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
11
Pros and Cons of Explicit Loops
• Con: Explicit loops take freedom from the
compiler.
– Parallelization of loops requires the compiler to infer the
presence of parallelism.
– Complex loops means the compiler may miss
parallelization opportunities.
• Pro: Explicit loops give the programmer full
control.
– Array syntax can make it harder to understand the
properties of the program created by the compiler.
– Example: memory usage is harder to reason about due
to implicit temporaries.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
12
Temporaries
• Consider the following code:
int i;
double *tmp, *x;
int len;
for (i=0;i<len-1;i++)
tmp[i] = x[i]+x[i+1];
for (i=0;i<len-1;i++)
x[i] = tmp[i];
• A dependency is present between loop iterations.
A temporary variable is used to prevent
overwriting values until all elements have been
processed, and then the results are copied back.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
13
Temporaries
• Now consider the same code in array notation.
INTEGER :: len
DOUBLE, DIMENSION(len) :: x
x(1:len-1) = x(1:len-1)+x(2:len)
• In order to guarantee the same behavior as the
code with loops, the compiler must generate
temporary storage behind the scenes.
• In complex code, it can be difficult to see where
temporaries will arise. They are not explicitly
shown in code using array notation.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
14
Temporaries
• The problem of implicit temporaries is well known.
• Elemental functions are a mechanism to avoid
unexpected temporary overhead.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
15
Elemental functions
• Functions that return arrays often force the
compiler to create a temporary to store their result
when they are used in an expression.
• Elemental functions are functions on scalar values
that can be called with array arguments as well.
• Calling an elemental function on an array implicitly
loops the function application over the array.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
16
Elemental functions
• This allows the compiler to avoid temporaries
since it can infer that calling the function with an
array is equivalent to a loop in which it is called for
each element.
• Code that would otherwise be generated with
temporaries can be generated to work in-place
when elemental functions are present.
– Elemental functions make reasoning about memory
overhead easier.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
17
Shifts
• A common pattern in array-based computations is
to work with an array where the indices have been
shifted by a fixed amount.
– Example: Subtracting each element from its successor
is equivalent to subtracting the array from itself with its
indices shifted by one.
• Shift primitives are often available to inform the
compiler of this pattern.
– Shift primitives also deal with boundary issues, such as
circular shifts or shifts in which zero elements are
introduced at the boundaries.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
18
Outline
•
•
•
•
Array abstractions
Message passing
Control-flow parallelism
Functional languages
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
19
Message passing
• Parallel programs often are built by explicitly
passing data between processes.
– Sending and receiving messages.
• The message passing interface (MPI) library is an
example of this at the API level.
• Some languages integrate the concept of
message passing directly in the language syntax.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
20
Message passing
• Message passing can have many forms.
– One sided: One process initiates a send or receive with
another process without the other process explicitly
issuing a corresponding receive or send.
• This is often a form of direct memory access, not necessarily via
hardware-level methods.
– Two sided: Both processes participate with one
explicitly issuing a send or receive, and the other
explicitly issuing the receive or send.
– Remote procedure call: Message passing can also
take the form of invocation of functions between
process spaces.
• RPC is often implemented via one- or two-sided messaging.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
21
Actor model
• The actor model is an abstraction for message
passing.
• Encountered in the Erlang language.
• Processes are actors.
• Messages are sent between actors by delivering
them into mailbox buffers.
• Actors receiving messages examine their
mailboxes to find data that was sent to them
asynchronously.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
22
Channels
• Languages with channel types allow variables to
represent endpoints of message passing
operations.
• A variable may be defined to be a channel.
• Operations on the channel allow data to be
pushed into it (sent) or read from it (received).
• The occam language provides a channel type.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
23
Co-arrays
• Traditional arrays have one or more dimensions
that can be indexed.
– These dimensions translate to linear coordinates in
memory.
• Co-arrays add dimensions that correspond to
other processors.
– Traditional dimensions still correspond to a mapping to
linear memory coordinates, but an additional dimension
allows one to indicate which processor owns the data.
• Introduced in Co-Array Fortran, adopted in Fortran
2008.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
24
Outline
•
•
•
•
Array abstractions
Message passing
Control-flow parallelism
Functional languages
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
25
Control flow
• Arrays and message passing are data centric.
– Parallelism is focused on the data.
• Control flow parallelism focuses on parallelism of
different operations.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
26
Algol collateral clauses
• Early languages like Algol allowed programmers to
specify sequences of operations in which the order
of execution didn’t matter.
– This implied that the operations had no dependencies
and could therefore execute in parallel.
• Example: x=f(1), y=g(); z=f(x); x=y+z;
• The comma indicates that the first two statements
can execute in parallel, but the assignments to z
and x that follow them must execute in sequence.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
27
Occam PAR, SEQ, and ALT
• The occam language provided keywords to
indicate that blocks of statements should be
executed in parallel or in sequence.
• SEQ and PAR blocks could be nested.
– A SEQ block could contain PAR blocks, meaning that
each the PAR blocks could contain parallelism
internally, but they would execute in sequence relative
to each other.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
28
Occam PAR and SEQ
• Example:
SEQ
x := 14
PAR
foo1(x)
foo2(x)
PAR
bar1(x)
bar2(x)
y := 12
SEQ has four activities that will be executed in sequence. Two of these are
PAR blocks whose contents can be executed in parallel within each block.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
29
Occam ALT
• Sometimes a program will want to execute one
statement from a set of candidates based on a
condition. Occam ALT expresses this.
– Useful for dealing with multiple message passing
channels.
• For example, given two channels, pick the one
that receives data first:
INT x:
ALT
a?x
doSomething(x)
b?x
doSomethingElse(x)
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
30
Parallel loops
• A common idiom in modern languages is the
parallel loop.
– Most prevalent: OpenMP parallel for loops
• Explicitly tell the compiler when we wish to
execute loop iterations in parallel to each other.
– Removes need for the compiler to infer parallelism.
– Also allows programmer to indicate explicitly what data
is shared versus private to each loop iteration.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
31
Parallel loops
• Consider the following parallel loop:
#pragma omp parallel
{
#pragma omp for shared(mean) private(i) reduction(+:mean)
for (i=0;i<N;i++) {
mean = mean + x[i] / (double)N;
}
}
• This tells the compiler that the loop iterations can
be done in parallel, and that the value of mean
computed for each iteration should be summed up
to compute the overall mean of x[].
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
32
Parallel loops
• OpenMP allows programmers to say:
– When loop iterations should be performed in parallel.
– How to recombine the results of these iterations to form
a single result (the “reduction”).
– Which data should be treated as shared versus loopprivate.
• Helps compiler determine where copies should be made.
• Parallel loops are often very easy for programmers
to reason about.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
33
Outline
•
•
•
•
Array abstractions
Message passing
Control-flow parallelism
Functional languages
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
34
Functional languages
• Functional languages focus on functions applied to
data.
• They avoid specifying how operations work, but
instead focus on what operations should be
applied to data.
– The “how” the functions work is deferred to the
compiler.
– The programmer focuses on “what” they want to do,
regardless of how it is implemented.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
35
Maps
• Example:
– Given a list of elements “x”, square each of them.
– y = map square x
• The functional programmer simply says that y is
defined as the result of applying the function
square to all elements of x.
• The programmer never specifies how this function
is applied to the list.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
36
Functional programming
• Primitives in functional languages typically focus
on how a single function is applied to data via
various patterns.
– Mapping over all elements.
– Folding a function over all elements to derive a single
value.
– Combining multiple sequences to produce a single
sequence.
• Functional languages typically work at a higher
level of abstraction. Explicit decoupling of “what”
to do from “how” to do it.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
37
Opportunity for parallelism
• By separating the “what to do” from the “how to do
it”, functional languages give compilers great
freedom relative to languages in which
programmers specify low-level machine
operations.
• Functional languages like Haskell often also take
into account side effects.
– Explicitly stating whether or not an operation has an
effect.
– This has a strong impact on parallelization.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
38
Effects and parallelism
• Effects complicate parallelism.
• If two computations have no effects and don’t
depend on each other, their order doesn’t matter.
• If they have effects but still don’t depend on each
other, their order still matters.
– Effects such as printing a value cause the output to vary
depending on order.
• Languages like Haskell capture the effect
properties of computations in the type system.
– This aids parallelizing compilers.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
39