par-langs-ch6 - Introduction to Concurrency in Programming

Download Report

Transcript par-langs-ch6 - Introduction to Concurrency in Programming

Concurrency in Programming
Languages
Matthew J. Sottile
Timothy G. Mattson
Craig E Rasmussen
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
1
Chapter 6 Objectives
• Examine the historical development of concurrent
and parallel systems.
– Hardware
– Languages
• Discuss relationship that languages had with
evolving hardware.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
2
Outline
• Evolution of machines
• Evolution of languages
• Limits of automatic parallelization
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
3
Early concurrency
• Biggest challenge with early computers was
keeping them busy.
– Increasing utilization.
• Idle CPUs meant wasted cycles.
• Common source of idle CPUs: I/O
• How to use the CPU when a program is waiting on
I/O?
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
4
Addressing the I/O problem
• Two early methods became popular:
– Multiprogramming
– Interrupt driven I/O
• Concurrency addressed the utilization problem by
allowing different programs to use resources when
one becomes blocked on I/O.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
5
Multiprogramming
• Transparently multiplex the computing hardware to
give the appearance of simultaneous execution of
multiple programs.
• Prior to multicore, single CPU machines used this
approach to provide multitasking.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
6
Interrupt Driven I/O
• Another development was interrupt driven I/O.
• This allows hardware to notify the CPU when I/O
operations were complete.
– Avoids inefficient active polling and checking by CPU.
– CPU can do other work and only worry about I/O
operations when the I/O hardware tells it that they are
ready.
• Interrupt-based hardware helps manage parallel
devices within a machine.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
7
CPU sophistication
• Multiprogramming and interrupt-based hardware
addressed early utilization problems when CPUs
were simple.
• CPUs continued to advance though:
– Huge increases in speed (cycles/second)
– More sophisticated instruction sets
• This led to more hardware advances to support
the increased capabilities of newer CPUs.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
8
Memory performance
• The time difference between CPU clock cycles
and physical I/O devices (like tapes) always were
large.
• Soon CPUs overtook digital memory performance
as well.
– Memory itself looked slower and slower relative to the
CPU.
– Results in the same utilization problem faced in the
early days with I/O devices.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
9
Hierarchical memory
• One approach to dealing with this growing gap
between CPU performance and memory
performance was caches.
• Place one or more small, fast memories between
the CPU and main memory.
– Recently accessed data is replicated there.
– Locality assumption is often safe: memory that was
recently accessed is likely to be accessed again.
– Caches speed up these subsequent accesses, and we
amortize cost to first access it from slow memory with
multiple subsequence accesses from fast cache.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
10
Pipelining
• Another advance was to decompose instructions
into pieces such that the CPU be structured like a
factory assembly line.
– Instructions start at one end, and as they pass through,
subtasks are performed such that at the end, the
instruction is complete.
• This allows multiple instructions to be executing at
any point in time, each at a different point in the
pipeline.
– This is a form of parallelism: instruction level
parallelism.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
11
Pipelining
• Pipelining allowed for more complex instructions to
be provided that required multiple clock cycles to
complete.
– Each clock cycle, part of the instruction could proceed.
• Instruction level parallelism allowed this multicycle complexity to be hidden.
– If the pipeline could be kept full, then every cycle an
instruction could complete.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
12
Vector processing
• Another method for achieving parallelism in the
CPU was to allow each instruction to operate on a
set of data elements simultaneously.
• Vector processing tool this approach – instructions
operated on small vectors of data elements.
• This became very popular in scientific computing,
and later in multimedia and graphics processing.
– Today’s graphics processing units are a modern
descendant of early vector machines.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
13
Dataflow
• In the 1970s and 1980s, dataflow was proposed
an alternative architecture to the traditional
designs dating back to the 1940s.
• In dataflow, programs are represented as graphs
in which vertices represent computations and
edges represent data flowing into and out of
computations.
– Large opportunities for parallelism – any computation
with data ready could execute.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
14
Dataflow
• Example: an expression (a+b)*(c+d) could be
represented as the flow graph below.
• The independence of the + nodes means they can
execute in parallel.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
15
Massively parallel machines
• In the 1980s and early 1990s, “massively parallel”
machines with many, many parallel processors
were created.
• Design goal: use many, many simple CPUs to
solve problems with lots of available parallelism.
– Many instructions would complete per second due to
high processor count.
– This would outperform more systems with few complex
CPUs.
– Relied on finding problems with lots of parallelism.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
16
Distributed systems and clusters
• Special purpose supercomputers (such as MPPs)
were very expensive.
• Networks of workstations could achieve similar
performance by writing programs that used
message passing as the method for coordinating
parallel processing elements.
• Performance was often lower than special purpose
supercomputers due to network latency, but
extreme cost savings to buy clusters of
workstations outweighed performance impact.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
17
Today: Multicore and accelerators
• Processor manufacturers today can put multiple
CPUs on a single physical chip.
– Expensive shared memory multiprocessors of the past
are now cheap desktop or laptop processors!
• Demands of multimedia (video, 3D games) have
led to adoption of multicore and vector processing
in special purpose accelerators.
– Most machines today have a graphics card that is more
capable than a vector supercomputer 20 to 30 years
ago.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
18
Outline
• Evolution of machines
• Evolution of languages
• Limits of automatic parallelization
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
19
FORTRAN
• The first widely used language was FORTRAN –
the FORmula TRANslation language.
– Dominant in numerical applications, which were the
common application area for early computers.
• Standarad FORTRAN took many decades to
adopt concurrency constructs.
• Dialects of FORTRAN built for specific machines
did adopt these constructs.
– E.g.: IVTRAN for the ILLIAC IV added the “DO FOR
ALL” loop to provide parallel loops.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
20
ALGOL
• The ALGOrithmic Language was introduced
around the same time as FORTRAN.
– Introduced control flow constructs present in modern
languages.
• ALGOL 68 introduced concurrency constructs.
– Collateral clauses allowed the programmer to express
sequences of operations that could be executed in
arbitrary order (or in parallel).
– Added a data type for semaphores used for
synchronization purposes.
– Introduced “par begin”, a way of expressing that a block
of statements can be executed in parallel.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
21
Concurrent Pascal and Modula
• Concurrent Pascal is a dialect of the Pascal
language designed for operating systems software
developers.
– Operating systems are fundamentally concerned with
concurrency since the introduction of multiprogramming
and parallel I/O devices.
• Added constructs representing processes and
monitors.
– Monitors are similar to objects in that they provide data
encapsulation and synchronization primitives for
defining and enforcing critical sections that operate on
this data.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
22
Communicating Sequential Processes
• CSP was a formal language defined to represent
concurrent processes that interact with each other
via message passing.
• The occam language was heavily influenced by
CSP and provided language constructs for:
– Creating processes.
– Defining sequential and parallel sequences of
operations.
– Channels for communication between processes.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
23
Ada
• Ada was a language designed for the US Dept. of
Defense in the 1970s and 1980s, still in use today.
• Intended to be used for critical systems in which
high software assurance was required.
• Included constructs for concurrent programming
early in its development.
• Ada used the “task” abstraction to represent
concurrently executing activities.
– Communication via message passing or shared
memory.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
24
Ada
• Tasks communication is via rendezvous.
• Tasks reach points where they block until another
task reaches a paired point, at which time they
communicate and continue.
– The tasks “rendezvous” with each other.
• Ada 95 also introduced protected objects for
synchronization of data access.
– Provides mutual exclusion primitives to the programmer.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
25
Functional languages
• Functional languages have been around since
LISP in the 1950s.
• Typically considered to be at a higher level of
abstraction from imperative languages like C.
– E.g.: Mapping a function over a list. A functional
programmer doesn’t implement lower level details like
how the list is represented or how the loop iterating over
its elements is structured.
• Higher level of abstraction leaves more decision
making up to the compiler.
– Such as parallelization.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
26
Functional languages
• MultiLISP was a dialect of Scheme that focused
on concurrent and parallel programming.
• Included the notion of a future variable.
– A future is a variable that can be passed around but
may not have a value associated with it until a later
time.
– Operations on these future values can be synchronized
if they are required before they are available.
• Futures have influenced modern languages, like
Java and X10.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
27
Dataflow languages
• Dataflow languages like Id and VAL were created
to program dataflow hardware.
– Purely functional core languages restricted side effects
and made derivation of data flow graphs easier.
– Lack of side effects facilitates parallelism.
– I-structures and M-structures were part of the Id
language to provide facilities for synchronization,
memory side effects, and I/O.
• Modern languages like Haskell provide constructs
(e.g.: Haskell MVars) that are based on features of
dataflow languages.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
28
Logic languages
• Based on formal logic expressions.
– Programmers stated sets of relations and facts to
encode their problems.
– Fundamental core of the languages are logical
operators (AND, OR, NOT).
– High degree of parallelism in logic expressions. If we
have an expression “A and B and C”, A, B, and C can
be evaluated in parallel.
• Logic languages like PROLOG influenced modern
languages like Erlang.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
29
Parallel languages
• A number of languages were explored that
specifically focused on parallelism.
– Earlier examples were focused on general purpose
programming, with concurrency constructs as a
secondary concern.
• Languages like High Performance Fortran and
ZPL instead focused on abstractions that were
specifically designed to build parallel programs.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
30
Parallel languages
• In both HPF and ZPL, data distribution was a key
concern.
– Given a set of parallel processing elements, how does
the programmer describe how large logical data
structures are physically decomposed across the
processors?
– Goal was to let the compiler generate the often tedious
and error prone code to handle data distribution and
movement.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
31
Parallel languages
• Encourages users to take a “global view” of
programs, not focus on local processor view.
• This lets programmer focus on the problem they
want to solve, instead of details about how to map
their problem onto a parallel machine.
• Parallel languages also focus on retargetability.
– If parallelization decisions are fully controlled by the
compiler, then it can make different decisions for
different platforms. Portability is easier in this case.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
32
Modern languages
• With the introduction of multicore, concurrency
constructs are being added to most mainstream
languages now.
–
–
–
–
Java: Threading and synchronization primitives.
Fortran: Co-Array Fortran added to 2008 standard.
.NET languages: Synchronization and threading
Clojure: LISP derivative with software transactional
memory
– Scala: Concurrent functional language.
– Haskell: Software transactional memory, Mvars,
threads.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
33
Modern languages
• Most new language features in concurrent
languages are based on features explored in
earlier languages.
• Studying older languages that include concurrency
constructs is informative in understanding what
motivated their design and creation.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
34
Outline
• Evolution of machines
• Evolution of languages
• Limits of automatic parallelization
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
35
Inferrence of parallelism is hard
• Most people agree that it is exceptionally difficult
to automatically parallelize programs written in
sequential languages.
– Language features get in the way. E.g., pointers
introduce potential for aliasing, which restricts compiler
freedom to parallelize.
– High-level abstractions are lost in low-level
implementations. Complex loops and pointer-based
data structures make it very challenging to infer
structures that can be parallelized.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
36
Move towards concurrent languages
• Vectorizing and parallelizing compilers are very
powerful, but they are reaching their limits as
parallelism seen in practice increases.
• The big trend in language design is to introduce
language features that are built to support
concurrent and parallel programming.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen
37