Parallel Collections - Prokopec

Download Report

Transcript Parallel Collections - Prokopec

Scala Parallel Collections
Aleksandar Prokopec, Tiark Rompf
Scala Team
EPFL
Introduction
• multi-core programming – not
straightforward
• need better higher order
abstractions
• libraries and tools have only begun
using these new capabilites
• collections - everywhere
Scala Collection Framework
• most operations implemented in
terms of an abstract method
def foreach[U](f: T => U): Unit
• new collections are created using
builders
trait Builder[Elem, To]
Example
• the filter method:
def filter(p: A => Boolean): Repr = {
val b = newBuilder
for (x <- this) if (p(x)) b += x
b.result
}
List(1, 2, 3, 4, 5, 6, 7).filter(_ % 2 == 0)
1
2
3
4
5
6
7
Nil
Builder
Nil
Parallel operations
• parallel traversal should be easy for
some data structures
• could filter be parallelized by
having a concurrent builder?
• 3 problems:
– order may not be preserved anymore – sequences?
– performance concerns
– there are more complicated methods such as span
Method span
• assume an array (keep it simple)
array.span(_ >= 0)
7
3 11 99 -1 19 22 99 21 42 63 -5 11 -2 -7 1 33
um... not a good idea
prefixElems
suffixElems
Method reduce
• span seems inherently sequential
• we’ll get back to this, let’s try
something simpler – reduce
def reduce[U >: T](op: (U, U) => U): U
• takes an associative operator and
applies it between all the elements
(examples: adding, concatenation)
Method reduce
• assume associative operator is
concatenation
val s = “Tell your friends and family to use Scala.”
s.split(“ ”).toArray.reduce(_ + _)
+
Tell your friends and
Tell
your
friends and
family to use Scala.
family
to use Scala.
Method reduce
• we might have more processors
36
+
1
10
26
+
+
3
7
11
15
+
+
+
+
2
3
4
5
6
7
8
• this is a well known pattern from
parallel programming
• but, we need a right abstraction
Method split
• we can implement methods such
as reduce, foreach, count, find
and forall assuming we can
divide the collection
• new abstract operation
def split: Seq[Repr]
• returns a non-trivial partition of
the collection
Method split
def split: Seq[Repr]
• how to implement?
– copy elements
– produce a wrapper
– use data structure properties (e.g. tree)
Method filter
• this abstract method can be used to
implement accessor methods
• for transformer methods such as
filter this is not sufficient –
collection results should be merged
2, 4, 6, 8, 8, 0, 2, 2
2, 4, 6, 8
1, 2, 4
3, 4
5, 6,
6, 7,
88
8, 0, 2, 2
3, 8,
1, 0
8, 0
2, 2,
2, 1,
29
Method combine
• we need another abstraction
def combine[Other >: Repr]
(that: Other): Other
• creates a collection that contains
all the elements of this collection
and that collection
Method combine
def combine[Other >: Repr]
(that: Other): Other
• how to implement?
– copy elements
– use lazy evaluation to copy twice
– use specialized data structures
Lazy collection evaluation
• merge occurs more than once
• each processor adds results to its
own builder
• evaluation occurs in the root
2
4
6
8
8
0
2
copy
2
merge
merge
1, 2, 4
3, 4
allocate
merge
5, 6,
6, 7,
8 8
3, 8,
1, 0
8, 0
2, 2,
2, 1,
2 9
Lazy collection evaluation
• advantages:
– easier to apply to existing collections
– for certain data structures copying is
cheap (arrays)
– merging is very cheap
• disadvantages:
– copying occurs twice – affects cheap
operations
– garbage collection occurs more often
Specialized data structures
• some data structures such can be
merged efficiently (trees, heaps,
skiplists…)
• immutable vectors – immutable
sequences with efficient splitting
and concatenation
Method span
• each processors keeps 2 builders
• merge has 2 cases
– counterexample in the left partition
– no counterexample in the left partition
3
3
9
9 -1 2
4 -5
2
-1
-5
4
2
-1
4
7
-5
7
3
3
2
4 -7 2
2
4
7
2
3
4
-7
2
Load balancing
• processor availability and data
processing cost may not be uniform
1
2
3
. . .
750
751
752
753
754
755
Done!
• fine grained division – more tasks
than processors
Work-stealing
• need to schedule tasks to
processors – work stealing
• each processor has a task queue
• when it runs out of tasks – it steals
from other queues
steal!
proc 1
proc 2
Adaptive work-stealing
• still, a large number of tasks can
lead to an overhead
adaptive partitioning
Adaptive work-stealing
• ensures better load balancing
steal!
proc 1
proc 2
Package hierarchy
• subpackage of collection package
collection
mutable
immutable
mutable
parallel
immutable
Class hierarchy
• consistent with existing collections
• clients can refer to parallel
collections transparently
Iterable
Map
Seq
ParallelMap
Set
ParallelIterable
ParallelSeq
ParallelSet
How to use
• be aware of side-effects
var k = 0
array.foreach(k += _)
• parallel collections are not
concurrent collections
• careful with small collections – cost
of setup may be higher
How to use
• parallel ranges – a way to parallelize
for-loops
for (i <- (0 until 1000).par) yield {
var num = i
var lst: List[Int] = Nil
while (num > 0) {
lst ::= num % 2
num = num / 2
}
lst
}
Benchmarks
• microbenchmarks with low cost perelement operations
foreach
1
2
4
6
8
Sequential
1227
1227
1227
1227
1227
ParallelArray
1180
797
529
449
421
Extra166
1195
757
544
442
403
reduce
1
2
4
6
8
Sequential
949
949
949
949
949
ParallelArray
832
551
375
328
297
Extra166
890
566
363
300
282
Benchmarks
• microbenchmarks with low cost perelement operations
filter
1
2
4
6
8
Sequential
611
611
611
611
611
ParallelArray
476
333
235
216
208
Extra166
581
372
296
280
264
find
1
2
4
6
8
Sequential
1181
1181
1181
1181
1181
ParallelArray
961
608
410
331
300
Extra166
841
602
393
309
294
Current state
•
•
•
•
an array - ParallelArray
ranges - ParallelRange
views - ParallelView
working on – ParallelVector and
ParallelHashMap
Conclusion
•
•
•
•
good performance results
nice integration with existing collections
more parallel collections worked on
will be integrated into Scala 2.8.1