Lecture2-Basic MapReduce Algorithm Design

Download Report

Transcript Lecture2-Basic MapReduce Algorithm Design

MapReduce
Theory and Practice
http://net.pku.edu.cn/~course/cs402/2010/
彭波
[email protected]
北京大学信息科学技术学院
7/15/2010
Some Slides borrow from Jimmy Lin and Aaron Kimball
大纲




Functional Language and MapReduce
MapReduce Basic
MapReduce Algorithm Design
Hadoop and Java Practice
2
Functional Language and
MapReduce
What is Functional Programming?

In computer science, functional programming
is a programming paradigm that treats
computation as the evaluation of mathematical
functions and avoids state and mutable data. It
emphasizes the application of functions, in
contrast with the imperative programming style
that emphasizes changes in state.[1]
4
Example
Summing the integers 1 to 10 in Java:
total = 0;
for (i = 1; i  10; ++i)
total = total+i;
The computation method is variable assignment.
5
5
Example
Summing the integers 1 to 10 in Haskell:
sum [1..10]
The computation method is function application.
6
6
Why is it Useful?


The abstract nature of functional programming
leads to considerably simpler programs;
It also supports a number of powerful new ways
to structure and reason about programs.
7
Functional Programming Review

Functional operations do not modify data
structures:




they always create new ones
Original data still exists in unmodified form
Data flows are implicit in program design
Order of operations does not matter
8
Functional Programming Review
fun foo(l: int list) =
sum(l) + mul(l) + length(l)


Order of sum() and mul(), etc does not matter
They do not modify l
9
Functional Updates Do Not Modify
Structures
fun append(x, lst) =
let lst' = reverse lst in reverse ( x :: lst' )
The append() function above reverses a list, adds a
new element to the front, and returns all of that,
reversed, which appends an item.
But it never modifies lst!
10
Functions Can Be Used As Arguments
fun DoDouble(f, x) = f (f x)
It does not matter what f does to its
argument; DoDouble() will do it twice.
A function is called higher-order if it takes a
function as an argument or returns a
function as a result
11
Map
map f lst: (’a->’b) -> (’a list) -> (’b list)
Creates a new list by applying f to each element of the
input list; returns output in order.
f
f
f
f
f
f
12
Fold
fold f x0 lst: ('a*'b->'b)->'b->('a list)->'b
Moves across a list, applying f to each element plus an
accumulator. f returns the next accumulator value, which
is combined with the next element of the list
f
f
f
initial
13
f
f
returned
fold left vs. fold right



Order of list elements can be significant
Fold left moves left-to-right across the list
Fold right moves from right-to-left
SML Implementation:
fun foldl f a []
= a
| foldl f a (x::xs) = foldl f (f(x, a)) xs
fun foldr f a []
= a
| foldr f a (x::xs) = f(x, (foldr f a xs))
14
Example
fun foo(l: int list) =
sum(l) + mul(l) + length(l)
How can we implement this by map and foldl?
15
Example (Solved)
fun foo(l: int list) =
sum(l) + mul(l) + length(l)
fun sum(lst) = foldl (fn (a,x)=>a+x) 0 lst
fun mul(lst) = foldl (fn (a,x)=>a*x) 1 lst
fun length(lst) = foldl (fn (a,x)=>a+1) 0 lst
16
map Implementation
fun map f []
= []
| map f (x::xs) = (f x) :: (map f xs)


This implementation moves left-to-right across
the list, mapping elements one at a time
… But does it need to?
17
Implicit Parallelism In map



In a purely functional setting, elements of a list
being computed by map cannot see the effects of
the computations on other elements
If order of application of f to elements in list is
commutative, we can reorder or parallelize
execution
This is the “secret” that MapReduce exploits
18
References


http://net.pku.edu.cn/~course/cs501/2008/resou
rce/haskell/functional.ppt
http://net.pku.edu.cn/~course/cs501/2008/resou
rce/haskell/
19
MapReduce Basic
Typical Large-Data Problem





Iterate over a large number of records
Extract something of interest from each
Shuffle and sort intermediate results
Aggregate intermediate results
Generate final output
Key idea: provide a functional abstraction for
these two operations
(Dean and Ghemawat, OSDI 2004)
21
Roots in Functional Programming
Map
f
f
f
f
f
Fold
g
g
g
g
g
22
MapReduce

Programmers specify two functions:
map (k, v) → <k’, v’>*
reduce (k’, v’) → <k’, v’>*
 All values with the same key are sent to the same
reducer

The execution framework handles everything
else…
23
k1 v1
k2 v2
map
a 1
k3 v3
k4 v4
map
b 2
c 3
k5 v5
k6 v6
map
c 6
a 5
map
c 2
b 7
c 8
Shuffle and Sort: aggregate values by keys
a
1 5
b
2 7
c
2 3 6 8
reduce
reduce
reduce
r1 s1
r2 s2
r3 s3
MapReduce

Programmers specify two functions:
map (k, v) → <k’, v’>*
reduce (k’, v’) → <k’, v’>*
 All values with the same key are sent to the same
reducer
What’s “everything else”?

The execution framework handles everything
else…
25
MapReduce “Runtime”

Handles scheduling


Handles “data distribution”


Gathers, sorts, and shuffles intermediate data
Handles errors and faults


Moves processes to data
Handles synchronization


Assigns workers to map and reduce tasks
Detects worker failures and restarts
Everything happens on top of a distributed FS
(later)
26
MapReduce

Programmers specify two functions:
map (k, v) → <k’, v’>*
reduce (k’, v’) → <k’, v’>*
 All values with the same key are reduced together


The execution framework handles everything
else…
Not quite…usually, programmers also specify:
partition (k’, number of partitions) → partition for k’
 Often a simple hash of the key, e.g., hash(k’) mod n
 Divides up key space for parallel reduce operations
combine (k’, v’) → <k’, v’>*
 Mini-reducers that run in memory after the map phase
 Used as an optimization to reduce network traffic
27
k1 v1
k2 v2
map
a 1
k4 v4
map
b 2
c 3
combine
a 1
k3 v3
c 6
a 5
map
c 2
b 7
combine
c 9
partition
k6 v6
map
combine
b 2
k5 v5
a 5
partition
c 8
combine
c 2
b 7
partition
c 8
partition
Shuffle and Sort: aggregate values by keys
a
1 5
b
2 7
c
2 3
9 6
8 8
reduce
reduce
reduce
r1 s1
r2 s2
r3 s3
Two more details…

Barrier between map and reduce phases


But we can begin copying intermediate data earlier
Keys arrive at each reducer in sorted order

No enforced ordering across reducers
29
“Hello World”: Word Count
Map(String docid, String text):
for each word w in text:
Emit(w, 1);
Reduce(String term, Iterable<Int> values):
int sum = 0;
for each v in values:
sum += v;
Emit(term, value);
30
MapReduce can refer to…



The programming model
The execution framework (aka “runtime”)
The specific implementation
Usage is usually clear from context!
31
MapReduce Implementations

Google has a proprietary implementation in C++


Hadoop is an open-source implementation in Java




Bindings in Java, Python
Development led by Yahoo, used in production
Now an Apache project
Rapidly expanding software ecosystem
Lots of custom research implementations

For GPUs, cell processors, etc.
32
User
Program
(1) submit
Master
(2) schedule map
(2) schedule reduce
worker
split 0
split 1
split 2
split 3
(5) remote read
(3) read
worker
worker
(6) write
output
file 0
(4) local write
split 4
worker
output
file 1
worker
Input
files
Map
phase
Adapted from (Dean and Ghemawat, OSDI 2004)
Intermediate files
(on local disk)
Reduce
phase
Output
files
MapReduce Algorithm Design
“Everything Else”

The execution framework handles everything else…





Limited control over data and execution flow


Scheduling: assigns workers to map and reduce tasks
“Data distribution”: moves processes to data
Synchronization: gathers, sorts, and shuffles intermediate data
Errors and faults: detects worker failures and restarts
All algorithms must expressed in m, r, c, p
You don’t know:




Where mappers and reducers run
When a mapper or reducer begins or finishes
Which input a particular mapper is processing
Which intermediate key a particular reducer is processing
35
Tools for Programmer

Cleverly-constructed data structures


Sort order of intermediate keys


Control order in which reducers process keys
Partitioner


Bring partial results together
Control which reducer processes which keys
Preserving state in mappers and reducers

Capture dependencies across multiple keys and values
36
Preserving State
Mapper object
Reducer object
one object per task
state
configure
map
state
API initialization hook
one call per input
key-value pair
configure
reduce
one call per
intermediate key
close
API cleanup hook
37
close
Scalable Hadoop Algorithms:
Themes

Avoid object creation



Inherently costly operation
Garbage collection
Avoid buffering


Limited heap size
Works for small datasets, but won’t scale!
38
Importance of Local Aggregation

Ideal scaling characteristics:



Why can’t we achieve this?



Twice the data, twice the running time
Twice the resources, half the running time
Synchronization requires communication
Communication kills performance
Thus… avoid communication!


Reduce intermediate data via local aggregation
Combiners can help
39
Shuffle and Sort
intermediate files
(on disk)
Mapper
merged spills
(on disk)
Combiner
circular buffer
(in memory)
Combiner
other reducers
spills (on disk)
other mappers
40
Reducer
Word Count: Baseline
What’s the impact of combiners?
41
Word Count: Version 1
Are combiners still needed?
42
Word Count: Version 2
Are combiners still needed?
43
Design Pattern for Local
Aggregation

“In-mapper combining”


Advantages



Fold the functionality of the combiner into the mapper
by preserving state across multiple map calls
Speed
Why is this faster than actual combiners?
Disadvantages


Explicit memory management required
Potential for order-dependent bugs
44
Combiner Design

Combiners and reducers share same method
signature



Remember: combiner are optional optimizations



Sometimes, reducers can serve as combiners
Often, not…
Should not affect algorithm correctness
May be run 0, 1, or multiple times
Example: find average of all integers associated
with the same key
45
Computing the Mean: Version 1
Why can’t we use reducer as combiner?
46
Computing the Mean: Version 2
Why doesn’t this work?
47
Computing the Mean: Version 3
Fixed?
48
Computing the Mean: Version 4
Are combiners still needed?
49
Algorithm Design: Running
Example

Term co-occurrence matrix for a text collection



M = N x N matrix (N = vocabulary size)
Mij: number of times i and j co-occur in some context
(for concreteness, let’s say context = sentence)
Why?


Distributional profiles as a way of measuring semantic
distance
Semantic distance useful for many language
processing tasks
50
MapReduce: Large Counting
Problems

Term co-occurrence matrix for a text collection
= specific instance of a large counting problem




A large event space (number of terms)
A large number of observations (the collection itself)
Goal: keep track of interesting statistics about the
events
Basic approach


Mappers generate partial counts
Reducers aggregate partial counts
How do we aggregate partial counts efficiently?
51
First Try: “Pairs”

Each mapper takes a sentence:




Generate all co-occurring term pairs
For all pairs, emit (a, b) → count
Reducers sum up counts associated with these
pairs
Use combiners!
52
Pairs: Pseudo-Code
53
“Pairs” Analysis

Advantages


Easy to implement, easy to understand
Disadvantages


Lots of pairs to sort and shuffle around (upper bound?)
Not many opportunities for combiners to work
54
Another Try: “Stripes”

Idea: group together pairs into an associative array
(a, b) → 1
(a, c) → 2
(a, d) → 5
(a, e) → 3
(a, f) → 2

Each mapper takes a sentence:



a → { b: 1, c: 2, d: 5, e: 3, f: 2 }
Generate all co-occurring term pairs
For each term, emit a → { b: countb, c: countc, d: countd … }
Reducers perform element-wise sum of associative arrays
+
a → { b: 1,
d: 5, e: 3 }
a → { b: 1, c: 2, d: 2,
f: 2 }
a → { b: 2, c: 2, d: 7, e: 3, f: 2 }
55
Stripes: Pseudo-Code
56
“Stripes” Analysis

Advantages



Far less sorting and shuffling of key-value pairs
Can make better use of combiners
Disadvantages



More difficult to implement
Underlying object more heavyweight
Fundamental limitation in terms of size of event space
57
Cluster size: 38 cores
Data Source: Associated Press Worldstream (APW) of the English Gigaword Corpus (v3),
which contains 2.27 million documents (1.8 GB compressed, 5.7 GB uncompressed)
58
59
Relative Frequencies

How do we estimate relative frequencies from
counts?
count ( A, B)
f ( B | A) 

count ( A)
count ( A, B)
 count ( A, B' )
B'


Why do we want to do this?
How do we do this with MapReduce?
60
f(B|A): “Stripes”
a → {b1:3, b2 :12, b3 :7, b4 :1, … }

Easy!


One pass to compute (a, *)
Another pass to directly compute f(B|A)
61
f(B|A): “Pairs”
(a, *) → 32
Reducer holds this value in memory
(a, b1) → 3
(a, b2) → 12
(a, b3) → 7
(a, b4) → 1
…

(a, b1) → 3 / 32
(a, b2) → 12 / 32
(a, b3) → 7 / 32
(a, b4) → 1 / 32
…
For this to work:




Must emit extra (a, *) for every bn in mapper
Must make sure all a’s get sent to same reducer (use
partitioner)
Must make sure (a, *) comes first (define sort order)
Must hold state in reducer across different key-value
pairs
62
“Order Inversion”

Common design pattern





Computing relative frequencies requires marginal
counts
But marginal cannot be computed until you see all
counts
Buffering is a bad idea!
Trick: getting the marginal counts to arrive at the
reducer before the joint counts
Optimizations


Apply in-memory combining pattern to accumulate
marginal counts
Should we apply combiners?
63
Synchronization: Pairs vs. Stripes

Approach 1: turn synchronization into an ordering
problem





Sort keys into correct order of computation
Partition key space so that each reducer gets the appropriate set
of partial results
Hold state in reducer across multiple key-value pairs to perform
computation
Illustrated by the “pairs” approach
Approach 2: construct data structures that bring partial
results together


Each reducer receives all the data it needs to complete the
computation
Illustrated by the “stripes” approach
64
Secondary Sorting

MapReduce sorts input to reducers by key


Values may be arbitrarily ordered
What if want to sort value also?

E.g., k → (v1, r), (v3, r), (v4, r), (v8, r)…
65
Secondary Sorting: Solutions

Solution 1:



Buffer values in memory, then sort
Why is this a bad idea?
Solution 2:




“Value-to-key conversion” design pattern: form
composite intermediate key, (k, v1)
Let execution framework do the sorting
Preserve state across multiple key-value pairs to
handle processing
Anything else we need to do?
66
Recap: Tools for Synchronization

Cleverly-constructed data structures


Sort order of intermediate keys


Control order in which reducers process keys
Partitioner


Bring data together
Control which reducer processes which keys
Preserving state in mappers and reducers

Capture dependencies across multiple keys and values
67
Issues and Tradeoffs

Number of key-value pairs



Size of each key-value pair


Object creation overhead
Time for sorting and shuffling pairs across the network
De/serialization overhead
Local aggregation




Opportunities to perform local aggregation varies
Combiners make a big difference
Combiners vs. in-mapper combining
RAM vs. disk vs. network
68
Debugging at Scale

Works on small datasets, won’t scale… why?




Memory management issues (buffering and object
creation)
Too much intermediate data
Mangled input records
Real-world data is messy!




Word count: how many unique words in Wikipedia?
There’s no such thing as “consistent data”
Watch out for corner cases
Isolate unexpected behavior, bring local
69
Hadoop and Java Practice
Basic Hadoop API*(0.20.0)

Mapper




Reducer/Combiner



map(KEYIN key, VALUEIN value, Mapper.Context context)
setup(Mapper.Context context)
cleanup(Mapper.Context context)
reduce(KEYIN key, Iterable<VALUEIN> values,
Reducer.Context context)
Setup/cleanup
Partitioner

getPartition(KEY key, VALUE value, int numPartitions)
*Note: forthcoming API changes…
71
Data Types in Hadoop
Writable
WritableComprable
IntWritable
LongWritable
Text
…
SequenceFiles
Defines a de/serialization protocol.
Every data type in Hadoop is a Writable.
Defines a sort order. All keys must be
of this type (but not values).
Concrete classes for different data types.
Binary encoded of a sequence of
key/value pairs
72
Complex Data Types in Hadoop


How do you implement complex data types?
The easiest way:




Encoded it as Text, e.g., (a, b) = “a:b”
Use regular expressions to parse and extract data
Works, but pretty hack-ish
The hard way:



Define a custom implementation of WritableComprable
Must implement: readFields, write, compareTo
Computationally efficient, but slow for rapid prototyping
73
Basic Cluster Components

One of each:



Namenode (NN)
Jobtracker (JT)
Set of each per slave machine:


Tasktracker (TT)
Datanode (DN)
74
Putting everything together…
namenode
job submission node
namenode daemon
jobtracker
tasktracker
tasktracker
tasktracker
datanode daemon
datanode daemon
datanode daemon
Linux file system
Linux file system
Linux file system
…
slave node
…
slave node
75
…
slave node
Anatomy of a Job

MapReduce program in Hadoop = Hadoop job




Jobs are divided into map and reduce tasks
An instance of running a task is called a task attempt
Multiple jobs can be composed into a workflow
Job submission process






Client (i.e., driver program) creates a job, configures it, and
submits it to job tracker
JobClient computes input splits (on client end)
Job data (jar, configuration XML) are sent to JobTracker
JobTracker puts job data in shared location, enqueues tasks
TaskTrackers poll for tasks
Off to the races…
76
InputFormat
Input File
Input File
InputSplit
InputSplit
InputSplit
InputSplit
InputSplit
RecordReader
RecordReader
RecordReader
RecordReader
RecordReader
Mapper
Mapper
Mapper
Mapper
Mapper
Intermediates
Intermediates
Intermediates
Intermediates
Intermediates
Source: redrawn from a slide by Cloduera, cc-licensed
Mapper
Mapper
Mapper
Mapper
Mapper
Intermediates
Intermediates
Intermediates
Intermediates
Intermediates
Partitioner
Partitioner
Partitioner
Partitioner
Partitioner
(combiners omitted here)
Intermediates
Intermediates
Intermediates
Reducer
Reducer
Reduce
Source: redrawn from a slide by Cloduera, cc-licensed
OutputFormat
Reducer
Reducer
Reduce
RecordWriter
RecordWriter
RecordWriter
Output File
Output File
Output File
Source: redrawn from a slide by Cloduera, cc-licensed
Input and Output

InputFormat:





TextInputFormat
KeyValueTextInputFormat
SequenceFileInputFormat
…
OutputFormat:



TextOutputFormat
SequenceFileOutputFormat
…
80
Shuffle and Sort in Hadoop


Probably the most complex aspect of MapReduce!
Map side




Map outputs are buffered in memory in a circular buffer
When buffer reaches threshold, contents are “spilled” to disk
Spills merged in a single, partitioned file (sorted within each
partition): combiner runs here
Reduce side



First, map outputs are copied over to reducer machine
“Sort” is a multi-pass merge of map outputs (happens in memory
and on disk): combiner runs here
Final merge pass goes directly into reducer
81
Q&A
What is Hugs?



An interpreter for Haskell, and the most widely
used implementation of the language;
An interactive system, which is well-suited for
teaching and prototyping purposes;
Hugs is freely available from:
www.haskell.org/hugs
83
The Standard Prelude
When Hugs is started it first loads the library file
Prelude.hs, and then repeatedly prompts the user
for an expression to be evaluated.
For example:
> 2+3*4
14
> (2+3)*4
20
84
The standard prelude also provides many useful
functions that operate on lists. For example:
> length [1,2,3,4]
4
> product [1,2,3,4]
24
> take 3 [1,2,3,4,5]
[1,2,3]
85
Function Application
In mathematics, function application is denoted
using parentheses, and multiplication is often
denoted using juxtaposition or space.
f(a,b) + c d
Apply the function f to a and b, and add
the result to the product of c and d.
86
In Haskell, function application is denoted using
space, and multiplication is denoted using *.
f a b + c*d
As previously, but in Haskell syntax.
87