right sub-problems

Download Report

Transcript right sub-problems

A Randomized Linear-Time
Algorithm to Find Minimum
Spaning Trees
David R. Karger
Philip N. Klein
Robert E. Tarjan
黃則翰
蘇承祖
張紘睿
許智程
戴于晉
R96922141
R96922077
R96922136
D95922022
R96922171
Outline
Introduction
 Basic Property & Definition
 Algorithm
 Analysis

Outline
Introduction
 Basic Property & Definition
 Algorithm
 Analysis

Introduction
[Borůvka 1962] O(m log n)
 Gabow et al.[1984] O(m log β(m,n) )
◦ β(m,n)= min { i |log(i)n <= m/n}


Verification algorithm
◦ King[1993] O(m)

A randomize algorithm runs in O(m) time with
high probability
Outline
Introduction
 Basic Property & Definition
 Algorithm
 Analysis

Cycle property

For any cycle C in a graph, the heaviest edge in
C dose not appear in the minimum spanning
forest.
3
2
3
5
6
2
5
6
Cut Property

For any proper nonempty subset X of the
vertices, the lightest edge with exactly one
endpoint in X belongs to the minimum
spanning tree
5
2
3
6
X
Definition

Let G be a graph with weighted edges.
◦ w(x,y)
 The weight of edge {x,y}

If F is a forest of a subgraph in G
◦ F(x, y)
 the path (if any) connecting x and y in F
◦ wF(x, y)
 the maximum weight of an edge on F(x, y)
◦ wF(x, y)=∞
 If x and y are not connected in F
F-heavy & F-light

An edge {x,y} is F-heavy if w(x,y) > wF(x,y)
and F-light otherwise

Edge of F are all F-light
A
3
C
2
E
5
D
B
W(C,D)=5
WF(C,D)=5
F-light
G
2
5
H
F
6
W(B,D)=6
WF(B,D)=max{2,3,5}
F-heavy
3
6
W(F,H)=6
WF(F,H)= ∞
F-light
Observation
No F-heavy edge can be in the minimum
spanning forest of G (cycle property)
 Discard edge that cannot be in the minimum
spanning tree


F-light edge can be the candidate edge for the
minimum spanning tree of G
Outline
Introduction
 Basic Property & Definition
 Algorithm
 Analysis

Boruvka Algorithm
For each vertex, select the minimum-weight
edge incident to the vertex.
 Replace by a single vertex each connected
component defined by the selected edges.
 Delete all resulting isolated vertices, loops, and
all but the lowest-weight edge among each set
of multiple edges.

Algorithm Step1

Apply two successive Boruvka steps to the
graph, thereby reducing the number of vertices
by at least a factor of four.
Algorithm Step2
Choose a subgraph H by selecting each edge
independently with probability ½.
 Apply the algorithm recursively to H, producing
a minimum spanning forest F of H.
 Find all the F-heavy edges and delete them from
the contracted graph.

Algorithm Step3

Apply the algorithm recursively to the
remaining graph to compute a spanning forest
F’. Return those edges contracted in Step1
together with the edges of F’.
Original Problem
G
Boruvka × 2
Sample with
p=0.5
Left
Subproblem
G*
H
Return
minimum forest
F of H
F’
G’
Right
Subproblem
Delete F-heavy
edges from G*
Correctness
By the cut property, every edge contracted
during Step1 is in the MSF.
 By the cycle property, the edges deleted in
Step2 do NOT belong to the MSF.
 By the induction hypothesis, the MSF of the
remaining graph is correctly determined in the
recursive call of Step3.

Candidate Edge of MST

The expected number of F-light edges in G is at
most n/p (negative binomial)

For every sample graph H, the expected
candidate edge for MST in G is at most n/p (Flight edge)
Random-sampling
To help discard some edge that cannot be in the
minimum spanning tree
 Construct the sample graph H
◦ Process the edges in increasing order
◦ To process an edge e
◦ 1. Test whether both endpoints of e in same
component
◦ 2. Include the edge in H with probability p
◦ 3. If e is in H and is F-light, add e to the Forest
F

Random-sampling
5
C
H
G
E
4
14
A
6
3
7
D
B
10
9
5
C
13
F
F
E
4
14
A
6
3
7
B
G
11
D
10
9
G
11
F
13
W(E,G)=14
WF(E,G)=max{5,6,9,13}
F-heavy
W(E,F)=11
WF(E,F)=max{5,6,9}
F-heavy
W(D,F)=9
WF(D,F)=9
F-light
W(A,B)=7
WF(A,B)= ∞
F-light
Random-sampling
G
5
C
E
4
14
A
6
3
7
D
B
G
G
10
9
5
C
11
F
F
1. Random select edges to H
2. Find F of H
E
4
14
A
6
3
7 B
13
D
G
10
9
11
F
1. Increasing Order
2. If F-light
Throw
If
Select
3. Else
Throw
Don’t select
13
Observation

No F-heavy edge can be in the minimum
spanning forest of G (cycle property)

F-light edge can be the candidate edge for the
minimum spanning tree of G

The forest F produced is the forest that would
be produced by Kruskal and inlcude all possible
MSF of G
Observation
The size of F is at most n-1
 The expected number of F-light edges in G is at
most n/p (negative binomial)

 k  n  1 n
 p (1  p) k
f (k ; n; p)  
 k 
 k  (n  1)  n 1
 p (1  p) k p
 
k


1 p
p
1 p
n
n

n

Expected n =
p
p
Mean k = n
Outline
Introduction
 Basic Property & Definition
 Algorithm
 Analysis

Analysis of the Algorithm
The worst case.
 The expectations running time.
 The probability of the expectations running
time.

Running time Analysis
Total running time= running time in each steps.
 Step(1): 2 steps Boruvka’s algorithm
 Step(2):Dixon-Rauch-Tarjan verification
algorithm.
 All takes linear time to the number of edges.
◦ Estimate the total number of edges.

Observe the recursion tree
G=(V,E) |V| = n, |E|=m .
◦ m≧n/2 since there is no isolate vertices.
 Each problem generates at most 2 subproblems.
◦ At depth d, there is at most 2d nodes.
◦ Each node in depth d has at most n/4d
vertices.
 The depth d is at most log4n.
◦ There are at most vertices in all subproblems

d 0 2d n / 4d  d 0 n / 2d  2n


The worst case
Theorem 4.1 The worst-case running time of
the minimum-spanning-forest algorithm is
O(min{n2,m log n}), the same as the bound for
Boruvka’s algorithm.

Proof: There is two different estimate ways.
1. A subproblem at depth contains at most
(n/4d)2/2 edges.
 Total edges in all subproblems is:


log4 n
d 0
(n / 4d ) 2 d
2  O( n 2 )
2
The worst case
2. Consider a subprolbem G=(V,E) after step(1),
we have a G’=(V’ ,E’),|E’|≦|E| - |V|/2, |V’| ≦|V|/4
Edges in left-child = |H|
Edges in right-child ≦ |E’| - |H| + |F|
so edges in two subproblem is less then:
(|H|) + (|E’| - |H| + |F|)
=|E’| +|F|≦|E|-|V|/2 + |V|/4≦|E|
The two sub problem at most contains |E|
edges.
The worst case
m edges
 m edges
log n
 m edges
 m edges
The worst case
The depth is at most log4n and each level has at
most m edges, so there are at most (m log n)
edges.
 The worst-case running time of the minimumspanning-forest algorithm is O(min{n2,m log n}).

Analysis of the Algorithm
The worst case.
 The expectations running time.
 The probability of the expectations running
time.

Analysis – Average Case (1/8)

Theorem: the expected running time of the
minimum spanning forest algorithm is O(m)
◦ Calculating the expected total number of edges
for all left path problems
Original Problem
Left Sub-problem
Left Subsub-problem
Right Sub-problem
Right Subsub-problem
Analysis – Average Case (2/8)
Calculating the expected total edge number
for one left path started at one problem
with m’ edges
 Evaluating the total edge number for all
right sub-problems
# of edges = m’

Expected total
edge number ≤ 2m’
Analysis – Average Case (3/8)

Calculating the expected total edge number for
one left path started at one problem with m’
edges
E[edge number of H] = 0.5 × edge number
of G*
2. ∵ Boruvka × 2
∴ edge number of G* ≤ edge number of
G
E[edge number of H] ≤ 0.5 × edge number
of G
Original Problem 1.
G
Boruvka × 2
G*
Sample with
p=0.5
H
Left
Sub-problem
G’
Right
Sub-problem
Analysis – Average Case (4/8)

Calculating the expected total edge number for
one left path started at one problem with m’
edges
Original Problem
G
Boruvka × 2
G*
# of edges = m’
Sample with
p=0.5
H
Left
Sub-problem
G’
E[edge number of H] ≤ 0.5 × edge number
# of edges ≤ 0.5 ×
of G
m’
Right
Sub-problem
Expected total edge number ≤
2m’
=
Analysis – Average Case (5/8)

Calculating the expected total edge number for
one left path L started at one problem with
m’ edges
K.O.
◦ Expected total edge number on L ≤ 2m’
• Evaluating the total edge number of all right
sub-problems
• E[total edges of all right sub-problem] ≤ n
Analysis – Average Case (6/8)

Evaluating the total edge number for all right subproblems
◦ To prove : E[total edges of all right sub-problem] ≤ n
Original Problem 1. ∵ Boruvka × 2
∴ vertex number of G*
G
≤ 0.25 × vertex number of G
Boruvka × 2
2. Based on lemma 2.1:
G*
E[edge number of G’] ≤ 2 × vertex number of G*
Sample with
p=0.5
H
Left
Sub-problem
Return
minimum forest
F of H
G’
E[edge number of G’] ≤ 0.5×vertex number of G
Right
Sub-problem
Delete F-heavy
edges from G*
Analysis – Average Case (7/8)

Evaluating the total edge number for all right subproblems
◦ To prove : E[total edges of all right sub-problem] ≤ n
Original Problem
G
=n
Boruvka × 2
G*
# of vertices of originalproblems=n
# of vertices of subproblems ≤ 2×n/4
Sample with
p=0.5
H
Left
Sub-problem
G’
E[edge
Right
Sub-problem
# of edges of right
sub-problems ≤ n/2
number
of G’] of
≤ sub0.5×vertex
number
# of edges
of right of G
# of vertices
sub-problems ≤ 2×n/8
problems ≤ 4×n/42
# of edges of right sub# of vertices of sub3
problems ≤ 4×n/(42×2)
problems ≤ 8×n/4
# of edges of right sub# of vertices of subproblems ≤ 8×n/(43×2)
problems ≤ 16×n/44
Analysis – Average Case (8/8)

Calculating the expected total edge number for one
left path started at one problem with m’ edges
◦ Expected total edge number for one left path ≤

2m’
Evaluating the total edge number for all right subproblems
◦ E[total edges of all right sub-problem] ≤ n
# of edges = m’
Expected
totalall sub-problems]
E[processed edges in the original problem
and
edge number ≤ 2m’
=2×(m+n)
Analysis of the Algorithm
The worst case.
 The expectations running time.
 The probability of the expectations running
time.

The Probability of Linearity

Theorem 4.3
◦ The minimum spanning forest algorithm runs
in Ο(m) time with probability 1 – exp(-Ω(m))
The Probability of Linearity
Chernoff Bound:
Given xi as i.d.d. random variables and 0< i  n, and X is
the sum of all xi, for t > 0, we have
n
 
PrX  A   e  At  E e tXi
i1
Thus, the probability that less than s successes (each
with chance p) within k trials is
k
 
PrX  s   e st  E e tXi
i1
 e st  (pe t )k
 e  Ω ( s ) , for t  1 2 and p 
1
2
The Probability of Linearity

Right Subproblems
◦ At most the number of vertices in all right
subproblems: n/2 ( proved by theorem 4.2 )
◦ n/2 is the upper bound on the total number
of heads in nickel-flips
Right Subproblems

The probability
◦ It occurs fewer than n/2 heads in a sequence
of 3m nickel-tosses
m + n ≦ 3m since n/2 ≦ m
 The probability is exp (-Ω(m)) by a
Chernoff bound

The Probability of Linearity

Left Subproblem
◦ Sequence: every sequence ends up with a tail,
that is, HH…HHT
◦ The number of occurrences of tails is at most
the number of sequences
◦ Assume that there are at most m’ edges in
the root problem and in all right subproblems
Left Subproblems
The probability
◦ It occurs m’ tails in a sequence of more than
3m’ coin-tosses

The probability is exp (-Ω(m)) by a
Chernoff bound
The Probability of Linearity

Combining Right & Left Subproblems
◦ The total number of edges is Ο(m) with a
high-probability bound 1 – exp(-Ω(m))