right sub-problems
Download
Report
Transcript right sub-problems
A Randomized Linear-Time
Algorithm to Find Minimum
Spaning Trees
David R. Karger
Philip N. Klein
Robert E. Tarjan
黃則翰
蘇承祖
張紘睿
許智程
戴于晉
R96922141
R96922077
R96922136
D95922022
R96922171
Outline
Introduction
Basic Property & Definition
Algorithm
Analysis
Outline
Introduction
Basic Property & Definition
Algorithm
Analysis
Introduction
[Borůvka 1962] O(m log n)
Gabow et al.[1984] O(m log β(m,n) )
◦ β(m,n)= min { i |log(i)n <= m/n}
Verification algorithm
◦ King[1993] O(m)
A randomize algorithm runs in O(m) time with
high probability
Outline
Introduction
Basic Property & Definition
Algorithm
Analysis
Cycle property
For any cycle C in a graph, the heaviest edge in
C dose not appear in the minimum spanning
forest.
3
2
3
5
6
2
5
6
Cut Property
For any proper nonempty subset X of the
vertices, the lightest edge with exactly one
endpoint in X belongs to the minimum
spanning tree
5
2
3
6
X
Definition
Let G be a graph with weighted edges.
◦ w(x,y)
The weight of edge {x,y}
If F is a forest of a subgraph in G
◦ F(x, y)
the path (if any) connecting x and y in F
◦ wF(x, y)
the maximum weight of an edge on F(x, y)
◦ wF(x, y)=∞
If x and y are not connected in F
F-heavy & F-light
An edge {x,y} is F-heavy if w(x,y) > wF(x,y)
and F-light otherwise
Edge of F are all F-light
A
3
C
2
E
5
D
B
W(C,D)=5
WF(C,D)=5
F-light
G
2
5
H
F
6
W(B,D)=6
WF(B,D)=max{2,3,5}
F-heavy
3
6
W(F,H)=6
WF(F,H)= ∞
F-light
Observation
No F-heavy edge can be in the minimum
spanning forest of G (cycle property)
Discard edge that cannot be in the minimum
spanning tree
F-light edge can be the candidate edge for the
minimum spanning tree of G
Outline
Introduction
Basic Property & Definition
Algorithm
Analysis
Boruvka Algorithm
For each vertex, select the minimum-weight
edge incident to the vertex.
Replace by a single vertex each connected
component defined by the selected edges.
Delete all resulting isolated vertices, loops, and
all but the lowest-weight edge among each set
of multiple edges.
Algorithm Step1
Apply two successive Boruvka steps to the
graph, thereby reducing the number of vertices
by at least a factor of four.
Algorithm Step2
Choose a subgraph H by selecting each edge
independently with probability ½.
Apply the algorithm recursively to H, producing
a minimum spanning forest F of H.
Find all the F-heavy edges and delete them from
the contracted graph.
Algorithm Step3
Apply the algorithm recursively to the
remaining graph to compute a spanning forest
F’. Return those edges contracted in Step1
together with the edges of F’.
Original Problem
G
Boruvka × 2
Sample with
p=0.5
Left
Subproblem
G*
H
Return
minimum forest
F of H
F’
G’
Right
Subproblem
Delete F-heavy
edges from G*
Correctness
By the cut property, every edge contracted
during Step1 is in the MSF.
By the cycle property, the edges deleted in
Step2 do NOT belong to the MSF.
By the induction hypothesis, the MSF of the
remaining graph is correctly determined in the
recursive call of Step3.
Candidate Edge of MST
The expected number of F-light edges in G is at
most n/p (negative binomial)
For every sample graph H, the expected
candidate edge for MST in G is at most n/p (Flight edge)
Random-sampling
To help discard some edge that cannot be in the
minimum spanning tree
Construct the sample graph H
◦ Process the edges in increasing order
◦ To process an edge e
◦ 1. Test whether both endpoints of e in same
component
◦ 2. Include the edge in H with probability p
◦ 3. If e is in H and is F-light, add e to the Forest
F
Random-sampling
5
C
H
G
E
4
14
A
6
3
7
D
B
10
9
5
C
13
F
F
E
4
14
A
6
3
7
B
G
11
D
10
9
G
11
F
13
W(E,G)=14
WF(E,G)=max{5,6,9,13}
F-heavy
W(E,F)=11
WF(E,F)=max{5,6,9}
F-heavy
W(D,F)=9
WF(D,F)=9
F-light
W(A,B)=7
WF(A,B)= ∞
F-light
Random-sampling
G
5
C
E
4
14
A
6
3
7
D
B
G
G
10
9
5
C
11
F
F
1. Random select edges to H
2. Find F of H
E
4
14
A
6
3
7 B
13
D
G
10
9
11
F
1. Increasing Order
2. If F-light
Throw
If
Select
3. Else
Throw
Don’t select
13
Observation
No F-heavy edge can be in the minimum
spanning forest of G (cycle property)
F-light edge can be the candidate edge for the
minimum spanning tree of G
The forest F produced is the forest that would
be produced by Kruskal and inlcude all possible
MSF of G
Observation
The size of F is at most n-1
The expected number of F-light edges in G is at
most n/p (negative binomial)
k n 1 n
p (1 p) k
f (k ; n; p)
k
k (n 1) n 1
p (1 p) k p
k
1 p
p
1 p
n
n
n
Expected n =
p
p
Mean k = n
Outline
Introduction
Basic Property & Definition
Algorithm
Analysis
Analysis of the Algorithm
The worst case.
The expectations running time.
The probability of the expectations running
time.
Running time Analysis
Total running time= running time in each steps.
Step(1): 2 steps Boruvka’s algorithm
Step(2):Dixon-Rauch-Tarjan verification
algorithm.
All takes linear time to the number of edges.
◦ Estimate the total number of edges.
Observe the recursion tree
G=(V,E) |V| = n, |E|=m .
◦ m≧n/2 since there is no isolate vertices.
Each problem generates at most 2 subproblems.
◦ At depth d, there is at most 2d nodes.
◦ Each node in depth d has at most n/4d
vertices.
The depth d is at most log4n.
◦ There are at most vertices in all subproblems
d 0 2d n / 4d d 0 n / 2d 2n
The worst case
Theorem 4.1 The worst-case running time of
the minimum-spanning-forest algorithm is
O(min{n2,m log n}), the same as the bound for
Boruvka’s algorithm.
Proof: There is two different estimate ways.
1. A subproblem at depth contains at most
(n/4d)2/2 edges.
Total edges in all subproblems is:
log4 n
d 0
(n / 4d ) 2 d
2 O( n 2 )
2
The worst case
2. Consider a subprolbem G=(V,E) after step(1),
we have a G’=(V’ ,E’),|E’|≦|E| - |V|/2, |V’| ≦|V|/4
Edges in left-child = |H|
Edges in right-child ≦ |E’| - |H| + |F|
so edges in two subproblem is less then:
(|H|) + (|E’| - |H| + |F|)
=|E’| +|F|≦|E|-|V|/2 + |V|/4≦|E|
The two sub problem at most contains |E|
edges.
The worst case
m edges
m edges
log n
m edges
m edges
The worst case
The depth is at most log4n and each level has at
most m edges, so there are at most (m log n)
edges.
The worst-case running time of the minimumspanning-forest algorithm is O(min{n2,m log n}).
Analysis of the Algorithm
The worst case.
The expectations running time.
The probability of the expectations running
time.
Analysis – Average Case (1/8)
Theorem: the expected running time of the
minimum spanning forest algorithm is O(m)
◦ Calculating the expected total number of edges
for all left path problems
Original Problem
Left Sub-problem
Left Subsub-problem
Right Sub-problem
Right Subsub-problem
Analysis – Average Case (2/8)
Calculating the expected total edge number
for one left path started at one problem
with m’ edges
Evaluating the total edge number for all
right sub-problems
# of edges = m’
Expected total
edge number ≤ 2m’
Analysis – Average Case (3/8)
Calculating the expected total edge number for
one left path started at one problem with m’
edges
E[edge number of H] = 0.5 × edge number
of G*
2. ∵ Boruvka × 2
∴ edge number of G* ≤ edge number of
G
E[edge number of H] ≤ 0.5 × edge number
of G
Original Problem 1.
G
Boruvka × 2
G*
Sample with
p=0.5
H
Left
Sub-problem
G’
Right
Sub-problem
Analysis – Average Case (4/8)
Calculating the expected total edge number for
one left path started at one problem with m’
edges
Original Problem
G
Boruvka × 2
G*
# of edges = m’
Sample with
p=0.5
H
Left
Sub-problem
G’
E[edge number of H] ≤ 0.5 × edge number
# of edges ≤ 0.5 ×
of G
m’
Right
Sub-problem
Expected total edge number ≤
2m’
=
Analysis – Average Case (5/8)
Calculating the expected total edge number for
one left path L started at one problem with
m’ edges
K.O.
◦ Expected total edge number on L ≤ 2m’
• Evaluating the total edge number of all right
sub-problems
• E[total edges of all right sub-problem] ≤ n
Analysis – Average Case (6/8)
Evaluating the total edge number for all right subproblems
◦ To prove : E[total edges of all right sub-problem] ≤ n
Original Problem 1. ∵ Boruvka × 2
∴ vertex number of G*
G
≤ 0.25 × vertex number of G
Boruvka × 2
2. Based on lemma 2.1:
G*
E[edge number of G’] ≤ 2 × vertex number of G*
Sample with
p=0.5
H
Left
Sub-problem
Return
minimum forest
F of H
G’
E[edge number of G’] ≤ 0.5×vertex number of G
Right
Sub-problem
Delete F-heavy
edges from G*
Analysis – Average Case (7/8)
Evaluating the total edge number for all right subproblems
◦ To prove : E[total edges of all right sub-problem] ≤ n
Original Problem
G
=n
Boruvka × 2
G*
# of vertices of originalproblems=n
# of vertices of subproblems ≤ 2×n/4
Sample with
p=0.5
H
Left
Sub-problem
G’
E[edge
Right
Sub-problem
# of edges of right
sub-problems ≤ n/2
number
of G’] of
≤ sub0.5×vertex
number
# of edges
of right of G
# of vertices
sub-problems ≤ 2×n/8
problems ≤ 4×n/42
# of edges of right sub# of vertices of sub3
problems ≤ 4×n/(42×2)
problems ≤ 8×n/4
# of edges of right sub# of vertices of subproblems ≤ 8×n/(43×2)
problems ≤ 16×n/44
Analysis – Average Case (8/8)
Calculating the expected total edge number for one
left path started at one problem with m’ edges
◦ Expected total edge number for one left path ≤
2m’
Evaluating the total edge number for all right subproblems
◦ E[total edges of all right sub-problem] ≤ n
# of edges = m’
Expected
totalall sub-problems]
E[processed edges in the original problem
and
edge number ≤ 2m’
=2×(m+n)
Analysis of the Algorithm
The worst case.
The expectations running time.
The probability of the expectations running
time.
The Probability of Linearity
Theorem 4.3
◦ The minimum spanning forest algorithm runs
in Ο(m) time with probability 1 – exp(-Ω(m))
The Probability of Linearity
Chernoff Bound:
Given xi as i.d.d. random variables and 0< i n, and X is
the sum of all xi, for t > 0, we have
n
PrX A e At E e tXi
i1
Thus, the probability that less than s successes (each
with chance p) within k trials is
k
PrX s e st E e tXi
i1
e st (pe t )k
e Ω ( s ) , for t 1 2 and p
1
2
The Probability of Linearity
Right Subproblems
◦ At most the number of vertices in all right
subproblems: n/2 ( proved by theorem 4.2 )
◦ n/2 is the upper bound on the total number
of heads in nickel-flips
Right Subproblems
The probability
◦ It occurs fewer than n/2 heads in a sequence
of 3m nickel-tosses
m + n ≦ 3m since n/2 ≦ m
The probability is exp (-Ω(m)) by a
Chernoff bound
The Probability of Linearity
Left Subproblem
◦ Sequence: every sequence ends up with a tail,
that is, HH…HHT
◦ The number of occurrences of tails is at most
the number of sequences
◦ Assume that there are at most m’ edges in
the root problem and in all right subproblems
Left Subproblems
The probability
◦ It occurs m’ tails in a sequence of more than
3m’ coin-tosses
The probability is exp (-Ω(m)) by a
Chernoff bound
The Probability of Linearity
Combining Right & Left Subproblems
◦ The total number of edges is Ο(m) with a
high-probability bound 1 – exp(-Ω(m))