No Slide Title

Download Report

Transcript No Slide Title

Approximating the MST Weight in
Sublinear Time
Bernard Chazelle (Princeton)
Ronitt Rubinfeld (NEC)
Luca Trevisan (U.C. Berkeley)
Sublinear Time Algorithms
• Make sense for problems on very large data
sets
• Go contrary to common intuition that “an
algorithm must be given at least enough time to
read all the input”
• Must be probabilistic
• Must be approximate
Approximation
• For decision problems:
the output is the correct answer either for the
given input, or at least for some other input
“close” to it.
(Property Testing)
• For optimization problems:
the output is a number that is close to the
cost of the optimal solution for the given
input.
(There is not enough time to construct a
solution)
Previous Examples
• The cost of the max cut in a graph with n nodes
and cn2 edges can be approximated to within a
factor e in time 2poly(1/ec).
(Goldreich, Goldwasser, Ron)
• Other results for “dense” instances of optimization
problems, for low-rank approximation of
matrices, . . .
• No results (that we know of) for problems on
bounded-degree graphs.
Our Result
• Given a connected weighted graph G, with
maximum degree d and with weights in the
range {1, . . . , w},
• we can compute the weight of the minimum
spanning tree of G to within a factor of e in time
O(dwe-2log w/e);
• we also prove that it is necessary to look at
W(dwe-2) entries in the representation of G.
(We assume that G is represented using adjacency lists)
Main Intuition
• Suppose all weights are 1 or 2
• Then the MST weight is equal to
n – 2 + # of conn. comp. induced by weight-1 edges
weight 1
weight 2
connected components
Induced by weight-1 edges
MST
Algorithm
Algorithm for weights in {1,2}
• To approximate the MST weight to within a
multiplicative factor (1+e) it’s enough to
approximate c1 to within an additive factor en
(c1:= # of connected components induced by
weight-1 edges)
• To approximate c1 we use ideas from
Goldreich-Ron (property testing of
connectivity)
• The algorithm runs in time O(de-2loge-1)
Approximating # of connected
components
• Given a graph G of max degree d with n nodes
we want to compute c, the number of connected
components of G up to an additive error en.
• For every vertex u, define
nu := 1 / size of component of u
• Then
c = Su nu
• And if we call
au:= max {nu, e }
• Then
c = Su au  en
Wrapping up the analysis
• Can estimate summation of au using sampling
• Once we pick a vertex u at random, the value
au can be computed in time O(d/e)
• We need to pick O(1/e2) vertices, so we get
running time O(d/e3)
Algorithm
CC-APPROX(e)
Repeat O(1/e2) times
pick a random vertex v
do a BFS from v, stopping after 2/e steps
b:= 1 / number of visited vertices
return (average of the values b) * n
Improved Algorithm
CC-APPROX(e, W)
Repeat O(1/e2) times
pick a random vertex v
do first step of a BSF from v
b:=0; t:=1
(*) flip a coin
If heads, and visited <W nodes so far
t:=2*t
continue BSF until ends or t nodes are visited
if BSF ends, b:= 2#random coins / nodes visited
else go to (*)
return (average of the values b) * n
• Inner procedure takes average O(dlog W) time
Analysis
• Main idea: if v is in a component of size c<W, then b
is zero with prob. ~(1 – 1/c) and ~1 with probability
~1/c. The average of b is 1/c.
• Setting W:=2/e we get
– each time, the average of b is within e/2 from the
average over v of nv
(that is, (# conn. comp.)/n)
– Repeating O(1/e2) times, the probability of
deviating by another factor e/2 is bounded by a
constant
– The average running time is O(de-2logW), that is
O(de-2log e-1).
General Weights
• Generalize argument for weight 1 and 2.
• Let
ci = # of connected components induced by
edges of weight at most i
• Then the MST weight is
n – w + Si=1,. . ., w-1 ci
Final Algorithm
• For j=1,. . ., w-1, call CC-APPROX(e,2w/e) on
the subgraph of G obtained by removing
edges of cost >j
• Get ai, an approximation of ci
• Return n – w + Si=1,. . ., w-1 ai
• Average answer is within en/2 from cost of
MST, and variance is bounded
• Total running time O(dwe-2log w/e)
Extensions
• Low average degree
• Non-integer weights
Lower Bound
Abstract sampling problem
•
•
•
•
•
Fix p,e
Define two binary distributions A,B
Pr[A=1] = p, Pr[A=0]=1-p
Pr[B=1] = p+ ep, Pr[B=0]=1-p-ep
Distinguishing A from B with constant
probability requires W(1/pe2) samples
Reduction
• Fix p = 1/w
• We consider two distributions of weights over
a cycle of length n
• In distribution G, for each edge we sample
from A; if A=0 the edge gets weight 1,
otherwise it gets weight w
• In distribution H, same with B
• H and G are likely to have MST costs that
differ by about en
• To distinguish them we need to look at
W(w/e2) edge weights
Higher Degree
• Sample from G or H as before,
– also add d-1 forward edges of weight w+1
from each vertex
– randomly permute names of vertices
• Now, on average, reading t edge weights
gives us t/d samples from A or B, so
t=W(dw/e2)
Conclusions
• A plausibility result showing that
approximation for a standard graph problem
in bounded degree (and sparse) graphs can
be achieved in time independent of number of
vertices
• Use of approximate cost without solution?
• More problems?
– The trivial Max SAT approximation algorithms can
be implemented in constant time, and give (an
implicit representation of) a solution
– Non-trivial Max SAT approximation? (say, 3/4)
– Something really useful?