Transcript Slide 1
Relative Approximations in
Geometric Hypergraphs
Esther Ezra
New York University
Range Counting
P - Set of points in Rd (objects) .
R - Family of regions in Rd (ranges) .
Goal:
Preprocess P in order to efficiently answer
counting queries.
Counting query:
Given a range R,
report the size of P .
Various Problem Settings (GIS, DB…)
Input: Cities in the USA (mapped as
points in R2 ) .
Query: How many cities are located
within a certain distance from a given
location? (disk in R2 )
Input: people with personal details, as
age, yearly income, address, etc.
(mapped as points in Rd )
Query: How many people are in the age
[35,50] and have an income of at least
50,000$ ? (axis-parallel boxes in Rd )
Approximate Range Counting
The Range Counting problem is well studied
[AM-94, AS-93, Chaz-93, Chaz-94, Mat-93,
Mat-94, MW-92,…]
Too expensive
Computing the exact count requires storing the entire set X.
If approximate count is sufficient then we can save space:
Compute a small subset Z X that approximates
| X| well for each R .
Specifically, we would like the two proportions | X| / |X| ,
| Z| / |Z| to be roughly the same.
Range Spaces
The approximate range counting is an instance of a more general
and abstract setting:
Range space (X, R) :
X – Objects
R – Ranges: Subsets of X .
|R| 2|X|
Abstract form: Hypergraphs.
X – vertices.
R – hyperedges.
Geometric Range Spaces
specification: X Rd, R = set of simply-shaped regions in Rd
X – Points on the real line.
R – Intervals.
X – Points on the plane.
R – halfplanes, disks,…
Assume X is finite, |X| = n:
|R| is polynomial in |X|, and this continues
to hold for any “projected” range space.
The approximate count
The measure of a range R is the quantity:
X( ) = | X | / |X|
Let 0 < < 1 be the error parameter. Ideally,
Equivalent to
we would like to have a relatively small
|Z( ) - X( )| X( )
subset Z X s.t.,
X( ) (1 - ) Z( ) X( ) (1 + )
But this requirement is too strong.
At the extreme, when Z = , X must also be empty!
But this property may guarantee |Z| = X | ) !
Cannot distinguish between empty/non-empty ranges
when Z is small!
Relative (p,)-Approximation
Thus, refine the definition for the approximate count by
introducing an additional parameter 0 < p < 1.
Definition:
A subset Z X is a relative (p,)-approximation for (X, R) if:
| Z( ) - X( ) | X( ) , if X( ) p , and
| Z( ) - X( ) | p , otherwise.
In particular, when Z = , X( ) does not exceed p .
How small can we make Z?
Upper bounds [LLS-01]
Theorem [Li , Long, Srinivasan 2001]:
Bound does not
depend on n.
A random sample of
O( log (1 / p) / 2 p )
elements of X is a relative (p, )-approximation for (X, R)
with constant probability.
For X( ) p
Example: Points and intervals on the real line:
Put ,|Z| = 2/p . We have | Z( ) - X( ) | p/2 X( )
p
pn
Previous Results
Points and halfplanes in 2D:
O( log 4/3 (1/p) / (4/3 p) ) [Har-Peled Sharir 2011]
Points and halfplanes in 3D:
O( log 3/2 (1/p) / (3/2 p) ) [Har-Peled Sharir 2011]
• This case is somewhat restricted
(bound corresponds to multiple
subsets of this overall size).
We improve the dependency on p
for a wider family of range spaces.
Our Result:
Well-Behaved Range Spaces
We consider “well-behaved” range spaces (X, R) :
1.
For any parameter 1 k n , the number of ranges of size
at most k is nearly-linear in n and polynomial in k.
bound is n (n) , where (n) , is
a slowly growing function.
2.
kc , for some
constant c > 0
This property holds for any “projected” range space
(X’, R’) . That is, X’ X, and R’ = { X’ | R } .
Well-Behaved Primal
Range Spaces in Geometry
Range Space
Points and halfspaces
in 2- and 3-dimensions.
Points and disks
in the plane.
# ranges of size k
O(nk), O(nk 2)
O(nk2)
Points and axis-parallel boxes O(nk 2 log n) ,
in 2- and 3-dimension.
O(nk 2 log3 n)
Involves some canonization
Points and -fat triangles
in the plane
O(nk 3 log2 n)
Each angle
Well-Behaved Dual
Range Spaces in Geometry
Range Space
# ranges of size k
Halfspaces and points in
2- and 3-dimensions.
O(nk), O(nk2)
Pseudo-disks and points
in the plane.
O(nk)
-fat triangles and points in
the plane.
O(nk log* n)
Locally -fat objects and points nk 2 O(log* n)
in the plane.
Our Result
Theorem:
For simplicity of
presentation assume
Any well-behaved range space (X, R) admits
() grows slower
a relative (p, )-approximation Z of size
than log ()
O( { log log (1 /p) + log(1 / ) } / 2 p ) ,
Constant of prop depends on degree of
polynomial in the bound on |R|
An improvement when p <
and Z can be constructed in expected polynomial time.
Corollary:
Each of the well-behaved geometric range spaces listed
above admits a relative (p, )-approximation of size
O( { log log (1 /p) + log(1 / ) } / 2 p ) .
Highlights of the Construction
F – a relative (p, )-approximation for (X, R) , F X .
T – the set of ranges projected onto F .
Remove the
dependency on n
Replace X by a F:
|F | = O( log (1/ p) / 2 p ) .
Now, compute an improved relative (p, )- approximation
for (F, T ) .
lose only a constant factor
in and in p
Since (X, R) is well-behaved, then by definition,
(F, T ) is also well-behaved.
Classifying the objects in F
An object in F is heavy if it participates in “many” ranges (of T ).
Otherwise, this object is light.
Put all of them in the
target approximation
Claim: The number of heavy objects is at most O(1 / p) << |F |
The proof follows from the fact the range space
(F, T ) is well-behaved.
Handwaving argument: Consider the case where all ranges have
size ~ k, and number of such ranges is O(|F|k). Then number of
object/range incidences is O(|F|k 2) , and then there are only
O(|F|/k) (heavy) objects that participate in > k 3 ranges.
A random sampling scheme for
the light objects
The overall majority of F consists of light objects.
Sample each light object independently with probability
:= (log log (1 /p) + log(1 / )) / (log(1/p) + log (1/ ) )
Let F1 be the resulting sample.
E[|F1 |] = O( log log (1 /p) + log(1 / ) / 2 p ) .
Goal: Show that the heavy objects and F1 comprise a relative
(p, )-approximation for (F, T) .
This relative approximation
is weighted
Analysis
The analysis for the light objects consists of two major steps.
1.
2.
Show that, for a fixed range , the probability to have a
relative error between its original measure (w.r.t. the light
objects) and its approximated measure (w.r.t. F1 ), is small.
Use standard Chernoff's bound
A
Number of events (ranges) is too large!
But they admit a relatively small “degree of dependency”.
So we can apply the (asymmetric) Lovasz Local Lemma in
order to conclude that, with positive probability, none of these
events happen.
The Lovasz Local Lemma (LLL)
Goal: Show that there exists an assignment x from the
set of events to (0,1) , s.t.,
()
Prob[A ] x (A ) A ~ A’, ’ (1 - x (A’ ))
For each A .
Then the Local Lemma of Lovasz implies:
Prob [ A ] (1 - x (A )) > 0 .
There exists a sample F1 that well approximates all range counts.
Applying the Local Lemma of Lovasz:
Dependency among events
Observation:
Due to our sampling model, for a pair of ranges , ’ the two
events A , A’ are mutually independent iff there is no light object
in F that participates in both , ’.
Omitting further technical details,
the degree of dependency for a fixed
range depends on its size.
These values are non-uniform!
(which is the reason we need to resort
to the asymmetric version of LLL)
’
Non-Uniform Degree of Dependency
We overcome this difficulty by applying an exponential partition
of the ranges according to their count:
A range T lies at the i th layer of T if, for i = 1, …, log(1/p)
2i-1 p | F | / |F | 2i p ,
and lies at the 0 th layer if
0 | F | / |F | p
Wrapping Up
Then the bound on Prob[A ] (computed by Chernoff's bound) is
sensitive to the layer: Prob[A ] < ( / log(1/p)) B 2^{i-1}
The (asymmetric) Lovasz Local Lemma “connects” between all
layers:
x (A ) = exp{2i+1} Prob[A ] , for each Si .
This yields one (universal) sample for all ranges at all layers.
Remark: The simpler version of LLL can be applied in each layer
separately, but that produces multiple samples and not a single one.
Keeping the Sample Small
We have shown that
We sample each light object
independently with prob.
E[ | F1 | ] = O( log log (1 /p) + log(1 / ) / 2 p ) .
Nevertheless, we need to show that a sample of that size exists,
and that it also well approximates all ranges (w.r.t. light objects).
Let B be the (bad) event: |F1 | > (1 + ) E[ | F1 |] .
is a sufficiently
large constant
Our analysis extends the Local Lemma to include this event.
Applying the Constructive LLL
Apply the randomized algorithm of Moser and Tardos [MT 2010].
Our scenario matches the properties of the setting in [MT 2010] :
The light objects induce a set of mutually independent random
variables : Each of which is chosen randomly and independently
with probability .
Each bad event A and B ) is determined by these variables.
A sample F1 with the above desired properties can be
constructed in expected polynomial time.
Application:
Set Multi-Cover Problem
A set cover for (X, R) is a subset S R, s.t.,
any x X is covered by S.
Goal: find smallest set cover.
Finding a set cover of smallest size
is NP-hard, even for geometric range spaces!
Use an approximation algorithm instead
[Lovasz 75, Chvatal 79, Clarkson 93, ERS 05...].
Application:
Set Multi-Cover Problem
In set multi-cover each item x X is associated
with a demand d(x) .
Goal: Find a smallest subset of R , s.t., each item x X
is covered at least d(x) times.
Model: Allow repetitions.
Set Multi-Cover with Repetitions
Theorem [Chekuri, Clarkson, Har-Peled]:
If (X, R) admits a relative (p, -approximation of size
(1 / p) / 2 p) ,
Computable in polynomial time
then there exists a polynomial-time algorithm that approximates
the smallest set multi-cover up to a factor of O( (OPT)) .
Applying the standard bound O ( log (1 / p) / 2 p ) yields
an approximation factor of O(log OPT) .
Applying our bound for well-behaved range spaces yields an
approximation factor of O(log log OPT) .
Further Research
Can we remove the factor log(1 / ) in the enumerator of our
bound? and thus obtain the bound O( { log log (1 /p) } / 2 p ) ?
Li , Long, and Srinivasan [LLS-01] obtained such an
improvement for more general range spaces: They reduced the
bound O( (log (1 / p) + log(1 / )) / 2 p ) to
O( log (1 / p) / 2 p ) .
Does our technique have any implications to combinatorial
discrepancy? Can one integrate this machinery with existing
techniques in order to obtain improved bounds on certain
scenarios?
The Case of Points and Halfplanes
[Har-Peled Sharir 2011]
The bound for points and halfplanes is achieved via
combinatorial discrepancy.
X – set of n points.
H – all halfplanes defined on X.
Informally, we want to color each point of X either red or blue, in
such a way that any halfplane in H has roughly the same number of
red and blue points.
The maximum deviation from an even splitting, over all halfplanes,
is the discrepancy of H.
The Case of Points and Halfplanes
[Har-Peled Sharir 2011]
Formally, a coloring of X is any mapping : X → {−1, +1}.
The discrepancy of H, denoted by disc(H ), is the minimum, over
all colorings , of
disc(χ, H) := max hH disc(χ, h) ,
where disc(χ,h) = |x∈h χ(x) |.
What is the best coloring for X ?
Theorem: [Matousek 95]
disc(H ) = O(n¼ )
The Case of Points and Halfplanes
[Har-Peled Sharir 2011]
The analysis of Har-Peled and Sharir shows that the discrepancy
is sensitive to the size of the halfspace:
Theorem: [Har-Peled Sharir 2011]
Given a set X of n points in the plane, there exists a coloring
: X → {−1, +1}, s.t., for any halfplane h H ,
disc( h ) = O( | h X | ¼ log n ) .
This property yields a bound of O( log 4/3 (1/p) / (4/3 p) ) on the
size of the relative (p, )-approximation using “halving technique”.
Thank you