Why Sampling?

Transcript Why Sampling?

Sampling Lower Bounds via
Information Theory
Ziv Bar-Yossef
IBM Almaden
1
Standard Approach to Hardness of
Approximation
Hardness of
approximation
n
for f: X ! Y
Hardness of a
decision
“promise problem”
“Promise problem”:
• 8a 2 A, b 2 B, f(a) is “far” from f(b).
• Given x 2 A [ B, decide if x 2 A.
n
X
A
B
2
The “Election Problem”
• input: a sequence x of n votes to k parties
• Want to get  s.t. || - x|| < .
• How big a poll should we
conduct?
• 8 S µ [k], easy to decide between:
A = { x | x(S) ¸ ½ +  } and
B = { x | x(S) · ½ -  }.
• Hardness due to the abundance of
such decision problems !
poll has to be of size (k).
Vote Distribution x
(n = 18, k = 6)
7/18 4/18 3/18 2/18 1/18 1/18
3
Similarity Hardness vs. Abundance Hardness
Similarity
hardness
Hardness of
approximation
for f: Xn ! Y
Abundance
hardness
Hardness of a
decision
“promise problem”
Abundance of
decision
“promise problems”
In this talk:
• A lower bound technique that captures both types
of hardness in the context of sampling algorithms.
4
Why Sampling?
Input Data Set
A small
number of
queries
Algorithm
• Queries can be chosen randomly
• Output is typically approximate
• Sub-linear time & space
5
Some Examples
Statistics
• Statistical decision and estimation
• Statistical learning
• …
CS
•
•
•
•
•
PAC and machine learning
Property testing
Sub-linear time approximation algorithms
Extractors and dispersers
…
6
Query Complexity
Query complexity of a function f:
# of queries required to approximate f
Examples:
• High query complexity:
– Parity
– # of distinct elements
• Low query complexity:
– Mean in [0,1]
– Median
7
Our Main Result
• A technique for obtaining lower bounds on the
query complexity of approximating functions
– Template for obtaining specific lower bounds
• Arbitrary domain and range
• All types of approximation
• Usable for wide classes of functions with symmetry properties
– Outperforms previous techniques for functions with
“abundance hardness”
– Matches previous techniques for functions with
“similarity hardness”
8
Previous Work
• Statistics
– Crámer-Rao inequality
– VC dimension
– Optimality of the sequential probability ratio test
• CS
– Lower bounds via the Hellinger distance [B., Kumar,
Sivakumar 01]
– Specific lower bounds [Canetti, Even, Goldreich 95],
[Radhakrishnan, Ta-Shma 96], [Dagum, Karp, Luby, Ross 95],
[Schulman, Vazirani 99], [Charikar, Chaudhuri, Motwani,
Narasayya 00]
None addresses abundance hardness!
9
Multi-Way
Reduction from a Binary Promise
Problem
f: Xn ! Y
pairwise “disjoint inputs”
f(a)
f(b)
Y
f(c)
Multi-way
Binary promise problem:
Given x 2 { a, b, c }, decide whether x = a or x = b or x = c
Can be solved by any sampling algorithm
approximating f
10
Main Result
The lower bound “recipe”
f: Xn ! Y: a function with an appropriate symmetry property
1.
2.
Identify a set S = { x1,…,xm } of “pairwise disjoint” inputs.
Calculate the “dissimilarity” D(x1,…,xm) among x1,…,xm.
(D(¢,…,¢) is a distance measure taking values in [0,log m]).
Theorem:
Any algorithm approximating f requires q queries, where
Tradeoff between “similarity hardness”
and “abundance hardness”
11
Measure of Dissimilarity
i :
distribution of the value of a uniformly
chosen entry of xi
Then:
• Jensen-Shannon divergence
1
m
2

12
Application I:
The Election Problem
Previous bounds on the query complexity:
• (1/2) [BKS01]
• (k)
[Batu et al. 00]
• O(k/2) [BKS01]
Theorem [This paper] (k/2)
13
Combinatorial Designs
t-design:
B1
[k]
B3
B2
Proposition
For all k and for all t ¸ 12, there exists a t-design of
size m = 2(k).
14
Proof of the Lower Bound
Step 1: Identification of a set S of pairwise disjoint inputs:
B1,…,Bm µ [k]: a t-design of size m = 2(k).
S = { x1,…,xm }, where
Bi
[k]nBi
Step 2: Dissimilarity calculation: D(x1,…,xm) = O(2).
By main theorem, # of queries is at least (k/2).
15
Application II:
Low Rank Matrix Approximation
Exact low rank approximation:
• Given an m £ n real matrix M and k · m,n, find the m £ n
matrix Mk of rank k for which ||M – Mk||F is minimized.
• Solution: SVD. Requires querying all of M.
Approximate low rank approximation (LRMk):
• Get a rank k martix A, s.t.
||M – A||F · ||M – Mk||F + ||M||F.
Theorem [This paper]
Computing LRMk requires (m + n) queries.
16
Proof of the Lower Bound
Step 1: Identification of a set S of pairwise disjoint inputs:
B1,…,Bt µ [2k]: a combinatorial design of size t = 2(k).
2k
S = { M1,…,Mt }, where
Mi is all-zero, except for the
diagonal, which is the
characteristic vector of Bi.
• Mi
is of rank k  (Mi)k = Mi.
Bi 0
2k
0
0
0
0
• ||Mi||F = k1/2.
• ||Mi – Mj||F ¸ (|Bi n Bj|)1/2 ¸ (k/12)1/2 ¸  (||Mi||F + ||Mj||F).
Step 2: Dissimilarity calculation: D(M1,…,Mt) = 2k/m.
By main theorem, # of queries is at least (m).
17
Low Rank Matrix Approximation (cont.)
Theorem [Frieze, Kannan, Vempala 98]
By querying an s £ s submatrix of M chosen using any
distributions which “approximate” the row and column weight
distributions of M, one can solve LRMk with s = O(k4/3).
Theorem [This paper]
Solving LRMk by querying an s £ s submatrix of M chosen
even according to the exact row and column weight
distributions of M requires s = (k/2).
18
Oblivious Sampling
Phase 1:
Phase 2:
Choose query positions
i1,…,iq
Query xi1,…,xiq
• Query positions are independent of the given input.
• Algorithm has a fixed query distribution  on [n]q.
• i.i.d. queries: queries are independent and identically
distributed:  = q, where  is a distribution on [n].
19
Main Theorem: Outline of the Proof
Adaptive
sampling
(For functions with symmetry properties)
Oblivious sampling
with i.i.d queries
Statistical
classification
Lower bounds via
information theory
20
Statistical Classification
1
2
q i.i.d.
samples
Black
Box
Classifier
i 2 [m]
m
• 1,…,m are distributions on Z.
• Classifier is required to be correct with probability ¸ 1 - .
21
From Sampling to Classification
q that
• T : oblivious algorithm
with
query
distribution

=

approximates f: Xn ! Y.
• x : joint distribution of a query and its answer when T
runs on input x (distribution on [n] £ X).
• S = {x1,…,xm} : set of pairwise disjoint inputs.
x1
x2
xm
q i.i.d.
samples
Black
Box
T
Decide i iff T’s
output 2 A(xi)
22
Jensen-Shannon Divergence [Lin 91]
• KL divergence between distributions , on Z:
• Jensen-Shannon divergence among distributions
1,…,m on Z: ( = (1/m) i i)
7
6
1
8

5
2
3
4
23
Main Result
Theorem [Classification lower bound]
Any -error classifier for 1,…,m requires q queries, where
Corollary [Query complexity lower bound]
For any oblivious algorithm with query distribution  = q
that (,)-approximates f, and for any set S = {x1,…,xm} of
“pairwise disjoint” inputs, the number of queries q is at least
24
Outline of the Proof
Lemma 1 [Classification error lower bound]
Proof: by Fano’s inequality.
Lemma 2 [Decomposition of Jensen-Shannon]
Proof: By subadditivity of entropy and conditional
independence.
25
Conclusions
• General lower bound technique for the query complexity
– Template for obtaining specific bounds
– Works for wide classes of functions
– Captures both “similarity hardness” and “abundance hardness”
• Applications
– The “Election Problem”
– Low rank matrix approximation
– Matrix reconstruction
• Also proved
– A lower bound technique for the expected query complexity
– Tightly captures similarity hardness but not abundance hardness
• Open problems
– Tight bounds for low rank matrix approximation
– Better lower bounds on the expected query complexity
– Lower bounds for non-symmetric functions
26
Simulation of Adaptive Sampling by
Oblivious Sampling
Definition
n
f: X ! Y is symmetric, if 8x and 8 2 Sn, f((x)) = f(x).
f is -symmetric, if 8x 8, A((x)) = A(x).
Lemma [BKS01]
Any q-query algorithm approximating an -symmetric f can be
simulated by a q-query oblivious algorithm whose queries are
uniform without replacement.
Corollary
If q < n/2, can be simulated by a 2q-query oblivious algorithm
27
whose queries are uniform with replacement.
Simulation Lemma:
Outline of the Proof
• T: q-query sampling algorithm approximating f
• WLOG, T never queries the same location twice.
Simulation:
• Pick a random permutation .
• Run T on (x).
• By -symmetry, output is likely to be in A((x)) = A(x).
• Queries to x are uniform without replacement.
28
Extensions
Definitions
• f is (g,)-symmetric if 8x, 8, 8y 2 A((x)), g(,y) 2 A(x).
• A function f on m £ n matrices is -row-symmetric, if for
all matrices M, and for all row-permutation matrices ,
A( ¢ M) = A(M).
Similarly:
-column-symmetry, and (g,)-row- and column-symmetry.
We prove: similar simulations hold for all of the above.
29

Why Sampling?

Transcript Why Sampling?

Directory