Testing of Clustering
Download
Report
Transcript Testing of Clustering
Property Testing:
A Learning Theory Perspective
Dana Ron
Tel Aviv University
Property Testing (Informal Definition)
For a fixed property P and any object O,
determine whether O has property P,
or whether O is far from having property P
(i.e., far from any other object having P ).
?
?
?
?
?
Task should be performed by inspecting the object (in
as few places as possible).
Examples
• The object can be a function and the property
can be linearity.
• The object can be a string and the property can
be membership in a fixed regular language L.
• The object can be a graph and the property can
be 3-colorabilty.
Two Views of Property Testing
• A relaxation of exactly deciding whether the object
has the property or does not have the property.
• A relaxation of learning the object (with membership
queries and under the uniform distribution).
In either case want testing algorithm to be significantly
more efficient than decision/learning algorithm.
Q: Which view is “more right”?
A: Depends…
Mainly on type of objects and properties studied:
Combinatorial objects and properties vs. function
classes that are of interest to learning community
A Little Background
• Initially defined by Rubinfeld and Sudan in the
context of Program Testing (of algebraic functions).
• With Goldreich and Goldwasser initiated study of
testing properties of combinatorial objects, and
in particular graphs.
• Growing body of work deals with properties of
functions, graphs, strings, sets of points ...
Many algorithms with complexity that is sub-linear in
(or even independent of) size of object.
Formal Definition (“standard” model)
A property testing algorithm for property P is given a distance
parameter and query access(*) to a function f.
If f has property P then the algorithm should accept w.h.p.
If f is -far from any function having property P then the
algorithm should reject w.h.p.
Distance is measured with respect to the uniform dist.(**)
(*) May consider testing with random examples only
(**) May consider other distributions (unknown dist.)
Property Testing and Learning:
Basic Comments/Observations
Comment (Motivation): Can use testing as preliminary step
to learning: That is, for efficiently selecting good hypothesis
class.
Observation: Testing is no harder than (proper) learning:
If have learning algorithm for function class F that outputs
hypothesis in F, then can use to test the property of belonging
to F.
That is, run learning alg with accuracy parameter
set to /2, and check that the hypothesis it
outputs is at most 3/4 far from f on independent
sample.
Want testing algorithm to be more efficient than learning
algorithm.
Classes/Properties for which testing is
more efficient then learning
Linear functions
Low-degree polynomials
Singletons, Monomials
(small) DNF and general Boolean formula and circuits
Monotone functions
Juntas
Halfspaces
Decision Lists, Decision Trees, Branching Programs
Clustering
Properties of Distributions
Two recurring approaches: “Self-Correcting” approach, and
“Enforce&Test” Approach.
Linearity Testing [Blum,Luby,Rubinfeld]
Def1: Let F be a finite field. A function f : Fn F is called
linear (multi-linear) if there exists constants a1,…,an F
s.t. for every x=x1,…,xn Fn it holds that
f(x) = aixi .
Def2: A function f is said to be -far from linear if for
every linear function g, dist(f,g)>, where
dist(f,g)=Pr[f(x) g(x)] (x selected uniformly in Fn).
Fact: A function f : Fn F is linear i.f.f for every x,y Fn it
holds that f(x)+f(y)=f(x+y) .
Linearity Testing Cont’
Linearity Test
1) Uniformly and independently select (1/) pairs of
elements x,y Fn .
2) For every pair x,y selected, verify that f(x)+f(y) = f(x+y).
3) If for any of the pairs selected linearity is violated (i.e.,
f(x)+f(y) f(x+y)), then REJECT, otherwise ACCEPT.
Query complexity: (1/), i.e., independent of n. In
contrast to learning where need (n) queries/examples.
Theorem: If f is linear then test accepts w.p. 1., and if f
is -far from linear then with probability at least 2/3 the
test rejects it.
Lemma: If f is accepted with probability greater than 1/3,
then f is -close to linear.
Lemma: If f is accepted with probability
greater than 1/3 , then f is -close to linear.
Linearity Testing Cont’
Suppose f is accepted w.p > 1/3
small (< /2) fraction of violating pairs (f(x)+f(y)f(x+y))
Define self-corrected version of f, denote g:
For each x,y let Vy(x) = f(x+y)-f(y) (the vote of y on x)
g(x) = Plurality(Vy(x))
Can show that (conditioned on < /2 fraction of violating pairs)
(1) g is linear.
(2) dist(f,g)
Testing “Basic” Properties of Functions
[Parnas, R, Samorodnitsky]
Considers following function classes:
• Singletons: f ( x) xi
f ( x) xi x j xk
f ( x) ( xi x j ) ( xk x xm )
• Monomials:
• DNF:
Testing “Basic” Properties of Functions Cont’
• Can test whether f is a singleton using O(1/) queries.
• Can test whether f is a monomial using O(1/) queries.
• Can test whether f is a monotone DNF with at most t
terms using Õ(t2/) queries.
Common theme: no dependence in query complexity on
size of input, n, and linear dependence on distance
parameter, (as opposed to learning these classes where
have dependence on n (logarithmic))
Recent result of [Diakonikolas, Lee, Matulef, Onak, Rubinfeld,
Servedio, Wan] greatly extends the above
Testing (Monotone) Singletons
Singletons satisfy: (1)
(2)
Pr[ f ( x) 1] 1 / 2
f ( x y) f ( x) f ( y) x, y
Natural test: check, by sampling, that conditions hold
(approximately).
Can analyze natural test for case that distance between
function and class of singletons is not too big (bounded
away from 1/2).
Testing Singletons II - Parity Testing
Observation: Singletons are a special case of parity functions
(i.e., linear functions: g ( x ) xi .)
iS
Claim: Let
g ( x ) xi . If | S | 2
iS
then
Pr[g ( x y) g ( x) g ( y)] 1 / 4
Modified algorithm:
(1) Test whether f is a parity function (with dist. par. ) using
algorithm of [BLR] .
(2) Uniformly select constant number of pairs x,y and check
whether any is a violating pair (i.e.: f ( x y) f ( x) f ( y) ).
Testing Singletons III - Self Correcting
This “almost works”: If f is singleton - always accepted.
If f is -far from parity - rejected w.h.p.
But if f is -close to parity function g, then cannot simply
apply claim to argue that many violating pairs w.r.t. f.
If we could only test violations w.r.t. g instead of f ...
Use Self-Corrector of [BLR] to “fix” f into parity function (g),
and then test violations on self-corrected version.
Testing Singletons IV - The Algorithm
Final Algorithm for Testing Singletons:
(1) Test whether f is a parity function with dist. par. using
algorithm of [BLR] .
(2) Uniformly select constant number of pairs x,y. Verify that
Self-Cor(f,x) Self-Cor(f,y) = Self-Cor(f,xy) .
(3) Verify that Self-Cor( f ,1) = 1 .
Testing Monomials and Monotone DNF
Monomial testing algorithm has similar structure to Singleton
testing algorithm. (Here too suffice to find test for monotone
monomials.)
- The first stage of linearity testing is replaced by Affinity Testing:
if f is a monomial then F1={x: f(x)=1} is an affine subspace.
[ Fact: H is affine subspace i.f.f x,y,zH, xyz H ].
Affinity test is similar to parity test: select x,yF1, z{0,1}n, verify that
f(xyz)=f(x)f(y)f(z).
- The second stage is also analogous singleton test (check for
violating pairs). Here affinity adds structure that helps analyze
second stage.
Testing monotone DNF: use monomial test as sub-routine
Result of [DLMORSW07] which extends to other families (e.g.,
non-monotone DNF) uses different techniques.
Testing of Clustering [Alon,Dar,Parnas,R]
Notation:
X - set of points |X| = n
dist(x,y) - distance between points x and y.
Assume that triangle inequality holds (dist(x,y)≤ dist(x,z)+dist(z,y)).
For any subset S of X:
Diameter of S
d(S) = maxx,ySdist(x,y)
Clustering Cont’
X is (k,b)-clusterable if exists a k-way
partition (clustering) of X s.t. each
cluster has diameter at most b.
b
X is -far from being (k,b’)-clusterable (b’ b)
if there is no k-way partition of any Y X, |Y| ≥ (1-)n
s.t. each cluster has diameter at most b’.
(In particular, will look at b’=(1+)b, ≤1 .)
In first case algorithm should accept
and in second reject with probability ≥ 2/3
Clustering Cont’
Testing Algorithm (input: k,b,, )
(1) Take sample of m=m(k, ,) points from X.
(2) If sample is (k,b)-clusterable then accept, o.w. reject
If X is (k,b)-clusterable then always accept.
Suppose X is -far from being (k,(1+)b)-clusterable.
Show that reject w.p. at least 2/3.
Will prove for general metric and =1 where m=O(k/)
Note: for general metric cannot go below =1 unless allow
m=(|X|1/2), but can do so for Euclidean distance in d
dimensions (in that case must have dependence on (1/ )d/2 ).
Other sublinear clustering work (e.g. for k-median) include: [Indyk],
[Mishra,Oblinger,Pitt], [Ben-David], [Meyerson,O’Callaghan,Plotkin]
Clustering Cont’
Consider following mental experiment:
- Let points in sample be x1,x2,…,xm
- Construct (growing) set of “cluster representatives” REPS.
- Initially, REPS = {x1}.
- At each step, take next point that is at distance > b from every
x in REPS, and add to REPS.
>b
Claim: If X is -far from being (k, 2b)-clusterable then w.h.p.
|REPS|>k, causing the algorithm to reject as required.
Proof Idea: At each step consider clustering “enforced” by
REPS: Each point in X\REPS is assigned to closest xi in REPS.
Additional sample points “test” this clustering.
Tolerant Testing of Clustering
[Parnas,R,Rubinfeld]
Tolerant Testing: Reject when -far but accept when ’-close
Tolerant Testing Algorithm (input: k, b, ,’, )
(1) Take sample of m=m(k, ,’, ) points from X.
(2) If sample is (’ + ( - ’)/2)-close to (k,b)-clusterable
then accept, o.w. reject
Can analyze using a generalization of a framework by
Czumaj & Sohler for (standard) testing that captures
aspects of “enforce&test” approach.
Sample has quadratic dependence on 1/( - ’), and
same dependence on other parameters as (standard)
testing algorithm.
Distribution-Free Testing (with queries)
First results [GGR]: trivial positive results that follow from
learning testing, and simple negative results for
combinatorial properties (e.g., bipartiteness)
First non-trivial positive results [Halevi, Kushilevitz]:
(1) Linearity and Low-degree polynomials. In general: when
have self-corrector.
(2) Monotonicity in low dimensions
(Also had positive results for graph properties)
On the other hand, [Halevi, Kushilevitz] showed hardness of
distribution free testing of monotonicity in high-dimensions
(i.e., exponential in d over {0,1}d).
Recently, [Glasner,Servedio] showed that for several classes
(monomials, decisions lists, linear threshold functions) need
((n/log n)1/5) queries for dist-free testing.
Conclusions and Open Problems
Property testing: A relaxation of learning where should
determine (w.h.p) if exists good approximation of function
f in class F rather than find such approximation.
Can serve as preliminary step to learning.
For quite a few function classes have testing algorithms
that are more query-efficient than learning algorithms.
Some extend to dist-free testing, for some have strong
lower bounds.
Still much is left to be understood about relation between
testing and learning (has flavor of relation between decision
and search).
Thanks
Property Testing and Learning:
Basic Comments/Observations
Motivation I: Can use testing as preliminary step to learning:
That is, for efficiently selecting good hypothesis class.
Motivation II: Relation between testing and learning has
similarity to relation between decision and search.
Testing is no harder than (proper) learning: If have learning
algorithm for function class F that outputs hypothesis in F, then
can use to test the property of belonging to F.
That is, run learning alg with accuracy parameter
set to /2, and check that the hypothesis it
outputs is at most 3/4 far from f on independent
sample.
Want testing algorithm to be significantly more efficient
than learning algorithm.
Lemma: If f is accepted with probability
greater than 1/3 , then f is -close to linear.
Linearity Testing Cont’
Suppose f is accepted w.p > 1/3
small (< /2) fraction of violating pairs (f(x)+f(y)f(x+y))
Define self-corrected version of f, denote g:
For each x,y let Vy(x) = f(x+y)-f(y) (the vote of y on x)
g(x) = Plurality(Vy(x))
Can show that (conditioned on < /2 fraction of violating pairs)
(1) g is linear.
(2) dist(f,g)
Main Technical Lemma (informal): if few violating pairs
then x we have that for almost all y, Vy(x)=g(x)
Learning Boolean Formulae
• Can learn singletons and monomials under uniform
distribution using O(log n/) queries/examples
(variation on Occam’s razor)
• Can properly learn monotone DNF with t terms and r literals
using Õ(r log2n/ + t(r + 1/ )) queries
[Angluin+Bshouty&Jackson&Tamon].
Main difference w.r.t testing results: in testing there is no
dependence on n and different algorithmic approach.