Transcript slides
Statistics vs Big Data
Constantinos Daskalakis
CSAIL and EECS, MIT
Greek Stochastics π
YOU WANT BIG DATA?
IβLL GIVE YOU BIG DATA!
BIG Data
small
Facebook: 20 petabytes images daily
Human genome: 40 exabytes storage by 2025
SKA Telescope: 1 exabyte daily
High-dimensional
DNA
microarray
Computer
vision
Expensive
Experimental
drugs
Financial
records
What properties do your BIG
distributions have?
e.g.1: play the lottery?
Is it uniform?
Is the lottery unfair?
from Hitlotto.com:
βLottery experts agree, past number histories
can be the key to predicting future winners.β
True Story!
β’ Polish lottery Multilotek
β Choose βuniformlyβ at random distinct 20 numbers
out of 1 to 80.
β Initial machine biased
β’ e.g., probability of 50-59 too small
Thanks to Krzysztof Onak (pointer) and Eric Price (graph)
New Jersey Pick 3,4 Lottery
β’ New Jersey Pick k ( =3,4) Lottery.
β Pick k random digits in order.
β 10k possible values.
β’ Data:
β Pick 3 - 8522 results from 5/22/75 to 10/15/00
β’ ο£2-test (on Excel) answers
"42% confidenceβ
β Pick 4 - 6544 results from 9/1/77 to 10/15/00.
β’
β’
β’
β’
fewer results than possible values
not a good idea to run π 2 test
convergence to π 2 distribution wonβt kick in for small sample size
(textbook) rule of thumb: expected number of at least 5
observations of each element in the domain under the null
hypothesis to run π 2
e.g. 2: Independence Testing
Shopping patterns:
Independent of zip code?
e.g.2: Linkage Disequilibrium
Genome
locus 1
locus 2
locus π
Single nucleotide polymorphisms, are they independent?
Suppose π loci, 2 possible states each, then:
β’ state of oneβs genome β {0,1}π
β’ humans: some distribution π over {0,1}π
Question: Is π a product distβn OR far from all product distβns?
should we expect the genomes from the 1000 human genomes
project to be sufficient? up to how many loci?
e.g. 3: Outbreak of diseases
β’ Similar patterns in different years?
β’ More prevalent near large airports?
Flu 2005
Flu 2006
Distributions on BIG domains
β’ Given samples of a distribution, need to know, e.g.,
β
β
β
β
entropy
number of distinct elements
βshapeβ (monotone, unimodal, etc)
closeness to uniform, Gaussian, Zipfianβ¦
β’ No assumptions on shape of distribution
β i.e., parametric, smoothness, monotonicity, normality,β¦
β’ Considered in Statistics, information theory, machine
learning, databases, algorithms, physics, biology,β¦
Old questions, new challenges
Modern Setting
Classical Setting
Domain:
1000 tosses
Domain:
One human genome
Small domain π·
Large domain π·
π πππππ, π· π ππππ
π π ππππ, π· πππππ
(comparatively)
Asymptotic analysis
Computation not crucial
New challenges:
samples, computation,
communication, storage
A Key Question
β’ How many samples do you need in terms of
domain size?
β Do you need to estimate the probabilities of each
domain item?
-- OR -β Can sample complexity be sublinear in size of the
domain?
Rules out standard statistical
techniques
Aim
Algorithms with sublinear sample complexity
STATISTICS
MACHINE
LEARNING
INFORMATION
THEORY ALGORITHMS
The Menu
Motivation
Problem Formulation
Uniformity Testing, Goodness of Fit
Testing Properties of Distributions
Discussion/Road Ahead
Testing in High Dimensions
The Menu
Motivation
Problem Formulation
Uniformity Testing, Goodness of Fit
Testing Properties of Distributions
Discussion/Road Ahead
Testing in High Dimensions
Problem formulation
discrete
Model
π«: family of distributions over π·
may be non-parametric, e.g. unimodal, product, log-concave
Problem
Given: samples from unknown π
with probability 0.9, distinguish
πβπ«
vs
Objective
Minimize samples
Computational efficiency
π(π, π«) > π
β1 π, π
min
πβπ«
2
max |π π β π(π)|
ππ£πππ‘π π
Sublinear
in |π·|?
Well-studied Problem
(Composite) hypothesis testing
β
β
β
β
β
Neyman-Pearson test
Kolmogorov-Smirnov test
Pearsonβs chi-squared test
Generalized likelihood ratio test
β¦
Quantities of Interest
ππΉ = Pr ππππππ π€βππ βπ¦πππ‘βππ ππ πππππ
ππ = Pr(ππππππ π€βππ βπ¦πππ‘βππ ππ ππππ)
Focus
- Sublinear in |π·|?
- Strong control for
Consistency
false positives?
Error exponents: exp βπ β
π
as π β β
Asymptotic regime: Results kick in when π β« |π·|
The Menu
Motivation
Problem Formulation
Uniformity Testing, Goodness of Fit
Testing Properties of Distributions
Discussion/Road Ahead
Testing in High Dimensions
The Menu
Motivation
Problem Formulation
Uniformity Testing, Goodness of Fit
Testing Properties of Distributions
Discussion/Road Ahead
Testing in High Dimensions
Testing Fairness of a Coin
β’ π : unknown probability of
1
β’
β’
π« = Bernoulli
2
π = Bernoulli(π)
β’
πβπ«
vs
πTV (π, π«) > π
β’ Question: Is π = 0.5 OR π β 0.5 > π?
β’ Goal: Toss coin several times, deduce correct answer w/ probability > 0.99
β’ Number of samples required?
π π
π=1 π
β Can estimate π, by tossing π times, then taking π =
β By concentration bounds, if π > π
β’ Are Ξ©
1
π2
1
π2
π
π
, π β π < 3 , w/ probability > 0.99
many samples necessary?
β Suppose there is tester using π samples
β Then it can distinguish one sample from π = (π1 , β¦ , ππ ) where each ππ βΌ π΅(0.5) from
one sample from π = π1 , β¦ , ππ where each ππ βΌ π΅ 0.5 + π w/ probability > 0.99
β Claim: Any tester has error probability at least
β dTV π, π β€ 2 β
π» π, π = π π β
π
1
(1 β dTV (π, π))
2
Testing Uniformity
β’
β’
π« = Uniformπ·
π : unknown
β’ π: unknown distribution over π·
β’ πβπ«
β’ Sample access to π
β’ Question: is π = ππ· or πTV π, ππ· > π ?
β’ [Paninskiβ03]: Ξ
|π·|
π2
vs
πTV (π, π«) > π
samples and time
βIntuition:β
β’ (Lower Bound) Suppose π is uniform distribution over {1, β¦ , π}
π
and π is either uniform on 1, β¦ , π or uniform on a random
2
size subset of {1, β¦ , π}
- unless Ξ©( π) samples are observed, there are no collisions,
hence cannot distinguish
β’ (Upper Bound) Collision statistics suffice to distinguish
Proving Lower Bounds
β’ [Le Camβ73]: Consider two disjoint sets of distributions π«1 , π«2 .
Suppose algorithm π is given π samples from some unknown
π β π«1 βͺ π«2 and claims to distinguish π β π«1 vs p β π«2 .
β’ Then:
Pr πππππ β₯
1
2
1β
inf
π1 βππππ£π π«1
π2 βππππ£π π«2
π ππ π1 , π2
ππππ£π π« : all distributions generating
samples as follows
- choose a random distribution π from π«
(according to some distribution over π«)
- then generate π samples from π
Proving Lower Bounds
β’ [Le Camβ73]: Consider two disjoint sets of distributions π«1 , π«2 .
Suppose algorithm π is given π samples from some unknown
π β π«1 βͺ π«2 and claims to distinguish π β π«1 vs p β π«2 .
β’ Then:
Pr πππππ β₯
1
2
1β
inf
π1 βππππ£π π«1
π2 βππππ£π π«2
π ππ π1 , π2
ππππ£π π« : all distributions generating
samples as follows
- choose a random distribution π from π«
(according to some distribution over π«)
- then generate π samples from π
Proving Lower Bounds
β’ [Le Camβ73]: Consider two disjoint sets of distributions π«1 , π«2 .
Suppose algorithm π is given π samples from some unknown
π β π«1 βͺ π«2 and claims to distinguish π β π«1 vs p β π«2 .
β’ Then:
Pr πππππ β₯
To prove Ξ©
π·
π2
1
2
1β
inf
π1 βππππ£π π«1
π2 βππππ£π π«2
lower bound for uniformity
testing take:
β’ π«1 = Uniform π
β’ π«2 = π | π2π+1 =
1±ππ
, π2π+2
π
=
1βππ
, βπ
π
π ππ π1 , π2
ππππ£π π« : all distributions generating
samples as follows
- choose a random distribution π from π«
(according to some distribution over π«)
- then generate π samples from π
The Menu
Motivation
Problem Formulation
Uniformity Testing, Goodness of Fit
Testing Properties of Distributions
Discussion/Road Ahead
Testing in High Dimensions
Identity Testing (βgoodness of fitβ)
β’ π, π: distributions over π·
β’ π: given; sample access to π
β’ Question: is π = π or πTV π, π > π ?
β’
β’
π«= π
π : unknown
β’
πβπ«
vs
πTV (π, π«) > π
β’ [Batu-Fisher-Fortnow-Kumar-Rubinfeld-Whiteβ01]β¦
β’ [Paninskiβ08, Valiant-Valiantβ14]: Ξ
|π·|
π2
samples and time
β’ [w/ Acharya-Kamathβ15]: a tolerant goodness of fit test with same
sample size can distinguish: π 2 π, π β€ π 2 vs β12 π, π > 4 β
π 2
β π 2 π, π β
ππ βππ 2
πβπ·
ππ
β Cauchy-Schwarz: π 2 π, π β₯ β1 π, π
2
A new π 2 - Goodness of Fit Test
β’ Goal: given π and sample access to π distinguish:
Case 1: π 2 π, π β€ π 2 vs Case 2: β12 π, π β₯ 4 β
π 2
β’ Approach: Draw Poisson(π) many samples from π
Side-Note:
β’ ππ : # of appearances of symbol π β π·
β’ Pearsonβs π 2 test uses
ππβπβ
ππ 2
β ππ ~ Poisson π β
ππ
statistic π
β ππ
πβπ·
independent random variables
β’ Statistic: π =
ππ βπβ
ππ 2 βππ
π
πβ
ππ
π 2 (π, π)
πβ
ππ
β’ Subtracting ππ in the
numerator gives an unbiased
estimator and importantly
may hugely decrease
variance
β πΈ π =πβ
β Case 1: πΈ π β€ π β
π 2 ; Case 2: πΈ π β₯ π β
4 π 2
β chug chug chugβ¦bound variance of π β O
suffice to distinguish
π·
π2
samples
The Menu
Motivation
Problem Formulation
Uniformity Testing, Goodness of Fit
Testing Properties of Distributions
Discussion/Road Ahead
Testing in High Dimensions
The Menu
Motivation
Problem Formulation
Uniformity Testing, Goodness of Fit
Testing Properties of Distributions
Discussion/Road Ahead
Testing in High Dimensions
Testing Properties of Distributions
β’ so far π« ={single distribution}
β restrictive, as rarely know hypothesis distribution exactly
β’ natural extension: test structural properties
β monotonicity: βPDF is monotone,β e.g. cancer vs radiation
β unimodality: βPDF is single-peaked,β e.g. single source of disease
β log-concavity: βlog PDF is concaveβ
β monotone-hazard rate: βlog (1 β CDF) is concaveβ
β product distribution, e.g. testing linkage disequilibrium
β’ Example question:
β π« = {π’πππππππ πππ π‘ππππ’π‘ππππ ππ£ππ [π]}
β Sample access to π
β Is π unimodal OR is π π-far from or unimodal distributions?
Testing Properties of Distributions
[w/ Acharya and Kamath 2015]:
1. Testing identity, monotonicity, log-concavity, monotone hazard-rate, unimodality
for distributions over (ordered set) π· is doable w/ π
2.
|π·|
π2
samples and time.
Testing monotonicity/independence of a distribution over π· = π
w/ π
ππ/2
π2
β‘π
|π·|
π2
π
is doable
samples and time.
β previous best for monotonicity testing: π
ππβ0.5
π4
[Bhattacharya-Fisher-Rubinfeld-Valiantβ11]
β previous best for independence: d=2, worst bounds [Batu et al.β01]
3.
All bounds above are optimal
β Matching lower bounds for 1 and 2 via Le Cam.
4.
Unified approach, computationally efficient tests
N.B. Contemporaneous work of [Canonne et alβ2015] provide a different unified approach for
testing structure but their results are suboptimal.
A Natural Approach
Goal: Given π« and sample access to π, distinguish π β π« vs β1 π, π« > π.
Choose a Hypothesis π β π«
Test the Hypothesis
(how well does π fit π?)
A Natural Approach (contβd)
β’ Goal: Given π« and sample access to π, distinguish π β π« vs β1 π, π« > π.
β’ A Learning-Followed-By-Testing Algorithm:
1.
Learn hypothesis π β π« s.t.
β’
β’
2.
β’
β’
π
2
(needs cheap βproper learnerβ)
β1 π, π« > π βΉ β1 π, π > π (automatic since π β π«)
Reduce to βtolerant goodness of fitβ
given π, sample access to π, and explicit description of π, distinguish
π
β1 π, π < 2 vs β1 π, π > π
Tolerant tester requires almost linear #samples in the support of π
β
β’
π β π« β β1 π, π <
namely Ξ©(|π·|/ log |π·|) samples [Valiant-Valiantβ10]
Could try investing more samples for more accurate learning, but proper
learning complexity vs tolerant testing complexity tradeoff does not work out
to give optimal testing complexity
A Modified Approach
β’ Goal: Given π« and sample access to π, distinguish π β π« vs β1 π, π« > π.
β’ A Learning-Followed-By-Testing Algorithm:
1.
Learn hypothesis π β π« s.t.
β’
β’
2.
β’
π β π« β β1 π, π <
π
2
(needs cheap βproper learnerβ)
β1 π, π« > π βΉ β1 π, π > π (automatic since π β π«)
Reduce to βtolerant goodness of fitβ
given π, sample access to π, and explicit description of π, distinguish
π
β1 π, π < 2 vs β1 π, π > π
π·
β’
Now tolerant testing has the right complexity of π
β’
Pertinent Question: are there sublinear proper learners in π 2 ?
β
π2
We show that the π 2 -learning complexity is dominated by the testing
complexity for all properties of distributions we consider
Tutorial: part 2
Summary so far
β’ Hypothesis Testing in the small sample regime.
π
i.i.d.
samples
β’
β’
β’
β’
β’
π unknown distribution over some discrete set π·
π«: set of distributions over π·
Given: π, πΉ, sample access to π
Goal: w/ prob β₯ 1 β πΉ tell π β π« vs β1 π, π« > π
Properties of interest: Is π uniform? unimodal? logconcave? MHR? product measure?
β’ All above properties can be tested w/ O
Test
Pass/Fail?
π«
ππ
β
π₯π¨π
π
πΉ
samples and time
β’ Unified approach based on modified Pearsonβs goodness
of fit test: statistic π =
ππ βπΈπ 2 βπ΅π
πβπ·
πΈπ
β tight control for false positives: want to be able to both assert
and reject the null hypothesis
β accommodate sublinear sample size
The Menu
Motivation
Problem Formulation
Uniformity Testing, Goodness of Fit
Testing Properties of Distributions
Discussion/Road Ahead
Testing in High Dimensions
The Menu
Motivation
Problem Formulation
Uniformity Testing, Goodness of Fit
Testing Properties of Distributions
Discussion/Road Ahead
Testing in High Dimensions
Other Distances (beyond β1 )
β’ So far focused on β1 (a.k.a. total variation) distance
β’ Given sample access to π, w/ prob β₯ 1 β πΉ distinguish:
π β π« vs β1 π, π« > π
β’ Stronger distances?
β [Acharya-D-Kamath]: results are actually shown for π 2 π, π« β₯ β1 π, π«
β Should also extend to KL π, π« β₯ ½ β
β1 π, π« 2
β’ Weaker distances?
β β2 is easy to test for [Goldreich Ron], but makes less sense, e.g.
p = (2/m,2/m,2/m,β¦,2/n,0,0,0,0,0,0,0β¦.0)
q = (0,0,0,0,0,0,0β¦.0,2/m,2/m,2/m,β¦,2/m)
β l1 distance = 2
l2 distance =
2
π
2
Tolerance
β’ So far, focused on non-tolerant version:
β Given set of distributions π«, and sample access to π
β Distinguish: π β π«
vs π ππ π, π« > π
β’ Tolerant version:
β Distinguish: π ππ π, π« <
π
2
β [Valiant-Valiantβ10]: Ξ©
vs π ππ π, π« > π
π·
log |π·|
samples are needed
β’ Tolerant version in (π 2 , β1 ):
β Distinguish: π 2 π, π« <
π
2
vs β12 π, π« > π
β [w/ Acharya, Kamathβ15]: O
|π·|
π2
Different ratios
change the
constants in π()
notation
samples suffice
Goodness of Fit
Our goodness of fit test was given an explicit distribution π and
sample access to a distribution π, and was asked to test π = π
vs π ππ π, π > π.
Sometimes both distributions are unknown, e.g.
Transactions of 20-30 yr olds
Transactions of 30-40 yr olds
Same or different?
Goodness of Fit w/ two unknowns
p
q
i.i.d.
samples
i.i.d.
samples
Test
Pass/Fail?
Given sample access to two unknown
distributions π, π:
Distinguish π = π vs π ππ (π, π) > π
Goodness of Fit w/ two unknowns
β’ [Batu Fortnow Rubinfeld Smith White], [P. Valiant], β¦
β’ [Chan Diakonikolas Valiant Valiant]: Tight upper and lower
bound of:
Ξ max
2
|π·|3
4
π3
,
π·
π2
.
β’ Why different?
β Collision statistics are all that matter
β Collisions on βheavyβ elements can hide collision statistics of rest of
the domain
β Construct pairs of distributions where heavy elements are identical,
but βlightβ elements are either identical or very different
Continuous Distributions
β’ What can we say about continuous distributions?
β without extra assumptions such as smoothness of PDF/parametric
modeling cannot stick to hard distances (β1 , π 2 , πΎπΏ)
β’ Instead of restricting π, π«, let us switch distances
β’ Can extend results if we switch to Kolmogorov distance
β recall: π ππ π, π = sup |π β° β π(β°)|
β°
β in contrast: ππΎ π, π =
sup
|π β° β π(β°)|
β°: π«πππππ§π π₯π
β’ Now want to distinguish: π β π« vs ππΎ π, π« > π
β’ Claim: Tolerant testing in Kolmogorov distance of any distribution
property (continuous or discrete) of π-dimensional distributions
π
can be performed from π 2 samples.
π
β’ Importantly: Kolmogorov distance allows graceful scaling with
dimensionality of data
DvoretzkyβKieferβWolfowitz inequality
β’ Suppose π1 , β¦ , ππ i.i.d. samples from (single-dimensional) πΉ,
and let πΉπ be the resulting empirical CDF, namely
1
πΉπ π₯ =
π
Then: Pr ππΎ πΉ, πΉπ > π β€
π
1ππβ€π₯ .
π=1
2
β2ππ
2π
,
βπ > 0.
1
π2
β’ i.e. π
samples suffice to learn any single-dimensional distβn
to within π in Kolmogorov distance.
β’ VC Inequality βΉ same is true for π-dimensional distributions
π
when #samples is at least π 2
π
β’ After learning in Kolmogorov, can tolerant test any property.
β’ Runtime under investigation.
β trouble: computing/approximating the Kolmogorov distance of two highdimensional distributions is generally a hard computational question.
The Menu
Motivation
Problem Formulation
Uniformity Testing, Goodness of Fit
Testing Properties of Distributions
Discussion/Road Ahead
Testing in High Dimensions
The Menu
Motivation
Problem Formulation
Uniformity Testing, Goodness of Fit
Testing Properties of Distributions
Discussion/Road Ahead
Testing in High Dimensions
Testing in High-Dimensions
β’ Already talked about testing high-dimensional
distributions in Kolmogorov distance.
β Sample complexity is π
π
π2
β’ Next Focus: discrete distributions, stronger
distances
High-Dimensional Discrete Distnβs
β’
Consider source generating π-bit strings β 0,1
β
β
β
β
β’
400 bit
images
0011010101 (sample 1)
0101001110 (sample 2)
0011110100 (sample 3)
β¦
Are bits/pixels independent?
β Our algorithms require Ξ
β’
2π 2
π2
samples
1
Is source generating graphs over π nodes Erdos-Renyi πΊ π, 2 ?
β’
β’
π
Our algorithms require Ξ
π
/2
2 2
π2
samples
Exponential dependence on π unsettling, but necessary
β Lower bound exploits high possible correlation among bits
β’
Nature is not adversarial
β Often high dimensional systems have structure, e.g. Markov random fields
fields (MRFs), graphical models (Bayes nets), etc
Testing high-dimensional distributions with combinatorial structure?
High-Dimensional Discrete Distnβs
β’
Consider source generating π-bit strings β 0,1 π
β 0011010101[w/
(sample
1) Kamathβ16]: If unknown π is known to be an Ising
Dikkala,
β 0101001110 (sample 2)
1
model,
then
poly
π,
samples suffice to test independence,
β 0011110100 (sample 3)
π
β β¦
goodness of fit. (extends to MRFs)
β’
Are bits/pixels independent?
β Our algorithms require Ξ
β’
samples
1
Is source generating graphs over π nodes Erdos-Renyi πΊ π, 2 ?
β’
β’
2π 2
π2
Our algorithms require Ξ
π
/2
2 2
π2
samples
Exponential dependence on π unsettling, but necessary
β Lower bound exploits high possible correlation among bits
β’
Nature is not adversarial
β Often high dimensional systems have structure, e.g. Markov random fields
fields (MRFs), graphical models (Bayes nets), etc
Testing high-dimensional distributions with combinatorial structure?
400 bit
images
Ising Model
β’ Statistical physics, computer vision, neuroscience, social
science
β’ Ising model:
β Probability distribution defined in terms of a graph πΊ = π, πΈ ,
edge potentials ππ , node potentials ππ£
β State space ±1 π
ππ π₯ β exp
ππ π₯π’ π₯π£ +
π= π’,π£ βπΈ
ππ£ π₯π£
π£βπ
β High |ππ |βs βΉ strongly (anti-)correlated spins
ππ£ = 0
Ising
Model:
Strong
vs
weak
ties
βhigh temperature regimeβ
βlow temperature regimeβ
ππ = 1
ππ = 0.5
ππ = 0.25
ππ = 0.125
ππ = 0
Testing Ising Models
ππ π₯ β exp
ππ π₯π’ π₯π£ +
π= π’,π£ βπΈ
ππ£ π₯π£
π£βπ
β’ Given: sample access to Ising model ππ over πΊ = (π, πΈ) w/ π = |π| nodes, π = |πΈ|
edges
β π unknown, graph G unknown
β ππ supported on ±1 π
β’ Goal: distinguish ππ β β ±1 π vs β1 ππ , β ±1 π > π
1
β’ [w/ Dikkala, Kamathβ16]: poly π, samples suffice
π
β’ Warmup:
ο§ SKL ππ , ππβ² = KL ππ , ππβ² +KL ππβ² , ππ
ππ β ππβ² πΈπ ππ’ ππ£ β πΈπβ² ππ’ ππ£
=
π’,π£ =πβπΈ
ππ£ β ππ£β² πΈπ ππ£ β πΈπβ² ππ£
+
π£βπ
ο§ SKL ππ , ππβ² β₯ β1 ππ , ππβ²
2
Testing Ising Models
ππ π₯ β exp
ππ π₯π’ π₯π£ +
π= π’,π£ βπΈ
ππ£ π₯π£
π£βπ
β’ Given: sample access to Ising model ππ over πΊ = (π, πΈ) w/ π = |π| nodes, π = |πΈ|
edges
β π unknown, graph G unknown
β ππ supported on ±1 π
β’ Goal: distinguish ππ β β ±1 π vs πππ ππ , β ±1 π > π
1
β’ [w/ Dikkala, Kamathβ16]: poly π, samples suffice
π
β’ Warmup:
ο§ SKL ππ , ππβ² = KL ππ , ππβ² +KL ππβ² , ππ
ππ β ππβ² πΈπ ππ’ ππ£ β πΈπβ² ππ’ ππ£
=
π’,π£ =πβπΈ
ππ£ β ππ£β² πΈπ ππ£ β πΈπβ² ππ£
+
π£βπ
ο§ SKL ππ , ππβ² β₯ β1 ππ , ππβ²
2
Focus:
ππ£ = 0
Testing Ising Models
ππ π₯ β exp
ππ π₯π’ π₯π£
π= π’,π£ βπΈ
β’ Given: sample access to Ising model ππ over πΊ = (π, πΈ) w/ π = |π| nodes, π = |πΈ|
edges
β π unknown, graph G unknown
β ππ supported on ±1 π
Cheaper nonlocalizing test?
β’ Goal: distinguish ππ = π ±1 π vs SKL ππ , π ±1 π > π
β’ [w/ Dikkala, Kamathβ16]: poly π,
β’ Warmup: SKL ππ , ππβ²
β’ SKL ππ , π ±1 π > π βΉ
=
1
π
samples suffice
π’,π£ =πβπΈ
ππ β ππβ² πΈπ ππ’ ππ£ β πΈπβ² ππ’ ππ£
π
πΈβπ= π’,π£
ππ β
πΈπ ππ’ ππ£ > π βΉ β π’, π£ : πΈπ ππ’ ππ£ > πβ
π
max
β’ ππ = π ±1 π βΉ β π’, π£ : πΈπ ππ’ ππ£ = 0
β’ Hence, can distinguish which is the case from π
2
π2 β
ππππ₯
π2
samples.
Can localize departure from Uniformity (independence) on some edge
Focus:
ππ£ = 0
Testing Ising Models
ππ π₯ β exp
ππ π₯π’ π₯π£
π= π’,π£ βπΈ
β’ Given: sample access to Ising model ππ over πΊ = (π, πΈ) w/ π = |π| nodes, π = |πΈ|
edges
β π unknown, graph G unknown
β ππ supported on ±1 π
Cheaper nonlocalizing test?
β’ Goal: distinguish ππ = π ±1 π vs SKL ππ , π ±1 π > π
β’ Claim: By expending some samples, can identify distinguishing statistic of the form
Z = π’,π£ ππ’π£ ππ’ ππ£ , where ππ’π£ β {±1} for all π’, π£.
β’ Issue: canβt bound πππ[π] intelligently as varβs arenβt pairwise ind.
β’ If ππ = 0, βπ, then πππ π = π2
β’ O.w. best can say is trivial πππ π = π(π4 )
Low temperature.
β and is, in fact, tight:
β’ consider two disjoint cliques with super-strong ties
β’ suppose all ππ’,π£ = 1, for all π’, π£
β’ π dances around its mean by Ξ© n2
How about high
temperature?
ππ = +β
ππ = +β
ππ£ = 0
Ising
Model:
Strong
vs
weak
ties
βhigh temperature regimeβ
βlow temperature regimeβ
ππ β€ 1/ππππ₯
Exponential mixing of the
Glauber dynamics
ππ = 1
ππ = 0.5
βΉ π π β
log n mixing of
the Glauber dynamics
ππ = 0.25
ππ = 0.125
ππ = 0
Testing Ising Models
ππ π₯ β exp
Focus:
ππ£ = 0
ππ π₯π’ π₯π£
π= π’,π£ βπΈ
β’ Given: sample access to Ising model ππ over πΊ = (π, πΈ) w/ π = |π| nodes, π = |πΈ|
edges
β π unknown, graph G unknown
β ππ supported on ±1 π
β’ Goal: distinguish ππ = π ±1 π vs SKL ππ , π ±1 π > π
β’ Claim: By expending some samples, can identify distinguishing statistic of the form
Z = π’,π£ ππ’π£ ππ’ ππ£ , where ππ’π£ β {±1} for all π’, π£.
β’ Issue: canβt bound πππ[π] intelligently as varβs arenβt pairwise ind.
β’ If ππ = 0, βπ, then πππ π = π2
β’ Low temperature: πππ π = π π4
β’ High temperature: πππ π = π
π3
ππππ₯
βΉ O(π2 ) for dense graphs
β proof via exchangeable pairs [Stein,β¦,Chatterjee 2006]
Exchangeable Pairs
β’ Goal: Given π(β
) and random vector π βΌ π· want to bound
moments of π(π), or prove concentration of π(π) about its mean
β’ Approach:
β Define pair of random vectors (π, π β² ) such that:
β’ (π, π β² ) has the same distribution as π β² , π
(exchangeability)
β’ marginal distributions are π·
(faithfulness)
β Find πΉ(π₯, π¦) anti-symmetric function (i.e. πΉ π₯, π¦ = βπΉ π¦, π₯ , β π₯, π¦) such
that πΈ[πΉ(π, πβ²)|π] = π(π) β πΈ[π(π)]
β’ Claims:
1
2
= β
πΈ π π β π πβ²
1.
πππ π π
2.
Let π£ π = β
πΈ
1
2
π π β π πβ²
β
πΉ π, π β²
β
πΉ π, π β²
If π£ π β€ πΆ a.s. then Pr π π β πΈ π π
π.
β₯ π‘ β€ 2 β
π βπ‘
2 /2πΆ
Silly Example
β’ π π = ππ=1 ππ , where π = π1 , β¦ , ππ βΌ π·1 × β― × π·π
β Suppose for all π: πΈ ππ = ππ and ππ β [ππ , ππ ] a.s.
β’ Goal: prove concentration of π π about its mean
β’ Defining exchangeable pair π, π β² as follows:
β sample π βΌ π·1 × β― × π·π
β pick π u.a.r.; sample ππβ² βΌ π·π ; and set ππβ² = ππ , βπ β π
β’ Choose anti-symmetric function: πΉ π, π β² = π β
π π β π π β²
β πΈ πΉ π, πβ π = π β
π π β
1
2
β’ Bounding π£ π = β
πΈ
1 1
2 π
β π£ π = β
β
ππΈ
β’ Pr π π β πΈ π π
π
π
π
ππ +
π π β π πβ²
ππ β ππβ²
2
β€
1
2
π
β
πΉ π, π β²
ππ β ππ
β₯ π‘ β€ 2 β
π βπ‘
2/
πβ π ππ
π
2
ππ βππ 2
=
π ππ
π?
β ππ
Interesting Example: Ising
β’ π βΌ ππ π₯ β exp π= π’,π£ βπΈ ππ π₯π’ π₯π£ ; ππ π₯ =
β’ How to define exchangeable pair?
π’,π£ ππ’π£ ππ’ ππ£
β Natural approach: sample π βΌ ππ
β Do one step of the Glauber dynamics from π to find πβ
β’ i.e. pick random node π£ and resample ππ£β² from marginal of ππ at π£
conditioning on the state of all other nodes being πβπ£
β’ Harder question: find anti-symmetric πΉ β
,β
s.t. πΈ[πΉ(π, πβ)|π] = ππ (π) β πΈ[ππ (π)]
β Approach that works for any π:
β’
β²
β’ πΉ π₯, π₯ β² = β
π‘=0 πΈ[π ππ‘ β π(ππ‘ β²)|π0 = π₯, π0 = π₯β²]
β’ where ππ‘ π‘β₯0 , ππ‘β² π‘β₯0 are two (potentially coupled) executions of the Glauber
dynamics starting from states π₯ and π₯β² respectively
Requires a good
Challenging question:
coupling
1
β bound πππ ππ π
= 2 β
πΈ ππ π β ππ π β² β
πΉ π, π β²
1
=2
β
π‘=0 πΈ
ππ π β ππ π β²
β
πΈ ππ ππ‘ β π ππ‘β² π0 , π0β²
Need to show function contracts as
Glauber dynamics unravels
Showing Contraction
Generous coupling: choose same node, but update independently
Lemma 1:
Different than βgreedy
couplingβ typically used
where the same node is
chosen and the update is
coordinated to maximize
probability of same update
Lemma 2:
The Menu
Motivation
Problem Formulation
Uniformity Testing, Goodness of Fit
Testing Properties of Distributions
Discussion/Road Ahead
Testing in High Dimensions
Future Directions
The Menu
Motivation
Problem Formulation
Uniformity Testing, Goodness of Fit
Testing Properties of Distributions
Discussion/Road Ahead
Testing in High Dimensions
Future Directions
Markov Chain Testing
Example:
n = 52 cards, β₯ 7 riffle shuffles needed*
π ππ Uni, 6 ×
> 0.6
[Diakonis,Bayerβ92]
π ππ Uni, 7 ×
β 0.33
*riffle shuffle = Gilbert-Shannon-Reeds (GSR) model for distribution on card permutations.
Empirical Fact:
vs.
different Markov chains! [Diakonisβ03]
[Ongoing work with Dikkala, Gravin]
β’ Question: how close is real shuffle to the GSR distribution?
β’ Given: sample access to
βΌ πΉriffle (π2 variables)
β’ Goal: distinguish
is GSR vs β1
, πΊππ
> π
Markov Chain Testing
Question: test π = πβ vs πππ π‘ π, πβ > π
Distance?
β’ Let π, π be the transition matrices of chains π, πβ
β’ Object of interest: wordπ£π π = π£ β‘ π£1 β v2 β β― β π£π βΌ π × π × β― × π
β’ Pertinent question: asymptotic π ππ wordπ£π P , wordπ£π Q
as π β β ?
spectral radius
β’ Easier to quantify 1 β ππ» wordπ£π P , wordπ£π Q
βπ
πππ β
πππ
β’ So proposed notion of distance: πππ π‘ π, πβ = 1 β π
πππ β
πππ
π
, for all π£
Results: testing symmetric π, πβ with π(ππ2 + cover time) samples
[Ongoing work with Dikkala, Gravin]
Testing Combinatorial Structure
Efficiently testing combinatorial structure
Is the phylogenic tree
assumption true?
Sapiens-Neanderthal
early interbreeding
[Slatkin et alβ13]
Is a graphical model a tree?
[ongoing work with Bresler, Acharya]
Testing from a Single Sample
β’ Given one social network, one brain, etc., how
can we test the validity of a certain generative
model?
β’ Get many samples from one sample?
INFORMATION
THEORY
MACHINE
LEARNING
STATISTICS
ALGORITHMS
Thank You!