Here are the slides.

Download Report

Transcript Here are the slides.

Influence and Noise
Gil Kalai
Hebrew University of Jerusalem,
Yale University
Microsoft R&D, Israel
ICS2011, Beijing January 2011
1
Part I: Influence
2
Cause
When does event A cause event B?
Example (Kira Radinsky’s automatic system for
making deductions based on internet
searches):
Earthquake causes Tsunami
Cabbage grew causes Linux release
3
Causality
When does event A cause event B?
Central problem in philosophy, law,
economics, physics, statistics, CS…
(Examples are often somber)
Two people try to assassin a third person who
plan a trip to the desert. One puts poison in
his jar, the other empties it.
A person throws a baby from a tall building,
another is waiting with a sharp sword.
4
Influence
The word “influence” (dating back, according to
Merriam-Webster dictionary, to the 14th
century) is close to the word “fluid”. The original
definition of influence is: “an ethereal fluid held
to flow from the stars and to affect the actions
of humans.” The modern meaning (according to
Wictionary) is: ”The power to affect, control or
manipulate something or someone.”
5
Influence
What is the influence an event
A has on another event B?
Can be regarded as an approach
to causality and also as a
generalization.
6
Influence
(Of a variable on a
function of many variables) The
“amount” that changing the
value of a variable will change
the value of the function.
7
Boolean Functions
We consider a BOOLEAN FUNCTION
f :{-1,1}n  {-1,1}
f(x1 ,x2,...,xn)
It is convenient to regard {-1,1}n as a
probability space with the uniform
probability distribution.
8
Influence
We consider a BOOLEAN FUNCTION
f :{-1,1}n  {-1,1}
The influence of the kth variable xk on f,
denoted by Ik(f) is the probability that
flipping the value of the kth variable will
flip the value of f.
I(f) is the sum of all individual influences.
9
Examples
1) Dictatorship f(x1 ,x2,...,xn) =x1
Ik(f) = 0 for k>1 I1(f)=1
2) Majority f(x1 ,x2,...,xn) =1
iff
x1 + x2+...+xn > 0
Ik(f) behaves like n-1/2 for every k.
10
Critical Percolation
11
Examples (cont.)
3) The crossing event for percolations
For percolation, every hexagon corresponds to a
variable. xi =-1 if the hexagon is white and xi =1 if
it is grey. f=1 if there is a left to right grey
crossing.
Ik(f) behaves like n-3/8 for every k but few.
12
Examples (cont.)
4) Recursive majority of threes
Ik(f) behaves like n-log3 for every k.
5) Ben-Or Linial TRIBE example
Divide the variables to tribes of size lognloglogn+loglog2
f=1 iff for some tribe all variables equal to 1
Ik(f) behaves like Klogn/n for some constant K.13
KKL theorem: There always
exists an influential variable
Theorem (Kahn, Kalai, Linial 1988)
Let f be a Boolean function and suppose that
Prob(f=1)=s
Then there is a variable k such that
Ik(f) > C s (1-s) log n / n
This result was a conjecture by Ben-Or and Linial
14
KKL theorem
15
Fourier
Given a Boolean function
f :{-1,1}n  {-1,1},
the Fourier expansion of f is
simply writing f(x) as a sum of
multilinear (square free)
monomials.
16
Fourier Spectrum
For every set S of variables we have
the associated Fourier coefficient.
The sum of squares of Fourier
coefficients is 1.
This defines a probability
distribution called the “Fourier
spectrum” (or “Fourier distribution”).
The probability that k belongs to S,
when S is distributed according to
the Fourier spectrum is the influence
of variable k on f.
17
Fourier
The study of Boolean functions
based on their Fourier expansion
is fruitful.
It can be regarded as a very
special case of spectral methods
in graph theory.
18
Hypercontractivity
A very useful technical tool:
The ratio between p-norms of low
degree polynomials is bounded.
(Khintchine , Nelson, Bonami,
Gross, Beckner… )
19
A glance at further advances and open
problems
Extensions to general Bernoulli
product spaces.
Notions of influence for continuous
product spaces; larger alphabets;
graph products.
Symmetry and influence
Understanding influence of sets of
variables;
Power-laws for influence
20
Part II
Choice, power, rationality
and manipulation
21
Individual and collective
rationality and judgments
Boolean functions can model how individual
preferences between two alternatives aggregate.
We can consider aggregation of individual
preferences between m alternatives.
(Social welfare functions.)
We can consider aggregation of judgments on r
different binary questions, when there are certain
consistency requirements. (Judgment aggregation.)
22
Elections: measures of power
Influence= Banzhaf power index
Shapley-Shubik power index:
Integral over p of the influence of the kth
player with respect to the Bernoulli probability
with parameter p.
(Sum to 1.)
23
Voting Paradoxes
Condorcet’s paradox:
Arrow’s paradox:
“Doctrinal paradox”:
24
Manipulation
A social choice function is a function from the
profile of individual order relations to the set of
alternatives.
Manipulation: reporting an incorrect preference
relation will improve the outcome.
25
A measure for manipulation
The Gibbard-Satterthwaite theorem asserts for
a non dictatorial choice function with at least 3
possible outcomes there are preferences that
leads to manipulation.
The manipulation power (Friedgut, Kalai and
Nisan) of an individual k for a social choice
function f, denoted by Mk(f) is the probability
that x’k is a profitable manipulation for voter k
when the profile of preferences x1 x2 ,…, xn and
x’k are chosen uniformly at random.
26
Conclusion of the
Algorithmic GT/Econ Part
The notions discussed in this lecture (measures for
influence, power, manipulation, noise sensitivity…)
may be of interest to other GT/econ models. For
example, the model of exchange economy.
Off-topic comment: Why is it rational and important to give
incentives to difficult technical works.
Added in proof: There is a work supporting this thought
27
by Kleinberg and Oren.
Part III
Randomness
28
Collective coin flipping
We need to create a random bit using
a protocol based on random bits
contributed by n processors. Some of
the processors are malicious.
A simple suggestion: Choice based on a
Boolean function where each processor
contributes a single bit.
(An often asked question: why not
choose among these bits at random?) 29
Randomness as a
computational resource
“So, yes, I know that the theory folks consider
derandomization an open problem, but from my
perspective, it is a solved problem for all practical
purposes.” AnonCSProf on Shtetl-Optimized
30
What is Randomness?
What is randomness and what is probability
are fundamental question in many areas.
Computational complexity offers a deep
understanding to randomness. Its asymptotic
nature makes it of little use in statistics.
Q: What allows much simpler answers in
statistics?
A: The interactive proof system known as
``statistical hypothesis testing''.
31
What is the source of
randomness?
Is human uncertainty the only source of randomness?
What is the explanation of the apparent randomness of
high-level phenomena in nature? For example the
distribution of females vs. males in a population (I am
referring to randomness in terms of the unpredictability
and not in the sense of it necessarily having to be
evenly distributed). 1. Is it accepted that these
phenomena are not really random, meaning that given
enough information one could predict it? If so isn't that
the case for all random phenomena? 2. If there is true
randomness and the outcome cannot be predicted what is the origin of that randomness? (is it a result of
the randomness in the micro world - quantum
phenomena etc...)
32
Sharp threshold phenomena:
Determinism from
randomness
Law of large numbers: Large stochastic
systems behave deterministically.
Sharp threshold phenomena: Choose
the value of the variables to be 1 with
probability p, independently. The value
of f will rapidly change from 0 to 1.
33
Sharp threshold phenomena:
Influence is a sort of derivative. Large total
influence corresponds to small threshold
interval.
Theorem (Friedgut Kalai, 96): Symmetry
implies sharp threshold
Theorem (Kalai, 04): Sharp threshold is
equivalent to diminishing maximum ShapleyShubik power index.
The economics term: complete aggregation of
information.
34
Influence without
independence
(Haggstrom, Kalai, Mossel; Graham, Grimmett.)
Influence version A: The probability
that changing the value of a variable
will change the value of the function.
Influence version B: The
(normalized) correlation between the
value of the variable and the value of
the function.
35
Part IV
Noise
36
Noise Sensitivity
We consider a BOOLEAN FUNCTION
f :{-1,1}n  {-1,1}
f(x1 ,x2,...,xn)
Given x1 ,x2,...,xn we define y1 ,y2,...,yn as follows:
xi = yi with probability 1-t
xi = -yi with probability t
37
Noise Sensitivity
Let C(f;t) be the correlation between
f(x1 , x2,...,xn) and f(y1,y2,...,yn)
A sequence of Boolean function (fn ) is
noise-sensitive if for every t>0,
C(fn,t) tends to zero with n.
38
Noise Stability
Lets fix a small s=0.0001(say).
A Boolean function is noise stable to
noise level t if the probability that
f(x1 , x2,...,xn) is different from
f(y1,y2,...,yn) is smaller than s.
39
Noise sensitivity, and nonclassical stochastic
processes; black noise
Closely related notions to “noise sensitivity” were
studied by Tsirelson and Vershik . In their
terminology “noise sensitivity” translates to “non
Fock processes”, “black noise”, and “non-classical
stochastic processes”. Their motivation is closer to
mathematical quantum physics.
40
BKS theorem
Theorem (Benjamini, Kalai, Schramm 1999)
Monotone balanced Boolean functions are noise
sensitive unless they have substantial
correlation with some weighted majority
functions.
41
Percolation is Noise
sensitive
Corollary [BKS, 1999]: The crossing event
for critical planar percolation model is
noise- sensitive
Theorem (Schramm and Smirnov, 2010):
Percolation is a 2-dimensional black noise.
42
Percolation is Noise sensitive
Imagine two separate pictures of n by n
hexagonal models for percolation. A hexagon
is grey with probability ½.
If the grey and white hexagons are
independent in the two pictures the
probability for crossing in both is ¼.
If for each hexagon the correlation between
its colors in the two pictures is 0.99, still the
probability for crossing in both pictures is
very close to ¼ as n grows! If you put one
drawing on top of the other you will hardly43
notice a difference!
Other cases of noise sensitivity
First Passage Percolation (Benjamini, Kalai, Schramm)
The recursive Majority on threes example
by Ben-Or and Linial (BKS)
Eigenvalues of random Gaussian matrices (Essentially
follows from the work of Tracy-Widom) Here, we leave
the Boolean setting.
Examples related to random walks (required replacing the
discrete cube by trees) and more...
44
Majority is noise stable
Sheppard Theorem: (1899):
Suppose that there is a probability t
for a mistake in counting each vote.
The probability that the outcome of
the election are reversed is:
arccos(1-t)/π +o(1)
When t is small this behaves like t1/2
45
Majority is noise stable
(cont.)
Weighted majority functions are also
noise stable (BKS, Peres)
Is there a more stable voting rule?
Sure! dictatorship
46
Majority is stablest!
Theorem: Mossel, O’Donnell and
Oleszkiewicz : (2005)
Let (fn ) be a sequence of Boolean functions
with diminishing maximum influence. I.e.,
limn->∞ maxk Ik(fn) -> 0
Then the probability that the outcome of the
election are reversed when for every vote
there is a probability t it is flipped is at
least
(1-o(1)) arccos(1-t)/π
47
1.
Majority is stablest:
Two applications
The probabilities of cyclic outcomes for
voting rules with diminishing influences
are minimized for the majority voting
rule.
2. Improving the Goemans-Williamson
0.878567 approximation algorithm is
hard, unique-game-hard.
48
Is the universe noise
sensitive?
Are the basic models of high energy
physics noise stable?
If this is indeed the case, does it
reflect some law of physics?
Otherwise, will noise sensitivity allow
additional modeling power?
49
Part V: Computational
complexity
Are the notions discussed here
related to computational complexity?
(We already mentioned relation to PCP; there
are some interesting connection to
randomized decision trees.)
50
Diversion: CC and modeling
Computational complexity and
Influence
(… Ajtai, Furst, Saxe, Sipser, Yao, Hastad, Boppana,
Linial,Mansour,Nisan…)
The total influence of Boolean functions
that can be described by depth D size M
Boolean circuits is at most
log MD-1
52
Computational complexity and
Noise stability
Conjecture 1: Let f be a monotone Boolean
function described by monotone threshold
circuits of size M and depth D. Then f
is stable to (1/t)-noise where
t=log M100D
Conjecture 2: For some η >0, every
balanced monotone functions in TC0 have
correlation at least η with a function in
monotone TC0.
53
Complexity of sampling the
Fourier spectrum
Suppose that f is a Boolean
function in P. Can we approximately
sample according to its Fourier
spectrum?
This is unknown and it might be
hard.
But... It is in BQP. (Namely, it is
known to be easy for quantum
computers. )
54
Computation with noise
Fault tolerant (quantum) computation.
Are quantum computers possible?
(This is a main research interest for me in recent years.)
The hope regarding FTQC:
No matter what the quantum computer computes
or simulates, nearly all of the noise will be a
mixture of states that are not codewords in the
error correcting code, but which are correctable
to states in the code.
The concern:
The process for creating a quantum error
correcting code will necessarily lead to a
mixture of the desired codeword with undesired
codewords.
55
Polymath3
A recent endeavor: you are most welcome to join
Details can be found on my blog
http://gilkalai.wordpress.com/2010/02/10/noisestability-and-threshold-circuits/
1) The AC0 analog
2) Positivity Vs Monotonicity
3) Natural Proof Obstructions
56
谢谢
Thank you!
谢谢
!‫תודה רבה‬
It is great to be in Beijing!
57