Randomization

Download Report

Transcript Randomization

Randomization
Seminar on Communication Complexity
Carmella Kroitoru
What’s new?
Until now, Alice and Bob were all powerful, but
deterministic.
But what if Alice and Bob could act in a
randomized fashion?
What’s randomized fashion??
Well, just flipping a coin,
for instance..
What now??
Now players “toss coins”.
The coins determine the protocol.
The communication over an input (x, y) is not
fixed anymore, it’s a random variable.
And so is f(x,y).
So, How can we define the success of the
protocol?
The conservative way - Las Vegas protocols
Success = the protocol always gives
the right f(x,y).

The liberal option – Monte Carlo protocols
Success = a high probability that the protocol
will give the right f(x,y).

Where do we get all the coins?
Deterministic
Input
Alice gets x
Bob gets y
Additional
Information
Output
Dependence
Randomized
Alice gets x
Bob gets y
Alice has r(A)
Bob has r(B)
F(x,y) depends
only on x and y
F(x,y) depends on
x, y, r(A), r(B)
Denote – r(I) is a pseudo random string of arbitrary length
So how does the decision tree looks??
Before
Randomization
X1 = 1
Y1 = 1
Z(1,1)
X1 = 0
Y1 = 0
Z(1,0)
After
Randomization
Y1 = 1
Z(0,1)
Y1 = 0
Z(0,0)
But What if we get the ‘wrong’ r(I)?
Does it mean the f(x,y) will be wrong?
Yes!
For The same input (x,y), f(x,y) will differ,
depending on r(A) and r(B).
So how will we get the right answer?
Through the magic of probability.
Definitions:
P – a randomized protocol
 Zero error – for every (x,y)
Pr [P(x,y) = f(x,y)] = 1
 ε error – for every (x,y)
Pr [P(x,y) = f(x,y)] ≥ 1- ε
 One sided ε error - for every (x,y)
if f(x,y) = 0 then Pr [P(x,y) = 0] = 1
if f(x,y) = 1 then Pr [P(x,y) = 1] ≥ 1- ε
Now f(x,y) can vary depending on r(A) and r(B)
What about D(f)??
Is it constant for some (x,y) input?
No!
So how will we measure the running time?
Running time – first method

The worst case running time of a randomized
protocol p is

The worst case cost of p is
Running time – second method

The average case running time of a
randomized protocol p is

The average case cost of p is
Average?
r(A) and r(B) are chosen independently,
according to some probability distribution.
Should we consider the distribution of (x,y)?
No ‘average input’!
Now what’s all that ‘probability’ talk?
Didn’t I pass that course already?
Let’s do a quick review..
Binomial Distribution
A binomial experiment, also known as Bernoulli trial,
is a statistical experiment with the properties:
n Independent trials – yi’s.
 Success or failure (denoted 0/1, T/F)
 Probability of success, p, is the same on every trial.
 Denote: S – number of successes.
Example: coin toss. Success = ‘Heads’.

trial
y1
y2
y3
….. …..
yn
result
0
1
1
….. …..
0
S = ∑yi
Exercise:
Given a box with red and blue ball, suppose red
balls are at least δ fraction of total.
Probability that
draws don’t see any red
balls?
But what if 1/3 isn’t good enough? What if we want <α?
Probability that
any red balls?
draws don’t see
Expectation

Linearity




E( a ) = a
E( a*X + b) = a*E(X) + b
E( X+Y ) = E(X) + E(Y)
Non-multiplicativity
E( X*Y ) = E(X) * E(Y) only if X and Y independent
What is the probability that a random
variable deviates from its expectation?

Focus on sums/averages of n bounded
variables:

Note: can get similar bounds for other
bounded variables
Don’t know anything about the yi’s :
If yi is positive, but we don’t know anything else, then
we can use Markov’s inequality to bound S:
Some intuition - no more than 1/5th of the population can have more than 5
times the average income
Example:
– toss 100 coins, Pr (# of heads ≥ 60) ≤ 5/6 (E(Y) = 50, t = 6/5)
So without knowing if yi is bound, or if the yi ‘s are independent we got
an upper bound.
Although it’s not a great bound, and we don’t know the lower bound.
Bounding S=Σy when y ‘s are
independent
i
i
An exact result:
Using the Cumulative binomial probability.
Refers to the probability that the binomial random variable
falls within a specified range
Example: toss 100 coins, Pr (# of heads ≥ 60) = ?
Solution: To solve this problem, we compute 40 individual
probabilities, using the binomial formula. The sum of all
these probabilities is the answer we seek.
Denote: k = # of heads
Pr(k ≥ 60; n = 100, p = 0.5) = Pr(k = 60; 100, 0.5) +
Pr(k = 61; 100, 0.5) + . . . + Pr(k = 100; 100, 0.5)
Pr(k ≥ 60; n = 100, p = 0.5) = 0.028
More generally:
We can use Chernoff’s inequality to bound S:
Then for any δ > 0
Example: toss 100 coins independently
Pr(# of heads ≥ 60) ≤
(t = 1/5)
 Dramatically better bounds then Markov
 Worse then Cumulative Probability but much easier and
works for more cases.
 Bounds for all S (bigger or smaller then E)
Back to running time
3 types of errors 3 complexity measures (1)
Let f : x × y → {0,1}
Lets define complexity measures for a
randomized protocol p:
is the minimum average case cost of p,
that computes f with zero error.
3 types of errors 3 complexity measures (2)
For 0< ε <1,
is the minimum worst case
cost of p that computes f with error ε.
For 0< ε <1,
is the minimum worst case
cost of p that computes f with one sided error
ε.
Wait!
What’s the meaning of ‘average’ for f with zero
error?
And the meaning of ‘worst’ for f with error?
Worst case = Θ (Average case)
Reminder: DAVG(f) – the average running time
over all random vectors r(A), r(B), maxed
over all possible inputs x,y.
A protocol ‘A’ makes an error ε/2 and DAVG(A) = t
Define A’:
~ Execute A as long as at most 2t/ ε bits are exchanged
~ If A finished, return it’s answer
~ Else return 0
And #{bits exchenged} of A’ in the worst case is
DWORST(A’) = 2t/ε
So we found k= 2/ε s.t. DAVG(A) = k * DWORST(A’)
But is A and A’ have the same errors?

A has error of ε/2 .

What’s the error of A’?


A’ can return the answer A output. That has a chance of
ε/2 to be wrong.

Or A’ can output 0 because A wasn’t done after 2t/ ε
bits. What’s the chance of that? We’ll use Markov:
Pr[A exchanged more then 2t/ ε bits] =
Pr[ #{bits exchenged in A}> (2/ε)*t ] ≤ 1/(2/ε) = ε/2
So A’ has error of at most ε/2 + ε/2 = ε,
meaning 2*Aerror = A’error , both one sided.
What if ε = 0?
For zero error protocols, using the worst case
cost gives exactly the deterministic
communication complexity.
How come?
A deterministic protocol can just fix some r(A)
and r(B) and proceed.
So for zero error protocols, we only care about
the average case cost.
Exercise (1):
A: Calculate f(x, y) for some fixed x,y, with some protocol
with one sided error ε.
A’: Execute A t times.
Denote: ~ fi – result of execution #i
~ if for some i fi=1 then res(A’)= 1 , else res(A’)= 0.
Pr [res(A’) ≠ f(x, y)] = ?
Solution:
If res(A’) = 1, that’s the right answer, 100% (by definition).
If res(A’) = 0, it might be the right result, or we got the wrong
result for t rounds. What’s the chance of that?
Pr[ mistake in A] < ε. Pr[ mistake in A’] < ε^t.
Exercise (2):
A: Calculate f(x, y) for some fixed x,y, with some protocol
with error ε.
A’: Execute A t times. Denote: fi – result of execution #i
res(A’) = maj {fi}
What’s the possibility that A’ didn’t get a result we trust?
Solution(1):
Define a Bernoulli trial: yi = 1 if fi ≠ f(x, y).
To get res(A’) wrong we need to get more than half yi ‘s wrong.
E[∑ yi ] = E[S] = εt. So what’s the chance of S > t/2 ?
Hint: Use Chernoff for that.
Solution (2):
Let’s fix ε ≤ ¼ (can be generalized for ε < ½) and take δ = 1.
What if we want to bound the possibility of error with α
smaller than that?
Then we need to take t = 12*ln[1/ α]
So the error probability can be reduced if we’ll take bigger t,
meaning, enlarge the communication complexity by a small
penalty.
Field
A non-empty set F with two binary operation +
(addition) and * (multiplication) is called a field
if it has:






Closure of F under + and *
Associativity of + and *
Commutativity of + and *
Distributivity of * over +
+ and * identity (0, 1)
+ and * inverses
examples: rational numbers, GF[7] = {0, 1, 2, 3, 4, 5, 6}
Example: EQ
Denote: Alice’s input A = a0a1…an-1
Bob’s input B = b0b1…bn-1
Let’s think of these inputs as polynomials over
GF[p] where n² < p < 2n², p is prime.
That is A(x) = a0 + a1x + a2x² + … + an-1x^(n-1) mod p
and B(x) = b0 + b1x + b2x² + … + bn-1x^(n-1) mod p
Alice picks at random t in GF[p] and sends Bob both t and
A(t). Bob outputs 1 if A(t) = B(t) and 0 otherwise.
#{bits exchanged} = O(log p) = O(log n)
Correctness
Note that if A=B then A(t) = B(t) for all t, so f(A,B) = 1.
If A≠B then we have 2 distinct polynomials of degree n-1.
Such polynomials can be equal on at most n-1 (out of p)
elements of the field (since their difference is a non
zero polynomial of degree ≤ n-1, and has at most n-1
roots).
Hence the probability of error is at most (n-1)/p ≤ n/n² = 1/n
So we have shown that R(EQ) = O(log n), and in fact
and
In contrast to D(NE) = D(EQ) = n+1
Exercise
Prove that the following protocol for EQ achieves similar
performance:
Alice and Bob view their inputs A and B as n-bit integers
(between 1 and 2^n).
Alice chooses a random prime number p between the n first
primes. (If n = 3 then p is 2, 3 or 5)
She sends p and (A mod p) to Bob.
Bob outputs 1 if A mod p = B mod p, and 0 otherwise.
Solution
First note that if A = B, then of course
A mod p = B mod p.
If A ≠ B and Bob accepts, it means that A = B (mod p).
Define BADA,B = {p | p is in the first n primes and A = B (mod p)}
BAD 5,8 = {3}
Claim (without proof): | BADA,B | ≤ ½n.
Then the probability that B accepts is
Can repeat this for 100 times to get error of 2^(-100).
So we got
since max {first n primes} < n^3 and O(log n) = O(log n^3)
So how much better are bounds of
randomization protocols then deterministic
ones?
Lemma:
We will prove a more delicate statement:
Proof 1:
Let’s present a deterministic simulation of a given
randomized protocol p.
For each leaf l of the protocol p, Alice will send
Bob p(A, l) – the probability that given x, she will
respond in a way leading to the l.
Bob will compute p(B, l) - the probability that given
the y, he will respond in a way leading to the l.
Bob will then compute p(l) = p(A, l) * p(B, l) – the
probability to reach l.
Proof 2:
Bob and Alice will do this for every leaf, thus
calculating all
leafs.
Bob will check which of the values has
probability 1-ε, and that is the right f(x,y).
What’s the problem??
We need to send probability values, that means
real numbers.
But we can use precision of
bits.
Proof 3:
This guarantees deviation in values of at most
This implies that p(l), is at most
far
from the true p(l)
(since p(B, l) ≤ 1).
So the total error over all the leaves is at most ½-ε.
Therefore Bob only needs to check which of the
values (0 or 1) has probability of more then ½
and that is the correct f(x,y) value.
Q.E.D.