Probability --

Download Report

Transcript Probability --

Discrete Math
CS 2800
Prof. Bart Selman
[email protected]
Module
Probability --- Part c)
1) Linearity of Expectation
2) St. Petersburg (Gambling) Paradox
3) Randomized Algorithms
4) Variance
1
Linearity of Expectation
2
Linearity of Expectation
Thm.
E ( X1  X 2 ) 
 p(s)(X1 (s)  X 2 (s))
sS
E( X1  X 2  ...  X n )  E( X1 )  E( X 2 )  ...  E( X n )
Very useful result. Holds even when X_i’s are dependent!
Proof: (case n=2)
3
Proof: (case n=2)
E ( X1  X 2 ) 

 p(s)(X1 (s)  X 2 ( s))
sS
 p( s) X1 ( s)   p( s) X 2 ( s)
sS
sS
 E ( X1 )  E ( X 2 )
QED
Aside, alternative defn. of expectation:
E  X    xP  X  x 
x
(defn. of
expectation;
summing over
elementary
events in
sample space)
Example application
Consider n coin flips of biased coin (prob. p of heads),
what is the expected number of heads?
Let X_i be a 0/1 r.v such that X_i is 1 iff ith coin flip comes
up head. (X_i is called an “indicator variable”.)
So, we want to know:
E ( X1  X 2  ...  X n ) Linearity of
 E( X )  E( X )  ...  E( X ) Expectation!
1
What is E(X_i) ?
So,
2
n
E ( X i )  1. p  0.(1  p)  p
E( X1 )  E( X 2 )  ...  E( X n )  np 
QED
Holds even if coins are not flipped independently!
Consider: all coins “glued” together. Either all “heads” or all “tails”.
Still correct expectation! Why? (can be quite counter-intuitive)
Example
Consider n children of different heights placed in a line at random.
Starting from the beginning of the line, select the first child. Continue walking,
until you find a taller child or reach the end of the line.
When you encounter taller child, also select him/her, and continue to look for
next tallest child or reach end of line.
Question: What is the expected value of the number of children selected
from the line??
Hmm. Looks tricky…
What would you guess? Lineup of 100 kids... [e.g. 15 more or less?]
Lineup of 1,000 kids… [e.g. 25 more or less?]
Let X be the r.v. denoting the number of children selected
from the line.
X  X1  X 2  ...  X n
1
where
Xi 
{
1 if the tallest among the first i children.
(i.e. will be selected from the line)
0 otherwise
By linearity of expectation, we have:
E( X )  E( X1  X 2  ...  X n )  E( X1 )  E( X 2 )  ...  E( X n )
What is E(X_i)?
What is P(X_1 = 1)?
What is P(X_n = 1)?
What is P(X_i = 1)?
A: 1
A: 1/n
A: 1/i
Note that the one tallest
person among i persons
needs to be at the end.
(by symmetry: equally
likely in any position)
Now, E(X_i) = 0 * P(X_i = 0) + 1 * P(X_i = 1)
= 0 * (1 – 1/i) + 1 * (1/i)
Consider doubling
= 1/i.
queue:
So, E(X) = 1 + 1/2 + 1/3 + 1/4 … + 1/nWhat’s probablility
1
tallest kid in first
small!!
 ln(n)    , wheree   0.57722...
Euler ' s con.ts
half?
n
e.g. N = 100
E(X) ~ 5
N = 200
E(X) ~ 6
N = 1000
E(X) ~ 7
N = 1,000,000
E(X) ~ 14
N = 1,000,000,000 E(X) ~ 21
A: ½
(in which case
2nd half doesn’t
add
anything!)
Indicator Random Variable
Recall: Linearity of Expectation
E ( X 1  X 2  ....  X n )  E ( X 1 )  E ( X 2 )  ....  E ( X n )
Y An indicator random variable is:
– 0/1 binary random variable.
– 1 corresponds to an event E, and 0 corresponds to the event did not occur
– Then E(Y) = 1*P(E) + 0*(1-P(E)) = P(E)
Expected number of times event event occurs:
– E(X1+X2+…) = E(X1)+E(X2)+…. = P(E1)+P(E2)+….
9
n
n
i1
i1
E[ X i ] = E[X i ]
Suppose everyone (n) puts their cell phone in a pile in the middle of the room,

 is the expected number of students who
and I return them randomly.
What
receive their own phone back? Guess??
Define for i = 1, … n, a random variable:
1 if student i gets the right phone,
X i  
0 otherwise .
Need to calculate: E[
k
Pr(Xi=k)
0
X ]
i
i1
1
1-(1/n) 1/n
n
E[Xi] = Pr(Xi = 1)
Why? Symmetry! All phones equally likely.

10
n
n
i1
i1
E[ X i ] = E[X i ]
So,

So, we expect just one
student to get his or her
own cell phone back…

E[X] = E[X1 + X2 + … + Xn]
Independent of n!!
= E[X1] + E[X2] + … + E[Xn]
= 1/n + 1/n + … + 1/n
=1
n
n
i1
i1
E[ X i ] = E[X i ]
Suppose there are N couples at a party, and suppose m
(random) people get “sleepy and leave”… What is the
expected number of
(“complete”)
 couples left?
Define for i = 1, … N, a random variable:
1 if couple i remains,
X i  
0 otherwise .
Define r.v. X = X1 + X2 + … + Xn, and we want E[X].
 = E[X1 + X2 + … + Xn]
E[X]
= E[X1] + E[X2] + … + E[Xn]
So, what do we know about Xi?
n
n
i1
i1
E[ X i ] = E[X i ]
Suppose there are N couples at a party, and suppose m people
get sleepy and leave. What is the expected number of
couples left?


Define for i = 1, … N, a random variable:
1 if couple i remains,
X i  
0 otherwise .
E[Xi] = Pr(Xi = 1)

2N - 2


 m 
=
2N
 
 m 
How?
(# of ways of choosing m from everyone else) /(# of
ways of choosing m from all)
E[X1] + E[X2] + … + E[Xn]
= n x E[X1] = (2N-m)(2N-m-1)/2(2N-1)
Probability Paradox II
14
The St. Petersburg (Gambling) Paradox
Consider the following game:
– A coin is flipped until a head appears.
– The payoff is based on the number of tails observed (n) before the
first head.
– The payoff is calculated as $2n
Note: This grows pretty fast!
How much are you willing to pay to play this game?
To answer this: you want to know the expected payoff.
[General idea: If expected payoff is e.g. $100, accept to play
the game by paying e.g. $95. Expect to win $5 per game on average
over time!]
15
Aside: Fair Bets / Games
A fair bet is a lottery in which the expected payoff is equal to the cost of
playing.
Most people will not take a fair bet unless the dollar amount involved is
reasonable (you could lose!).
16
St. Petersburg Paradox
Number of Tails Before
First
Head
Probability
Payoff
Probability
x Payoff
0
(1/2)1 = 1/2
20 = $1
$0.50
1
(1/2)2 = 1/4
$2
$0.50
2
(1/2)3 = 1/8
$4
$0.50
3
(1/2)4 = 1/16
$8
$0.50
4
(1/2)5 = 1/32
$16
$0.50
n
(1/2)n + 1
$2n
$0.50
Total
n
1.00

Expected payoff
infinity... Pretty
good! Hmm. Really??
St. Petersburg Paradox (cont’d)
The expected payoff is infinite!!
Idea: “utility” of money
goes with SQRT.
But, how much would you be willing to play the game?
– Most people would only pay a couple of dollars
– Note that the marginal (extra) utility for each additional is just
$0.50
What would be a reasonable entrance fee in ”real life”? Bernoulli
proposed to replace the expectation E[G] of the profit G = 2n with the
expectation (E[√G])2, where u(x) = √x is called a utility function. This
would lead to a fair entrance:
The utility of money:
“Your first million is way
more special than your
second million.” 
It is not so clear if that is the right way out of the paradox because for any
proposed utility function u(x), one can modify the Casino rule so that the
paradox reappears: pay (2k)2 if the utility function u(x) = √x or
pay e2^k dollars, if the utility function is u(x) = log(x).
Discussions about what is the “right utility function” are a key part of
economics and social sciences.
But, paradox continues. Why do we not go with the
expected value?
St. Petersburg Paradox: Another way out
Number of Tails Before
First
Head
Probability
Payoff
Probability
x Payoff
0
(1/2)1 = 1/2
$1
$0.50
1
(1/2)2 = 1/4
$2
$0.50
2
(1/2)3 = 1/8
$4
$0.50
3
(1/2)4 = 1/16
$8
$0.50
4
(1/2)5 = 1/32
$16
$0.50
n
(1/2)n + 1
$2n
$0.50
Total
n

1.00
Expected payoff.
What is expectation when max payout is $1M?
I.e., after a run of 20 or more Tails, you get “just” $1M.
So, to get to infinity, by far the most contributions come
from extremely rare, extremely large payout runs.

$10 hmm…
St. Petersburg Paradox: Perspective
Expectation relies on being able to do many repeated runs.
But if even in only a few runs, with a large entry fee, you have
a substantial probability of “losing it all,” unless you get
long runs right away.
The problem of “Gambler’s ruin.”
Most folks don’t want to take that chance…
The paradox leads to interesting simulations.
Average payout per k games played so far (k <= 4,000).
Note sudden jumps. Where do they come?
Similar to e.g. stock market 
Average payout
should have stopped here…
 $2.50
per game
Assume $10
entrance fee
per game.
becomes hard to recover
What kind of payout is needed?
 $10K requires 11 tails in a row!
After 4000 games you would have lost around
$10K. Still, eventually player would win, but
that may take a very long time!!
Games played
Paradox is related to “Power Law Distributions”
Consider positive random variable X >= 1. Let the probability
of X > x be given by:
1
P( X  x )  k
x
where k is a constant > 1.
k
E[ X ] 
k 1
Prob. density function
23
Unexpected expectations…
The conditional probability that X exceeds a value x, given
that X > x_0, is given by:
P( X  x | X  x0 )  P( X x  x) 
0
x0
xk
The expectation of this conditional variable is given by:
x0 k
E[ X x0 ] 
k 1
OK. Looks rather “unremarkable” but beware…
24
We commissioned a project with expected delivery time of 3 days.
Unfortunately, the work is power-law distributed.
We have k = 1.5, because E[X] = k / (k-1) = 3.
So, when project starts, we expect it to be finished in 3 days.
It’s been 5 days. How much longer can we expect to wait?
x0 k
E[ X x0 ] 
Hmm. We have
k 1
x_0 = 5, and k = 1.5.
and
E[ X x0 ]  3x0
So, expected remaining wait time is now another 15 – 5 = 10 days!!
If not done in another 60 days, we’ll need to wait another 120 days, in expectation!
Not so atypical! Actually, models real-world waiting times for project
completions, without “drop-dead deadlines”.
Underlying intuition: “rare long delays” are actually not so rare!
Or “the longer you have waited for something, the longer you may yet
have to wait”
Summary: Beware of expectations and “heavy tails”!
Randomized Algorithms
26
Example: MAX 3-SAT
Consider a propositional logical formula on N Boolean variables in conjunctive
normal form (CNF), i.e., a conjunction (logical AND) of disjunctions (logical
OR).
Example:
( x1  x2  x3 )  ( x2  x3 )  (x3 )
The truth assignment with x1 and x2 assigned to True and x3 assigned to False
satisfies this formula. Each disjunction is also referred to as a “clause”.
If each clause, contains exactly k variables, we refer to the formula as a k-CNF
formula.
MAX 3-SAT cont.
Problem: MAX-3-SAT
Given a 3-CNF formula F, find a truth assignment that
satisfies as many clauses as possible.
The MAX 3-SAT problem is a so-called NP-hard problem; it is
generally believed that no efficient (i.e., polynomial time)
algorithm exists for solving such problems.
Leonid Levin
Note: search space of 2N truth assignments.
Stephen Cook
[The $1M Clay Millennium prize, click on P=/=NP]
28
So, finding a maximally satisfying assignment is (most likely)
computationally very hard.
However, it’s surprisingly easy to find a reasonable good
assignment, satisfying 7/8th (87.5%) of the clauses in
expectation. How??
Thm. Given a 3-CNF formula with k clauses, the expected number
of clauses satisfied by a random assignment is 7 k .
8
Proof. (by linearity of expectation)
A random assignment is obtained by setting each variable x1,…, xn
independently to True or False with probability ½ each.
Let Z denote the r.v. equal to the number of satisfied clauses. Z can
be written as a sum of random indicator variables Zi , one for
each clause.
I.e.
Z Z Z  Z
1
2
k
with Zi = 1 if the ith clause is satisfied, and 0 otherwise.
29
Now, we have by linearity of expectation
E[ Z ]  E[ Z1 ]  E[ Z 2 ] 
 E[ Z k ]
(Remember this holds no matter how the random variables Zi
are correlated!)
What is
E[ Z i ]?
The probability that a clause is not satisfied is (1/2)3 = 1/8.
So, the probability that a clause is satisfied by the random
assignment is 1 – 1/8 = 7/8. So,
E[ Zi ]  0 1/ 8  1 7 / 8  7 / 8
And, therefore:
E[ Z ]  7 k
8
QED
30
So, we can actually find a pretty good assignment, in expectation,
very easily, even though it’s believed intractable to find the
maximally satisfying assignment. We can obtain yet another
surprise from our analysis.
E[ X ] 
 p(s) X (s)
sS
Note that a random variable has to assume a value at least
as large as its expectation at some point in the sample space. This
observation immediately leads us to the following result.
Thm. Given a 3-CNF formula, there must exist a truth assignment
that satisfies at least a 7/8th fraction of the clauses.
So, from the analysis of a random event (a randomly sampled
truth assignment), we have now obtained a statement that does
not involve any randomness or probability! (pause…)
31
The technique we used to prove this result is more generally
referred to as the “probabilistic method”, which can be used to show
the existence of certain combinatorial objects (in this case, a truth
assignment satisfying 7/8th of the clauses) by showing that a random
construction produces the desired object with non-zero probability.
The probabilistic method (link) is a non-constructive proof
technique. The method was pioneered by the famous
mathematician Paul Erdos (link).
End note: We showed that a randomly generated assignment satisfies
7/8th of the clauses, in expectation. Hmm… How often do we have to
guess to be sure to have an assignment satisfying 7/8th of the clauses?
It can be shown that the expected number of guesses grows only
polynomially in N, the number of Boolean variables.
32
Variance
33
Variance
The variance of a random variable is its expected squared deviation
from its mean.
– A measure of “dispersion” for random variables
– Is calculated in the natural way: Each value a random variable
could take is subtracted from its expectation, squared, and
weighted by its probability of occurrence
34
Variance
– Each value a random variable could take is subtracted from its
expectation, squared, and weighted by its probability of occurrence
Var  X     x  E  X   P  X  x 
2
x
35
Variance
– Example, die throw
x
1
2
3
4
5
6
Tot
E[X]=3.5
x-E(X)
-2.5
-1.5
-0.5
0.5
1.5
2.5
(x-E(X))2
6.25
2.25
0.25
0.25
2.25
6.25
P(X=x) Product
1/6
1.042
1/6
0.375
1/6
0.042
1/6
0.042
1/6
0.375
1/6
1.042
2.918
36
Variance
What is difference between the following distributions?
k
0
1
2
k
0
1
2
P(k)
.1
.8
.1
P(k)
.4
.2
.4
Need a measure of spread.
Use the average squared difference from the mean.
E[x]=1
Variance
Definition: Let X be a r.v. on a sample space S. The variance
of X, denoted by Var(X), is:
Var(X)   (X(s)  E[X])2Pr(X  s)
sS
Theorem: If X is a r.v. on a sample space S, then

Var(X)  E[X ] E[X]
2
2
By algebra shown in your book, page 436.

Var(X)  E[X ] E[X]
2
2
Variance
What is difference between the following distributions?

E[x]=1
k
0
1
2
k
0
1
2
P(k)
.1
.8
.1
P(k)
.4
.2
.4
E[X2] = (0)(.1)+1(.8)+4(.1) = 1.2
Var(X) = 1.2 - 1 = 0.2
E[x]=1
E[X2] = (0)(.4)+1(.2)+4(.4) = 1.8
Var(X) = 1.8 - 1 = 0.8
Var(X)  E[X2 ] E[X]2
Independent Random Variables:
Definition
Random variables X and Y are independent
if occurrence of one does not affect other’s
probability
P(X=x,Y=y) = P(X=x)*P(Y=y)
P(X,Y) = P(X)*P(Y)
otherwise X and Y dependent
40
Independent Random Variables
Random variables X and Y are independent if
occurrence of one does not affect the probability of
occurrence of other
iff
P(X=x|Y=y) = P(X=x) for all x and y
i.e. P(X|Y) = P(X) for all values X and Y
otherwise dependent
41
Independent Random Variables
alt. defn.
Random variables X and Y are independent if
occurrence of one does not affect the probability of
occurrence of other
iff
P(Xx,Yy) = P(Xx)*P(Yy)
i.e. F(X,Y) = F(X)*F(Y)
otherwise X and Y dependent
42
Independent Random variables
Theorem: If X and Y are independent random variables on a
space S, then E[XY]=E[X]E[y]
E[ XY ]   X ( s)Y ( s) p( s)
sS


r1r2 p ( X ( s )  r1andY ( s )  r2 )

r1r2 p ( X ( s )  r1 ). p (Y ( s )  r2 )
r1 X ( S ), r2 Y ( S )

r1 X ( S ), r2 Y ( S )
[

r1 X ( S )
r1 p( X ( s )  r1 )][
 E[ X ]E[Y ]

r2 Y ( S )
r2 p( X ( s )  r2 )]
43
Independent Random variables
Theorem: If X and Y are independent random variables on a space S,
then V[X+Y]=V[X]+V[y]. Furthermore if Xi, i=1,2,3,…, n a
positive integer, are pairwise independent random variables on S,
then V[X1+X2+…+Xn]=V[X1]+V[X2]+…+V[Xn]
V [ X  Y ]  E[( X  Y ) 2 ]  E[ X  Y ]2
 E[ X 2  2 XY  Y 2 ]  ( E[ X ]  E[Y ]) 2
 E[ X 2 ]  2 E[ XY ]  E[Y 2 ]  E[ X 2 ]  2 E[ X ]E[Y ]  E[Y 2 ]
 ( E[ X 2 ]  E[ X ]2 )  ( E[Y 2 ]  E[Y ]2 )
 V [ X ]  V [Y ]
44
Variance
– Standard deviation: as in the case of describing
data, it is easier to work in standard deviations
X  
2
X
 V X 
48
Functions of a random variable
It is possible to calculate expectations and variances of functions of
random variables
E g  X    g x P X  x 
x
V g  X    g x   E g x  P X  x 
2
x
49