Transcript Document

Outline
• Historical note about Bayes’ rule
• Bayesian updating for probability density
functions
– Salary offer estimate
• Coin trials example
• Reading material:
– Gelman, Andrew, et al. Bayesian data analysis. CRC
press, 2003, Chapter 1.
• Slides based in part on lecture by Prof. Joo-Ho
Choi of Korea Aerospace University
Historical Note
• Birth of Bayesian
– Rev. Thomas Bayes proposed Bayes’ theory (1763):
q of Binomial dist. is estimated using observed data.
Laplace discovered, put his name (1812), generalized to
many prob’s.
– For more 100 years, Bayesian “degree of belief” was rejected
as vague and subjective. Objective “frequency” was accepted
in statistics.
– Jeffreys (1939) rediscovered, made modern theory (1961).
Until 80s, still limited due to requirement for computation.
• Flourishing of Bayesian
– From 1990, rapid advance of HW & SW, made it practical.
– Bayesian technique applied to areas of science (economics,
medical) & engineering.
, 1999
Bayesian Probability
• What is Bayesian probability ?
– Classical: relative frequency of an event, given many
repeated trials (e.g., probability of throwing 10 with
pair of dice)
– Bayesian: degree of belief that it is true based on
evidence at hand
• Saturn mass estimation
– Classical: mass is fixed but unknown.
– Bayesian: mass described probabilistically based on
observations (e.g, uniformly in interval (a,b).
Bayes rule for pdf’s
• θ is a probability density to estimate based on data y.
• Conditional probability density functions
p(q , y )  p q  p( y | q )  p( y ) p(q | y)
• Leading to Bayes’ rule:
• Often written as
p(q | y)  p q  p( y | q ) / p( y )
p(q | y)  p q  p( y | q )  p q  L( y | q )
• L used because p(y| θ ) is called the likelihood function.
• Instead of dividing by p(y) can divide by area under
curve.
p(q | y)  p q  L( y | q ) /  p q  L( y | q )dq
Bayesian updating
– The process schematically
Prior
distribution
Updated prior PDF
p q | y   L  y | q p q
Likelihood
function
q
Posterior
distribution
5
Posterior
4
Observed
data
3
Prior
2
1
0
12
0.1
Observed data added
y
0.5
0.7
0.9
q post  ky  1  k  q prior
8
6
4
2
0
0.3
x
10
4
6
8
10
12
14
16
Salary estimate example
• You are considering an engineering position for which salary offers θ
(in thousand dollars)have recently followed the triangular distribution
p(q )  0.1  0.01 100  q
90  q  110
• Your friend received a $93K offer for a similar position, and you know
that their range of offers for such positions is no more than $5K.
• Before your friend’s data, what was your chance of an offer <$93K?
• Estimate the distribution of the expected offer and the likeliest value.
If you are offered a salary q , the likelihood of your friend
getting an offer of y is
y -q  5
p q | 93  L(93 | q ) p q   0.1  0.1  0.01 100  q 
90  q  98
Right hand side is 0.008 at q  98, so area is 0.008  8 / 2  0.032
To make area equal to 1:
p q | 93  3.125 0.1  0.01 100  q  
0.25
0.2
0.15
p
L( y | q )  0.1
0.1
0.05
0
90
92
94
96
theta
98
100
Self evaluation question
• What value of salary offer to your friend
would leave you with the least uncertainty
about your own expected offer?
Coin Trials Example
• Problem
– For a weighted (uneven) coin, probability of heads
is to be determined based on the experiments. This is the
parameter q to
– Assume the true θ is 0.78, obtained after ∞ trials.be estimated.
But we don’t know this. Only infer based on experiments.
• Bayesian parameter estimation
Prior knowledge on p0(q )
1.
2.
3.
No prior information
Normal dist centered at 0.5 with s=0.05
Uniform distribution [0.5, 0.7]
Posterior distribution of q
p q | x   p  x | q  p0 q 
Experiment data:
x times out of n trials.
• 4 out of 5 trials
• 78 out of 100 trials
Likelihood by Binomial dist.
Count of successes
Outcome
Count of failures
p  x | q   n C xq x 1  q 
Given
parameter
n x
Failure probability
Success probability
Probability of heads posterior
distributions
Prior
10
9
Red: prior
Wide: 4 out
of 5
Narrow; 78
out of 100
8
7
6
1. No prior (uniform)
5
4
3
figure
2
1
0
2. N(0.5,0.05), poor
prior slows
convergence.
3. U(0.5,0.7) cannot
exceed barrier due
to incorrect prior
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
10
9
8
7
6
5
4
3
2
1
0
0
10
9
8
7
6
5
4
3
2
1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Probability of 5 consecutive heads
• Prediction using posterior (no prior case)
• Exact value is binom(5,5,0.78) = 0.785 = 0.289
Posterior distribution of q
p q | x   p  x | q  p0 q 
Draw random samples
of q from PDF
Compute p based on each q
binom(5,5,q)
nomial dist.
unt of successes
1q 
Count of failures
10
18000
9
16000
8
n x
14000
12000
6
p q | x 
5
4
3
12000
10000
8000
6000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Posterior PDF of q
Estimation process
0
95% CI
0.282
0.172
0.416
4000
2000
0
5% CI
6000
4000
1
median
10000
8000
2
0
16000
14000
7
Failure probability
ess probability
18000
2000
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
10,000 samples of q
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
10,000 samples of predicted p
Posterior prediction process
practice problems
1. For the salary estimate problem,
what is the probability of getting a
better offer than your friend?
2. For the salary problem, calculate the
95% confidence bounds on your
salary around the mean and median
of your expected salary distribution.
3. Slide 9 shows the risks associated
with using a prior. When is it
important to use a prior?
Source: Smithsonian Institution
Number: 2004-57325