How to Write and Present Class 6: Results

Download Report

Transcript How to Write and Present Class 6: Results

A Practical Course in Graphical
Bayesian Modeling; Class 1
Eric-Jan
Wagenmakers
Outline
 A bit of probability theory
 Bayesian foundations
 Parameter estimation: A simple example
 WinBUGS and R2WinBUGS
Probability Theory
(Wasserman, 2004)
 The sample space Ω is the set of possible
outcomes of an experiment.
 If we toss a coin twice then
Ω = {HH, HT, TH, TT}.
 The event that the first toss is heads is
A = {HH, HT}.
Probability Theory
(Wasserman, 2004)
 A
B denotes intersection: “A and B”
 A B denotes union: “A or B”
Probability Theory
(Wasserman, 2004)
P is a probability measure when the following axioms
are satisfied:
1. Probabilities are never negative:
P  A  0
2. Probabilities add to 1.
P    1
2. The probability of the union of non-overlapping

(disjoint) events is its sum:



P  Ai    P  Ai 
 i 1  i 1
Probability Theory
(Wasserman, 2004)
For any events A and B:
P  A B  P  A  P  B  P  A, B 
Ω
A
B
Conditional Probability
The conditional probability of A given B is
P  A, B 
P  A | B 
P  B
Ω
A
B
Conditional Probability
You will often encounter this as
P  A | B P  B   P  A, B 
Ω
A
B
Conditional Probability
From
P  A, B  P  A | B P  B 
and
P  A, B  P  B | A P  A
follows Bayes’ rule.
Bayes’ Rule
P  B | A P  A
P  A | B 
P  B
The Law of
Total Probability
Let A1,…,Ak be a partition of Ω. Then, for any event B:
k
P( B)   P  B | Ai  P  Ai 
i 1
The Law of
Total Probability
This is just a weighted average of P(B) over the
disjoint sets A1,…,Ak. For instance, when all P(Ai) are
equal, the equation becomes:
k
1
P( B)   P  B | Ai 
k i 1
Bayes’ Rule Revisited
P  Ai | B  
P  B | Ai  P  Ai 
k
 PB | A  P A 
i 1
i
i
Example
(Wasserman, 2004)
 I divide my Email into three categories:
“spam”, “low priority”, and “high priority”.
 Previous experience suggests that the a priori
probabilities of a random Email belonging to
these categories are .7, .2, and .1,
respectively.
Example
(Wasserman, 2004)
 The probabilities of the word “free”
occurring in the three categories is .9, .01,
.01, respectively.
 I receive an Email with the word “free”.
What is the probability that it is spam?
Outline
 A bit of probability theory
 Bayesian foundations
 Parameter estimation: A simple example
 WinBUGS and R2WinBUGS
The Bayesian Agenda
 Bayesians use probability to quantify
uncertainty or “degree of belief” about
parameters and hypotheses.
 Prior knowledge for a parameter θ is updated
through the data to yield the posterior
knowledge.
The Bayesian Agenda
P  D |   P  
P  | D  
 P  D |   P  
P  D
Also note that this equation allows one to learn, from the
probability of what is observed, something about what is
not observed.
The Bayesian Agenda
 But why would one measure “degree of
belief” by means of probability? Couldn’t we
choose something else that makes sense?
 Yes, perhaps we can, but the choice of
probability is anything but ad-hoc.
The Bayesian Agenda
 Assume “degree of belief” can be measured
by a single number.
 Assume you are rational, that is, not selfcontradictory or “obviously silly”.
 Then degree of belief can be shown to follow
the same rules as the probability calculus.
The Bayesian Agenda
 For instance, a rational agent would not hold
intransitive beliefs, such as:
Bel  A
Bel  B 
Bel  B 
Bel C 
Bel  C 
Bel  A
The Bayesian Agenda
 When you use a single number to measure
uncertainty or quantify evidence, and these numbers
do not follow the rules of probability calculus, you
can (almost certainly?) be shown to be silly or
incoherent.
 One of the theoretical attractions of the Bayesian
paradigm is that it ensures coherence right from the
start.
Coherence Example
a la De Finetti
 There exists a ticket that says “If the French national
soccer team wins the 2010 World Cup, this ticket
pays $1.”
 You must determine the fair price for this ticket.
 After you set the price, I can choose to either sell the
ticket to you, or to buy the ticket from you. This is
similar to how you would divide a pie according to
the rule “you cut, I choose”.
 Please write this number down, you are not allowed
to change it later!
Coherence Example
a la De Finetti
 There exists another ticket that says “If the Spanish
national soccer team wins the 2010 World Cup, this
ticket pays $1.”
 You must again determine the fair price for this
ticket.
Coherence Example
a la De Finetti
 There exists a third ticket that says “If either the
French or the Spanish national soccer team wins the
2010 World Cup, this ticket pays $1.”
 What is the fair price for this ticket?
Bayesian Foundations
 Bayesians use probability to quantify
uncertainty or “degree of belief” about
parameters and hypotheses.
 Prior knowledge for a parameter θ is updated
through the data to yield posterior
knowledge.
 This happens through the use of probability
calculus.
Bayes’ Rule
Likelihood
Prior
Distribution
P  D |   P  
P  | D  
P  D
Posterior
Distribution
Marginal Probability
of the Data
Bayesian Foundations
P  D |   P  
P  | D  
 P  D |   P  
P  D
This equation allows one to learn, from the
probability of what is observed, something about what is
not observed. Bayesian statistics was long known as
“inverse probability”.
Nuisance Variables
 Suppose θ is the mean of a normal distribution, and
α is the standard deviation.
 You are interested in θ, but not in α.
 Using the Bayesian paradigm, how can you go
from P(θ, α | x) to P(θ | x)? That is, how can you
get rid of the nuisance parameter α? Show how this
involves P(α).
Nuisance Variables
P  | x    P ( ,  | x)d
  P( |  , x) P  | x  d
  P ( |  , x) P  x |   P   d
Predictions
 Suppose you observe data x, and you use a model
with parameter θ.
 What is your prediction for new data y, given that
you’ve observed x? In other words, show how you
can obtain P(y|x).
Predictions
P  y | x    P ( y |  , x) P  | x  d
Want to Know More?
Outline
 A bit of probability theory
 Bayesian foundations
 Parameter estimation: A simple example
 WinBUGS and R2WinBUGS
Bayesian Parameter
Estimation: Example
 We prepare for you a series of 10 factual
true/false questions of equal difficulty.
 You answer 9 out of 10 questions correctly.
 What is your latent probability θ of
answering any one question correctly?
Bayesian Parameter
Estimation: Example
 We start with a prior distribution for θ. This
reflect all we know about θ prior to the
experiment. Here we make a standard choice
and assume that all values of θ are equally
likely a priori.
Bayesian Parameter
Estimation: Example
 We then update the prior distribution by means
of the data (technically, the likelihood) to
arrive at a posterior distribution.
The Likelihood
 We use the binomial model, in which P(D|θ) is
given by
n s
ns
P  D |      1   
s
where n =10 is the number of trials, and s=9 is the
number of successes.
Bayesian Parameter
Estimation: Example
 The posterior distribution is a compromise
between what we knew before the experiment
(i.e., the prior) and what we have learned from
the experiment (i.e., the likelihood). The
posterior distribution reflects all that we know
about θ.
Mode = 0.9
95% confidence
interval: (0.59, 0.98)
Bayesian Parameter
Estimation: Example
 Sometimes it is difficult or impossible to
obtain the posterior distribution analytically.
 In this case, we can use Markov chain Monte
Carlo algorithms to sample from the posterior.
As the number of samples increases, the
approximation to the analytical posterior
becomes arbitrarily small.
Mode = 0.89
95% confidence
interval: (0.59, 0.98)
With 9000 samples,
almost identical to
analytical result.
Outline
 A bit of probability theory
 Bayesian foundations
 Parameter estimation: A simple example
 WinBUGS and R2WinBUGS
WinBUGS
Bayesian inference
Using
Gibbs Sampling
You want to have this
installed (plus the
registration key)
WinBUGS
 Knows many probability distributions
(likelihoods);
 Allows you to specify a model;
 Allows you to specify priors;
 Will then automatically run the MCMC
sampling routines and produce output.
Want to Know More
About MCMC?
Models in WinBUGS
 The models you
can specify in
WinBUGS are
directed acyclical
graphs (DAGs).
Models in WinBUGS
(Spiegelhalter, 1998)
Below, E depends only on C
B
D
A
C
E
Models in WinBUGS
(Spiegelhalter, 1998)
If the nodes are stochastic, the joint
distribution factorizes…
B
D
A
C
E
Models in WinBUGS
(Spiegelhalter, 1998)
P(A,B,C,D,E) = P(A) P(B) P(C|A,B)
P(D|A,B) P(E|C)
B
D
A
C
E
Models in WinBUGS
(Spiegelhalter, 1998)
This means we can sometimes perform
“local” computations to get what we want
B
D
A
C
E
Models in WinBUGS
(Spiegelhalter, 1998)
What is P(C|A,B,D,E)?
B
D
A
C
E
Models in WinBUGS
(Spiegelhalter, 1998)
P(C|A,B,D,E) is proportional to P(C|A,B) P(E|C)
 D is irrelevant
B
D
A
C
E
WinBUGS & R
 WinBUGS produces MCMC samples.
 We want to analyze the output in a nice
program, such as R.
 This can be accomplished using the R
package “R2WinBUGS”
End of Class 1