Bayesian Statistics - University of Sunderland

Download Report

Transcript Bayesian Statistics - University of Sunderland

Introduction to Bayesian Statistics
Harry R. Erwin, PhD
School of Computing and Technology
University of Sunderland
Resources
• Albert, Jim (2007) Bayesian Computation with R,
Springer.
• Ntzoufras, Ioannis (2009) Bayesian Modeling Using
WinBUGS, Wiley.
• Kéry, Marc (2010) Introduction to WinBUGS for
Ecologists, Academic Press.
Topics
• Probability
• Bayes’ Theorem
• Bayesian Statistics
Basic Definitions
• Suppose Ω is a sample space—the set of outcomes, ω, of an
experiment—for example the possible results of flipping a coin or
rolling a die.
• Suppose F is the collection of possible events (subsets of Ω) involving
outcomes in Ω, including:
– An empty event, Ø, with no outcomes belonging to it.
– Simple events consisting of single outcomes.
– Complex events consisting of multiple alternative outcomes (e.g., rolling
an even number on a six-sided die).
• An event in F that combines the outcomes in two other events, A and
B, is called the union of A and B and is written A∪B.
• An event in F made up of the outcomes present in both A and B is
known as the intersection of A and B and is written A∩B.
Probability Measures
• A probability measure, P, is a function that
assigns to each event in F, a real number
between 0 and 1, called the probability of the
event, that satisfies the following requirements:
– P(Ω) = 1
– If there are two disjoint events, A and B in F—that
is, A∩B = Ø—then P(A∪B) = P(A)+P(B). (This rule
must also be true for any countable number of
pairwise disjoint events.)
Conditional Probability
• The “conditional probability of B given A”, written
P(B|A), describes the probability of an outcome
being in B given that it is known to be in A.
• P(B|A) = P(A∩B)/P(A).
• For example, let A be even die rolls of a fair sixsided die, and B be die rolls that are a multiple of
three.
• A={2,4,6}, B={3,6}, A∩B={6}, and P(B|A) =
P(A∩B)/P(A)=(1/6)/(1/2) = 1/3.
Probability Models
• The triple <Ω, F, P> is called a probability
model.
• Some theorems can be easily proven:
1. Let Ø be a event in which there are no outcomes.
Then P(Ø)=0.
2. Define ¬A to be an event consisting of all the
outcomes not in A. Then P(¬A) = 1 – P(A).
3. If A∩B = Ø, P(A∪B) = P(A)+P(B).
Bayes’ Theorem
• Bayes’ theorem is a provable consequence of these
axioms (Wikipedia):
• That is, the probability of A given B is the probability of
B given A multiplied by the probability of A and divided
by the probability of B.
• Also, P(A|B) ∝ P(B|A)P(A)
What Are Bayesian Statistics?
• Bayesian statistics are the working out of the
implications of Bayes’ Theorem.
• They allow you to deduce the posteriori
(afterwards) probability distribution of an event
if you know the prior (before) probability
distribution of the event and have some
additional information.
• It’s a theorem, so it is always true.
Why is Bayes’ Theorem Useful?
• If the prior probability distribution is ‘vague’ or
‘noninformative’, you can incrementally add
information to produce a posterior distribution that
reflects just the information. That posterior
distribution is very similar to the distribution you
would come up with using classical statistics.
• If you start with real information in your prior, that
is also taken into account, which is even more
useful.
Density Functions
• You often have a ‘density’ function that is a good model
of how events are distributed. A few typical density
functions include:
1. The binomial distribution (one parameter, the probability
of a ‘heads’)
2. The Poisson distribution (one parameter, the mean number
of occurrences in a unit time interval)
3. The exponential distribution (one parameter, the rate)
4. The normal distribution (two parameters, the mean and the
variance)
5. The uniform distribution (two parameters, the beginning
and the end)
Likelihood
• Suppose you have an event, ω, drawn from a
process described by a density. The probability
of the event is then the value of the density for
that event.
• This is the ‘likelihood’ of that event.
• If you have multiple samples, the corresponding
likelihood is the product of the density values
for each of the events.
Maximum Likelihood
• Suppose you have a probability density function,
f(x,θ), and θ is a parameter, such as the mean, that
you want to estimate. If you have n data samples,
xn, the most likely value of θ is the one that
maximizes the value of the likelihood.
• Mathematically, the likelihood is Πf(xn, θ), the
joint distribution of the sample and the product of
all the f(xn, θ) functions.
• You can often use calculus to calculate θ.
Bayes and Maximum Likelihood
• Suppose you have a prior distribution, f(θ), and
some data, described by a likelihood function,
li(data|θ). The posterior distribution, f(θ|data),
can be calculated by applying Bayes’ Theorem.
– f(θ|data) ∝li(data|θ)f(θ)
Worked Example
•
•
•
•
51 smokers in 83 cases of lung cancer
23 smokers in 70 disease-free
P(smoker|case) = 51/83
P(smoker|control) = 23/70
• P(case|smoker) = P(smoker|case)P(case)/
(P(smoker|case)P(case)+P(smoker|control)P(control))
• Relative risk = RR = P(case|smoker)/
P(case|nonsmoker) = 1.87