PSI Bayes Course My slides used in the Introduction to

Download Report

Transcript PSI Bayes Course My slides used in the Introduction to

Applied Bayesian Methods
Phil Woodward
Phil Woodward 2014
1
Introduction to
Bayesian Statistics
Phil Woodward 2014
2
Inferences via Sampling Theory
• Inferences made via sampling distribution of statistics
–
–
–
–
A model with unknown parameters is assumed
Statistics (functions of the data) are defined
These statistics are in some way informative about the parameters
For example, they may be unbiased, minimum variance estimators
• Probability is the frequency with which recurring events occur
–
–
–
–
The recurring event is the statistic for fixed parameter values
The probabilities arise by considering data other than actually seen
Need to decide on most appropriate “reference set”
Confidence and p-values are p(data “or more extreme”| θ) calculations
• Difficulties when making inferences
– Nuisance parameters an issue when no suitable sufficient statistics
– Constraints in the parameter space cause difficulties
– Confidence intervals and p-values are routinely misinterpreted
• They are not p(θ | data) calculations
Phil Woodward 2014
3
How does Bayes add value?
• Informative Prior
–
–
–
–
Natural approach for incorporating information already available
Smaller, cheaper, quicker and more ethical studies
More precise estimates and more reliable decisions
Sometimes weakly informative priors can overcome model fitting failure
• Probability as a “degree of belief”
– Quantifies our uncertainty in any unknown quantity or event
– Answers questions of direct scientific interest
• P(state of world | data) rather than P(data* | state of world)
• Model building and making inferences
–
–
–
–
–
Nuisance parameters no longer a “nuisance”
Random effects, non-linear terms, complex models all handled better
Functions of parameters estimated with ease
Predictions and decision analysis follow naturally
Transparency in assumptions
• Beauty in its simplicity!
– p(θ | x) = p(x | θ) p(θ) / p(x)
– Avoids issue of identifying “best” estimators and their sampling properties
– More time spent addressing issues of direct scientific relevance
Phil Woodward 2014
4
Probability
• Most Bayesians treat probability as a measure of belief
– Some believe probabilities can be objective (not discussed here)
– Probability not restricted to recurring events
• E.g. probability it will rain tomorrow is a Bayesian probability
– Probabilities lie between 0 (impossible event) and 1 (certain event)
– Probabilities between 0 and 1 can be calibrated via the “fair bet”
• What is a “fair bet”?
– Bookmaker sells a bet by stating the odds for or against an event
– Odds are set to encourage a punter to buy the bet
• E.g. odds of 2-to-1 against means that for each unit staked two are won, plus the stake
– A fair bet is when one is indifferent to being bookmaker or punter
• i.e. one doesn’t believe either side has an unfair advantage in the gamble
Phil Woodward 2014
5
Probability
• Relationship between odds and probability
– One-to-one mapping between odds (O) and probability (P)
Where O equals the ratio X/Y for odds of X-to-Y in favour
and the ratio Y/X for odds of X-to-Y against an event
e.g. odds of 2-to-1 against, if fair, imply probability equals ⅓
• Probabilities defined this way are inevitably subjective
–
–
–
–
–
–
People with different knowledge may have different probabilities
Controversy occurs when using this definition to interpret data
Science should be “objective”, so “subjectivity” to some is heresy
But where do the models that Frequentists use come from?
Are the decisions made when designing studies purely objective?
Is judgment needed when generalising from a sample to a population?
Phil Woodward 2014
6
Probability
• Subjectivity does not mean biased, prejudiced or unscientific
– Large body of research into elicitation of personal probabilities
– Where frequency interpretation applies, these should support beliefs
• E.g. the probability of the next roll of a die coming up a six should be ⅙ for everyone
unless you have good reason to doubt the die is fair
– An advantage of the Bayesian definition is that it allows all other
information to be taken into account
• E.g. you may suspect the person offering a bet on the die roll is of dubious character
• Bayesians are better equipped to win at poker than Frequentists!
• All unknown quantities, including parameters,
are considered random variables
Epistemic
uncertainty
– each parameter still has only one true value
– our uncertainty in this value is represented by a probability distribution
Phil Woodward 2014
7
Exchangeability
• Exchangeability is an important Bayesian concept
– exchangeable quantities cannot be partitioned into more
similar sub-groups
– nor can they be ordered in a way that infers we can
distinguish between them
– exchangeability often used to justify prior distribution for
parameters analogous to classical random effects
Phil Woodward 2014
8
The Bayesian Paradigm
From
P r(A | B ) 
and
P r(B | A) 
P r(A, B)
P r(B)
A
P r(A, B)
P r(A)
B
comes Bayes Theorem
P r(A | B) 
P r(A) P r(B | A)
P r(B )
Nothing controversial yet.
Phil Woodward 2014
9
The Bayesian Paradigm
How is Bayes Theorem (mis)used?
Coin tossing study: Is the coin fair?
Model
ri ~ bern(π) i = 1, 2, ..., n
ri = 1 if ith toss a head, = 0 if a tail
Let terms in Bayes Theorem be
A = π (controversial)
B=r
then
p ( ) p (r |  )
p ( | r ) 
Why?
p(r )
Phil Woodward 2014
10
The Bayesian Paradigm
What are these terms?
p(r|π) is the likelihood
= bin(n, Σr| π)
(not controversial)
p(π) is the prior
= ???
(controversial)
The prior formally represents our knowledge of π
before observing r
Phil Woodward 2014
11
The Bayesian Paradigm
What are these terms (continued)?
MCMC to
the rescue!
p(r) is the normalising constant
= ∫ p(r|π) p(π) dπ
(the difficult bit!)
In general,
not in this particular case
p(π|r) is the posterior
The posterior formally represents our knowledge of
π after observing r
Phil Woodward 2014
12
The Bayesian Paradigm
A worked example.
Coin tossed 5 times giving 4 heads and 1 tail
What if data were
p(r|π) = bin(n=5, Σr=4| π)
5 dogs in tox study:
4 OK, 1 with an AE?
p(π) = beta(a, b), when a=b=1 ≡ U(0, 1)
...but is a stronger
Why choose a beta distribution?!
prior justifiable?
- conjugacy … posterior p(π|r) = beta(a+Σr, b+n-Σr)
- can represent vague belief?
- can be an objective reference?
- Beta family is flexible (could be informative)
Phil Woodward 2014
13
The Bayesian Paradigm
A worked example (continued).
Applying Bayes theorem
p(π|r) = beta(5, 2)
95% credible interval
π : (0.36 to 0.96)
Pr[π ϵ (0.36 to 0.96) | Σr = 4] = 0.95
95% confidence interval
π : (0.28 to 0.995)
Pr[Σr ≥ 4 | π = 0.28] = 0.025, Pr[Σr ≤ 4 | π = 0.995] = 0.025
Phil Woodward 2014
14
The Bayesian Paradigm
Bayesian inference for simple Normal model
Clinical study: What’s the mean response to placebo?
Model
yi ~ N(µ, σ2) i = 1, 2, ..., n (placebo subjects only)
assume σ known and for convenience will use
precision parameter τ = σ-2 (reciprocal of variance)
Terms in Bayes Theorem are
p(  ) p(y |  )
p(  | y ) 
p(y )
Phil Woodward 2014
15
The Bayesian Paradigm
Improper
prior density
Phil Woodward 2014
16
The Bayesian Paradigm
Posterior precision
equals sum of prior
and data precisions
Posterior mean equals
weighted mean of
prior and data
Phil Woodward 2014
17
The Bayesian Paradigm
Phil Woodward 2014
18
The Bayesian Paradigm
A worked example (continued).
Applying Bayes theorem
p(µ |y) = N(80, 0.5)
95% credible interval
µ : (78.6 to 81.4)
95% confidence interval
µ : (78.6 to 81.4)
Phil Woodward 2014
19
The Bayesian Paradigm
Bayesian inference for simple Normal model
The case when both mean and variance are unknown
Model
yi ~ N(µ, σ2) i = 1, 2, ..., n
Terms in Bayes Theorem are
p(  ,  ) p(y |  ,  )
p(  ,  | y ) 
p(y )
Phil Woodward 2014
20
The Bayesian Paradigm
Phil Woodward 2014
21
The Bayesian Paradigm
Phil Woodward 2014
22
The Bayesian Paradigm
Bayesian inference for Normal Linear Model
Model
y = Xθ + ε
εi ~ N(0, σ2) i = 1, 2, ..., n
y and ε are n x 1 vectors of observations and errors
X is a n x k matrix of known constants
θ is a k x 1 vector of unknown regression coefficients
Terms in Bayes Theorem are
p(θ,  ) p(y | θ,  )
p(θ,  | y ) 
p(y )
Phil Woodward 2014
23
The Bayesian Paradigm
Phil Woodward 2014
24
The Bayesian Paradigm
In summary, for Normal Linear Model (“fixed effects”)
Classical confidence intervals can be interpreted
as Bayesian credible intervals
But, need to be aware of implicit prior distributions
Not generally the case for other error distributions
But for “large samples” when likelihood based estimator
has approximate Normal distribution, a Bayesian
interpretation can again be made
“Random effects” models are not so easily compared
Don’t assume classical results have Bayesian interpretation
Phil Woodward 2014
25
The Bayesian Paradigm
Conditional (on µ)
distribution for
future response
Phil Woodward 2014
Posterior distribution for µ
26
The Bayesian Paradigm
N(µ, σ2)
N(µ1, 1/τ1)
yf ~ N(µ1, 1/τ1 + 1/τ)
Sum of posterior variance of µ
and conditional variance of yf
Phil Woodward 2014
27
The Bayesian Paradigm
Predictive Distributions
When are predictive distributions useful?
When designing studies
“design priors” must
be informative
we predict the data using priors to assess the design
we may use informative priors to reduce study size,
these being predictions from historical studies
When undertaking interim analyses
we can predict the remaining data using current posterior
When checking adequacy of our assumed model
model checking involves comparing observations with predictions
When making decisions after study has completed
we can predict future trial data to assess probability of success,
helping to determine best strategy or decide to stop
Some argue predictive inferences should be our main focus
be interested in observable rather than unobservable quantities
e.g. how many patients will do better on this drug?
Phil Woodward 2014
28
The Bayesian Paradigm
δ is treatment effect
Phil Woodward 2014
29
The Bayesian Paradigm
Phil Woodward 2014
30
The Bayesian Paradigm
Phil Woodward 2014
31
The Bayesian Paradigm
Making Decisions
A simple Bayesian approach defines criteria of the form
Pr(δ ≥ Δ) > π
where Δ is an effect size of interest, and π is the
probability required to make a positive decision
For example, Bayesian analogy to significance could be
Pr(δ > 0) > 0.95
But is believing δ > 0 enough for further investment?
Phil Woodward 2014
32
END OF PART 1
intro to WinBUGS
illustrating fixed effect models
Phil Woodward 2014
33
Bayesian Model Checking
Phil Woodward 2014
34
Bayesian Model Checking
Brief outline of some methods easy to use with MCMC
Consider three model checking objectives
1. Examination of individual observations
2. Global tests of goodness-of-fit
3. Comparison between competing models
In all cases we compare observed statistics with
expectations, i.e. predictions conditional on a model
Phil Woodward 2014
35
Bayesian Model Checking
yi is the observation
Yi is the prediction
E(Yi) is the mean of the
predictive distribution
Bayesian residuals can
be examined as we do
classical residuals
p-value
concept
Phil Woodward 2014
36
Bayesian Model Checking
Ideally we would have a separate evaluation dataset
Predictive distribution for Yi is then independent of yi
Typically not available for clinical studies
Cross-validation next best, but difficult within WinBUGS
Following methods use the data twice, so will be
conservative, i.e. overstate how good model fits data
Will illustrate using WinBUGS code for simplest NLM
Phil Woodward 2014
37
Bayesian Model Checking
{
(Examination of Individual Observations)
More typically, each Y[i]
### Priors
has different mean, mu[i].
mu ~ dnorm(0, 1.0E-6)
prec ~ dgamma(0.001, 0.001) ; sigma <- pow(prec, -0.5)
### Likelihood
for (i in 1:N) {
Y[i] ~ dnorm(mu, prec)
}
each residual has a distribution
use the mean as the residual
Y.rep[i] is a prediction accounting for
uncertainty in parameter values, but
not in the type of model assumed
### Model checking
for (i in 1:N) {
### Residuals and Standardised Residuals
resid[i] <- Y[i] – mu
st.resid[i] <- resid[i] / sigma
}
}
mean of Pr.big[i] estimates
the probability a future
observation is this big
### Replicate data set & Prob observation is extreme
only need both
Y.rep[i]
~ dnorm(mu, prec)
Pr.big[i] <- step( Y[i] – Y.rep[i] )
when Y.rep[i] could
Pr.small[i] <- step( Y.rep[i] – Y[i] )
exactly equal Y[i]
Phil Woodward 2014
38
Bayesian Model Checking
(Global tests of goodness-of-fit)
Identify a discrepancy measure
e.g. a measure of skewness
for testing this aspect of
Normal assumption
typically a function of the data
but could be function of both data and parameters
Predict (replicate) values of this measure
conditional on the type of model assumed
but accounting for uncertainty in parameter values
Compute “Bayesian p-value” for observed discrepancy
similar approach used for individual observations
convention for global tests is to quote “p-value”
Phil Woodward 2014
39
Bayesian Model Checking
{
(Global tests of goodness-of-fit)
… code as before …
### Model checking
for (i in 1:N) {
### Residuals and Standardised Residuals
resid[i] <- Y[i] – mu
st.resid[i] <- resid[i] / sigma
m3[i]
<- pow( st.resid[i], 3)
### Replicate data set
Y.rep[i]
~ dnorm(mu, prec)
resid.rep[i] <- Y.rep[i] – mu[i]
st.resid.rep[i] <- resid.rep[i] / sigma
m3.rep[i]
<- pow( st.resid.rep[i], 3)
}
skew
<- mean( m3[] )
skew.rep <- mean( m3.rep[] )
p.skew.pos <- step( skew.rep – skew )
p.skew.neg <- step( skew – skew.rep )
p.skew interpreted as for
classical p-value, i.e. small
is evidence of a discrepancy
}
Phil Woodward 2014
40
Bayesian Model Checking
(Comparison between competing models)
Bayes factors
not easy to implement using MCMC
will not be discussed further
ratio of marginal likelihoods under competing models
Bayesian analogy to classical likelihood ratio test
Phil Woodward 2014
41
Bayesian Model Checking
(Comparison between competing models)
Deviance Information Criterion (DIC)
a Bayesian “information criterion” but not the BIC
will not discuss theory, focus on practical interpretation
WinBUGS & SAS can report this for most models
DIC is the sum of two separately interpretable quantities
DIC = Dbar + pD
Dbar
pD
Dhat
: the posterior mean of the deviance
: the effective number of parameters in the model
pD = Dbar - Dhat
: deviance point estimate using posterior mean of θ
Phil Woodward 2014
42
Bayesian Model Checking
(Comparison between competing models)
Deviance Information Criterion (DIC)
DIC = Dbar + pD
pD will differ from the total number of parameters
when posterior distributions are correlated
typically the case for “random effect parameters”
non-orthogonal designs, correlated covariates
common for non-linear models
pD will be smaller because some parameters’ effects “overlap”
Phil Woodward 2014
43
Bayesian Model Checking
(Comparison between competing models)
Deviance Information Criterion (DIC)
DIC = Dbar + pD
Measures model’s ability to make short-term predictions
Smaller values of DIC indicate a better model
Rules of thumb for comparing models fitted to the same data
DIC difference > 10 is clear evidence of being better
DIC difference > 5 (< 10) is still strong evidence
There are still some unresolved issues with DIC
relatively early days in it use, so use other methods as well
Phil Woodward 2014
44
Bayesian Model Checking
(practical advice)
“All models are wrong, but some are useful”
if we keep looking, or have lots of data, we will find lack-of-fit
need to assess whether model’s deficiencies matter
depends upon the inferences and decisions of interest
judge the model on whether it is fit for purpose
Sensitivity analyses are useful when uncertain
should assess sensitivity to both the likelihood and the prior
e.g. replace Normal
with t distribution
Model expansion may be necessary
Bayesian approach particularly good here
Informative priors and MCMC allow greater flexibility
Phil Woodward 2014
45
Introduction to BugsXLA
Parallel Group Clinical Study
(Analysis of Covariance)
Phil Woodward 2014
46
BugsXLA
(case study 3.1)
Switch to Excel
and demonstrate
how BugsXLA
facilitates rapid
Bayesian model
specification and
analysis via
WinBUGS.
Phil Woodward 2014
47
BugsXLA
(case study 3.1)
Phil Woodward 2014
48
BugsXLA
(case study 3.1)
Settings used
by WinBUGS
Posterior
distributions to
be summarised
Posterior
samples to
be imported
Save
WinBUGS files,
create R scripts
Suggested
settings
Phil Woodward 2014
49
BugsXLA
(case study 3.1)
Fixed factor effects
parameterised as
contrasts from a zero
constrained level
Priors for other
parameter types
Default priors chosen
to be “vague”
(no guarantees!)
Bayesian model
checking options
Phil Woodward 2014
50
BugsXLA
(case study 3.1)
BugsXLA uses generic
names for parameters
in WinBUGS code
(deciphered on input!)
Recommend adding
MC Error to input list
The Excel sheet used
to display the results
Phil Woodward 2014
51
BugsXLA
Generic names … deciphered
(case study 3.1)
Posterior
means, st.devs.
& credible int.s
Could compute ratio
MC Error / St.Dev.
using cell formula
Reminder of model, prior
and WinBUGS settings
Phil Woodward 2014
52
BugsXLA
(case study 3.1)
Phil Woodward 2014
53
BugsXLA
(case study 3.1)
BugsXLA interprets contents
of cells to define predictions
& contrasts to be estimated
In this case, predicted
means for each level of
factor TRT are defined
Phil Woodward 2014
54
BugsXLA
(case study 3.1)
Phil Woodward 2014
55
Other default settings
can be personalised,
e.g. default priors
Recommend turn this
off once understand
how parameterised
Can set own alerts to
be used with model
checking functions
BugsXLA
(Default Settings)
Phil Woodward 2014
56
Fixed factor effects
parameterisation
can be changed to
SAS (last level)
BugsXLA
(Default Settings)
Phil Woodward 2014
57
Obtaining Prior Distribution
Phil Woodward 2014
58
Obtaining Prior Distributions
• Brief overview of main approaches*
• Further issues in the use of Priors*
* based on chapter 5 of Spiegelhalter et al (2004)
Phil Woodward 2014
59
Obtaining Prior Distributions
• Misconceptions: They are not necessarily
–
–
–
–
Prespecified
Unique
Known
Influential
• Bayesian analysis
–
–
–
–
But prespecification
strongly recommended,
data must not influence
the prior distribution
Transforms prior into posterior beliefs
Doesn’t produce the posterior distribution
Context and audience important
Sensitivity to alternative assumptions vital
• Prior could differ at design & analysis stage
– May want less controversial vague priors in analysis
– Design priors usually have to be informative
Phil Woodward 2014
60
Obtaining Prior Distributions
• Five broad approaches
– Elicitation of subjective opinion
– Summarising past evidence
– Default priors
– Robust priors
– Estimation using hierarchical models
Phil Woodward 2014
61
Obtaining Prior Distributions
• Elicitation of subjective opinion
–
–
–
–
Most useful when little ‘objective’ evidence
Less controversial at the design stage
Elicitation should be kept simple & interactive
O’Hagan is a strong advocate
• Spiegelhalter et al do not recommend
– Prefer archetypal views; see Default Priors
Phil Woodward 2014
62
Summarising
Past Evidence.
Typically (b) adequate, maybe
with more complexity.
Meta-analytic-predictive
Typically,
yh ~ N(θh, σh2)
Exchangeable
θ,θh ~ N(μ, τ2)
(a) τ = ∞, μ = K
(b) τ ~ dist.
(f) τ = 0, μ = θ
(c) θh = θ + δh
δh ~ N(0, σδh2)
θh ~ N(θ, σδh2)
Phil Woodward 2014
63
Obtaining Prior Distributions
• Default Priors
Parameter “big” can be derived via
eliciting inferred quantities, e.g. credible
differences between study means.
– Vague a.k.a. non-informative or reference
• WinBUGS (general advice for ‘simple’ models):
– Location parms ~ Normal with huge variance
– Lowest level error variance ~ inv-gamma(small, small)
– Hierarchical error variances … controverisal
sd ~ Uniform(0, big) or ~ Half-Normal(big); big < huge!
– Sceptical & Enthusiastic Priors
•
•
•
•
Sceptical used to determine when success achieved
Enthusiastic used to determine when to stop
Sceptical prior centred on 0 with small prob. effect > Δ
Enthusiastic prior centred on Δ with small prob. effect < 0
– ‘Lump-and-smear’ Priors
• Point mass at the null hypothesis
Phil Woodward 2014
Might be appropriate for
unprecedented
mechanisms in ED stage.
64
Obtaining Prior Distributions
• Robust Priors
– We always assess model assumptions
– Bayesians assess prior assumptions also
– Use a ‘community of priors’
• Discrete set
• Parametric family
• Non-parametric family
Perhaps develop a range
of priors appropriate in
typical case.
– Interpretation section recommended in report
• Show how data affect a range of prior beliefs
Phil Woodward 2014
65
Example of a parametric family of priors
α is the discounting factor discussed previously
(variant d: “equal but discounted”)
Not Recommended as no operational
Interpretation & no means of assessing
suitable values for alpha
Phil Woodward 2014
66
Obtaining Prior Distributions
• Hierarchical priors
– In simplest case, the same as (b) Exchangeable
– ‘Borrow strength’ between studies
• counter view: ‘share weakness’
– Three essential ingredients
• Exchangeable parameters
• Form for random-effects dist.
– Typically Normal, although t is perhaps more realistic
• Hyperprior for parms of random-effects dist.
– sd ~ Uniform(0, Max Credible) or Half-Normal(big)
or Half-Cauchy(large)
Phil Woodward 2014
67
Obtaining Prior Distributions
• Case Study
– Dental Pain Studies
– Informative prior for placebo mean
• Used in the formal analysis
– Meta-analytic-predictive approach
Phil Woodward 2014
68
Obtaining Prior Distributions
(part of table of prior studies considered relevant)
TOTPAR[6]
Title
Characterization of rofecoxib as a cyclooxygenase-2 isoform inhibitor and
demonstration of analgesia in the dental pain model
Valdecoxib Is More Efficacious Than Rofecoxib in Relieving Pain Associated
With Oral Surgery
Rofecoxib versus codeine/acetaminophen in postoperative dental pain: a
double-blind, randomized, placebo- and active comparator-controlled clinical
trial
Analgesic Efficacy of Celecoxib in Postoperative Oral Surgery Pain: A
Single-Dose, Two-Center, Randomized, Double-Blind, Active- and PlaceboControlled Study
Combination Oxycodone 5 mg/Ibuprofen 400 mg for the Treatment of
Postoperative Pain: A Double-Blind, Placebo and Active-Controlled ParallelGroup Study
Mean
Authors
Treatment
3.01
0.51
Elliot W. Ehrich et. al
Rofecoxib 50 and 500 mg
Ibuprofen 400 mg
Placebo
3.01
0.76
Fricke J. et al.
Valdecoxib 40 mg
Rofecoxib 50 mg
Placebo
3.4
1.22
Chang DJ; et al.
Rofecoxib 50 mg
Codeine/Acetaminophen 60/600 mg
Placebo
3.7
0.75
Raymond Cheung, et al
Celecoxib 400 mg
Ibuprofen 400 mg
Placebo
4.2
0.83
Thomas Van Dyke, et al
Oxycodone/Ibuprofen 5 mg/400 mg
Ibuprofen 400 mg
Oxycodone 5 mg
Placebo
Phil Woodward 2014
(Placebo Data)
SE
69
Obtaining Prior Distributions
• Meta-analysis of historical data
– Published summary data
• Normal Linear Mixed Model
Yi = θi + ei
θi ~ N(µθ, ω2)
ei ~ N(0, SEi2)
Yi are the observed placebo means from each study
SEi are their associated standard errors
Phil Woodward 2014
70
Obtaining Prior Distributions
• WinBUGS used to determine prior
• assumes study means exchangeable
• but not responses from different studies
{
‘newtrial’
provides
prior for a
future study.
for (i in 1:N) {
Y[i]
~ dnorm(theta[i], prec[i])
theta[i] ~ dnorm(mu.theta, tau.theta)
prec[i] <- pow(se[i], -2)
}
newtrial ~ dnorm(mu.theta, tau.theta)
mu.theta ~ dnorm(0, 1.0E-6)
tau.theta <- pow(omega, -2)
omega ~ dunif(0, 100)
}
If studies smaller, model
should account for fact
that each se is estimated.
If few studies, might need
slightly more informative prior.
Phil Woodward 2014
71
Obtaining Prior Distributions
Gamma Distribution (or Inv-Gamma Dist.)
Particularly useful in Bayesian statistics
Conjugate for Poisson mean
Marginal distribution for σ-2 in NLMs
NOTE: more than one
parameterisation of
Gamma Dist.
Chi-Sqr is a special case of the Gamma
ChiSqr(v) ≡ Gamma(v/2, 0.5)
If s2 (v d.f.) is ML estimate of Normal variance
(and conventional vague prior: p(σ2) α σ-2)
Posterior p(σ2 | s2) = v s2 Inv-ChiSqr(v)
= Inv-Gamma(v/2, v s2/2)
Phil Woodward 2014
72
Obtaining Prior Distributions
• Empirical criticism of priors
e.g. observed placebo
mean response
– George Box suggested a Bayesian p-value
•
•
•
•
Prior predictive distribution for future observation
Compare actual observation with predictive dist.
Calculate prob. of observing more extreme
see model
checking
Measure of conflict between prior and data
section
– But what should you do if conflict occurs?
• At least report this fact
• Greater emphasis on analysis with a vaguer prior
– Robust prior approach
or heavy tailed
e.g. t4 distribution
• Formally model doubt using a mixture prior
Phil Woodward 2014
73
Obtaining Prior Distributions
• Key Points
–
–
–
–
–
–
–
Subjectivity cannot be completely avoided
Range of priors should be considered
Elicited priors tend to be overly enthusiastic
Historical data is best basis for priors
Archetypal priors provide a range of beliefs
Default priors are not always ‘weak’
Exchangeability is a strong assumption
• but with hierarchical model plus covariates, best option?
– Sensitivity analysis is very important
Phil Woodward 2014
74
BugsXLA
Deriving and using
informative prior distributions
Phil Woodward 2014
75
BugsXLA
(case study 5.3, details not covered in this course)
Predict placebo mean
response in a future study
BugsXLA can model study
level summary statistics
(d.f. optional)
Typically, model is much
simpler than this, e.g. placebo
data only, no study level
covariates, so only random
STUDY factor in model
Phil Woodward 2014
76
BugsXLA
(using informative prior distributions)
Back to Case Study 3.1
Will assume have derived informative priors for:
Placebo mean response
Normal with mean 0 and standard deviation 0.04
Residual variance
Scaled Chi-Square with s2 = 0.026 and df = 44
Switch back to Excel and show how to use this in BugsXLA
Phil Woodward 2014
77
BugsXLA
(case study 3.1, informative prior)
Import samples so prior and
posterior can be compared.
Ignore this, unless you have
R loaded and wish to
explore in own time.
Phil Woodward 2014
78
Informative
priors for
placebo
mean and
residual
variance
Phil Woodward 2014
79
BugsXLA
(case study 3.1, informative prior)
Click ‘sigma’ then
‘Post Plots’ icon
Update Graph
Can edit histogram
(‘user specified’)
Repeat for
‘Beta0’ (placebo) &
‘X.Eff[1,3]’ (TRT C)
Phil Woodward 2014
80
BugsXLA
(case study 3.1, informative prior)
Phil Woodward 2014
81
BugsXLA
(case study 3.1, informative prior)
CAUTION
Although prior
for TRT:C is
flat, posterior
is influenced
by other priors
Can obtain other
posterior summaries
Phil Woodward 2014
82
Bayesian Study Design
Phil Woodward 2014
83
Bayesian Study Design
Consider a generic decision criterion of the form
GO decision if Pr(δ ≥ Δ) > π
δ is the treatment effect
Δ is an effect size of interest
π is the probability required to make a positive decision
As previously discussed,
a Bayesian analogy to significance could be
Pr(δ > 0) > 0.95
Phil Woodward 2014
84
Bayesian Study Design
Phil Woodward 2014
85
Bayesian Study Design
Number of
subjects
Phil Woodward 2014
86
Bayesian Study Design
Operating Characteristics (OC)
Simple to calculate in any statistical software
e.g. for 2 group PG or AB/BA XO design
R code
non-central t cdf
1 - pt( qt(pi, df= df), df= df, ncp= (delta – DELTA)/(sigma*sqrt(2/N)) )
normal cdf
1 - pnorm( qnorm(pi), mean= (delta – DELTA)/(sigma*sqrt(2/N)) )
Phil Woodward 2014
87
Bayesian Study Design
Operating Characteristics (OC)
If wanted to account for uncertainty in σ
Determine Bayesian distribution for σ
e.g. σ ~ U(12, 18)
Use simulation to calculate the OC
1. Simulate σ value from its distribution
2. For each value of δ and this σ value compute Pr(GO)
3. Repeat 1 & 2 10,000 times, say, and mean for each δ value
Warning
This unconditional Pr(GO) averages high and low probabilities
Is under powered more concerning than over powered?
Phil Woodward 2014
88
Bayesian Study Design
Phil Woodward 2014
89
Bayesian Study Design
(Bayesian NLM inference reminder)
Prior: p(δ) = N(δ0, ω2)
vague if ω ≈ ∞
Known σ
Likelihood (sufficient statistic):
p(d | δ) = N(δ, Vd)
e.g. 2 arm PG or AB/BA XO, Vd = 2σ2/N
Posterior:
p(δ | d) = N(Mδ, Vδ)
Mδ = Vδ(δ0/ω2 + d/Vd) weighted average
1/Vδ = 1/ω2 + 1/Vd
precisions are additive
Vague prior implies p(δ|d) = N(d, Vd) = “confidence dist.”
Phil Woodward 2014
90
Bayesian Study Design
(Bayesian NLM predictions reminder)
Posterior Distribution:
As yet unobserved
p(δ | d) = N(Mδ, Vδ)
Conditional Distribution for d* from future study:
p(d* | δ) = N(δ, Vd*)
assuming studies are “exchangeable”
Vd* determined by future study design
Predictive Distribution for d*:
p(d* | d) = ∫ p(d* | δ) p(δ | d) dδ = N(M*, V*)
M* = Mδ
V* = Vδ + Vd* sum of posterior and conditional variances
Phil Woodward 2014
91
Bayesian Study Design
(Prior Predictive Distribution)
Can make predictions based on “prior beliefs”
Prior Distribution:
Design Prior
p(δ) = N(δ0, ω2)
Conditional Distribution for d from planned study:
p(d | δ) = N(δ, Vd)
Unobserved at design stage
Predictive Distribution for d:
p(d) = ∫ p(d | δ) p(δ) dδ = N(M0, V0)
M0 = δ0
V0 = ω2 + Vd sum of prior and conditional variances
Phil Woodward 2014
92
Bayesian Study Design
(Assurance)
expected/marginal power
or predictive probability
Classical power:
Let C denote the event “reject null hypothesis”
Power = Pr[C | δ] i.e. a conditional probability
More generally, C can be any decision criteria
Refer to the earlier OC calculations
Bayesian predictive probability:
Pr[C] = ∫ Pr[C | δ] p(δ) dδ
The “unconditional” probability of C occurring
(although it is conditional on our prior beliefs)
“The probability, given our prior knowledge, that we
will meet the decision criteria at the end of the study.”
Phil Woodward 2014
93
Bayesian Study Design
(Assurance)
Consider original GO decision
Pr[C | d] = Pr[d – tπ se(d) > Δ]
p(d) = N(δ0, ω2 + Vd)
with vague Analysis Prior
prior predictive distribution
with informative Design Prior
Pr[C] = Pr[ N(δ0, ω2 + Vd) > tπ se(d) + Δ]
Pr[C] = Φ[δ0 – tπ se(d) – Δ) / (ω2 + Vd)½]
where Φ[.] is the standard Normal cdf
What if vague Design Prior,
i.e. ω very large?
Phil Woodward 2014
94
Bayesian Study Design
(Assurance)
Plot comparing classical (‘conditional power’) OC and assurance
ω
δ0
Phil Woodward 2014
95
Bayesian Study Design
(Assurance)
For superiority, Δ = 0, and noting z = t for large d.f.
Pr[C] = Φ[θ – zπ se(d)) / (ω2 + Vd)½]
same as Eq.3 in O’Hagan et al (2005)
For non-inferiority, Δ is negative
same as Eq.6 in O’Hagan et al (2005)
Phil Woodward 2014
96
Bayesian Study Design
(Interim Analysis)
Can make predictions after an interim analysis
Let estimate at interim (n subjects) be d’
Let estimate from part 2 (m subjects) be d*
p(d* | d’) = N(Mδ, Vδ + Vd*)
unobserved at
interim stage
predictive distribution
If vague prior at study start (Design Prior)
Mδ = d’
Vδ = Vd’
For a 2 arm PG or AB/BA XO
Vd* = 2σ2/m and Vd’ = 2σ2/n
Phil Woodward 2014
97
Bayesian Study Design
(Interim Analysis with vague Design & Analysis Priors)
Consider original GO decision (with Vd = 2σ2/N)
Pr[C | d] = Pr[d – tπ se(d) > Δ]
This criterion, C, can be expressed in terms of d’ and d*
(md* + nd’)/N – tπ σ(2/N)½ > Δ
At the interim stage d* is the only unknown
And so it is convenient to express C as
but Bayesians can
d* > (N½ tπ σ2½ + NΔ - nd’) / m
do better than this!
Classical “conditional power”, Pr[C | δ, d’]
Pr[ N(δ, 2σ2/m) > (N½ tπ σ2½ + NΔ - nd’) / m ]
1 - Φ[ (N/m)½ tπ + (NΔ - nd’ - mδ) / (σ(2m)½) ]
which, for Δ = 0 & tπ = zπ, is same as Eq.2 in Grieve (1991)
Phil Woodward 2014
98
Bayesian Study Design
(Interim Analysis with vague Design & Analysis Priors)
Bayesian predictive probability, Pr[C | d’]
Pr[ N(d’, 2σ2(n-1+m-1)) > (N½ tπ σ2½ + NΔ - nd’) / m ]
1 - Φ[ (n/m)½ {tπ - N½(d’ – Δ) / (σ2½)} ]
which, for Δ = 0 & tπ = zπ, is same as Eq.3 in Grieve (1991)
“The probability, given our knowledge at the interim, that we
will meet the decision criteria at the end of the study.”
Interim futility / success criteria could be based on this probability
e.g. futile if Pr[C | d’] < 0.2
success if Pr[C | d’] > 0.8
Phil Woodward 2014
99
Bayesian Study Design
(Interim analysis predictive probability)
Plot comparing ‘conditional power’ and predictive probability
following interim analysis (25/grp), vague prior distribution
Only differences to analysis
done prior to study start are:
1)OC curve conditional on both
delta and interim data
2)‘Belief distribution’ for delta
updated using interim data
Prior or Posterior
depends on one’s perspective
(‘Belief Distribution’)
could use informative design prior,
updated using interim data …
Vd’
d'
Phil Woodward 2014
100
Bayesian Study Design
(Interim Analysis with informative Design Prior)
If we allow an informative design prior at study start
refer back to
p(δ) = N(δ0, ω2)
Bayesian NLM reminders
p(d* | d’) = N(Mδ, Vδ + 2σ2/m)
Mδ = Vδ(δ0/ω2 + d’n/(2σ2))
1/Vδ = 1/ω2 + n/(2σ2)
Bayesian predictive probability, Pr[C | d’]
Pr[ N(Mδ, Vδ + 2σ2/m) > (N½ tπ σ2½ + NΔ - nd’) / m ]
1 - Φ[(N½ tπ σ2½ - nd’ - mMδ + NΔ) / (m(Vδ + 2σ2/m)½)
still with vague
analysis prior
Phil Woodward 2014
101
Bayesian Study Design
(using informative prior to reduce sample size)
Typically, only have informative prior for placebo response
Notation (in addition to that used previously)
γ is the placebo true mean response
p(γ) = N(γ0, ψ2) , the informative prior for γ
nA, nP are the number of subjects receiving active and placebo
The effective number of subjects this prior contributes is
… but is our
nγ = σ2 / ψ2
intuition correct?
which may be intuitive by re-expressing as
ψ2 = σ2 / nγ
Phil Woodward 2014
102
Bayesian Study Design
(using informative prior to reduce sample size)
If prior for treatment effect, δ, is vague then posterior
with
… left as
an exercise
to prove 
It can be seen that the informative prior is equivalent to
nγ additional placebo subjects with a sample mean of γ0
Phil Woodward 2014
103
Bayesian Study Design
(using informative prior to reduce sample size)
Worked example
Suppose predictive distribution (placebo prior)
p(γ) ~ N(18, 122)
Forecast residual standard deviation
(obtained in usual way, not shown here)
Design study in usual way,
σ = 70
ignoring informative prior.
Effective N of placebo prior
Eff.N = (70 / 12)2 = 34
Then reduce placebo arm
by 34 and have same
power / precision.
Phil Woodward 2014
104
Bayesian Study Design
(using informative prior to reduce sample size)
Unless no doubts at all, use Robust Prior
i.e. a mixture of informative and vague prior distributions
p(placebo mean) ~ 0.9 x N(18, 122) + 0.1 x N(18, 1202)
Represents 10% chance meta-data not exchangeable
in which case, will effectively revert to vague prior
(can also be thought of as heavy tailed distribution)
Also compute Bayesian p-value of data-prior compatibility
Pr( “> observed mean” | prior ~ N(18, 122) )
Note: predictive dist. for obs. mean ~ N(18, 122 + σ2 /nP)
Phil Woodward 2014
105
Bayesian Emax Model
dose/concentration response model
Phil Woodward 2014
106
Bayesian Emax Model
Emax model is often used for dose response data
even more common for concentration response data
in biological (non-clinical) context known as the
logistic or sigmoidal curve
More generally, could be used to model a monotonic
relationship between response and covariate
initially the response changes very slowly with the covariate
then the response changes much more rapidly
finally the response slows again as a plateau is reached
Phil Woodward 2014
107
Bayesian Emax Model
λ sometimes referred
to as ‘Hill slope’
when λ = 1 need
~80 fold range to
cover ED10 to ED90
approximately linear
on log-scale between
ED20 and ED80
Phil Woodward 2014
108
Bayesian Emax Model
Convergence issues are common with MLE of Emax models
Hill coefficient not restrained
Fitted Curve
no data on upper asymptote
7
7
6
6
5
5
Response
Response
Fitted Curve
Hill coefficient = 1
4
4
3
3
2
2
1
1
1
10
100
1000
Concentration (ug/m L)
1
10
100
1000
Concentration (ug/mL)
Most clinical data more variable than this and smaller dose range
Classical fitting algorithms can fail to provide any solution
Phil Woodward 2014
109
Bayesian Emax Model
Prior distributions required for all parameters
E0 : placebo (negative control) response
utilise historical data as discussed earlier
Emax : maximum possible effect relative to E0
typically vague, similar approach to treatment effect prior
ED50: dose that gives 50% of Emax effect
e.g. log-normal centred
on mid/low dose
90% CI (0.1, 10)GM
could be weakly informative
based on same information used to choose dose range
λ determines gradient of dose response
e.g. log-normal
centred on 1
90% CI (0.5, 2)
typically needs to be very informative
clinical data rarely provides much information regards λ
Phil Woodward 2014
110
Bayesian Emax Model
(WinBUGS code)
{
priors should be checked
for appropriateness in
each particular case
### Priors
prec
~ dgamma( 0.001, 0.001 ) ; sigma <- pow(prec, -0.5)
E0
~ dnorm( 0, 1.0E-6 ) #... but could be informative for placebo mean response
Emax
~ dnorm( 0, 1.0E-6 ) #... typically vague
log.ED50 ~ dnorm( ???, 1.4 )
ED50 <- exp( log.ED50 )
#... Gives 90% CI of (0.1, 10) x exp(???)
log.Hill ~ dnorm( 0, 0.42 )
Hill
<- exp( log.Hill )
#... Gives 90% CI of (0.5, 2)
### Likelihood
for (i in 1:N) {
Y[i] ~ dnorm(mu[i], prec)
mu[i] <- E0 + (Emax * pow( X[i], Hill) ) / ( pow( X[i], Hill ) + pow( ED50, Hill ) )
}
}
Bayesian Emax Model
(WinBUGS code)
{
… code as before …
### Quantities of potential interest
for (i in 1:N.doses) {
### effect over placebo for pre-specified doses (values entered as data in node DOSE)
effect[i] <- (Emax * pow( DOSE[i], Hill) ) / ( pow( DOSE[i], Hill ) + pow( ED50, Hill ) )
### predicted mean response for pre-specified doses
DOSE.mean[i] <- effect[i] + E0
### probability of exceeding pre-specified effect of size ??? (mean of node DOSE.PrEffBig)
DOSE.PrEffBig[i] <- step( effect[i] - ??? )
}
### estimate dose giving effect of size ???
DOSE.BigEff.0 <- ED50 * ( pow( ???, 1/Hill ) ) / ( pow( Emax - ???, 1/Hill ) )
# set to LARGE dose (LARGE pre-specified) if ??? > Emax
DOSE.BigEff <- DOSE.BigEff.0 * step( Emax - ???) + LARGE * step( ??? – Emax )
}
BugsXLA
Emax models
Pharmacology Biomarker Experiment
Phil Woodward 2014
113
BugsXLA
(case study 7.1)
Phil Woodward 2014
114
BugsXLA
(case study 7.1)
Fixed & random
effects Emax models
can be fitted using
BugsXLA.
Details not covered
in this course.
Phil Woodward 2014
115
References
Bolstad, W.M. (2007). Introduction to Bayesian Statistics. 2nd Edition. John Wiley & Sons, New York.
Gelman, A., Carlin, J.B., Stern, H.S. and Rubin, D.B. (2004). Bayesian Data Analysis. 2nd Edition. Chapman &
Hall/CRC. (3rd Edition now available).
Grieve, A. (1991). Predictive probability in clinical trials. Biometrics, 47, 323-330
Lee, P.M. (2004). Bayesian Statistics: An Introduction. 3rd Edition. Hodder Arnold, London, U.K.
Neuenschwander, B., Capkun-Niggli, G., Branson, M. and Spiegelhalter, D.J. (2010). Summarizing historical
information on controls in clinical trials. Clinical Trials; 7: 5-18
Ntzoufras, I. (2009). Bayesian Modeling Using WinBUGS. John Wiley & Sons, Hoboken, NJ.
O’Hagan,A., Stevens,J. and Campbell,M. (2005). Assurance in clinical trial design. Pharmaceut. Statist. 4, 187201
Spiegelhalter, D., Abrams, K. and Myles,J. (2004). Bayesian Approaches to Clinical Trials and Health-Care
Evaluation. John Wiley & Sons, New York.
Woodward, P. (2012). Bayesian Analysis Made Simple. An Excel GUI for WinBUGS. Chapman & Hall/CRC.
Phil Woodward 2014
116