Bayesian statistics
Download
Report
Transcript Bayesian statistics
Advanced Statistics
for Interventional
Cardiologists
What you will learn
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Introduction
Basics of multivariable statistical modeling
Advanced linear regression methods
Hands-on session: linear regression
Bayesian methods
Logistic regression and generalized linear model
Resampling methods
Meta-analysis
Hands-on session: logistic regression and meta-analysis
Multifactor analysis of variance
Cox proportional hazards analysis
Hands-on session: Cox proportional hazard analysis
Propensity analysis
Most popular statistical packages
Conclusions and take home messages
1st day
2nd day
Bayesian methods
•
•
•
•
Key concepts and scope
Frequentists vs Bayesians
Current applications
Future applications
Bayesian methods
•
•
•
•
Key concepts and scope
Frequentists vs Bayesians
Current applications
Future applications
Hype or hope?
PubMed query on 10 January 2009
Who was Bayes?
1702–1761
Bayes T. Essay towards solving a problem in the
doctrine of chances. Philos Trans R Soc Lond
1763;53:370–418.
The Bayes theorem
The Bayes theorem
The main feature of Bayesian
statistics is that it takes into account
prior knowledge of the hypothesis
Bayes theorem
Likelihood of
hypothesis (or
conditional
probability of B)
Prior (or marginal)
probability
of hypothesis
P (H | D) P (D | H) * P (H)
_____________
P (H | D) =
P (D)
Posterior (or conditional)
probability of hypothesis H
Probability of the data (prior
or marginal probability of B:
normalizing constant)
Thus it relates the conditional and marginal probabilities
of two random events and it is often used to compute
posterior probabilities given observations.
In other words
Odds that hypothesis is true before
seeing the data (Prior Odds)
Odds that hypothesis is
“Subjective”
component
X
X
Bayes factor
Bayes factor
Final (posterior) odds that
the hypothesis is true)
Bayes Theorem
Data
component
(Evidence)
Key concepts
• Resampling: Given a set of values obtained by drawing from a probability
distribution (for example, the sample obtained at step n of a sequential Monte Carlo
scheme); resampling is the procedure of drawing a new sample from these values,
typically according to appropriate resampling weights.
• Probability distribution: The probability measure obtained by considering the
probabilities P[X ∈ A] of a random object X taking on values in various regions A.
• Conditional probability: The conditional probability P[A|B] of A given B is the ratio
P[A and B]/P[B] of the probability of both A and B to the probability of B.
• Prior distribution: The probability distribution of an unknown parameter before
data is observed (in the Bayesian paradigm of statistics, the prior distribution
expresses one’s beliefs about what value might be taken by the parameter).
• Posterior distribution: The conditional probability distribution of an unknown
parameter after data is observed, and hence conditional on that data. Thus if we
observe the result Y = y then the posterior distribution of lying in the region A is
given by the conditional probability P[ ∈ A|Y = y].
Prior pitfalls
• Flat (uninformative) priors mean that the
posterior probability is directly proportional
to the likelihood
• The value of H at the peak of the posterior
distribution is equal to the maximum
likelihood estimation of H
• Informative priors can have a strong effect
on posterior probabilities
Do you already practice
Bayesian methods?
Gill et al, BMJ 2005
Do you already practice
Bayesian methods?
A non-invasive test for a plaque vulnerability generates the
following results: if a tested patient has the disease, the test
returns a positive result 99% of the time, or with probability
0.99; if a tested patient does not have the disease, the test returns
a positive result 5% of the time, or with probability 0.05.
Naively, one might think that only 5% of positive test results are
false, but that is quite wrong.
Suppose that only 0.1% of the population has vulnerable plaques,
so that a randomly selected patient has a 0.001 prior probability of
having the disease.
We can use Bayes' theorem to calculate the probability that a
positive test result is a false positive…
Do you already practice
Bayesian methods?
Let A represent the condition in which the patient has vulnerable
plaques, and B represent the evidence of a positive test result.
Then, probability that the patient actually has the disease given the
positive test result is
hence the probability that a positive result is a
false positive is about 1 − 0.019 = 0.98, or 98%
Why Bayes?
• On-line learning (ideal for adapting)
• Predictive probabilities (including
modeling outcome relationships)
• Synthesis (via hierarchical modeling,
for example)
Current stance
Current stance
Wijeysundera et al, JCE 2009
Typical output data
• For a typical Bayesian data analysis, we
might summarize our findings by reporting:
the posterior mean and mode,
several important posterior percentiles
(corresponding to, say, probability levels
.025, .25, .50, .75, and .975),
a plot of the posterior itself if it is
multimodal, highly skewed, or otherwise
badly behaved, and possibly
posterior probabilities that arise from the
context of the problem.
Example from Diamond
and Kaul’s spreadsheet
1
1: enter prior (eg subjective) expected benefit or disadvantage of
treatment A vs B (eg 20% decrease -> 0.80; 30% increase -> 1.3)
Example from Diamond
and Kaul’s spreadsheet
2
2: read log odds ratio (OR) and its standard deviation (SD) computed
from data entered at 1
Example from Diamond
and Kaul’s spreadsheet
3
3: enter data from 2 into “Prior”
Example from Diamond
and Kaul’s spreadsheet
4
4: enter results of current study
Example from Diamond
and Kaul’s spreadsheet
5
5: read frequency distribution of prior, data and posterior probabilities
Example from Diamond
and Kaul’s spreadsheet
6
6: test hypothesis that A is better or worse than B
Example from Diamond
and Kaul’s spreadsheet
7
8
7 & 8: verify when posterior probability changes significantly
depending on variability (7) or point estimate (8) of prior
probability
Empirical and hierarchical
Bayes methods
In addition to likelihood, Bayesian analysis depends on a
prior distribution for the model parameters. This prior can be
non-parametric or parametric, depending on unknown
parameters which may in turn be drawn from some secondstage prior.
This sequence of parameters and priors constitutes a
hierarchical model. The hierarchy must stop at some point,
with all remaining prior parameters assumed known. Rather
than make this assumption, the empirical Bayes (EB)
approach uses the observed data to estimate these final
stage parameters (or to directly estimate the Bayes decision
rule) and then proceeds as though the prior were known.
Empirical Bayes methods
Thus, empirical Bayes methods are a class of methods
which use empirical data to evaluate / approximate the
conditional probability distributions that arise from
Bayes' theorem.
These methods allow one to estimate quantities
(probabilities, averages, etc.) about an individual member of
a population by combining information from empirical
measurements on the individual and on the entire
population.
Empirical Bayes methods, by relying little or nothing on
prior/subjective assumptions, may be more robust and
reliable than other Bayes methods.
Bayesian inference
• Statistical model f(y│θ) for how data, y, are
related to unknown parameters, θ.
Typical parameters are means and
standard deviations for continuous y and
proportions for binary y.
• Prior belief p(θ) about parameters which
gives a probability distribution expressing
which values of θ are most likely a priori (ie
before we see the data).
Bayesian inference
• Posterior distribution p(θ│y) combines data
model and prior using Bayes’ rule.
• Estimates of θ, standard errors, intervals all
come from p(θ│y).
• Simple principle: for many Bayesian
statistical models, posterior estimate is
weighted average of the prior estimate and
the standalone (frequentist) estimate.
Simple example
• Two arm clinical trial: parameter, θ, is the
average treatment versus control difference
for a continuous outcome.
• Prior estimate: θ=4.0 (95%CI -2.0, 10.0).
• Data from trial give standalone estimate of
θ=3.5 (95%CI -0.5, 7.5).
• Posterior estimate: θ=3.7 (95% CI 0.3-7.0).
• Posterior estimate pulled towards prior
mean, with posterior interval narrower.
Inference for simple example
Incremental trials
• Change setting slightly: suppose we have
an original trial that was suggestive but not
conclusive.
• We want to design an incremental trial to
generate additional data.
• Would like to use data from both trials in the
analysis.
Incremental trials with full borrowing
(all patients are considered similar)
Incremental trials with partial borrowing
(discount factors varying depending
on pre-hoc decisions)
Multiple sources of data
• What if we have more than one original trial?
• We can combine inference from these trials
together with incremental data to produce
posterior inferecne in exact same way.
• Should different trials have different discount
factors?
• This is similar to meta-analysis, but important
difference is: main interest is on inference
on incremental trial.
Subgroup culling
• An incremental trial may focus on a
promising subgroup from an original trial.
• We would like to perform inference for
subgroup in incremental trial borrowing
possibly more from the same subgroup in
the original trial and less from the other
subgroups.
Interim summary
• Goal of Bayesian analysis in an incremental
clinical trial: combine standalone inference
for new data with some evidence from
previous trials and possibly other
subgroups.
• Question: how much to discount the
evidence from all these prior sources of
information in achieving this goal?
Flexible borrowing
• Common sense approach: amount of
borrowing should depend on similarity of
prior source to incremental data.
• We use this principle instinctively in our
daily lives, so why not in statistics?
• Would never go into a negotiation or
diagnosis with the idea of relying exactly
25% on previous knowledge, and 75% on
new information.
Hierarchical modeling
• Statisticians use hierarchical models to
achieve flexible pooling of data sources.
• Amount of pooling is data-driven: more
similarity of prior data implies more weight
will be given to it in determining the
posterior inference.
• Recommended approach of FDA in Draft
Guidance for Bayesian Clinical Trials of
Medical Devices.
Bridging the divide
• FDA: Bayesian designs must preserve
standard operating characteristics required
of all clinical trials:
Adequate power (low type II error) when
treatment effect is positive
Low type I error when no treatment effect.
• Problem is that a fixed weight prior
distribution centered on a positive treatment
effect inflates type I error.
Central concepts of
hierarchical modeling
• If incremental data are quite different from prior
data, model will analyze incremental data in
most standalone manner (keeps type I error
low)
• If studies are quite similar, model will borrow
more from prior (increases power).
• Amount of borrowing from prior studies is not
predetermined, but is a function of their
similarity to the incremental data.
Hierarchical modeling with culling
(subgroup pooling): the case of
moderate pooling
Hierarchical modeling with culling
(subgroup pooling): the case of
minimal pooling
Hierarchical models
• Each data source/subgroup is assumed to
have a unique value of θ.
• These unique values θ1, θ2, θ3, … are not
allowed to be arbitrarly disparate by
constraining them to follow a distribution.
• High dimensional space even for fairly
simple models.
• Analytic calculations for Bayesian
hierarchical modeling typically impossible
Advantages of Bayesian
hierarchical models
• Hierarchical models give ability to flexibly
pool new studies with previous ones.
• Amount of similarity will dictate the amount
of borrowing from previous studies in
analyzing the results of the new one.
• A Bayesian approach is natural, and the
hierarchical model structure allows type I
error to be kept to standard low levels
Empirical Bayes
In addition to the likelihood, Bayesian analysis depends on a prior
distribution for the model parameters. This prior can be
nonparametric or parametric, depending on unknown parameters
which may in turn be drawn from some second-stage prior.
This sequence of parameters and priors constitutes a hierarchical
model.
The hierarchy must stop at some point, with all remaining prior
parameters assumed known.
Rather than make this assumption, the empirical Bayes (EB)
approach uses the observed data to estimate these final stage
parameters (or to directly estimate the Bayes decision rule) and
then proceeds as though the prior were known.
Importance of computing
• Important factors converging to make
Bayesian inference feasible for hierarchical
models:
Moore’s law (computing power has increased
exponentially: better than 10,000 fold increase
since 1980.
Focus on Monte Carlo (ie sampling-based
simulations) methods for Bayesian inference
Development of Gibbs sampling and other
Markov chain-based simulation techniques
Bayes and Markow at Monte Carlo
Determination of posterior distributions in
most Bayesian analyses comes down to the
evaluation of complex, often high-dimensional
integrals.
Thus, in all but the simplest model settings some intractable
integrations remain. The earliest solution involved asymptotic
methods to obtain analytic approximations. The simplest
such result is to use a normal approximation to the posterior,
a Bayesian version of the Central Limit Theorem. More
complicated asymptotic techniques enable more accurate,
possibly asymmetric posterior approximations.
Bayes and Markow at Monte Carlo
As a result, most Bayesians have turned to
Monte Carlo integration methods.
Even more powerful are the more recently
developed iterative Monte Carlo methods,
including the Gibbs sampler.
These methods produce a Markov chain, the output of which
corresponds to (through a correlated sequence of random
numbers) a sample from the joint posterior distribution.
Monte Carlo simulations to approximate
Bayesian posterior probability
Suppose we didn’t know how to analytically integrate
the beta(eg 170, 130) posterior…
100 samples
True posterior
….but we
do know 10,000
howsamples
to simulate
from a beta
mean = 0.569
0.4
0.6
theta
mean = 0.566
0.8 0.4
0.6
theta
mean=0.567
0.8 0.4
0.6
theta
0.8
The other side of the moon
• An irony of the Bayesian renaissance is that some Bayesian
statisticians have become a major obstacle to the dissemination
of Bayesian approaches.
• They have done this by agonizing over representations of
ignorance (rather than information) and by insisting on precision
in prior specification and analytical computation far beyond
anything required of frequentist methods or by the messy
problems of observational data analysis.
• Their preferred methods—Markov-Chain Monte Carlo
(MCMC)—require special software, have their own special
problems, obscure important parallels between frequentist and
Bayesian methods, and are unnecessary for the imprecise data
and goals of everyday epidemiology (which is largely only semiquantitative inference about an adjusted risk comparison).
Bayesian methods
•
•
•
•
Key concepts and scope
Frequentists vs Bayesians
Current applications
Future applications
The Bayes theorem
Frequentists vs Bayesians
Frequentists vs Bayesians
Frequentists vs Bayesians
Frequentists vs Bayesians
Bayes vs Fisher
Goodman, Annals Intern med 1999
“Classical” statistical inference
vs Bayesians inference
“Classical” statistical inference
vs Bayesians inference
Issue
External
information
Parameter
Basic question
Reporting
statistical results
Interim analyses
Interim
predictions
Dealing with
subsets in trials
Frequentist
Informally used in design
Bayesian
Used formally by specifying a prior
probability distribution
A fixed state of nature
An unknown quantity which can have
a probability distribution
“How likely are these data given “How likely is a particular value of the
a particular value of the parameter given these data?
parameter?”
Likelihood functions, p-values, Plots of posterior distributions of the
confidence intervals
parameter, calculation of specific
posterior probabilities of interest, and
use of the posterior distribution in
formal decision analysis.
# of analyses dictates overall and Probability and credible interval
nominal significance levels and calculations not affected by the
repeated confidence intervals.
number or timing of interim analyses.
Conditional power analyses
Predictive probability of getting a firm
conclusion.
Adjusted p-values
Subset effects shrunk towards zero by
(e.g. Bonferroni)
a “skeptical” prior
Bayes’ advantages
• The "objectivity" of frequentist statistics has been obtained by disregarding
any prior knowledge about the process being measured. Yet in science
there usually is some prior knowledge about the process being measured.
Throwing this prior information away is wasteful of information. Bayesian
statistics uses both sources of information; the prior information we
have about the process and the information about the process contained
in the data. They are combined using Bayes’ theorem.
• It allows direct probability statements about the parameters. This is
much more useful to a scientist than the confidence statements allowed
by frequentist statistics.
• It has a single tool, Bayes’ theorem, which is used in all situations.
• It often outperform frequentist methods, even when judged by
frequentist criteria.
• It has a straightforward way of dealing with nuisance parameters.
They are always marginalized out of the joint posterior distribution.
• It gives the way to find the predictive distribution of future
observations. This is not always easily done in a frequentist way.
Bayes’ advantages
• The "objectivity" of frequentist statistics has been obtained by disregarding
any prior knowledge about the process being measured. Yet in science
there usually is some prior knowledge about the process being measured.
Throwing this prior information away is wasteful of information. Bayesian
In other words, the frequentist approach
statistics uses both sources of information; the prior information we
have about
the process andtesting
the information
process contained
to hypothesis
givesabout
thethe
likelihood
in the data. They are combined using Bayes’ theorem.
• It allows direct probability statements about the parameters. This is
the Bayesian
approach
gives
the allowed
much more useful
to a scientist than
the confidence
statements
by frequentist statistics.
probability that the hypothesis fits the
• It has a single tool, Bayes’ theorem, which is used in all situations.
data even when judged by
• It often outperform frequentist methods,
frequentist criteria.
• It has a straightforward way of dealing with nuisance parameters.
They are always marginalized out of the joint posterior distribution.
• It gives the way to find the predictive distribution of future
observations. This is not always easily done in a frequentist way.
that the data fit the hypothesis, whereas
Bridging the divide
• The statistician faces many challenges: designing complex
studies, summarizing complex data, fitting probability models,
drawing conclusions about the present, and making predictions.
• Currently, most statistical analyses are performed with software
packages which use methods based on a classical, or
frequentist, statistical philosophy.
• In this framework, maximum likelihood estimates (MLEs) and
hypothesis tests based on p-values figure prominently.
• Against this background, the Bayesian approach to statistical
design and analysis is emerging as an increasingly effective and
practical alternative to the frequentist one.
• Due to computing advances which enable Bayesian designs and
analyses, the battles between frequentists and Bayesians that
were once common are being replaced by a single, more
eclectic approach.
Bayesian methods
•
•
•
•
Key concepts and scope
Frequentists vs Bayesians
Current applications
Future applications
Selected history of Bayesian
trials: updated 2004!
• Medical devices (30+)
• 200+ at M.D. Anderson (Phase I, II, I/II)
• Cancer & Leukemia Group B
• Pharma
– ASTIN (Pfizer)
– Pravigard PAC (BMS)
– Other
• Decision analysis (go to phase III?)
Advantages of greater use of Bayesian
methods in clinical research
• Learn faster: more efficient trials
• Spend less: more efficient
drug/device development
• Minimize alpha error inflation
• Reduce ethical issues: better
treatment of patients in clinical trials
ADAPTIVE DESIGNS:
Approach and Methodology
•
•
•
•
•
Look at the accumulating data
Update probabilities
Find predictive probabilities
Use backward induction
Simulate to find false positive
rate and statistical power
Consequences of Bayesian
Adaptive Approach
• Fundamental change in way we
do medical research
• More rapid progress
• We’ll get the dose right!
• Better treatment of patients
• . . . at less cost
ADAPTIVE RANDOMIZATION
Giles, et al JCO (2003)
• Troxacitabine (T) in acute myeloid
leukemia (AML) combined with cytarabine
(A) or idarubicin (I)
• Adaptive randomization to:
IA vs TA vs TI
• Max n = 75
• End point: Time to CR (< 50 days)
Adaptive vs tailored balanced design
w/same false-positive rate & power
(Mean number patients by arm)
ORR
Arm
Adaptive
Balanced
Savings
pS = 0.20
pE = 0.35
Std Exp
pS = 0.30
pE = 0.45
Std Exp
pS = 0.40
pE = 0.55
Std Exp
68 168 79 178 74 180
171 171 203 203 216 216
103 3 124 25 142 36
Facing key problems
Kpozehouen et al, AJE 2005
Usefulness for mega trials
Usefulness for mega trials
Diamond et al, JACC 2004
Incorporating skepticism into EBM
Incorporating skepticism into EBM
Incorporating skepticism into EBM
Incorporating skepticism into EBM
Impact of Bayesian methods
in current EBM
Bayes & model selection
• Bayesian Model Selection
Posterior odds and posterior model
probabilities
Utility measures
Predictive criteria
• Model Selection Criteria
Akaike Information Criterion (AIC)
Bayes Information Criterion (BIC)
Other criteria
Bayesian information criterion (BIC)
Bayesian decision model
Pros and Cons: the AMIHOT II
Pros and Cons: the AMIHOT II
Pros and Cons: the AMIHOT II
Pros and Cons: the AMIHOT II
Pros and Cons: the AMIHOT II
Another real-world application
Another real-world application
Bayesian methods
•
•
•
•
Key concepts and scope
Frequentists vs Bayesians
Current applications
Future applications
FDA and Bayes: dispelling myths
•Does the FDA Center for Devices and Radiological Health (CDRH)
entertain only Bayesian submissions? No, only 5% are Bayesian.
•Are most of the CDRH statisticians Bayesian? No.
•Do the Bayesians in CDRH do only Bayesian submissions? No.
•Does saying “Bayesian statistics” lead automatically to approval? No.
•Does CDRH force to do Bayesian approaches? No (although it may be
least clinically burdensome).
•Is there a lower success criterion for Bayesian submission? No,
however, there is a different one. If a standard statistical analysis and a
Bayesian analysis were to always yield the same basic conclusion, there
would be no reason for Bayesian approachs. Often in the Bayesian
approach there is prior information that is ignored in the frequentist one.
Bayesian updating tools for
real-time safety monitoring
Softwares
•FAST*PRO
•GAUSS
•R
•BUGS (Bayesian inference Using Gibbs
Sampling)
•S-PLUS
•SAS
•STATA
•WinBUGS
Questions?
Take home messages
•Bayesian statistics provides a rational
theory of personal beliefs compounded
with real world data in the context of
uncertainty.
•It primarily states that utility maximization
should found rational decision-making in
conjunction with Bayes’ theorem.
•The appreciation of Bayesian methods is
growing fast both inside and outside the
statistics community.
Take home messages
•For scientists with little or no formal
statistical background, Bayesian methods
are being discovered as a viable method
for approaching their problems.
•However, computational issues and
reliance on different priors and model
specifications limit their current role in
clinical research.
•Thus, Bayesian methods are still mostly
confined to regulatory submissions or
complex meta-analytic endeavors.
Before the next break, a question
for you: who is a Bayesian?
Before the next break, a question
for you: who is a Bayesian?
A Bayesian is who, vaguely
expecting a horse, and catching a
glimpse of a donkey, strongly
believes he has seen a mule
Before the next break, a question
for you: who is a Bayesian?
A Bayesian is who, vaguely
expecting a horse, and catching a
glimpse of a donkey, strongly
believes he has seen a mule
Before the next break, a question
for you: who is a Bayesian?
A Bayesian is who, vaguely
expecting a horse, and catching a
glimpse of a donkey, strongly
believes he has seen a mule
For further slides on these topics
please feel free to visit the
metcardio.org website:
http://www.metcardio.org/slides.html