Why join ASA?
Download
Report
Transcript Why join ASA?
Introduction to Bayesian Methods (I)
C. Shane Reese
Department of Statistics
Brigham Young University
Outline
Definitions
Classical or Frequentist
Bayesian
Comparison (Bayesian vs. Classical)
Bayesian Data Analysis
Examples
Definitions
Problem: Unknown population parameter (θ)
must be estimated.
EXAMPLE #1:
θ = Probability that a randomly selected person will
be a cancer survivor
Data are binary, parameter is unknown and
continuous
EXAMPLE #2:
θ = Mean survival time of cancer patients.
Data are continuous, parameter is continuous.
Definitions
Step 1 of either formulation is to pose a statistical (or
probability)model for the random variable which
represents the phenomenon.
EXAMPLE #1:
a reasonable choice for f (y|θ) (the sampling density or
likelihood function) would be that the number of 6
month survivors (Y) would follow a binomial distribution
with a total of n subjects followed and the probability of
any one subject surviving is θ.
EXAMPLE #2:
a reasonable choice for f (y|θ) survival time (Y) has an
exponential distribution with mean θ.
Classical (Frequentist)
Approach
All pertinent information enters the problem
through the likelihood function in the form of
data(Y1, . . . ,Yn)
n
f (y1,..., yn | q ) = Õ f (yi | q )
objective in nature
i=1
software packages all have this capability
maximum likelihood, unbiased estimation, etc.
confidence intervals, difficult interpretation
Bayesian Data Analysis
data (enters through the likelihood function as well
as allowance of other information
n
p(q | y1,..., yn ) = cons * Õ f (yi | q )p (q )
i=1
reads: the posterior distribution is a constant
multiplied by the likelihood muliplied by the prior
Distribution
posterior distribution: in light of the data our
updated view of the parameter
prior distribution: before any data collection, the
view of the parameter
Additional Information
Prior Distributions
can come from expert opinion, historical studies,
previous research, or general knowledge of a
situation (see examples)
there exists a “flat prior” or “noninformative” which
represents a state of ignorance.
Controversial piece of Bayesian methods
Objective Bayes, Empirical Bayes
Bayesian Data Analysis
inherently subjective (prior is controversial)
few software packages have this capability
result is a probability distribution
credible intervals use the language that
everyone uses anyway. (Probability that θ is in
the interval is 0.95)
see examples for demonstration
Mammography
o Sensitivity:
o True Positive
o Cancer ID’d!
o Specificity:
Patient
o True Negative Status
o Healthy not ID’d!
Test Result
Positive Negative
Cancer 88%
12%
Healthy 24%
76%
Mammography Illustration
My friend (40!!!) heads into her OB/GYN for a
mammography (according to Dr.’s orders) and
finds a positive test result.
Does she have cancer?
Specificity, sensitivity both high! Seems likely ... or
does it?
Important points: incidence of breast cancer in
40 year old women is 126.2 per 100,000 women.
Bayes Theorem for
Mammography
Pr(positive | cancer)Pr(cancer)
Pr(cancer | positive) =
Pr(positive)
Pr(positive | cancer)Pr(cancer)
=
Pr(positive | cancer)Pr(cancer)+ Pr(positive | healthy)Pr(healthy)
0.88(0.00126)
=
= 0.00461!!!
0.88(0.00126) + 0.24(0.99874)
Mammography Tradeoffs
Impacts of false positive
Stress
Invasive follow-up procedures
Worth the trade-off with less than 1%
(0.46%)chance you actually have cancer???
Mammography Illustration
My mother-in-law has the same diagnosis in 2001.
Holden, UT is a “downwinder”, she was 65.
Does she have cancer?
Specificity, sensitivity both high! Seems likely ... or
does it?
Important points: incidence of breast cancer in
65 year old women is 470 per 100,000 women,
and approx 43% in “downwinder” cities.
Does this change our assessment?
Downwinder
Mammography
Pr( positive | cancer)Pr(cancer)
Pr( positive)
Pr( positive | cancer)Pr(cancer)
=
Pr( positive | cancer)Pr(cancer) + Pr( positive | healthy)Pr(healthy)
0.88(0.42)
=
= 0.7251!!!
0.88(0.42) + 0.24(0.58)
Pr(cancer | positive) =
Modified Example #1
One person in the class stand at the back and
throw the ball tothe target on the board (10
times).
before we have the person throw the ball ten times
does the choice of person change the a priori belief
you have about the probability they will hit the
target (θ)?
before we have the person throw the ball ten times
does the choice of target size change the a priori
belief you have about the probability they will hit the
target (θ)?
Prior Distributions
a convenient choice for this prior information is
the Beta distribution where the parameters
defining this distribution are the number of a priori
successes and failures. For example, if you
believe your prior opinions on the success or
failure are worth 8 throws and you think the
person selected can hit the target drawn on the
board 6 times, we would say that has a Beta(6,2)
distribution.
Bayes for Example #1
if our data are Binomial(n, θ) then we would
calculate Y/n as our estimate and use a
confidence interval formula for a proportion.
If our data are Binomial(n, θ) and our prior
distribution is Beta(a,b), then our posterior
distribution is Beta(a+y,b+n−y).
thus, in our example:
a=
b=
n=
and so the posterior distribution is: Beta( , )
y=
Bayesian Interpretation
Therefore we can say that the probability that θ is
in the interval ( , ) is 0.95.
Notice that we don’t have to address the problem
of “in repeated sampling”
this is a direct probability statement
relies on the prior distribution
Example: Phase II Dose
Finding
Goal:
Fit models of the form:
yd = f (d) + e d
Where
e d ~ N(0, s 2 )
And d=1,…,D is the dose level
Definition of Terms
ED(Q):
Lowest dose for which Q% of efficacy is achieved
Multiple definitions:
Def. 1
Def. 2
arg min
( f (0) + Q( f (d) - f (0))
d Î{1,..., D}
Q( f (dmax ) - f (0))
Example: Q=.95, ED95 dose is the lowest dose for
which .95 efficacy is achieved
Classical Approach
Completely randomized design
nd º n
Perform F-test for difference between
groups
If significant at a = 0.05 , then call the trial a
“success”, and determine the most
effective dose as the lowest dose that
achieves some pre-specified criteria (ED95)
Bayesian Adaptive
Approach
Assign patients to doses adaptively based on the
amount of information about the dose-response
relationship.
Goal: maximize expected change in information
gain:
DSDd = D V(Yd ) Pr ( d = ED95 )
Weighted average of the posterior variances
and the probability that a particular dose is the
ED95 dose.
Probability of Allocation
Assign patients to doses based on
rd =
DSDd
D
å DSD
d
d =1
Where rd is the probability of being assigned to
dose d
Four Decisions at Interim Looks
Stop trial for success: the trial is a
success, let’s move on to next phase.
Stop trial for futililty: the trial is going
nowhere, let’s stop now and cut our
losses.
Stop trial because the maximum
number of patients allowed is
reached (Stop for cap): trial outcome
is still uncertain, but we can’t afford to
continue trial.
Continue
Stop for Futility
The dose-finding trial is stopped
because there is insufficient evidence
that any of the doses is efficacious.
If the posterior probability that the
mean change for the most likely ED95
dose is within a “clinically meaningful
amount” of the placebo response is
greater than 0.99 then the trial stops
for futility.
Stop for Success
The dose-finding trial is stopped when
the current probability that the ED95*
is sufficiently efficacious is sufficiently
high.
If the posterior probability that the
most likely ED95 dose is better than
placebo reaches a high value (0.99)
or higher then the trial stops early for
success.
Note: Posterior (after updated data)
probability drives this decision.
Stop for Cap
Cap: If the sample size reaches the
maximum (the cap) defined for all
dose groups the trial stops.
Refine definition based on
application. Perhaps one dose
group reaching max is of interest.
Almost always $$$ driven.
Continue
Continue: If none of the above
three conditions hold then the trial
continues to accrue.
Decision to continue or stop is
made at each interim look at the
data (accrual is in batches)
Benefits of Approach
Statistical: weighting by the
variance of the response at each
dose allows quicker resolution of
dose-response relationship.
Medical: Integrating over the
probability that each dose is ED95
allows quicker allocation to more
efficacious doses.
Example of Approach
Reduction in average number of
events
Y=reduction of number of events
D=6 (5 active, 1 placebo)
Potential exists that there is a nonmonotonic dose-response
relationship.
Let n (d) be the dose value for dose
d.
Model for Example
(
Yi ~ q di + N 0, s
q 0 ~ N ( 0, t
2
)
2
)
q d ~ N (q d -1,[v(d) - v(d - 1)]t
t ~ IG ( 0.001,1000 )
2
s ~ IG ( 0.001,1000 )
2
2
)
Dynamic Model Properties
Allows for flexibility.
Borrows strength from
“neighboring” doses and similarity
of response at neighboring doses.
Simplified version of Gaussian
Process Models.
Potential problem: semiparametric, thus only considers
doses within dose range:
m0 + 0.95(m - m0 )
*
Example Curves
?
Simulations
5000 simulated trials at each of the
5 scenarios
Fixed dose design,
nd º n = 130
Bayesian adaptive approach as
outlined above
Compare two approaches for
each of 5 cases with sample size,
power, and type-I error
Results (power & alpha)
Case
Pr(S)
Pr(F)
Pr(cap)
P(Rej)
1
.018
.973
.009
.049
2
1
0
0
.235
3
1
0
0
.759
4
1
0
0
.241
5
1
0
0
.802
Results (n)
0
10
20
40
80
120
1
51.6 26.1
26.2
31.2
33.5
36.8
2
28.4 10.9
13.8
18.9
22.5
19.2
3
27.7 11.3
14.5
25.2
17
15.2
4
31.2 10.8
13.3
19.6
22.2
27.8
5
28.9 18.0
22.3
21.1
14.5
10.7
Fixed
130
130
130
130
130
130
Observations
Adaptive design serves two purposes:
Get patients to efficacious doses
More efficient statistical estimation
Sample size considerations
Dose expansion -- inclusion of safety
considerations
Incorporation of uncertainties!!!
Predictive inference is POWERFUL!!!
Conclusions
Science is subjective (what about the choice of
a likelihood?)
Bayes uses all available information
Makes interpretation easier
BAD NEWS: I have showed very simple cases . . .
they get much harder.
GOOD NEWS: They are possible (and practical)
with advanced computational procedures