Why join ASA?

Download Report

Transcript Why join ASA?

Introduction to Bayesian Methods (I)
C. Shane Reese
Department of Statistics
Brigham Young University
Outline
 Definitions
 Classical or Frequentist
 Bayesian
 Comparison (Bayesian vs. Classical)
 Bayesian Data Analysis
 Examples
Definitions
 Problem: Unknown population parameter (θ)
must be estimated.
 EXAMPLE #1:
 θ = Probability that a randomly selected person will
be a cancer survivor
 Data are binary, parameter is unknown and
continuous
 EXAMPLE #2:
 θ = Mean survival time of cancer patients.
 Data are continuous, parameter is continuous.
Definitions
 Step 1 of either formulation is to pose a statistical (or
probability)model for the random variable which
represents the phenomenon.
 EXAMPLE #1:
 a reasonable choice for f (y|θ) (the sampling density or
likelihood function) would be that the number of 6
month survivors (Y) would follow a binomial distribution
with a total of n subjects followed and the probability of
any one subject surviving is θ.
 EXAMPLE #2:
 a reasonable choice for f (y|θ) survival time (Y) has an
exponential distribution with mean θ.
Classical (Frequentist)
Approach
 All pertinent information enters the problem
through the likelihood function in the form of
data(Y1, . . . ,Yn)
n
f (y1,..., yn | q ) = Õ f (yi | q )
 objective in nature
i=1
 software packages all have this capability
 maximum likelihood, unbiased estimation, etc.
 confidence intervals, difficult interpretation
Bayesian Data Analysis
 data (enters through the likelihood function as well
as allowance of other information
n
p(q | y1,..., yn ) = cons * Õ f (yi | q )p (q )
i=1
 reads: the posterior distribution is a constant
multiplied by the likelihood muliplied by the prior
Distribution
 posterior distribution: in light of the data our
updated view of the parameter
 prior distribution: before any data collection, the
view of the parameter
Additional Information
 Prior Distributions
 can come from expert opinion, historical studies,
previous research, or general knowledge of a
situation (see examples)
 there exists a “flat prior” or “noninformative” which
represents a state of ignorance.
 Controversial piece of Bayesian methods
 Objective Bayes, Empirical Bayes
Bayesian Data Analysis
 inherently subjective (prior is controversial)
 few software packages have this capability
 result is a probability distribution
 credible intervals use the language that
everyone uses anyway. (Probability that θ is in
the interval is 0.95)
 see examples for demonstration
Mammography
o Sensitivity:
o True Positive
o Cancer ID’d!
o Specificity:
Patient
o True Negative Status
o Healthy not ID’d!
Test Result
Positive Negative
Cancer 88%
12%
Healthy 24%
76%
Mammography Illustration
 My friend (40!!!) heads into her OB/GYN for a
mammography (according to Dr.’s orders) and
finds a positive test result.
 Does she have cancer?
 Specificity, sensitivity both high! Seems likely ... or
does it?
 Important points: incidence of breast cancer in
40 year old women is 126.2 per 100,000 women.
Bayes Theorem for
Mammography
Pr(positive | cancer)Pr(cancer)
Pr(cancer | positive) =
Pr(positive)
Pr(positive | cancer)Pr(cancer)
=
Pr(positive | cancer)Pr(cancer)+ Pr(positive | healthy)Pr(healthy)
0.88(0.00126)
=
= 0.00461!!!
0.88(0.00126) + 0.24(0.99874)
Mammography Tradeoffs
 Impacts of false positive
 Stress
 Invasive follow-up procedures
 Worth the trade-off with less than 1%
(0.46%)chance you actually have cancer???
Mammography Illustration
 My mother-in-law has the same diagnosis in 2001.
 Holden, UT is a “downwinder”, she was 65.
 Does she have cancer?
 Specificity, sensitivity both high! Seems likely ... or
does it?
 Important points: incidence of breast cancer in
65 year old women is 470 per 100,000 women,
and approx 43% in “downwinder” cities.
 Does this change our assessment?
Downwinder
Mammography
Pr( positive | cancer)Pr(cancer)
Pr( positive)
Pr( positive | cancer)Pr(cancer)
=
Pr( positive | cancer)Pr(cancer) + Pr( positive | healthy)Pr(healthy)
0.88(0.42)
=
= 0.7251!!!
0.88(0.42) + 0.24(0.58)
Pr(cancer | positive) =
Modified Example #1
 One person in the class stand at the back and
throw the ball tothe target on the board (10
times).
 before we have the person throw the ball ten times
does the choice of person change the a priori belief
you have about the probability they will hit the
target (θ)?
 before we have the person throw the ball ten times
does the choice of target size change the a priori
belief you have about the probability they will hit the
target (θ)?
Prior Distributions
 a convenient choice for this prior information is
the Beta distribution where the parameters
defining this distribution are the number of a priori
successes and failures. For example, if you
believe your prior opinions on the success or
failure are worth 8 throws and you think the
person selected can hit the target drawn on the
board 6 times, we would say that has a Beta(6,2)
distribution.
Bayes for Example #1
 if our data are Binomial(n, θ) then we would
calculate Y/n as our estimate and use a
confidence interval formula for a proportion.
 If our data are Binomial(n, θ) and our prior
distribution is Beta(a,b), then our posterior
distribution is Beta(a+y,b+n−y).
 thus, in our example:
 a=
b=
n=
 and so the posterior distribution is: Beta( , )
y=
Bayesian Interpretation
 Therefore we can say that the probability that θ is
in the interval ( , ) is 0.95.
 Notice that we don’t have to address the problem
of “in repeated sampling”
 this is a direct probability statement
 relies on the prior distribution
Example: Phase II Dose
Finding
Goal:
 Fit models of the form:
yd = f (d) + e d
Where
e d ~ N(0, s 2 )
And d=1,…,D is the dose level
Definition of Terms
 ED(Q):
 Lowest dose for which Q% of efficacy is achieved
 Multiple definitions:
 Def. 1
 Def. 2
arg min
( f (0) + Q( f (d) - f (0))
d Î{1,..., D}
Q( f (dmax ) - f (0))
 Example: Q=.95, ED95 dose is the lowest dose for
which .95 efficacy is achieved
Classical Approach
 Completely randomized design
nd º n
 Perform F-test for difference between
groups
 If significant at a = 0.05 , then call the trial a
“success”, and determine the most
effective dose as the lowest dose that
achieves some pre-specified criteria (ED95)
Bayesian Adaptive
Approach
 Assign patients to doses adaptively based on the
amount of information about the dose-response
relationship.
 Goal: maximize expected change in information
gain:
DSDd = D V(Yd ) Pr ( d = ED95 )
 Weighted average of the posterior variances
and the probability that a particular dose is the
ED95 dose.
Probability of Allocation
 Assign patients to doses based on
rd =
DSDd
D
å DSD
d
d =1
Where rd is the probability of being assigned to
dose d
Four Decisions at Interim Looks
 Stop trial for success: the trial is a
success, let’s move on to next phase.
 Stop trial for futililty: the trial is going
nowhere, let’s stop now and cut our
losses.
 Stop trial because the maximum
number of patients allowed is
reached (Stop for cap): trial outcome
is still uncertain, but we can’t afford to
continue trial.
 Continue
Stop for Futility
 The dose-finding trial is stopped
because there is insufficient evidence
that any of the doses is efficacious.
 If the posterior probability that the
mean change for the most likely ED95
dose is within a “clinically meaningful
amount” of the placebo response is
greater than 0.99 then the trial stops
for futility.
Stop for Success
 The dose-finding trial is stopped when
the current probability that the ED95*
is sufficiently efficacious is sufficiently
high.
 If the posterior probability that the
most likely ED95 dose is better than
placebo reaches a high value (0.99)
or higher then the trial stops early for
success.
 Note: Posterior (after updated data)
probability drives this decision.
Stop for Cap
Cap: If the sample size reaches the
maximum (the cap) defined for all
dose groups the trial stops.
Refine definition based on
application. Perhaps one dose
group reaching max is of interest.
Almost always $$$ driven.
Continue
Continue: If none of the above
three conditions hold then the trial
continues to accrue.
Decision to continue or stop is
made at each interim look at the
data (accrual is in batches)
Benefits of Approach
Statistical: weighting by the
variance of the response at each
dose allows quicker resolution of
dose-response relationship.
Medical: Integrating over the
probability that each dose is ED95
allows quicker allocation to more
efficacious doses.
Example of Approach
 Reduction in average number of
events
 Y=reduction of number of events
 D=6 (5 active, 1 placebo)
 Potential exists that there is a nonmonotonic dose-response
relationship.
 Let n (d) be the dose value for dose
d.
Model for Example
(
Yi ~ q di + N 0, s
q 0 ~ N ( 0, t
2
)
2
)
q d ~ N (q d -1,[v(d) - v(d - 1)]t
t ~ IG ( 0.001,1000 )
2
s ~ IG ( 0.001,1000 )
2
2
)
Dynamic Model Properties
Allows for flexibility.
Borrows strength from
“neighboring” doses and similarity
of response at neighboring doses.
Simplified version of Gaussian
Process Models.
Potential problem: semiparametric, thus only considers
doses within dose range:
m0 + 0.95(m - m0 )
*
Example Curves
?
Simulations
5000 simulated trials at each of the
5 scenarios
Fixed dose design,
nd º n = 130
Bayesian adaptive approach as
outlined above
Compare two approaches for
each of 5 cases with sample size,
power, and type-I error
Results (power & alpha)
Case
Pr(S)
Pr(F)
Pr(cap)
P(Rej)
1
.018
.973
.009
.049
2
1
0
0
.235
3
1
0
0
.759
4
1
0
0
.241
5
1
0
0
.802
Results (n)
0
10
20
40
80
120
1
51.6 26.1
26.2
31.2
33.5
36.8
2
28.4 10.9
13.8
18.9
22.5
19.2
3
27.7 11.3
14.5
25.2
17
15.2
4
31.2 10.8
13.3
19.6
22.2
27.8
5
28.9 18.0
22.3
21.1
14.5
10.7
Fixed
130
130
130
130
130
130
Observations
 Adaptive design serves two purposes:
 Get patients to efficacious doses
 More efficient statistical estimation
 Sample size considerations
 Dose expansion -- inclusion of safety
considerations
 Incorporation of uncertainties!!!
Predictive inference is POWERFUL!!!
Conclusions
 Science is subjective (what about the choice of
a likelihood?)
 Bayes uses all available information
 Makes interpretation easier
 BAD NEWS: I have showed very simple cases . . .
they get much harder.
 GOOD NEWS: They are possible (and practical)
with advanced computational procedures