Model Averaging in Risk Assessment

Download Report

Transcript Model Averaging in Risk Assessment

Model Averaging: Beyond Model
Uncertainty in Risk Analysis *
Matthew W. Wheeler
NIOSH
[email protected]
*The findings and conclusions in this report are those of the authors and do not
necessarily represent the views of the National Institute for Occupational Safety
and Health
1
Acknowledgements
• A. John Bailer
Miami University/NIOSH
2
Outline
•
•
•
•
Introduction/Motivation.
Model Averaging 101.
Validation Simulation Study.
MA Software for dichotomous response.
3
Introduction/Motivation
• Model choice is frequently often a point of contention
among risk assessors/risk managers.
• Frequently multiple models describe the data
“equivalently.”
• Two model’s estimates of risk, especially at the lower
bound, can differ dramatically.
• Model uncertainty is inherent in most risk estimation,
though practically ignored in most situations.
4
Introduction/Motivation (cont)
• Consider the problem of estimating a benchmark dose (BMD)
from dichotomous dose response data.
• Here we seek to estimate the BMD from a “plausible” model,
given experimental data.
• In these experiments:
– Animals are exposed to some potential hazard.
– The adverse response is assumed to be distributed binomially.
– Risk (i.e, probability of adverse response) is estimated using regression
modeling.
– Multiple dose-response models can be used to estimate risk.
5
Common Dose-Response
Models Used:
logistic
 1 (d ) 
model:
log-logistic
gamma:
model:
1
1  exp    d 
 2 (d )   
(1)
(1   )
1  exp     ln d 
 3 (d )    (1   )
1
d
t
  
 1 t
e dt
(2)
(3)
 4 (d )    (1   )1  exp( 1d  2d 2 ...) (4)
0
multistage
probit
 5 (d )  a  d 
(5)
6
Common Dose-Response
Models Used:
log-probit
quantal-linear
quantal-quadratic
Weibull
 6 (d )    (1   )a   ln( d )
(6)
 7 (d )    (1   )1  exp( d )
(7)
 8 (d )    (1   )1  exp( d 2 )(8)
 9 (d )    (1   )1  exp( 
d  )
(9)
where Γ(α)= gamma function evaluated at α, for Ф(x) = CDF N(0,1) and πi = γ
when di=0 for models (2) and (7).
7
Benchmark dose estimation
• BMD is the dose associated with the a specified increase in
response relative to the control response (BMR); e.g.,:
dose d such that:
BMR=[πd- π0]/[1- π0] or
BMR= πd- π0
• The BMR is commonly set at values of 1%, 5%, 10%.
• BMDL = 100(1-α)% lower confidence limit on the BMD.
NOTE: As πd is dependent on a model thus the BMD is model dependent!!
8
Typical Risk Estimation Process
• Given data (in absence of mechanistic
information), a typical analyst will:
– Estimate the regression coefficients for
models (1)-(9).
– Estimate the BMD/BMDL given the
model.
– Pick the “best model.”
9
Example of this:
• Consider TiO2 lung tumor data which has
been combined from the studies of Heinrich
et al. (1995) Muhle et al. (1991) and Lee et
al. (1985).
• Here the benchmark dose (BMD) as well as
its lower bound (BMDL) are estimated at
BMRs of 10 and 1%.
10
Extra Risk
BMD
(BMDL)
AIC‡
X2 GOF
P-value
Quantal-Quadratic
364.95
0.57
Logistic
365.39
0.50
Probit
365.66
0.48
Log-Probit
365.74
0.55
Gamma
365.90
0.53
Model *
Log-Logistic
366.03
0.52
Weibull
366.11
0.52
Multistage (3-Degree Polynomial)
368.11
0.61
Quantal-Linear
370.35
0.26
10%
1%
0.96
0.30
(0.84)
(0.26)
1.01
0.25
(0.92)
(0.21)
0.97
0.22
(0.87)
(0.18)
1.01
0.51
(0.79)
(0.20)
1.03
0.52
(0.83)
(0.18)
1.04
0.50
(0.82)
(0.18)
1.04
0.49
(0.83)
(0.17)
1.04
0.47
(0.85)
(0.14)
0.80
0.08
(0.62)
(0.06)
** All fits were
obtained using
the US EPA’s
BMDS
‡ Calculated
using the number
of parameters vs.
the number of
non-bounded
parameters.
11
12
Model Choice
• If we pick best AIC the quantal-quadratic
model risk estimates would be chosen.
• If we pick the best Pearson Χ2 test statistic the
3-degree multistage model would be chosen.
• All of the models are reasonable based on
these statistics.
13
Model Choice (cont)
• Estimates are “reasonably similar” at the 10% estimate.
• Estimates vary by a factor of 5 for the 1% estimate.
• If a BMR of 0.1% is used (results not shown) the Ti02
BMD/BMDL estimates differ by a factor of 35.
• This heterogeneity in model estimates exists even though the
model fit statistics are very similar.
• Model uncertainty (and heated debate) result when any one of the
above models is chosen.
14
Model Averaging
• A better way would be to find an adequate way to
combine all estimates, and thus describe/account for
model uncertainty.
• Model Averaging (MA) is one such method that may
satisfactorily account for model uncertainty.
• Instead of focusing on a single model it allows
researchers to focus on “plausible behavior.”
15
Model Averaging
(cont.)
• We can think of any model contributing information
(including possible bias) to an analysis.
• Picking any one model ignores other plausible
information, and possibly introduces bias into the
analysis.
• Model averaging is a method that attempts to
synthesize all of the information available.
16
Model Averaging
(cont)
• Kang et al. (Regulatory Toxicology and
Pharmacology, 2000) and Bailer et al. (Risk
Analysis,2005) proposed model averaging for
risk assessment.
• They used an “Average-BMD” methodology.
17
“Average Dose” MA
• In this situation the benchmark dose is computed as :
BMˆ DMA   wk BMˆ Dk
k
• Here the BMDk is estimated from one of K models individually
• The BMDL is computed as:
BMˆ DLMA   wk BMˆ DLk
k
• Weights are based upon the formula:
wj 
exp(  I j / 2)
K
 exp(  I
i 1
•
i
/ 2)
Where Ii=AIC, Ii=KIC , or Ii=BIC. Other weights are possible.
18
“Average Dose” MA
• This method was found to perform unreliably
in many situations in terms of nominal
coverage and bias. (Wheeler and Bailer,
Environmental and Ecological Statistics, In
Press).
• The results suggested a different BMD
estimation technique may perform better. We
call this the “average model” MA.
19
“Average-Model” MA
• Given the fits of models (1)-(9) MA:
– Calculates the dose-response based upon a weighted
average of dose-responses Raftery et al. (1997),
Buckland et al. (1997)
– Estimates the MA dose-response curve as:
K
 MA (d )    i (θ, d )  wi
i 1
– Weights are formed as above.
20
0.20
0.15
0.10
Logistic (wi=0.45)
Quantal Linear (wi=0.25)
0.05
Probability of Response
Model Average
Gamma (wi=0.3)
0.0
0.2
0.4
0.6
Dose
0.8
1.0
21
“Average-Model” MA Benchmark
Dose
• Given this “Average-model,” the benchmark dose is
then computed by finding the dose that satisfies the
equation
BMR = [πMA(d)i- π MA(0)]/[1- π MA(0)].
• The BMDL is computed through a parametric
bootstrap. Here the 5th percentile of the bootstrap
distribution is used to compute the 95% lower tailed
confidence limit estimate on the BMD.
22
“Average-Model” vs. “Average-Dose”
• This is substantially different from the
“average-dose” method.
– “Average-Dose:”
Model Fits → Individual BMD estimation → BMD MA
estimate
– “Average-Model:”
Model Fits → MA Model Estimate → MA-BMD estimation
23
TiO2 Analysis Revisited
Extra Risk
BMD
(BMDL)
X2 GOF
P-value
AIC
Weights
AIC
Quantal-Quadratic
0.192
364.95
0.57
Logistic
0.154
365.39
0.50
Probit
0.134
365.66
0.48
Log-Probit
0.129
365.74
0.55
Gamma
0.119
365.90
0.53
Log-Logistic
0.112
366.03
0.52
Weibull
0.108
366.11
0.52
Multistage
0.039
368.11
0.61
Quantal-Linear
0.013
370.35
0.26
n/a
n/a
n/a
n/a
n/a
n/a
“average-model” Model Averaging BMD
(BMDL)
“average dose” Model Averaging BMD
(BMDL)
10%
0.96
(0.84)
1.01
(0.92)
0.97
(0.87)
1.01
(0.79)
1.03
(0.83)
1.04
(0.82)
1.04
(0.83)
1.04
(0.85)
0.80
(0.62)
1.01
(0.82)
1.00
(0.84)
1%
0.30
(0.26)
0.25
(0.21)
0.22
(0.18)
0.51
(0.20)
0.52
(0.18)
0.50
(0.18)
0.49
(0.17)
0.47
(0.14)
0.08
(0.06)
0.36
(0.12)
0.38
(0.20)
24
Validation Study
• MA seems like a good idea, however we need
to know if it works well in practice.
• A simulation study was conducted to
investigate the behavior of MA.
25
Validation Study
• A Monte Carlo simulation study was conducted to
investigate the behavior of “average model”
model averaging.
• 54 true model conditions, using models (1) – (9)
were used in the simulation.
• Full study described in Wheeler and Bailer (Risk
Analysis, in Press)
26
Step 1: Decide upon “TRUTH”
Truth was determined by using
various response patterns. These
were chosen to reflect shallow,
medium, and steep curvature as
well as low and high background
response rates. In all six response
patterns were chosen.
Response Responses at doses of
Pattern
0, 0.5, 1
1
(2%,7%,20%)
2
(2%,14%,34%)
3
(2%,25%,50%)
4
(7%,12%,25%)
5
(7%,19%,39%)
6
(7%,30%,55%)
0.5
0.4
0.3
0.2
0.1
0.0
-0.1
0.1
0.3
0.5
0.7
0.9
1.1
27
Step 1 (cont)
Using
these 6 dose-response patterns the parameters associated with
models (1)-(9) (e.g. γ,α,β,θ,θ2 ) were determined. For example:
Response Pattern 2
Dose 0 0.5
1 Weibull Fit
Resp 2% 14% 34%
True Weibull Model
γ
α
β
0.02 1.60
0.40
These
fits were used as the true underlying model in the simulation.
Three points allowed a unique specification of all models.
A variety of dose-response patterns were represented by the 54(6
conditions x 9 models) true dose response curves.
28
Step 2: Given truth set up
simulation conditions.
• The simulation proceeded by generating hypothetical
toxicology experiments with response probability π(d).
• With π(d) specified by one of the 54 true dose-response curves
estimated in step 1.
• These experiments consisted of 4 dose group design with
doses of 0, 0.25, 0.50, and 1.0.
• n=50 for all dose groups.
• 2000 experiments were generated per true dose-response
curve.
• Bias as well as coverage [i.e., Pr(BMDL ≤ BMDtrue)] was
estimated.
• Coverage is reported here.
29
Step 3: Gather appropriate
statistics.
• In each experiment the “average-model” BMD as well as the
BMDL was estimated.
• BMRs of 1% and 10% were used to estimate the BMD.
• Two model spaces for averaging were considered.
– One space consisted of three flexible models: the multistage, Weibull
and the log-probit model.
– The second space had seven models that added the probit, logistic,
quantal-linear, and quantal-quadratic to the three model space.
• The simulation took approximately 1 CPU year of
computation.
30
Coverage
• The BMDL was computed [with α = 0.05] and
compared to the true benchmark dose at the
specified BMR.
• Coverage probability [i.e., Pr(BMDL ≤
BMDtrue)] was estimated across 2000
simulations.
31
Coverage BMR = 10%
3
6
Multistage (3,7)
Log-Probit (3,7)
Weibull (3,7)
Probit (7)
Quantal Linear (7)
Quantal Quadratic (7)
1.0
0.9
0.8
0.7
1.0
0.9
0.8
Gamma (None)
Log-Logistic (None)
0.7
Logistic (7)
1.0
0.9
0.8
0.7
3
6
7 Model MA
3 Model MA
3
6
32
Coverage BMR = 1%
3
6
Multistage (3,7)
Log-Probit (3,7)
Weibull (3,7)
Probit (7)
Quantal Linear (7)
Quantal Quadratic (7)
100.0%
90.0%
80.0%
70.0%
100.0%
90.0%
80.0%
Gamma (None)
Log-Logistic (None)
70.0%
Logistic (7)
100.0%
90.0%
80.0%
70.0%
3
6
7 Model MA
3 Model MA
3
6
33
Coverage (Summary)
• Nominal coverage is reached for most
simulation conditions.
• MA fails to reach nominal coverage in the
quantal-linear and similar cases.
34
Quantal-Linear Problems
• The quantal-linear case represents a cause of concern if MA is
to be used.
• It is important to understand when (and why) it doesn’t work,
and compare the method to current practice.
• We first compare it to the common practice of picking the
“best model.”
• The “best-model” is chosen by a chi-squared goodness of fit
statistic.
35
Coverage Comparison
MA vs. Best Model
BMR = 10%
Best Model
from X2*
BMR = 1%
”average model” MA
7Model
3 Model
Best Model
from X2*
”average model” MA
7Model
3 Model
Quantal Linear 1
0.89
0.83
0.90
0.72
0.79
0.89
Quantal Linear 2
0.80
0.78
0.90
0.75
0.83
0.89
Quantal Linear 3
0.77
0.83
0.91
0.76
0.83
0.88
Quantal Linear 4
0.90
0.88
0.95
0.69
0.90
0.97
Quantal Linear 5
0.69
0.81
0.93
0.68
0.86
0.88
Quantal Linear 6
0.70
0.80
0.91
0.69
0.83
0.89
*True model included in the model set.
36
MA vs. “Best Model”
• As a decision rule, in terms of coverage,
“Average-model” 3-model averaging performs
better than picking the best fitting model.
NOTE: The true model was included in the set of
models the “best model” was chosen from.
37
Further Analysis of Quantal Linear
Model
• Despite the above it is disconcerting that a set
of flexible models like the 3-model space
model average fails to describe quantal-linear
behavior.
• We look at the sampling distribution of the
experiment for the quantal-linear model, in
attempt to explain this behavior.
38
10
5
0
Observed Incidence
15
20
Sampling distribution for the quantal-linear model
0.0
0.2
0.4
0.6
Dose
0.8
1.0
39
Average fit for 3-model MA models
0.15
Multistage
Excess Risk
0.10
Quantal Linear Truth
0.05
Weibull
0.00
Log-Probit
0.0
0.2
0.4
0.6
Dose
0.8
1.0
40
Further Analysis of Quantal Linear
Model
• The flexibility of the models combined with
the sampling distribution introduces bias into
the estimation of the dose-response curve.
• The bias carries through in BMD estimation.
• This also may be the cause of the conservative
behavior (i.e. coverage > 99%) seen in the
quantal-quadratic case.
41
Bias Corrected Intervals
• The evident bias suggests a search for a corrective mechanism.
• Percentile based bootstrap confidence intervals are not optimal.
• Biased Corrected and Accelerated (BCa) confidence intervals (CI) are
preferred (Efron, 1994)
• The BCa CIs are computationally more difficult. (i.e., full simulation
becomes longer if these were being computed in all conditions)
• Only the quantal-linear conditions were analyzed using the BCa confidence
interval.
42
7 Model Space
3 Model Space
BCa
Percentile
BCa
Percentile
Quantal-linear 1
0.852
0.829
0.868
0.903
Quantal-linear 2
0.891
0.776
0.928
0.901
Quantal-linear 3
0.926
0.831
0.918
0.906
Quantal-linear 4
0.840
0.881
0.836
0.946
Quantal-linear 5
0.868
0.807
0.924
0.927
Quantal-linear 6
0.909
0.798
0.917
0.914
43
• BCa CIs fail to be appropriate in shallow dose
response cases.
• In these cases an infinite BMD can be estimated
through the bootstrap re-samples.
• This causes a problem for the BCa technique, as it
violates its underlying assumptions.
• These problems are identifiable through histogram
plots of the bootstrap resample.
44
Sampling distribution where BCa BMDLfails to cover the true
BMD when the percentile based method does.
2000
1500
1000
500
0
0.134479
1.712962
3.291446
4.869929
6.448412
8.026896
9.605379
0.923721
2.502204
4.080687
5.659171
7.237654
8.816137
Bootstrap Resamples
45
Simulation Conclusions
• Model averaging recovers the central estimate, as well as the
lower bound in most cases. (Some bias is however evident)
• BMDs and BMDLs with BMRs of 1% are most problematic.
• Problems exhibited with true quantal-linear dose-response
mechanisms.
• BCa confidence intervals can help remediate some of these
problems.
46
• “Average-model” model averaging is superior
to picking the best model.
• It recovers the true BMD uncertainty (in terms
of correct coverage probabilities) when the
true model is not in the MA model space.
• The results show MA is not a panacea, it is
however a step in the right direction.
47
48
Model Averaging Software
• The results are promising but implementation
of this approach is difficult.
• The simulation code has been repackaged to
allow users to implement dichotomous doseresponse model averaging.
• This is done in a simple MS Windows
command prompt program.
49
• Software is described that will implement
Model Averaging for Dichotomous doseresponse data (affectionately named MADr for
short).
• The software, as well as the code, is freely
available under the GNU public license.
• It will be available, in the near future, on
NIOSH’s web site. A beta version is
distributed with this talk.
50
MADr-BMD: Algorithm Overview
Dichotomous
Data Input File
Model 1 ….
….
Model K
Create
Resampled Data
Model Average
BMD
Model 1 ….
….
Model K
Parametric
Bootstrap BMDL
Model Average
BMD
Output
51
Data Specification
• MADr is run on the command line by using the
command “madr” at the prompt.
• It’s input is a text file that can be edited using notepad
or similar text editor.
• By default it expects the file “input.txt”.
• Alternate input file names can be specified.
For Example:
“madr tio2.txt”
“madr newtoxdata.txt”
52
Example input file
250 1e-8 1e-8
0 0 1 1 1 1 1
101
2 1 0.1
0.95 5000 0
2
4
0
50
22.5
50
45.0
50
90.0
50
1 1
0
3
10
29
250 1e-8 1e-8
Iterations, Parameter Conv. , Relative Function Conv.
0 0 1 1 1 1 1 1 1
MA Specification: 1 = Included 0 = Not-included
Model Order: (Quantal-Linear, Quantal-Quadratic, Multistage
Logistic, Probit, Weibull, Log-Probit, Log-Logistic, Gamma)
101
Random Seed (Specifying 0 implies current clock time will be used)
2 1 0.1
Averaging Criterion ( 1 = BIC, 2 = AIC, 3=KIC,
4 = BICB, 5 = AICB, 6 = KICB ) ,
Risk Type ( 1 = Added Risk, 2 = Extra Risk),
BMR
0.95 5000 0
Nominal Coverage Probability, Number of bootstrap resamples, Output
Bootstrap resamples (0 = no 1 = yes)
2
Degree of multistage polynomial.
4
Number of data lines
0
22.5
45.0
90.0
50
50
50
50
0
3
10
29
Data Specification:
Dose, Number of experimental units, Number of observed responses
53
Program output
• The output for the analysis is printed to the
screen.
• This can be redirected to a text file using the
“>” command.
For Example:
“madr > output.txt”
“madr tio2.txt > anal1_output.txt”
54
Model
Weight -2log(L)
AIC
BIC
-------------------------------------------------------------------------Multistage
0.100
50.61
56.61
62.35
Logistic
0.228
50.96
54.96
58.79
Probit
0.218
51.05
55.05
58.88
Weibull
0.100
50.61
56.61
62.35
Log-Probit
0.122
50.21
56.21
61.95
Log-Logistic
0.121
50.23
56.23
61.97
Gamma
0.111
50.40
56.40
62.13
------------------------------------------------------------------------------------------------------------------'Average-Model' Benchmark Dose Estimate
-----------------------------------------Nominally Specified Confidence Limit:0.950
Weighting Criterion: AIC
BMD Calculation: Added Risk
BMR: 0.100000
BMD: 0.894456594251
BMDL(BCa):0.303525585681
BMDL(Percentile):0.296586853918
Acceleration: -0.034445
Bootstrap Resamples: 5000
Random Seed: 101
------------------------------------------
55
MODEL: Multistage, 2-degree polynomial:
----------------------------------------Parameters
Estimate
StdErr
-----------------------------------------GAMMA:
0.171088
0.079273
BETA(1):
0.000000
N/A
BETA(2):
0.141611
0.052429
Optimization Succeeded
-----------------------------------------MODEL: Logistic:
-----------------------------------------Parameters
Estimate
StdErr
-----------------------------------------ALPHA:
-1.824682
0.540209
BETA:
1.003971
0.301075
Optimization Succeeded
-----------------------------------------MODEL: Probit:
-----------------------------------------Parameters
Estimate
StdErr
-----------------------------------------ALPHA:
-1.083244
0.300300
BETA:
0.590790
0.160163
Optimization Succeeded
------------------------------------------
56
Conclusions
• Simulation Results + MADr implies
dichotomous based model averaging can be
used reliably in ones own research.
• Though we have tested MADr, it is still a “use
at your own risk program.”
57
• As mentioned before model averaging is not a
panacea.
• As such it does not:
– Relieve scientists from using their expert
judgment.
– Give automatic license to produce a low dose
extrapolations.
– Remove the need for adequate individual model fit
diagnostics.
– Remove all model uncertainty from the analysis.
58
• It does:
– Reframe the debate of model choice.
– Produces relatively stable central estimates often
independent of a given model being included in the
average.
59
Selected References
Raftery, A. E. (1995). Bayesian model selection in social research. Sociological
Methodology, 25, 111-163.
Hoeting, J.A., Madigan, D., Raftery, A.E., & Volinsky, C.T. (1999) Bayesian model
averaging: a tutorial. Statistical Science, 14, 382-417.
Buckland, S.T., Burnham, K. P., & Augustin, N. H., (1997). Model Selection: An Integral
Part of Inference. Biometrics, 53, 603-618.
Kang, S.H., Kodell, R.L., Chen, J.J. (2000) Incorporating Model Uncertainties along with
Data Uncertainties in Microbial Risk Assessment. Regulatory Toxicology and
Pharmacology, 32, 68-72.
Bailer, A.J., Noble R.B. and Wheeler, M. (2005) Model uncertainty and risk
estimation for quantal responses. Risk Analysis, 25,291-299.
Wheeler, M. W., & Bailer, A.J., (In Press). Properties of model-averaged BMDLs:
study of model averaging in dichotomous risk estimation. Risk Analysis
A
60