Bayesian Analysis of Radioactive Contamination

Download Report

Transcript Bayesian Analysis of Radioactive Contamination

Data Mining of Environmental
Models for Sensitivity Analysis
Tom Stockton
Paul Black, Andy Schuh, Kate Catlett, John Tauxe
Neptune and Company, Inc.
www.neptuneandco.com
Issue
How to conduct a sensitivity analysis of a
complex high dimensional probabilistic
environmental model?
Decision Modeling
1. Decision Model, build and solve
–
–
–
Decision Actions and Outcomes
Utility (costs, liabilities, desires)
Probabilistic model
•
•
•
2.
3.
4.
5.
Scenario
Model
Parameter
Sensitivity analysis (knowledge re-discovery)
Value of information analysis (OUT-path)
Data collection
Update model (Bayesian or ad hoc)
Decision Modeling
U(d | I) = supd QSMY U(d | y , S, M,qM)
p(S)
p(M | S)
p(qM | S)
p(I | qM , M, S)
p(y | qM , M, S)
utility function
scenario uncertainty
model uncertainty
parameter uncertainty
data likelihood
risk predictive dist
dy dS dM dqM
where:
U = utility, loss, cost
d = decision
I = information/data
M
qM
S
y
= model structure
= model parameters
= scenario
= risk
Sensitivity Analysis
Given a model:
Y = f (X)
[Y = GoldSim(X)]
Sensitivity analysis is aimed at
describing the influence of each
input variable Xi on the model
response Y
Sensitivity Measures
• One-At-A-Time (OAT)
• Differential Analysis
• Global
f (X)
X i
– Statistical
• scatter plots, correlation, regression, rank
transformations
– Data mining
• Sobol, FAST, MARS, MART
Desirable Properties
of a SA Measure
• Efficiency
– account for all effects while being
computationally affordable
• Simplicity
– implementable and interpretable
• Model Independent
– The method can handle non-linearity, nonmonotonicity (across time and space)
K. Chan, S. Tarantola and A. Saltelli, 2000, Variance-Based Methods, in Sensitivity
Analysis, A. Saltelli, K. Chan, E.M.Scott.John Wiley and Sons.
Sensitivity Measures
• OAT and Differential Analysis,
for complex probabilistic models,
often are
– not efficient, and
– not model independent
Global Sensitivity Measures
• Sensitivity Measure
Si 
VarX i [E(Y | xi )]
Var (Y )
• Build a statistical model of the model
response and the model inputs using the
Monte Carlo simulation results
• Decompose variance of the output and
attribute to input variables
Standardized Rank Regression
SRR
– Rank Y and Xi and scale the ranks to mean of 0 and
variance of 1 for convenience
y  i 1  i xi
p
Based on the ranks
of Y and Xi
Var (Y )  i 1  Var ( X i )
p
so S i  
2
i
2
i
Assuming the Xi
are independent
Fourier Amplitude Sensitivity Test
FAST
– Explores the multidimensional input
space of the input factors by a search
curve using Fourier transform function.
– Handles main and interaction effects
K. Chan, S. Tarantola and A. Saltelli, 2000, Variance-Based Methods, in Sensitivity Analysis, A. Saltelli, K. Chan,
E.M.Scott.John Wiley and Sons.
Issues
• Differential Analysis
– not feasible: derivatives of complex models
• SRR and OAT
– not model independent: trouble with
nonmonotonic nonlinear models.
– not efficient: trouble with interaction effects
in high dimensional models
• FAST
– not efficient: Separate model runs
Possible Solutions
• Data mine the probabilistic model output
– Multivariate Adaptive Regression Splines
(MARS)
– Multiple Additive Regression Trees
(MART)
Data Mining
• MARS
– Non-parametric recursive partitioning approach that
fits separate splines to distinct intervals of the
predictor variables.
• MART
– Explores the multidimensional input space of the
input factors using gradient boosting of additive
regression models.
• Advantages
– Search for interactions between variables, allowing any degree of
interaction to be considered.
– Tracks very complex data structures in high-dimensional data.
Sensitivity Indices via
ANOVA decomposition
fˆ ( X)  ao 
 f (x )   f
K m 1
i
i
K m 2
i, j
( xi , x j ) 
f
K m 3
i , j ,k
( xi , x j , xk )  
Sensitivity indices are calculated using basis functions not including xs
fˆ ( X  s )  ao   f ( xi ) 
SPRxs  
S xs 
is


2
ˆ
y  f (Xs )
SPRxs
 SPR
x
 f (x , x )   f (x , x , x
{i , j } s
i
j
{i , j , k } s
i
j
k
) 
Analytical Example
Sobol’ g-function
p
y   gi ( xi )
i 1
| 4 xi  2 |  ai
gi ( xi ) 
1  ai
1 1
xi   arcsin(sin ( wi s   i ))
2 
Saltelli A., Tarantola S., and Chan K.P.-S. (1999), “A Quantitative Model-Independent Method
for Global Sensitivity Analysis of Model Output,” Technometrics, 41, 39-55.
Example: Sobol’ g-function
Sensitivities
Input
a
w
Analytic
MART
MARS
FAST
SRR
x1
0
23
0.73
0.565
0.733
0.773
0.0005
x2
1
55
0.23
0.281
0.224
0.193
0.0015
x3
4.5
77
0.032
0.094
0.036
0.025
0.045
x4
9
97
0.009
0.05
0.009
0.008
0.197
x5
99
107
0.0001
0.005
0.0006
0.0002
0.207
x6
99
113
0.0001
0.004
0.0000
0.0005
0.437
x7
99
121
0.0001
0.0
0.0000
0.0001
0.007
x8
99
125
0.0001
0.0
0.0000
0.0002
0.105
Saltelli A., Tarantola S., and Chan K.P.-S. (1999), “A Quantitative Model-Independent Method
for Global Sensitivity Analysis of Model Output,” Technometrics, 41, 39-55.
1
i
n
m
a
a
t
t
i
o
n
n
o
C
2
Management Options
- Institutional Controls
- Site Maintenance
- Waste Acceptance
- Closure
- Monitoring/Surveillance
Future Inventory
Existing Inventory
Fate & Transport
e
s
s
s
s
m
A
e
k
n
s
i
t
R
Research, Monitoring,
Information & Data
Collection
Occupational
MOP & IHI
Cumulative (CA)
Ecosystem
7
• Maintenance Review
• Periodic Review
• Waste Acceptance
Decision
• Closure Decision
3
Value of
Information
6
Contamination
4
e
f
i
t
A
n
n
e
a
B
l
y
t
s
s
i
o
s
C
Risk
Budgets
Management
Cost
8
Disposal Costs
Closure Costs
NO
Monitoring Costs
Potential Liabilities
ALARA Costs
Disposal Fees
Analysis Costs
Public Benefit
Can the risk be managed to regulatory
thresholds at an acceptable cost with
an acceptable level of uncertainty?
YES
Choose Management
Options & Update
Management Plan
5
Regulations &
Guidance
Legend
Uncertainty
analysis
Sensitivity
analysis
Iteration
loop
1
Sequence
number
Simulation Results
• Model Inputs ( X )
– Inventory
– Fate and transport
• Upward advection
• Biotic transport
• Model response ( Y )
– “EPA-SUM”
EPA Sum
1.0e+000
1.0e+001
9.6e-003
3.8e-005
3.2e-007
3.7e-009
2.6e-012
1.0e-015
1.0e-020
1.0e-025
1.0e-030
Probability
Model Response
1.0000
0.5000
0.1000
0.0100
0.0010
0.0001
Relative Influence Plot
Upward Flux Rate
Termite1 b
Solubility U
Kd U
Kd Np
Dry Bulk Density
Key
MART
SRR
Ant2 NestWidth
Ant2 MaxDepth
0
0.2
0.4
0.6
Relative Influence
0.8
1
Partial Dependence Plots
0.4
0.2
15
4
5
-0.6 -0.4 -0.2
0
10
2
0
-2
0
-4
0
1e-04
2e-04
3e-04
4e-04
5e-04
0.0015
0.002
5
10
Dry Bulk Density
20
25
30
Kd U
0.4
0
0.5
0.2
0.4
0.2
-0.4
0
-0.2
-0.4 -0.2
0
15
1
Ant2 NestWidth
0.001
50
100
150
200
1200
1600
1800
2000
0.001
0.002
0.2
0.1
0
Density
-0.2
SRR
-0.1
MART
0
0.2
Ant2 MaxDepth
0.1
Solubility U
1400
-0.2 -0.1
partial dependence
Termite1 b
0.6
Kd Np
20
Upward Flux Rate
0
0.002
0.004
0.006
300
320
340
360
380
400
0.003
0.004
Co-partial Dependence Plot
Variation Explained
Time
GCD
10,000
LANL
50
100
500
1,000
10,000
SRR
MART/
MARS
0.91
0.99
0.87
0.86
0.75
0.71
0.71
0.94
0.96
0.91
0.95
0.93
Sensitivity Convergence
Upward.Flux.Rate
Kd.def.Kd.Np
Ant2.Data.NestWidth
5000 Sims (SRR)
5000 Sims (MART)
2500 Sims (SRR)
2500 Sims (MART)
1000 Sims (SRR)
1000 Sims (MART)
500 Sims (SRR)
Simulation Size
500 Sims (MART)
100 Sims (SRR)
100 Sims (MART)
0
0.2
0.4
0.6
0.8
1
0
0.2
Dry.Bulk.Density
0.4
0.6
0.8
1 0
Kd.def.Kd.U
0.02
0.04
0.06
0.08
0.1
Termite1.Data.b
5000 Sims (SRR)
5000 Sims (MART)
2500 Sims (SRR)
2500 Sims (MART)
1000 Sims (SRR)
1000 Sims (MART)
500 Sims (SRR)
500 Sims (MART)
100 Sims (SRR)
100 Sims (MART)
0
0.05
0.1
0.15
0
0.05
0.1
0.15
0.2
0.25 0
Measure of Relative Sensitivity
0.05
0.1
0.15
0.2
0.25
Upward Flux OAT
1e-10
EPA Sum
1e-12
1e-14
1e-16
1e-18
0.00005
0.00015
0.00025
Upward flux rate
0.00035
Summary
• MART and MARS appear to provide an
– Efficient
– Simple (?)
– Model Independent
approach to data mining probabilistic
model results for sensitivity analysis
Finally…
• The decision context:
– Is the uncertainty in the model response
too high?
– Is there value in reducing input
uncertainty?
– SA and cost used to estimate the value of
collecting additional information.
FAST

y  { A j cos js  B j sin js}
(1)

where Aj and Bj are the Fourier coefficients and can be estimated via a fast Fourier
transform algorithm
The spectrum of the Fourier transform is
 j  A 2j  B 2j
(2)
Summing all  j provides an estimate of the total variance in y
Dˆ    j
(3)
j Z
Summing all  j excluding the frequency embedded in xi and its associated higher
harmonics, Z0, provides an estimate of the variance due to the uncertainty in xi
Dˆ i 

j Z 0
The sensitivity of y to xi is then given by
Sˆi  Dˆ i / Dˆ
j
(4)
(5)
MARS
• Non-parametric recursive partitioning approach that fits separate
splines to distinct intervals of the predictor variables.
• Both the selected variables and the knots are found via a brute
force, exhaustive search procedure optimized simultaneously by
evaluating a "loss of fit" criterion.
• Searches for interactions between variables, allowing any degree
of interaction to be considered.
• Tracks very complex data structures in high-dimensional data.
J.H. Friedman, (1991), “Multivariate Adaptive Regression Splines,” The Annals of Statistics, 19, 1-14
Software:
Trevor Hastie and Robert Tibshirani, MDA Library for R (‘GNU S’).
Ross Ihaka and Robert Gentleman, (1996) R: A Language for Data Analysis and Graphics, Journal of Computational
and Graphical Statistics, 5, 3, 299-314. www.r-project.org.
MART
• Multiple Additive Regression Trees
– Explores the multidimensional input
space of the input factors using gradient
boosting of additive regression models.
– Handles main and interaction effects.
– Fast
K. Chan, S. Tarantola and A. Saltelli, 2000, Variance-Based Methods, in Sensitivity Analysis, A. Saltelli, K. Chan,
E.M.Scott.John Wiley and Sons.