Soil bulk density - University of Sheffield
Download
Report
Transcript Soil bulk density - University of Sheffield
Applications of Bayesian sensitivity
and uncertainty analysis to the
statistical analysis of computer
simulators for carbon dynamics
Marc Kennedy
Clive Anderson, Stefano Conti, Tony O’Hagan
Probability & Statistics, University of Sheffield
Outline
Uncertainties in computer simulators
Bayesian inference about simulator outputs
– Creating an emulator for the simulator
– Deriving uncertainty and sensitivity measures
Example application
Some recent extensions
Uncertainties in computer
simulators
Consider a complex deterministic code with
a vector of inputs and single output
y f (x)
Use of the code is subject to:
– Input uncertainty
– Code uncertainty
Input uncertainty
The inputs to the simulator are unknown for a
given real world scenario
Therefore the true value of the output is uncertain
A Monte Carlo approach is often used to take this
uncertainty into account
– Sample from the probability distribution of X
– Run the simulator for each point in the sample to give a
sample from the distribution of Y
– Very inefficient…not practical for complex codes
Code uncertainty
The code output at a given input point is
unknown until we run it at that point
– In practice codes can take hours or days to run, so
we have a limited number of runs
We have some prior beliefs about the output
– Smooth function of the inputs
Bayesian inference about
simulator outputs
Bayesian solution involves building an emulator
Highly efficient
– Makes maximum use of all available information
– A single set of simulator runs is required to train the
emulator. All sensitivity and uncertainty information is
derived directly from this
– The inputs for these runs can be chosen to give good
information about the simulator output
A natural way to treat the different uncertainties
within a coherent framework
Inference about functions using
Gaussian processes
We model f () as an unknown function
having a Gaussian process prior distribution
[ f () β, ] ~ N (h() β, c(,))
2
T
2
Prior expectation of the model
output as a function of the inputs
h(.) is a vector of regression functions and β
are unknown coefficients
Inference about functions using
Gaussian processes
We model f () as an unknown function
having a Gaussian process prior distribution
[ f () β, ] ~ N (h() β, c(,))
2
T
2
Prior beliefs about covariance
between model outputs
c(.,.) is a correlation function, which defines
our beliefs about smoothness of the output
and 2 is the GP variance
Choice of correlation function
We use the product of univariate Gaussian
functions:
p
c(x, x' ) exp{bk ( xk xk ' ) 2 }
k 1
Where bk is a measure of the roughness of
the function in the kth input
roughness = 0.5
15
20
10
Y
15
5
5
10
X
0.6
0.0
0.2
20
0.4
correlation
0.8
Z
4
-3 -2 -1 0 1 2 3
1.0
random realization - Gaussian correlation, roughness = 0.5
0
5
10
distance |x-x'|
15
roughness = 0.2
15
20
10
Y
15
5
5
10
X
0.6
0.0
0.2
20
0.4
correlation
0.8
Z
2
-1.5-1-0.5 0 0.5 1 1.5
1.0
random realization - Gaussian correlation, roughness = 0.2
0
5
10
distance |x-x'|
15
roughness = 0.1
20
15
20
10
Y
15
5
5
10
X
0.6
0.4
0.0
0.2
correlation
0.8
Z
-1-0.5 0 0.5
.5
-2-1
5
-3-2.
1.0
random realization - Gaussian correlation, roughness = 0.1
0
5
10
distance |x-x'|
15
roughness = 0.01
15
20
10
Y
0.6
0.4
20
0.2
correlation
0.8
Z
-0.4 -0.3 -0.2 -0.1
1.0
random realization - Gaussian correlation, roughness = 0.01
15
5
5
10
X
0
5
10
distance |x-x'|
15
Conditioning on code runs
Conditional on the observed set of training
runs,
yi f (xi ),
i 1,2,, n
f () is still a Gaussian process, with simple
analytical forms for the posterior mean and
covariance functions
2 code runs
2 code runs
2 code runs
Large b
Small b
3 code runs
5 code runs
More about the emulator
The emulator mean is an estimate of the
model output and can be used as a surrogate
The emulator is much more…
– It is a probability distribution for the whole
function
– This allows us to derive inferences for many
output related quantities, particularly integrals
Inference for integrals
For particular forms of input distribution
(Gaussian or uniform), analytical forms
have been derived for integration-based
sensitivity measures
– Main effects of individual inputs
– Joint effects of pairs of inputs
– Sensitivity indices
Example Application
Sheffield Dynamic Global
Vegetation Model (SDGVM)
Developed within the Centre for Terrestrial
Carbon Dynamics
Our job with SDGVM is to:
– Apply sensitivity analysis for model testing
– Identify the greatest sources of uncertainty
– Correctly reflect the uncertainty in predictions
Net Ecosystem Production
(CARBON FLUX)
Loss
Loss
– Terrestrial carbon
source if NEP is
negative
– Terrestrial carbon
sink if NEP is
positive
Some Inputs Parameters
Leaf life span
Leaf area
Budburst temperature
Senescence temperature
Wood density
Maximum carbon storage
Xylem conductivity
Soil clay %
Soil sand %
Soil depth
Soil bulk density
20
10
If leaves die young,
NEP is predicted to be
higher, on average.
Why?
0
mean NEP
30
Main Effect: Leaf life span
100
150
200
250
leaf life-span
300
350
20
15
5
10
If leaves die young, SDGVM
allowed a second growing
season, resulting in increased
carbon uptake. This problem
was fixed by the modellers
0
Mean NEP
25
30
Main Effect: Leaf life span (updated)
100
150
200
250
leaf life-span
300
350
10
20
Large values mean the
leaves drop earlier, so
reduce the growing
season
Small values mean the
leaves stay until the
temperature is very
low
0
mean NEP
30
Main Effect: Senescence Temperature
4
5
6
7
senescence
8
9
10
When soil bulk density was
added to the active parameter
set, the Gaussian Process
model did not fit the training
data properly
Error discovered in the soil module
NEP
NEP
80
80
60
60
40
40
20
20
0
0
-20
-20
0
500000
1000000
Bulk density
Before…
1500000
0
500000 1000000 1500000
Bulk density
After…
Our GP model depends on
the output being a smooth
function of the inputs. The
problem was again fixed by
the modellers
SDGVM: new sensitivity
analysis
Extended sensitivity analysis to 14 input
parameters (using a more stable version)
Assumed uniform probability distributions
for each of the parameters
The aim here is to identify the greatest
potential sources of uncertainty
150 160 170 180 190
150 160 170 180 190
NEP (g/m2/y)
160
170
180
190
200
1.8
2.4
150 160 170 180 190
150 160 170 180 190
2.2
160
180
200
leaf life span (days)
2.6
water potential (M Pa)
max. age (years)
NEP (g/m2/y)
2.0
0.0035
0.0040
0.0045
minimum growth rate (m)
Leaf life span 69.1%
by investing effort to learn
more about this parameter,
output uncertainty could be
significantly reduced
Water potential 3.4%
Maximum age 1.0%
Minimum
growth rate
14.2%
Percentage of total output variance
Extensions to the theory
Multiple outputs
So far we have created independent
emulators for each output
– Ignores information about the correlation
between outputs
We are experimenting with simple models
linking the outputs together
This is an important first step in treating
dynamic emulators and in aggregating code
outputs
Dynamic emulators
Physical systems typically evolve over time
Their behaviour is modelled via dynamic
codes
yt f ( yt 1 , x, zt )
– where x are tuning constants and zt are contextspecific drivers
– Recursive emulation of yt over the appropriate
time span shows promising results
CENTURY output (
) and dynamic emulator (
)
Aggregating outputs
Motivated by the UK carbon budget problem
– The total UK carbon absorbed by vegetation is a sum of
individual pixels/sites
– Each site has a different set of input parameters (e.g.
vegetation/soil properties), but some of these are
correlated
This is a multiple output code
– Each site represents a different output
Bayesian uncertainty analysis is being extended, to
make inference about the sum
References
For Bayesian analysis of computer models:
– Kennedy, M. C. and O’Hagan, A. (2001).
Bayesian calibration of computer models (with
discussion) J. Roy. Statist. Soc. B, 63: 425-464
For Bayesian Sensitivity analysis:
– Oakley, J. E. and O’Hagan, A. (2004).
Probabilistic sensitivity analysis of complex
models: A Bayesian approach. J. Roy. Statist.
Soc. B, 66: 751-769