Case studies in gaussian process modelling of computer codes for

Download Report

Transcript Case studies in gaussian process modelling of computer codes for

Case studies in Gaussian process
modelling of computer codes for
carbon accounting
Marc Kennedy,
Clive Anderson, Stefano Conti, Tony O’Hagan
Talk Outline
 Centre for Terrestrial Carbon Dynamics
 Computer Models in CTCD
 Bayesian emulators
 Case Study 1: SPA
 Case Study 2: SDGVM
Centre for Terrestrial Carbon Dynamics
The CTCD…
 is a NERC centre of excellence for Earth
Observation
 made up of groups from Sheffield, York,
Edinburgh, UCL, Forest Research
 brings together experts in vegetation modelling,
soil science, earth observation, carbon flux
measurement and statistics
Gain
Net Ecosystem Production
Loss
Loss
– Terrestrial carbon
source if NEP is
negative
– Terrestrial carbon
sink if NEP is
positive
Computer Models in CTCD
 SPA
– Simulates plant processes at 30-minute
time intervals
 ForestETP
– Stand scale
– Localised modelling
 SDGVM
– Global scale
– Coarse resolution
Statistical objectives within CTCD
 Contribute to the development of these models
– through model testing using sensitivity
analysis
 Identify the greatest sources of uncertainty
 Correctly reflect the uncertainty in predictions
– Uncertainty analysis: propagating the
parameter uncertainty through the model
Bayesian Emulation of Models
 Model output is an unknown function of its inputs
– Convenient prior is a Gaussian process
– Run code at set of ‘well chosen’ input points
– Obtain posterior distribution
 The emulator is the posterior distribution of the output
– Fast approximation
– Measure of uncertainty
– Nice analytical form for further analysis
Case study 1: Soil Plant Atmosphere (SPA)
Model
 SPA is a fine scale model created by Mat
Williams
– Aggregated SPA outputs were used to
create the simpler up-scaled model (ACM:
the Aggregated Canopy Model) by fitting a
set of simple equations with 9 parameters
 Can an emulator do any better than ACM as an
approximation to SPA?
ACM vs. Emulator for predicting SPA
 Bayesian emulator created using only 150 of
the total 6561 points used to create ACM
 Predicted remaining 6411 SPA points using
emulator and ACM
– Compare Root Mean Square Errors
(RMSE)
RMSE = 0.726 using ACM
RMSE = 0.314 using emulator
15
15
10
10
5
5
0
0
0
5
10
ACM Predictions
15
0
5
10
Emulator Predictions
15
Case Study 2: Sheffield Dynamic Global
Vegetation Model
 SDGVM is a point model
– each pixel represents an area, with an
associated vegetation type / land use
 Vegetation type is described using 14 plant
functional type parameters
 SDGVM is constantly being developed
– To improve process modelling
– To incorporate more detailed driving data
Plant Functional Type inputs
Examples:
 Leaf life span
 Leaf area
 Temperature when bud bursts
 Temperature when leaf falls
 Wood density
 Maximum carbon storage
 Xylem conductivity
 Emulator will allow small groups of inputs to vary,
others fixed at original default values
Soil inputs




Soil clay %
Soil sand %
Soil depth
Bulk density
Emulator for SDGVM
 Built an emulator for the NEP output of SDGVM
– 80 runs in the 5-dimensional input space were used as
training data
– A maximin Latin hypercube design was used to ensure
even coverage of the input space. Plant scientists
specified the ranges
254.0
330.0
326.0
145.0
236.0
123.0
6.304346
8.739128
8.30435
5.521742
9.43478
9.608696
7.913044 20.28985 6.521775
8.173912 13.4058
19.56525
5.56522 7.971025 50.000023
5.043478 0.72465
33.695625
8.782606 1.08695
75.0
9.478258 21.0145
71.739151
Run code
24.259
14.24
18.384
36.204
-3.214
1.774
…
…
Model testing: Sensitivity analysis
 We use sensitivity analysis for model checking
and for model interpretation
 Calculate main effects of each code input
– How does output change if we vary the
input, averaged over other inputs?
 Building the emulator has uncovered bugs
– simply by trying different combinations of
input values
20
10
0
mean NEP
30
Main Effect: Leaf life span
100
150
200
250
leaf life-span
300
350
20
15
10
5
0
Mean NEP
25
30
Main Effect: Leaf life span (updated)
100
150
200
250
leaf life-span
300
350
20
10
0
mean NEP
30
Main Effect: Senescence Temperature
4
5
6
7
senescence
8
9
10
Main Effects: Soil inputs
 Soil inputs had been fixed in SDGVM
 Output sensitive to sand content, but not clay content,
over these ranges
30
20
mean NEP
0
10
20
10
0
mean NEP
30
 More detailed soil input data are now used
0
5
10
15
soil clay%
20
25
0
20
40
soil sand%
60
Error discovered in the soil module
NEP
NEP
80
80
60
60
40
40
20
20
0
0
-20
-20
0
500000
1000000
Bulk density
Before…
1500000
0
500000 1000000 1500000
Bulk density
After…
SDGVM: new sensitivity analysis
 We initially analysed uncertainty in the NEP
output at a single test site, using rough ranges
for the 14 plant functional type parameters
 Assumed default (uniform) probability
distributions for the parameters
 The aim here is to identify the greatest
potential sources of uncertainty
150 160 170 180 190
150 160 170 180 190
NEP (g/m2/y)
160
170
180
190
200
1.8
2.4
150 160 170 180 190
150 160 170 180 190
2.2
160
180
200
leaf life span (days)
2.6
water potential (M Pa)
max. age (years)
NEP (g/m2/y)
2.0
0.0035
0.0040
0.0045
minimum growth rate (m)
Leaf life span 69.1%
Water potential 3.4%
Maximum age 1.0%
Minimum
growth rate
14.2%
Plant Functional Type parameters
 Uncertainty is driven by just a few key
parameters
– Maximum age
– Leaf life span
– Water potential
– Minimum growth rate
 The next step was to refine the rough
probability distributions for these parameters
Elicitation
 We elicited formal probability distributions for
the key parameters
– based on discussion with Ian Woodward
– representing his uncertainty about their
values within the UK
– noting that each really applies as an
average over the species actually present in
a given pixel
Leaf life span (days)
Minimum growth rate (m)
Maximum age (years)
Water potential (M Pa)
Uniform probability distributions
Leaf life span 69.1%
Refined probability distributions
Minimum growth rate 64%
Leaf life span 13.2%
Water potential 3.4%
Maximum age 1.0%
Water potential 3.3%
Maximum age 1%
Seeding density 10%
Minimum
growth rate
14.2%
Mean NEP = 174 gCm-2
Std deviation = 14.32 gCm-2
Mean NEP = 163 gCm-2
Std deviation = 12.65 gCm-2
Uncertainty analysis at sample sites
 We computed uncertainty analyses on NEP
outputs from SDGVM for 9 sites/pixels
Stockten on the Forest (Nr York)
Milton Keynes
Barnstaple (Devon)
Keswick (Lake District)
Lowland (Scotland)
Dartmoor
New Forest (Hampshire)
Kielder
S. Ballater (Scotland)
20
70
120
170
NEP
220
270
 Uncertainty is clearly substantial, even when
we only take account of uncertainty in these
parameters
 The most important parameter is minimum
growth rate, which accounts for typically at
least 60% of overall NEP uncertainty
– This suggests targeting this parameter for
research
 Seeding density?
Ongoing work
 We need to estimate uncertainty in the overall
UK carbon budget
– Developing new theory for aggregating
uncertainty over many pixels
 Windows software will be made available later
this year
www.shef.ac.uk/st1mck