Uncertainty Analysis Using GEM-SA
Download
Report
Transcript Uncertainty Analysis Using GEM-SA
Uncertainty Analysis Using
GEM-SA
Outline
Setting up the project
Running a simple analysis
Exercise
More complex analyses
GEM-SA course - session 4
2
Setting up the project
Create a new project
Select Project -> New,
or click toolbar icon
Project dialog appears
We’ll specify the data
files first
GEM-SA course - session 4
4
Files
Using “Browse”
buttons, select
input and output
files
The “Inputs” file contains one column for each
parameter and one row for each model
training run (the design)
The “Outputs” file contains the outputs from
those runs (one column, in this example)
GEM-SA course - session 4
5
Our example
We’ll use the example “model1” in the GEM-SA
DEMO DATA directory
This example is based on a vegetation model
with 7 inputs
RESAEREO, DEFLECT, FACTOR, MO, COVER,
TREEHT, LAI
The model has 16 outputs, but for the present
we will consider output 4
June monthly GPP
GEM-SA course - session 4
6
Number of inputs
Click on Options tab
Select number of
inputs using
Or click “From Inputs
File”
GEM-SA course - session 4
7
Define input names
Click on “Names …”
The “Input
parameter names”
dialog opens
Enter parameter
names
Click “OK”
GEM-SA course - session 4
8
Complete the project
We will leave all other settings at their default
values for now
Click “OK”
The Input Parameter
Ranges window
appears
GEM-SA course - session 4
9
Close and save project
Click “Defaults from
input ranges” button
Click “OK”
Select Project -> Save
Or click toolbar icon
Choose a name and
click “Save”
GEM-SA course - session 4
10
Running a simple analysis
Build the emulator
Click
to build the emulator
A lot of things now start to happen!
The log window at the bottom starts to record various
bits of information
A little window appears showing progress of
minimisation of the roughness parameter estimation
criterion
The “Main Effects” tab is selected, in which several
graphs are drawn
Progress bar at the bottom
GEM-SA course - session 4
12
Focus on the log window
The “Main Effects” and “Sensitivity Analysis”
tabs are concerned with SA, and will be
considered in the next session
We are interested just now simply in Uncertainty
Analysis (UA)
The “Output Summary” tab contains all we need
and more
But the key things can be seen more simply in
the log window at the bottom
Diagnostics of the emulator build
The basic uncertainty analysis results
GEM-SA course - session 4
13
Emulation diagnostics
Note where the log window reports …
Estimating emulator parameters by maximising probability distribution...
maximised posterior for emulator parameters: precision = sigmasquared = 0.342826, roughness = 0.217456 0.0699709 0.191557
16.9933 0.599439 0.459675 1.01559
The first line says roughness parameters have
been estimated by the simplest method
The values of these indicate how non-linear the
effect of each input parameter is
Note the high value for input 4 (MO)
GEM-SA course - session 4
14
Uncertainty analysis – mean
Below this, the log reports
Estimate of mean output is 24.145, with variance 0.00388252
So the best estimate of the output (June GPP) is
24.1 (mol C/m2)
This is averaged over the uncertainty in the 7 inputs
Better than just fixing inputs at best estimates
There is an emulation standard error of 0.062 in this
figure
GEM-SA course - session 4
15
Uncertainty analysis – variance
The final line of the log is
Estimate of total output variance = 73.9033
This shows the uncertainty in the model output
that is induced by input uncertainties
The variance is 73.9
Equal to a standard deviation of 8.6
So although the best estimate of the output is 24.1,
the uncertainty in inputs means it could easily be as
low as 16 or as high as 33
GEM-SA course - session 4
16
Exercise
A small change
Run the same model with Output 11 instead of
Output 4
Calculate the coefficient of variation (CV) for this
output
NB: the CV is defined as the standard deviation
divided by the mean
GEM-SA course - session 4
18
More complex analyses
Input distributions
Default is to assume the uncertainty in each
input is represented by a uniform distribution
Range determined by the range of values found in
the input file or separately input
A normal (gaussian) distribution
is generally a more realistic
representation of uncertainty
0.3
0.2
Range unbounded
More probability in the middle
0.1
-4
GEM-SA course - session 4
-2
0
2 x
4
20
Changing input distributions
Reopen Project
dialog by Project > Edit … or
clicking on
Select Options tab
Click All unknown,
product normal
Then OK
A new dialog
opens to specify
means and
variances
GEM-SA course - session 4
21
Model 1 example
Uniform
distributions from
input ranges
Normal
distributions to
match
Range about 4 std
deviations
Except for MO
Narrower
distribution
Uniform
Parameter
Normal
Lower
Upper
Mean
Variance
RESAEREO
80
200
140
900
DEFLECT
0.6
1
0.8
0.01
FACTOR
0.1
0.5
0.3
0.01
MO
30
100
60
100
COVER
0.6
0.99
0.8
0.01
TREEHT
10
40
25
100
3.75
9
6.5
1
LAI
GEM-SA course - session 4
22
Effect on UA
After running the revised model, we see:
It runs faster, with no need to rebuild the emulator
The emulator fit is unchanged
The mean is changed a little and variance is halved
Estimate of mean output is 26.2698, with variance 0.00784475
Estimate of total output variance = 38.1319
GEM-SA course - session 4
23
Reducing MO uncertainty further
If we reduce the variance of MO even more, to
49:
UA mean changes a little more and variance reduces
again
Estimate of mean output is 26.3899, with variance 0.0108792
Estimate of total output variance = 27.1335
Notice also how the emulation uncertainty has
increased (0.004 for uniform)
This is because the design points cover the new
ranges less thoroughly
GEM-SA course - session 4
24
A homework exercise
What happens if we reduce the uncertainty in
MO to zero?
Two ways to do this
Literally set variance to zero
Select “Some known, rest product normal” on Project
dialog, check the tick box for MO in the mean and
variance dialog
What changes do you see in the UA?
GEM-SA course - session 4
25
Cross-validation
Reopen the Project dialog and select the
Options tab
Look at the bottom menu box, labelled “Crossvalidation”
There are 3 options
None
Leave-one-out
Leave final 20% out
CV is a way of checking the emulator fit
Default is None because CV takes time
GEM-SA course - session 4
26
Leave-one-out CV
After estimating roughness and other
parameters, GEM predicts each training run
point using only the remaining n-1 points
Close to 1
Results appear in log window
Cross Validation Root Mean-Squared Error = 0.907869
Cross Validation Root Mean-Squared Relative Error = 4.34773 percent
Cross Validation Root Mean-Squared Standardised Error = 1.15273
Largest standardised error is 4.32425 for data point 61
Cross Validation variances range from 0.18814 to 3.92191
Written cross-validation means to file cvpredmeans.txt
Written cross-validation variances to file cvpredvars.txt
(Model 1, output 4, uniform inputs)
GEM-SA course - session 4
27
Leave final 20% out CV
This is an even better check, because it tests the
emulator on data that have not been used in any
way to build it
Emulator is built on first 80% of data and used to
predict last 20%
Cross Validation Root Mean-Squared Error = 1.46954
Cross Validation Root Mean-Squared Relative Error = 7.4922 percent
Cross Validation Root Mean-Squared Standardised Error = 1.73675
Largest standardised error is 5.05527 for data point 22
Cross Validation variances range from 0.277304 to 4.886
Standardised error a bit bigger
But not bad for just 24 runs predicted
GEM-SA course - session 4
28
Output Summary tab
The “Output Summary” tab presents all of the
key results in a single list
Tidier than searching for the details in the log
window
Although the log window actually has more
information
Can print using
GEM-SA course - session 4
29
Other options
There are various other options associated with
the emulator building that we have not dealt with
See built in help facility for explanations
Also slides at the end of session 3
But we’ve done the main things that should be
considered in practice
And it’s enough to be going on with!
GEM-SA course - session 4
30
When it all goes wrong
How do we know when the emulator is not working?
Large roughness parameters
Especially ones hitting the limit of 99
Large emulation variance on UA mean
Poor CV standardised prediction error
Especially when some are extremely large
In such cases, see if a larger training set helps
Other ideas like transforming output scale
A suite of diagnostics is being developed in MUCM
See Bastos and O’Hagan on my website
http://tonyohagan.co.uk/academic/pub.html
Not implemented in GEM-SA yet
GEM-SA course - session 4
31