Transcript ppt - Cosmo
Deutscher Wetterdienst
Bootstrapping – using different methods to
estimate statistical differences between
model errors
Ulrich Damrath
COSMO GM Rome 2011
Some typical situations occuring during
operational verification:
ahhdfkfflflflflflfkfkfkjdjdddnbdnnnd
Ulrich Damrath: Bootstrapping – using different methods to estimate statistical differences between model errors, COSMO GM Rome September 2011
Questions:
1.Question: Are the differences of scores due to noise or are they statistical
significant?
2. Question: Are there significant differences between the quality of different
models? (Interests user of forecasts)
3. Question: Are there significant differences between the quality of models for
different situations? (Interests developers of models)
Problem: BIASes may be normal distributed, but RMSEs?
A possible solution: Application of bootstrap techniques to get confidence
intervals or quantiles of the distribution
1. Question concerning the bootstrap method: How many replications are
necessary to get stable statistical results?
2. Question concerning the bootstrap method: How should the sample data be
grouped in order to avoid autocorrelation effect?
Ulrich Damrath: Bootstrapping – using different methods to estimate statistical differences between model errors, COSMO GM Rome September 2011
The principle of bootstrapping for a sample with 10 elements
Realisation 1: mean value using elements: 5 3 8 7 8 4 7 0 4 3
Realisation 2: mean value using elements: 3 2 0 5 1 2 0 2 2 8
Realisation 3: mean value using elements: 5 2 3 6 8 3 8 0 8 6
Realisation 4: mean value using elements: 7 5 1 6 4 0 1 2 1 6
Realisation 5: mean value using elements: 6 5 8 6 1 0 0 2 3 2
Realisation 6: mean value using elements: 1 0 5 5 6 5 8 5 5 8
Realisation 7: mean value using elements: 3 4 4 4 2 8 5 3 2 6
Realisation 8: mean value using elements: 0 8 2 0 6 4 1 6 6 5
Realisation 9: mean value using elements: 0 7 5 6 3 2 2 3 8 8
Realisation 10: mean value using elements: 2 2 3 6 6 6 6 2 0 0
The mean value of all realisations (replications) gives the bootstrap mean.
The standard deviation of all mean values gives the bootstrap standard deviation as
Ulrich Damrath: Bootstrapping – using different methods to estimate statistical differences between model errors, COSMO GM Rome September 2011
Bootstrap properties for three analytical cases
Number of sample values: 31
Ulrich Damrath: Bootstrapping – using different methods to estimate statistical differences between model errors, COSMO GM Rome September 2011
Bootstrap properties for three analytical cases
Number of sample values: 310
Ulrich Damrath: Bootstrapping – using different methods to estimate statistical differences between model errors, COSMO GM Rome September 2011
Bootstrap properties for three analytical cases
Number of sample values: 3100
Ulrich Damrath: Bootstrapping – using different methods to estimate statistical differences between model errors, COSMO GM Rome September 2011
Bootstrap properties for three analytical cases
Number of sample values: 31000
Ulrich Damrath: Bootstrapping – using different methods to estimate statistical differences between model errors, COSMO GM Rome September 2011
Bootstrap properties for three analytical cases
Number of sample values: 310000
Ulrich Damrath: Bootstrapping – using different methods to estimate statistical differences between model errors, COSMO GM Rome September 2011
Conclusion concerning the convergence of the method:
A number of ~500 replications seems to be appropriate
to get a stable value for the bootstrap variance.
Setting the sample characteristics:
Treating each pair of observations and forecasts as a
single sample member leeds to large sample sizes with
relatively high autocorrelation. Therefore values are grouped
by blocks of one, two and four days.
Additionally, a block size was constructed using the
optimal block length LOPT which can be estimated by LOPT NINT{[ 61/ 2 * a /(1 a)] * N 1/ 3}
with ‚a‘ as a function of autocorrelation and N as sample size.
Ulrich Damrath: Bootstrapping – using different methods to estimate statistical differences between model errors, COSMO GM Rome September 2011
The real world: Dependence of bootstrap standard deviation and
bootstrap confidence intervals on the number of replications
2m-temperature forecasts during Summer 2010 and
10m-wind speed during Winter 2010/2011.
BIASes for different periods, models and weather elements
Ulrich Damrath: Bootstrapping – using different methods to estimate statistical differences between model errors, COSMO GM Rome September 2011
The real world: Dependence of bootstrap standard deviation and
bootstrap confidence intervals on the number of replications
2m-temperature forecasts during Summer 2010 and
10m-wind speed during Winter 2010/2011.
RMSEs for different periods, weather elements and
types of mean wind direction over Germany (700 hPa)
Ulrich Damrath: Bootstrapping – using different methods to estimate statistical differences between model errors, COSMO GM Rome September 2011
Quantiles 10% and 90% for different bootstrap types, Period 01.06.2010 – 31.08.2010
COSMO-EU (solid), COSMO-DE (dotted), Element Temperature 2m
Top: Median and quantiles (green: overlapping quantiles, red: no overlapping quantiles)
Bottom: another visualisation of the overlapping intervals
(bluish: overlapping intervals, deep red: no overlapping intervals)
Ulrich Damrath: Bootstrapping – using different methods to estimate statistical differences between model errors, COSMO GM Rome September 2011
Quantiles 10% and 90% for different bootstrap types, Period 01.06.2010 – 31.08.2010
COSMO-EU (solid), COSMO-DE (dotted), Element Wind speed 10m
Top: Median and quantiles (green: overlapping quantiles, red: no overlapping quantiles)
Bottom: another visualisation of the overlapping intervals
(bluish: overlapping intervals, deep red: no overlapping intervals)
Ulrich Damrath: Bootstrapping – using different methods to estimate statistical differences between model errors, COSMO GM Rome September 2011
Comparison of overlapping quantile intervals for different wind directions
NW: north westerly flow,
SW: south westerly flow,
NO: north easterly flow,
SO: south easterly flow
Ulrich Damrath: Bootstrapping – using different methods to estimate statistical differences between model errors, COSMO GM Rome September 2011
Comparison of overlapping quantile intervals for different wind directions
NW: north westerly flow,
SW: south westerly flow,
NO: north easterly flow,
SO: south easterly flow
Ulrich Damrath: Bootstrapping – using different methods to estimate statistical differences between model errors, COSMO GM Rome September 2011
Some typical situations occuring during
operational verification in 2009, 2010 and 2011:
Modification of turbulent mixing length May 2009:
lturb
z
1
z
lturlen
COSMO EU and COSMO DE(old ) : lturlen 500 m
COSMO DE(new) :lturlen 150 m
Ulrich Damrath: Bootstrapping – using different methods to estimate statistical differences between model errors, COSMO GM Rome September 2011
Conclusions:
Different types of grouping the samples lead to
different result concerning the statistical significance
of the model errors.
Block methods give more or less equivalent results.
The results for the comparison of different models
may users lead to a decision which model should be
used.
The results for different weather types (flow
directions) may developers give some hints
concerning the development of model physics.
Ulrich Damrath: Bootstrapping – using different methods to estimate statistical differences between model errors, COSMO GM Rome September 2011
References:
Efron, B., Tibshirani, R.J.(1993): An Introduction to the Bootstrap
(Chapman & Hall/CRC Monographs on Statistics & Applied Probability)
Mudelsee, M. (2010): Climate Time Series Analysis – Classical Statistical and
Bootstrap Methods, Springer Dordrecht, Heidelberg, London, New York
Ulrich Damrath: Bootstrapping – using different methods to estimate statistical differences between model errors, COSMO GM Rome September 2011