Long-range dependence in the North Atlantic Oscillation

Download Report

Transcript Long-range dependence in the North Atlantic Oscillation

Extreme Value Modelling in Climate Science:
Why do it and how it can fail!
Professor David B. Stephenson
U. of Exeter
Aims:
•
•
What the heck do we mean by “extreme”?
Summary of statistical methods used in climate science
Statistics for modelling the process rather than for just making indices
•
Some examples of extreme value modelling:
• Problem 1: Properties of drought indices
• Problem 2: Trends in extreme gridded temperatures
• Problem 2: Trends in largest annual skew tides;
NCAR summer colloquium, 8 June 2011
© 2011 [email protected]
1
Some wet and windy extremes
Convective severe storm
Hurricane
Extra-tropical cyclone
Polar low
Extra-tropical cyclone
2
Some dry and hot extremes
Drought
Dust storm
Dust storm
Wild fire
3
All are complex multivariate spatio-temporal events!
So to massively simplify, it is helpful to focus in on the time evolution of
single variable related to the event e.g. wind speeds of major
extratropical cyclones passing by London, losses to an insurers, etc.
MARKED POINT PROCESS: random times, random marks
4
What do we mean by “extreme”?
Large meteorological values
NOTE! Extremeness is not a
 Maximum value (i.e. a local extremum)
binary property of an event
 Exceedance above a high threshold
but an ordering of a process
 Record breaker (time-varying threshold
equal to max of previously observed
Gare Montparnasse, 22 Oct 1895
values)
Rare event in the tail of distribution
(e.g. less than 1 in 100 years – p=0.01)
Large losses (severe or high-impact)
(e.g. $200 billion if hurricane hits Miami)
hazard, vulnerability, and exposure
Risk  V (h( x, t ))e( x, t )
Stephenson, D.B. (2008): Chapter 1: Definition, diagnosis, and origin of extreme weather and climate events
In Climate Extremes and Society , R. Murnane and H. Diaz (Eds), Cambridge University Press, pp 348 pp. 5
IPCC 2001 definitions
X~N(0,1) Y~N(0.5,1.5)
Simple extremes:
“individual local weather variables
exceeding critical levels on a continuous
scale”
Complex extremes:
“severe weather associated with particular
climatic phenomena, often requiring
a critical combination of variables”
Extreme weather event:
“An extreme weather event is an event
that is rare within its statistical reference
distribution at a particular place.
Definitions of "rare" vary, but an extreme
weather event would normally be as
rare or rarer than the 10th or 90th percentile.”
Extreme climate event:
“an average of a number of weather events over
a certain period of time which is itself extreme
(e.g.rainfall over a season)”
px=rank(x)/(n+1)
6
How might extreme events change?
Changes in location, scale,
and shape all lead to
big changes in the tail of the
distribution.
Some physical arguments
exist for changes in location
and scale.
E.g. multiplicative change
in precipitation due to
increased humidity
(change in scale)
Scale change impacts high quantiles!
Example: Normal variable
1% increase in standard deviation s
shifts the 10-year return value (x0.9) by
1.28s and the 200-year return value (x0.995)
by 2.58s.
7
How can we relate the tails …
8
to the bulk of the distribution?
Change in scale
PDF =
Or …
Change in shape
Probability Density Function
Probable Dinosaur Function??
9
Quantile attribution
Describe the changes in quantiles in terms of changes in
the location, the scale, and the shape of the parent
distribution:
IQR
X   X 0.5 
( X   X 0.5 )
IQR
 shape changes
The quantile shift is the sum of:
• a location effect (shift in median)
• a scale effect (change in IQR)
• a shape effect
Ferro, C.A.T., D.B. Stephenson, and A. Hannachi, 2005: Simple non-parametric techniques
for exploring changing probability distributions of weather, J. Climate, 18, 4344 4354.
Beniston, M. and Stephenson, D.B. (2004): Extreme climatic events and their evolution under
changing climatic conditions, Global and Planetary Change, 44, pp 1-9
10
Example: Regional Model Simulations of daily Tmax
T90
ΔT90-Δm
ΔT90 (2071-2100 minus 1971-2000)
ΔT90-Δm-(T90-m) Δs/s
 Changes in location, scale and shape all important
11
Statistical methods used in climate science
Extreme indices – sample statistics
Basic extreme value modelling





More complex EVT models







GEV modelling of block maxima
GPD modelling of excesses above high
threshold
Point process model of exceedances
Inclusion of explanatory factors
(e.g. trend, ENSO, etc.)
Spatial pooling
Max stable processes
Bayesian hierarchical models
+ many more
Other stochastic process models
12
Extreme indices are useful and easy but …





They don’t always measure extreme
values in the tail of the distribution!
They often confound changes in rate
and magnitude
They strongly depend on threshold and
so make model comparison difficult
They say nothing about extreme
behaviour for rarer extreme events at
higher thresholds
They generally don’t involve probability
so fail to quantify uncertainty (no
inferential model)
More informative approach:
model the extremal process using statistical
models whose parameters are then sufficient
to provide complete summaries of all
other possible statistics (and can simulate!)
See: Katz, R.W. (2010) “Statistics of Extremes in Climate Change”, Climatic Change, 100, 71-7613
Furthermore … indices are not METRICS!
One should avoid the word “metric” unless
the statistic has distance properties! Index,
sample/descriptive statistic, or measure is
a more sensible name!
Oxford English Dictionary:
Metric - A binary function of a topological space
which gives, for any two points of the space, a value
equal to the distance between them, or a value
treated as analogous to distance for analysis.
Properties of a metric:
d(x, y) ≥ 0
d(x, y) = 0 if and only if x = y
d(x, y) = d(y, x)
d(x, z) ≤ d(x, y) + d(y, z)
14
Universal Poisson process for extremes
N=number of points
with Z>z
For a large number n of
independent and identically
distributed values and a
sufficiently high threshold z:
N ~ Poisson(Λ)
t=t1
t=t2
Λn e  Λ
Pr(N  n) 
n!

 z   
Λ  (t2  t1 ) 1   



 

 Miraculous limit theorem for tails of i.i.d. variables!
1/ 
15
Probability models for maxima and excesses
lim n, z  
Λn e  Λ
Pr(N  n) 
n!

 z   
Λ  (t2  t1 ) 1   



 

1/ 
 Pr max( Z )  z  Pr  N ( z )  0  e 
Generalized Extreme Value (GEV) distribution
 Pr Z  z | Z  u   ( z ) /  (u )
1/ 

 z  u 
 1   
     (u   )

   

Generalized Pareto Distribution (GPD)
Note: extremal properties are
characterised by only three parameters
(for ANY underlying distribution!)
16
Why use these probability models?

Model parameters are sufficient for providing a
complete threshold-independent description of
extremal properties. All other statistics of the
extremal process are a function of these three
parameters.

The models provide a rigorous probability
framework for making inference about extremal
behaviour. Their mathematically justifiable
parametric form allows more precise inference
about tail properties.

Model can be used to smoothly interpolate between
empirical quantiles/probabilities. Such interpolation has
made efficient use of all the large values;

Model can be used to extrapolate out carefully to rarer
less frequently (or never!) observed events AND
provide intervals for such predictions!
17
Problem 1: Do 2 drought series have similar extremal properties?
Observed index
n=90
Reconstructed index
n=5000
Data example kindly
provided by
Eleanor Burke,
Met Office
18
Do 2 drought series have similar extremal properties?
Observed index
n=90
+
Reconstructed index
n=5000
Data example kindly
provided by
Eleanor Burke,
Met Office
19
Return level plots
Empiricalquantiles versusempiricalreturnperiods
(1  i /(n  1) , y[i ] )
1
Outlier in the extended data set? Slight kink at 2.5 in d1
20
Quantile-Quantile plot
Empirical distributions similar except for the big outlier in d1
21
Modelling the excesses using GPD

 z  u 
F ( z )  P rZ  z | Z  u  1  1    ~ 
  

~     (u   )
1
 z  u 
 f ( z )  ~ 1    ~ 

  
   (u   )
 E (Z ) 
1 
Assu m ption
s:
1. Asymptoticsupport
lim n, u  
2. Independence of Zi
1 / 
11 / 
22
Use of mean excess to find a suitable threshold u
Observed
d0
E(X - u | X  u)
  u

1 
Simulated
d1
 GPD implies linear behaviour in mean excess for u from about 0 to 1
 Try fits with u=0.5 as threshold
23
Nested model approach
1
 z  u 
f ( z )  ~ 1    ~ 

  
11 / 
Nu llm ode lH 0
~  ~
0
  0
C on trastm ode lH a
~  ~  ~ X
0
1
i
   0  1 X i
obs data
0
Xi  
1 extendeddata
24
Maximum Likelihood Estimates
~
Model
No. of
~

0
1
0
params
Akaike Inf.
Criterion
1
Null
2
1050.4
0.631
(0.023)
---
-0.107
(0.024)
---
Contrast
4
1053.6
0.629
(0.023)
0.234
(0.321)
-0.105
(0.024)
-0.299
(0.313)
Contrast
with outlier
4
1085.2
0.599
(0.021)
0.264
(0.321)
-0.042
(0.019)
-0.361
(0.313)
Predictedupper limit u - ~0 / 0  0.5  0.631/ 0.107  6.4
Is there a statistically significant difference at 5% level?
• Difference in deviance 1050.4-1053.6+2*2=0.8
• Parameter estimates 0.234/0.321=0.729
0.299/0.313=0.956
D2  D4 ~ 22  p=0.67
ˆ / sˆ ~ t n  p  p=0.23
ˆ / sˆ ~ t n  p  p=0.17
No significant difference between the exceedances at 5% level
25
Model checking: do the quantiles match?
Re tu rnpe ri od
T
1
P r(Z  z )
1
P r(Z  z | Z  u ) P r(Z  u )
1

(1  F ( z )) P r(Z  u )

Re tu rnval u e
~

zT  u  T P r(Z  u )  1


No! The null model underestimates the empirical quantiles

26
Model checking: are estimates stable?
ˆ
ˆ
ˆ

No! Constant up to u=1.7 but then trends for larger values?!
27
Model checking: uniform in time?
Uniform distribution in time and exponential between events
28
Problem 2: Extremes in surface temperature
Coelho, C.A.S., Ferro, C.A.T., Stephenson, D.B. and Steinskog, D.J.
(2008): Methods for exploring spatial and temporal variability of
extreme events in climate data, Journal of Climate, 21, pp 2072-2092
Observed surface temperatures 1870-2005
Monthly mean gridded surface temperature (HadCRUT2v)
 5 degree resolution
 Summer months only: June July August
 Grid points with >50% missing values and SH are omitted.
Maximum monthly temperatures
0
20
40
60
80
Maximum temperature
-150
0
-100
5
10
-50
15
0
20
Celsius
50
25
100
30
150
35
40
29
Non-stationarity due to seasonality and long term trends
Example: Grid point in Central Europe (12.5ºE, 47.5ºN)
2003 exceedance
Excess
(Ty,m – uy,m)
a)
10
5
0
-5
Long term trend in mean
Temperature
(Celsius)
15
20
75th quantile (uy,m = 16.2ºC)
2001
2002
2003
2004
year
2005
2006
30
GPD scale and shape estimates
e 0
e
1


 z  u 
Pr( Z  z | Z  u )  1   



 

log    0  1 x
  0
1

Scale parameter is large over highlatitude land areas AND shows some
dependence on x=ENSO.
Shape parameter is mainly negative
suggesting finite upper temperature.
Spatial pooling has been used to get
more reliable less noisey shape
estimates
31
How significant is ENSO on extremes?
 Null hypothesis of no effect can only be rejected with
confidence over tropical Pacific and Northern Continents
32
Use of covariates in models
“with four parameters I can fit an
elephant and with five I can make him
wiggle his trunk.” - John von Neumann
33
Model can be used to estimate return periods
Return period for the excess
August 2003: Return
period
for b)August
2003
0
0
20
20
40
40
60
60
80
80
a) August 2003: Excesses above 75% threshold
Excess
for August 2003
-150
0
-100
1
-50
0
2
Celsius
50
100
3
150
-150
4
1
-100
5
-50
0
10
50
50
100
150
150
500
years
 return period of 133 years for August 2003 event over Europe
34
Spatial pooling
Pool over local grid points but allow for spatial
variation by including local spatial covariates to
reduce bias (bias-variance tradeoff).
For each grid point, estimate 5 GPD
parameters by maximising the following
likelihood over the 8 neighbouring grid points:
Lij 
1
 f (y
i  i , j  j
;  i  i , j  j ,  i  i , j  j )
j  1
i  1
log i  i , j  j  
 i  i , j  j  
0
i, j
  ( xi  i  xi )   ( y j  j  y j )
x
i, j
y
i, j
0
i, j
No spatial pooling:
Local pooling:
2 parameters from n data values
5 parameters from 9n data values
Coelho et al., 2008:Methods for Exploring Spatial and Temporal
Variability of Extreme Events in Climate Data, J. Climate
35
Teleconnections of extremes
0
20
40
60
80
Bivariate measure of extremal dependency:
2log Pr(Y  u )

1
log Pr(( X  u ) & (Y  u ))
b) Chi bar (75th quantile) Central Europe
Coles et al.,
Extremes, (1999)
-150
-0.4
-100
-0.1
-50
0.1
0
50
0.4
100
150
0.7
 association with extremes in subtropical Atlantic
1
36
Problem 3: Is there a time trend in extreme skew tides?
10 largest skew tides
for each of n=149 years
Is there a time trend in the
extremes?
Dots show largest values
Line is linear fit to the mean
of the 10 values
Data example kindly
provided by
Tom Howard,
Met Office
37
r Largest Order model
lim n, z  

 z   
Λ  (t 2  t1 ) 1   

  

n Λ
Λe
P r(N  n) 
n!
1 / 
s  
 P r max(Z )  z  P rN ( z )  r  1   e
s  0 s!
r Large stO rde rdistribu tion

(r )

r 1
Tre n dm ode l
   0  1 X
  0
  0
38
Maximum Likelihood Estimates
Model
No. of
AIC
0
0
1
params
0
Null
r=10
3
-6882.1
0.661
(0.0096)
---
0.147
(0.0065)
0.033
(0.025)
Null
r=5
3
-2469.2
0.659
(0.0099)
---
0.146
(0.0068)
0.050
(0.034)
Null
r=1
3
-91.5
0.658
(0.014)
---
0.146
(0.0099)
0.031
(0.059)
Trend
r=10
4
-6880.4
0.754
(0.0095)
-4.57E-5
(???)
0.147
(0.0042)
0.032
(0.0025)
• Estimates for null model are similar for r=1,5,10
• Estimates get more precise for larger r
• Null model has slightly better AIC than trend model
• Trend model has trouble estimating trend parameter
 Could either constrain shape=0 and/or pool over more data
39
Model checking: null model r=10
Model slightly underestimates largest r=1 and r=2 quantiles
40
Model checking: trend model r=10
 Including a time trend does not improve the r=1 and 2 fits
41
Summary

Sufficiently large values of an independent
identically distributed variable can be described by
a 3-parameter non-homogenous Poisson process;

This leads to simple parametric forms for the
distribution of maxima and r-largest values (GEV)
and exceedances above a high threshold (GPD);

MLE can be used to estimate the parameters (but
estimates are often sensitive to individual values);

Non-stationarity can be accounted forby making
model parameters systematic functions of
covariates;

Spatial pooling can be used to obtain more
precise estimates but covariates have to be
included to avoid bias
42
Some outstanding questions …
1.
2.
3.
4.
5.
What do extreme indices really tell us about extremes?
How best to develop well-specified extreme value models that account for
non-stationarity (non-identical distributions) caused by natural and climate
change processes?
How to deal with large sampling uncertainty due to the rarity of events and
shortness of available observational records? Robust estimation in the
presence of outlier events?
What can imperfect climate models tell us about real world extremes?
How to bias correct model errors in extremes?
How to develop and test well-specified inferential frameworks for prediction
and attribution of real world extremes from multi-model ensembles?
43
References
Stephenson, D.B. (2008): Chapter 1: Definition, diagnosis, and origin of extreme weather and
climate events, In Climate Extremes and Society , Cambridge University Press, pp 348 pp.
Definitions of what we mean by extreme, rare, severe and high-impact events
Ferro, C.A.T., D.B. Stephenson, and A. Hannachi, 2005: Simple non-parametric techniques for
exploring changing probability distributions of weather, J. Climate, 18, 4344 4354.
Attribution of changes in extremes to changes in bulk distribution
Beniston, M. and Stephenson, D.B. (2004): Extreme climatic events and their evolution under
changing climatic conditions, Global and Planetary Change, 44, pp 1-9
Time-varying attribution of changes in heat wave extremes to changes in bulk distribution
Coelho, C.A.S., Ferro, C.A.T., Stephenson, D.B. and Steinskog, D.J. (2008): Methods for
exploring spatial and temporal variability of extreme events in climate data, Journal of Climate,
21, pp 2072-2092
GPD fits to gridded data including covariates. Spatial pooling and teleconnection methods.
Antoniadou, A., Besse, P., Fougeres, A.-L., Le Gall, C. and Stephenson, D.B. (2001): L Oscillation
Atlantique Nord NAO: et son influence sur le climat europeen, Revue de Statistique Applique ,
XLIX (3), pp 39-60
One of the earliest papers to use climate covariates in EVT fits – NAO effect on CET extremes
Stuart Coles, An Introduction to Statistical Modeling of Extreme Values, Springer.
Excellent overview of extreme value theory.
44
There are worse things
than extreme climate …
e.g. extreme ironing!
Thanks for your attention
[email protected]
45
Tubing Boulder Creek on Sunday?
Sunday noon
whitewatertubing.com
See me today if you are
interested.
46
Proposed taxonomy of atmospheric extremes
Rarity
Rare
weather/climate
events
Rare and Severe
events
Rare, Severe,
Acute events
e.g. hurricane in New
England
Rare and Non-Severe
Events
Rare, Severe
Chronic events
Rare, Non-severe,
Acute events
e.g. European blocking
e.g. hurricane over the
South Atlantic ocean
Severity
Rare, Non-severe,
Chronic events
e.g. Atlantic blocking
Rapidity
Acute:
Chronic:
Having a rapid onset and following a short but severe course.
Lasting for a long period of time or marked by frequent recurrence
Stephenson, D.B. (2008): Chapter 1: Definition, diagnosis, and origin of extreme weather and climate events,
In Climate Extremes and Society , R. Murnane and H. Diaz (Eds), Cambridge University Press, pp 348 pp.
47