Transcript part 1
Vector Generalized Additive Models
and applications to extreme value
analysis
(1)
(2)
Olivier Mestre (1,2)
Météo-France, Ecole Nationale de la Météorologie, Toulouse, France
Université Paul Sabatier, LSP, Toulouse, France
Based on previous studies realized in collaboration with :
Stéphane Hallegatte (CIRED, Météo-France)
Sébastien Denvil (LMD)
SMOOTHER
« Smoother=tool for summarizing the trend of a response measurement Y
as a function of predictors » (Hastie & Tibshirani)
estimate of the trend that is less variable than Y itself
Smoothing matrix S
Y*=SY
The equivalent degrees of freedom (df) of the smoother S is the trace of S.
Allows compare with parametric models.
Pointwise standard error bands
COV(Y*)=V=S tS ² given an estimation of ², this allows approximate
confidence intervals (values : ±2square root of the diagonal of V)
SCATTERPLOT SMOOTHING EXAMPLE
Data: wind farm production vs numerical windspeed forecasts
SMOOTHING
Problems raised by smoothers
How to average the response values in
each neighborhood?
How large to take the neighborhoods?
Tradeoff between bias and variance of Y*
SMOOTHING: POLYNOMIAL (parametric)
Linear and cubic parametric least squares fits: MODEL DRIVEN
APPROACHES
SMOOTHING: BIN SMOOTHER
In this example, optimum intervals are determined by means of a
regression tree
SMOOTHING: RUNNING LINE
Running line
KERNEL SMOOTHER
Watson-Nadaraya
SMOOTHING: LOESS
The smooth at the target point is the fit of a locally-weighted linear fit
(tricube weight)
CUBIC SMOOTHING SPLINES
This smoother is the solution of the following optimization problem:
among all functions f(x) with two continuous derivatives, choose the
one that minimizes the penalized sum of squares
n
Y f X
i 1
2
i
i
Closeness to the data
f " x dx
b
2
a
penalization of the curvature of f
It can be shown that the unique solution to this problem is a natural cubic
spline with knots at the unique values xi
Parameter can be set by means of cross-validation
CUBIC SMOOTHING SPLINES
Cubic smoothing splines with equivalent df=5 and 10
Additive models
Gaussian Linear Model
Gaussian Additive model
:
:
IE[Y]=o+1X1+2X2
IE[Y]=S1(X1)+S2(X2)
S1, S2 smooth functions of predictors X1, X2, usually LOESS, SPLINE
Estimation of S1, S2 : « Backfitting Algorithm »
PRINCIPLE OF THE BACKFITTING ALGORITHM
Y=S1(X1)+e
estimation S1*
Y-S1*(X1)=S2(X2)+e
estimation S2*
Y-S2*(X2)=S1(X1)+e
estimation S1**
Y-S1**(X1)=S2(X2)+e
estimation S2**
Y-S2**(X2)=S1(X1)+e
estimation S1***
Etc… until convergence
Additive models
Additive models
One efficient way to perform non-linear regression, but…
Crucial point
ADAPTED WHEN ONLY FEW PREDICTORS
2, 3 predictors at most
Additive models
Philosophy
DATA DRIVEN APPROACHES RATHER THAN
MODEL DRIVEN APPROACH
USEFUL AS EXPLORATORY TOOLS
Approximate inference tests are possible, but full inferences are better
assessed by means of parametric models
Generalized Additive models (GAM)
Extension to non-normal dependant variables
Generalized additive models : additive modelling of the natural
parameter of exponential family laws (Poisson, Binomial, Gamma,
Gauss…).
g[µ]==S1(X1)+S2(X2)
Vector Generalized Additive Models (VGAM): one step beyond…
Example 1
Annual umber and maximum integrated
intensity (PDI) of hurricane tracks
over the North Atlantic
Number of Hurricanes
Number of Hurricanes in North Atlantic ~ Poisson distribution
Factors influencing the number of hurricanes
GAM applied to number of hurricanes (YEAR,SST,SOI,NAO)
GAM model
Log()= o+S1(SST)+S2(SOI)
PARAMETRIC model
“broken stick model” (with continuity constraint) in SOI, revealed by
GAM analysis
log()
= o+SOI(1)SOI+SSTSST
= o+SOI(1)SOI+SOI(2)(SOI-K)+SSTSST
SOI<K
SOIK
The best fit obtained for SOI value K=1
log-likelihood=-316.16, to be compared with -318.71 (linearity)
standard deviance test allows reject linearity (p value=0.02)
Expectation of the hurricane number is then straightforwardly
computed as a function of SOI and SST
EXPECTATION OF HURRICANE NUMBERS
OBSERVED vs EXPECTED: r=0.6