Project - MPO581

Download Report

Transcript Project - MPO581

Regression
Jenna King
MPO581
April 24, 2012
Background
Background
Background – Linear Regression



Quantifies the relationship between a
dependent variable and one or more
independent variables
Simple linear regression is used when there is
only one independent variable
Multivariate linear regression is used when
there is more than one independent variable
Background – Experiment


ENSO prediction is inhibited by the “spring
barrier.” Prior to the end of April it is very difficult
to predict ENSO for the coming fall and winter.
By June an ENSO signal has usually started to
appear so forecasting is much easier.
In this experiment I will create logistic
regression models using 25 years of both
March and June SST, OLR, precip, SLP, and Uwind data versus Dec.-Feb. averaged SST data
of the same year and compare the results.
El Niño/La Niña Criteria


In order for an El Niño event to be officially
recognized, there needs to be a three month
period in which the SST, averaged over 120W170W and 5S-5N, is 0.5ºC higher than the 30year average. La Niña criteria is the same but
0.5ºC lower than the 30-year average.
The data used in this experiment covered 25
years, so that is the time period over which the
mean was calculated.
Multinomial Logistic Regression



The main reason for using a logistic regression
is that it produces probabilities for a binary
outcome.
In this case, instead of producing projected
SSTs (which is theoretically infinite in the
model), a logistic model will produce
probabilities for 1 (yes, El Niño will occur) or 0
(no, El Niño will not occur).
In a multinomial logistic regression, multiple
outcomes can be examined. In this case, the
possible outcomes are El Niño occurrence, La
Niña occurrence, and neither.
Math!
Probability of predicted outcomes:
̂
P=
where
u
e
u
1+ e
u= β0 + β1 X 1+ β2 X 2+ ...+ βn X n
Math!
In our case:
ui= β0 + β1 precipi + β2 olr i+ β3 sst i + β4 slpi+ β5 uwnd i
We're going to start small and build up to this. We
will add on variables in order of their correlation
with SST, beginning with SST itself.
Variable
Correlation with SST
Precip
0.7537
U-Wind
0.7478
OLR
-0.7400
SLP
-0.3360
Experiment 1 – Using SST only


In this experiment I will only be using
March/June SST data as the independent
variable.
Formula for this experiment:
ui = β0 + β1 sst i
Experiment 1 – Using SST only
Event
Average
Average
Probability - March Probability - June
La Niña
0.4000
0.6331
El Niño
0.2457
0.4685
Neither
0.3711
0.4250
Experiment 2 – Using SST and
Precip


In this experiment I will only be using
March/June SST and precip data as the
independent variables.
Formula for this experiment:
ui = β0 + β1 sst i + β2 precipi
Experiment 2 – Using SST and
Precip
Event
Average
Average
Probability - March Probability - June
La Niña
0.4047
0.6753
El Niño
0.2957
0.4866
Neither
0.3896
0.4530
Slight improvements over experiment 1!
Experiment 3 – Using SST, Precip,
and U-Wind


In this experiment I will only be using
March/June SST precip, and U-wind data as the
independent variables.
Formula for this experiment:
ui = β0 + β1 sst i + β2 precipi + β3 uwnd i
Experiment 3 – Using SST, Precip
and U-Wind
Event
Average
Average
Probability - March Probability - June
La Niña
0.4738
0.6796
El Niño
0.2821
0.5067
Neither
0.4406
0.4498
Bold = lower than in previous experiment
Experiment 4 – Using SST, Precip,
U-Wind, and OLR


In this experiment I will only be using
March/June SST, precip, U-wind, and OLR data
as the independent variables.
Formula for this experiment:
ui = β0 + β1 sst i + β2 precipi + β3 uwnd i + β4 olr i
Experiment 4 – Using SST, Precip,
U-Wind, and OLR
Event
Average
Average
Probability - March Probability - June
La Niña
0.4790
0.8237
El Niño
0.3052
0.6488
Neither
0.4397
0.6446
Bold = lower than in previous experiment
Huge improvement in June predictability!
Experiment 5 – Using all five
variables


In this experiment I will only be using
March/June SST, precip, U-wind, OLR, and SLP
data as the independent variables.
Formula for this experiment:
ui= β0 + β1 sst i + β2 precipi+ β3 uwnd i+ β4 olr i+ β5 slpi
Experiment 5 – Using all five
variables
Event
Average
Average
Probability - March Probability - June
La Niña
0.4797
0.8545
El Niño
0.3094
0.7547
Neither
0.4422
0.7178
Another large improvement in June predictability
Comparison
Event –
SST Only
Average
Average
Probability Probability
- March
- June
Event – Average
Average
All five
Probability Probability
variables - March
- June
La Niña
0.4000
0.6331
La Niña
0.4797
0.8545
El Niño
0.2457
0.4685
El Niño
0.3094
0.7547
Neither
0.3711
0.4250
Neither
0.4422
0.7178
Definite improvement in predictions using both
March and June data
Experiment 6: Using the model



In experiments 1-5 we produced probabilities
using the same data that was used to create the
model.
How does the model perform on other data?
Using model created in experiment 5 (using all
variables)
Experiment 6: Results
Event – Year
1 (2007)
Average
Probability March
Average
Probability June
Event –
Year 2
(2008)
Average
Probability March
Average
Probability June
La Niña
0.1965
0.9534
La Niña
0.5081
0.9940
El Niño
0.3179
0.0465
El Niño
0.0824
0.0000
Neither
0.4855
0.0001
Neither
0.4095
0.0060
Year 1 March: Neither, Year 1 June: La Niña
Year 2 March: La Niña, Year 2 June: La Niña
Experiment 6: Results

In reality, neither of these years featured an El
Niño or a La Niña event.
Conclusions



The multinomial linear regression models
created in experiments 1-5 performed as
expected: the March data had much less
success in predicting ENSO than the June
model
In most cases, prediction accuracy increased
when including more variables
The models performed badly when applied to
2007 and 2008 data, but two years is too small
of a sample size to draw any distinct
conclusions.
Resources




Tabachnick, Barbara G, and Linda S. Fidell.
Using Multivariate Statistics. London: Harper
Collins Publishers, 1996. Print.
Seber, G A. F. Linear Regression Analysis. New
York: Wiley, 1977. Print.
Tinsley, Howard E. A, and Steven D. Brown.
Handbook of Applied Multivariate Statistics and
Mathematical Modeling. San Diego: Academic
Press, 2000. Print.
Physical Barriers to El Nino Prediction –
Columbia
Questions?