Practical session on neural network modelling

Download Report

Transcript Practical session on neural network modelling

Practical session on
neural network modelling
Antonello Pasini, Rocco Langone
CNR - Institute of Atmospheric Pollution
Rome, Italy
Attribution studies
The attribution investigations aim at identifying the influence of some forcings
on a particular meteo-climatic variable, explaining the different weights that
several causes have on a single effect. In this filed, neural networks have
already been used successfully in global studies, as we have shown in the
morning lesson.
Today we’ll deal with data used in a master thesis, in which our target was
twofold: i) to understand which global atmospheric circulation patterns are
most influencing the temperatures observed in the SW Alpine region in the
last 50 years; ii) to reconstruct the observed temperatures from these patterns
The neural model
The model generally used in “attribution” researches are layered neural
networks, feed-forward, trained by the “error back propagation” method, using
an optimization algorithm to minimize an Error Function:
g(a) = [1 + exp(-2ka])-1
“logsig” transfer
function
Error Function
Be F a function to
minimize:
Gradient descent…
learning rate…
Gradient descent with momentum…
momentum
For example, in the case of our neural network, the equations of weights
updating become (excluding the momentum term and the thresholds of the
neurons):
The All Frame procedure
Our model is endowed with a particular kind of training procedure called “allframe” or “leave-one-out” cross-validation. It’s a cycle in which at every step
all the data but one (the “hole”) constitute the training set and the remaining
pattern is the validation/test set:
Training stops in two ways:
• by fixing the number of epochs (i.e. the output cycles)
• by an early-stopping (if the error function on the training set is less than a
threshold).
Ensemble runs
The backpropagation learning method is local and the performance depends in part
on the values of the initial random weights because the convergence to the absolute
minimum of the error function is not assured.
In this situation, we have to assess the variability of performance by choosing
several random values for the weights. This is done automatically by our model at
every run. This allows us to perform several runs (ensemble runs) in order to widely
explore the landscape of the error function. This is particularly useful when the error
function presents several local minima… in which we can be “trapped”.
Some words about NEWNET
In the file “NEWNET1.exe” you can create and train a neural network in two
ways:
1) All the parameters are
set in a file (of the kind
“.par”)
2)
step by step
scrolling a
menu…
We will use, for convenience, the first way…
The parameters file and the data file must have the same filename (with different
extension, respectively “.par” and “.txt”) and they have to be in the same folder in
which NEWNET is.
Once you have all the files in your folder, you’re ready to launch the executable file:
1) set “l” (load files)
write the filename without extension
2) set “a” to enter the training menu
3) once you are in the training menu, set “f” (all-frame method)
set the number
of the patterns of the whole dataset (in our case it’s 49)
push a key to
continue
4) the file with the results will appear in your folder (with the same name of the
other files and extension “.taf”). If the file already exist, you can overwrite the
new file choosing “s” (yes) or not (setting “n”). In the last case you have to
write the new filename without extension.
The parameter file *.par
NL : number of layers
LP : total number of data-patterns
TP : number of test patterns, a subset of LP (when you’re not training with the “all-frame”)
RR : variability range of the starting random weights
T: initial threshold of all neurons
ETA: learning rate
MOM: momentum
MSS: threshold for the early stopping (training stop when MSS of the training set become
smaller than this threshold )
CY: maximum number of epochs (output cycles) to do
UPL: number of units for layer
AF: kind of activation (transfer) function (1 = linear, 2 = tansig, 3 = logsig)
ST: stepness
DMS: minimum variation of MSS between two consecutive epochs, below which the training
stops, because of a probable “flat plateau”.
Task 1
We are given to you 3 data sets, built on 3 different mixing of 4 circulation patterns
(indices). We are setting all the parameters (in the .par file) in a way allowing us a
optimized reconstruction of temperatures (no overfitting, no too early stopping,
good number of hidden neurons, etc.).
Question: which choice of inputs determines the best results in terms of the Pearson
coefficient R between observed and reconstructed temperatures?
Please, for each data set consider 10 ensemble runs with random initial weights, just
to explore the variability of the results themselves. Build an error bar for each file
and display the results in a graph.
Example
Input = 4 atmospheric teleconnection pattern indices (AO, EA, EBI, SCAN)
Target = mean temperature of extended winter
In this case you can get an interesting result by creating a 4-4-1 net trained with
these parameters:
EXTENDED WINTER
AO, EAWR, ABI, ENSO
AO, EA, EBI, SCAN
NAO, EA, AO, ABI
R
This is the best result we obtained. Have you done something better?
The correlation coefficient between output and target,
in the best single run, is about 0.75…
Task 2
Once determined the best combination of inputs for correctly reconstructing
temperatures of extended winter, try to evaluate convergence or overfitting
problems using this combination, by changing the number of epochs in the .par file.
You can evaluate which choice is the optimal one and clearly see the nonconvergence and overfitting problems.
Please, make a graph of performance (in terms of R) for several values of epochs.
Then, you can change the number of hidden neurons and see an overfitting example
due to too many neurons.
Result 1
Result 2