Outline - Lecture 1 - Chiang Mai University

Download Report

Transcript Outline - Lecture 1 - Chiang Mai University

EE749
Neural Networks
How to design a good performance NN?
Kasin Prakobwaitayakit
Department of Electrical Engineering
Chiangmai University
Glossary
•
Pattern: a complete set of data inputs and
outputs that provide a snapshot of the system
being modelled
–
•
•
also called examples, cases
Feature: an identifying characteristic in the
data that the model will ideally capture
Domain: a set of boundaries that define the
range of expected / observed data for a
particular model or problem
Backpropagation Modelling Heuristics
•
•
•
•
•
•
•
•
•
•
•
•
•
Selection of model inputs and outputs
Defining the model domain
Data pre-processing
Selection of training/testing cases
Data scaling
Number of hidden layers
Number of neurons
Activation function selection
Initial weight values
Learning rate and momentum
Presentation of patterns
Stopping criteria for training
Improving model performance
Selection of Model Inputs and Outputs
•
Start with a set of inputs that are KNOWN to
affect the process
–
•
•
•
then add other inputs that are suspected of
having a relationship in the process one at a time
Eliminate input variables that are redundant
(high covariance)
Eliminate data patterns that do not contribute
to training (no new information)
Identify / eliminate data patterns with the
same inputs that have different outputs
Defining the Model Domain
•
ANNs should be confined to a limited
domain
–
•
develop separate models for contradictory areas
of the domain
Build models that predict a single output
–
link multiple models together, if required
Data Pre-processing
•
•
•
Any time the dynamic range of an input is
over a few orders of magnitude, a
logarithmic transformation should be applied
Transformations may also be useful in
“reconditioning” input data when an ANN
has trouble converging
Non-numerical inputs need to be codified
Selection of Training/Testing Cases
•
•
Training cases should be representative of
the problem domain
For good generalization capacity, the
training set must be complete
–
•
important variable must be measured
Data set is randomly divided into training
and testing sets
–
in a 70:30 ratio, for example
•
Rules of thumb
–
–
The number of training patterns should be at
least 5 times the number of nodes in the network
The number of training cases should be roughly
= the number of weights*the inverse of the
accuracy parameter (e, e=0.9 means 90%
accuracy of prediction is required)
Data Scaling
•
Output variables should be scaled in the 0.1
to 0.9 range
–
avoids operating in the saturation range of the
sigmoid function
Number of Hidden Layers
•
•
Increasing the number of hidden layers
increases both the time and number of
patterns (examples) required for training
For most problems, one hidden layer will
suffice
–
•
problems can always be solved with two
Multiple slabs (Ward Nets) supposedly
increase processing power
–
each slab (group of neurons) acts as a detector
for one or more input features
Number of Neurons
•
•
•
One neuron in the input layer for each input
One neuron in the output layer for each
output
Proper number of hidden neurons is often
determined experimentally
–
–
•
too few = poor ability to capture features
too many = poor ability to generalize (ANN
simply memorizes training data)
Various rules of thumb have been reported
–
–
0.75*N
2*N +1
N = number of inputs
•
•
•
In general, more weights introduce more
local minima to the error surface
Flat regions in the error surface can mislead
gradient-based error minimization methods
(backpropagation)
Start with a small network and then add
connections as needed
–
•
avoids convergence problems as the networks
get too large
The optimum ratio of hidden neurons in the
first to second hidden layer is 3:1
Activation Function Selection
•
•
Sigmoidal (logistic) activation functions are
the most widely used
Thresholding functions are only useful for
categorical outputs
Initial Weight Values
•
Weights need to be randomized initially
–
•
if all weights are set to the same number, the
GDR would never be able to leave the starting
point
The backpropagation algorithm may also
have difficulty if the connection weights are
prematurely saturated (>0.9)
Learning Rate and Momentum
•
Learning rate () affects the speed of
convergence.
–
–
•
If it is large (>0.5), the weights will be changed
more drastically, but this may cause the
optimum to be overshot
If it is small (<0.2) the weights will be changed
in smaller increments, thus causing the system to
converge more slowly with little oscillation
The best training occurs when the learning
rate is as large as possible without leading to
oscillation
•
•
The learning rate can be increased as
learning progresses, or a momentum term
added to improve network learning speed
The momentum factor () has the ability to
dampen out oscillations caused by increasing
the learning rate
–
a momentum of 0.9 allows higher learning rates
Presenting Patterns to the ANN
•
•
Present patterns in a random fashion
If input patterns can be easily classified, do
not train the ANN on all patterns in a class
in succession
–
•
the ANN will forget information as it moves
from class to class
Shaping can be used to improve network
training
–
involves starting off on a very small cohesive
data set and then adding more patterns that have
greater deviations from variable means as
training progresses
Stopping Criteria for Training
•
Training should be stopped when one of the
following conditions is met
–
–
–
the testing error is sufficiently small
the testing error begins to increase
a set number of iterations has passed
Improving Model Performance
•
Re-initialize network weights to a new set of
random values
–
•
•
•
•
•
re-train the model
Adjust learning rate and momentum
Modify stopping criteria
Prune network weights
Use genetic algorithms to adjust network
topology
Add noise to training cases to decrease the
chance of memorization
ANN Modelling Approach
ASSESS NEEDS AND
SUITABIL ITY
SELECT IN
OUT
COL LECT AND
ANALYZE DATA
SELE
ORGAN
PAT T
S
ARCHIT
CHARAC
APPLY MODEL
DEVEL OPMENT
Needs and Suitability Assessment
• What are the modelling needs ?
• Is the ANN technique suitable to meet these
needs ?
• Can the following be met ?
–
–
–
–
data requirements
software requirements
hardware requirements
personnel requirements
Data Collection and Analysis
• Successful models require careful attention
to data details
• Recall : relevant historical data is a key
requirement
• For data collection, investigate:
– data availability
• parameters, time-frame, frequency, format
– QA/QC protocols
• data reliability
– process changes
• Data requirements and guidelines:
– data for each of the parameters must be
available
– at least one full cycle of data must be available
– appropriate QA/QC protocols must be in place
– data collected prior to major process changes
should generally not be used
• Data analysis involves:
– data characterization
– a complete statistical analysis
• Data characterization for each parameter
– qualitative assessment of hourly, daily, seasonal
trends (graphical examination of data)
– time-series analyses may be warranted
• Statistical analysis for each parameter
– measures of central tendency
• mean, median, mode
– measures of variability
• standard deviation, variance
– percentile analyses
– identification of outliers, erroneous entries,
non-entries
Application of a Model-Building
Protocol
• There is no accepted “best method” of
developing ANN models
• An infinite number of distinct architectures
are possible
• A protocol is to reduce the number of
architectures that are evaluated
• A sample five-step protocol:
ASSESS NEEDS AND
SUITABIL ITY
SELECT INPUTS AND
OUT PUT S
COL LECT AND
ANALYZE DATA
SELECT AND
ORGANIZ E DATA
PAT TERNS
SET
ARCHIT ECT URE
CHARACTERIST ICS
APPLY MODEL
DEVEL OPMENT
PROTOCOL
• Selection of model inputs and outputs
Why it’s important:
– ANN models are based on process inputs and
outputs
How it’s done:
– first, select the model output
• best models only have one output parameter
– next, select model inputs from available input
parameters
– selection is based on data availability, literature,
expert knowledge
• Selection and organization of data patterns
Why it’s important:
– ANN models are only as good as the data used
– separate independent data sets are required to
test and validate the model
How it’s done:
– examine each data pattern for erroneous entries,
outliers, blank entries
– delete questionable data patterns
– sort and divide data into training, testing, and
production sets
– perform a statistical analysis on each of the
three data sets
• Determination of architecture characteristics
Why it’s important:
– each modeling scenario will have an optimal
architecture
How it’s done:
– initially, hold many factors at software defaults
or at pre-determined values
• use modelling heuristics from literature along with
expert knowledge
– determine the number of hidden layer neurons
• compare results for different runs
• Evaluation of model stability
Why it’s important:
– ensure that model results are independent of the
method of data sorting
How it’s done:
– build new training, testing, and production sets
from the original database
– re-train models on the “new” data sets
– compare results with initial runs
• Model fine tuning
Why it’s important:
– some models require minor improvements in
order to meet process operating criteria
How it’s done:
– modeling parameters previously held constant
can be varied to improve model results
– the model fine-tuning methodology is typically
researcher-specific
Evaluating Model Performance
• In many situations, more than one good
model can be developed
• The best model is the one that both
– meets the modelling needs initially identified
– offers the smallest prediction errors
• Therefore need to be able to evaluate model
performance
• Prediction errors can be assessed
– graphically
• visual representation of missed predictions
6
5
4
3
2
1
Pattern #
Actual
Network
181
171
161
151
141
131
121
111
101
91
81
71
61
51
41
31
21
11
0
1
Clarifier Effluent Colour (TCU)
7
– using statistics
• absolute measures of error
– mean absolute error (MAE)
– maximum absolute error
• relative measures of error
– mean absolute percent error (MAPE)
• coefficients of correlation
– coefficient of correlation (r)
– coefficient of multiple correlation (R)
• coefficients of determination
– coefficient of determination (r2)
– coefficient of multiple determination (R2)
• Model residuals should also be studied
– residual = prediction error
– residuals should be
• Normally distributed
– plot a histogram of the residuals
• have a mean = 0
• independent
– plot the residuals (in time order if applicable)
– plot should be free of obvious trends
• have constant variance
– plot the residuals against the predicted values
– plot should not show spreading, converging, or
other trends
Model Evaluation Using Real-time
Data
• Need to consider:
– changes in the frequency of data collection
– changes in the methodology of data collection
– existence of QA/QC protocols to detect
erroneous data
• The evaluation can take the form of
– simulated real-time testing on a stand-alone PC
– online testing in real-time
• Methodology
–
–
–
–
select the time frame of the test
the data is ported to the developed models
process each data pattern and record the results
the model predicted values are compared to the
actual values
– prediction errors are determined