Neural Networks - University of Southern Mississippi

Download Report

Transcript Neural Networks - University of Southern Mississippi

Data Mining:
Concepts and Techniques
— Chapter 6 —
July 17, 2015
Data Mining: Concepts and
Techniques
1
Chapter 6. Classification and
Prediction
•
•
What is classification? What is
•
Support Vector Machines (SVM)
prediction?
•
Lazy learners (or learning from
and prediction
•
your neighbors)
Issues regarding classification
•
Frequent-pattern-based
classification
Classification by decision tree
induction
•
Other classification methods
•
Bayesian classification
•
Prediction
•
Rule-based classification
•
Accuracy and error measures
•
Classification by back
•
Ensemble methods
propagation
•
Model selection
July 17, 2015
Data Mining: Concepts
and
• Summary
Techniques
2
What are Neural Networks?
• Models of the brain and nervous system
• Highly parallel
– Process information much more like the brain than a serial
computer
• Learning
• Very simple principles
• Very complex behaviours
• Applications
– As powerful problem solvers
– As biological models
Biological Neural Nets
• Pigeons as art experts (Watanabe et al. 1995)
– Experiment:
• Pigeon in Skinner box
• Present paintings of two different artists (e.g. Chagall / Van
Gogh)
• Reward for pecking when presented a particular artist (e.g. Van
Gogh)
• Pigeons were able to discriminate between Van
Gogh and Chagall with 95% accuracy (when
presented with pictures they had been trained on)
• Discrimination still 85% successful for previously
unseen paintings of the artists
• Pigeons do not simply memorise the pictures
• They can extract and recognise patterns (the ‘style’)
• They generalise from the already seen to make
predictions
• This is what neural networks (biological and artificial)
are good at (unlike conventional computer)
ANNs – The basics
• ANNs incorporate the two fundamental components
of biological neural nets:
1. Neurones (nodes)
2. Synapses (weights)
• Neurone vs. Node
• Structure of a node:
•
Squashing function limits node output:
• Synapse vs. weight
Feed-forward nets
• Information flow is unidirectional
• Data is presented to Input layer
• Passed on to Hidden Layer
• Passed on to Output layer
• Information is distributed
• Information processing is parallel
Internal representation (interpretation) of data
• Feeding data through the net:
n
net   wi xi
i 0
 1 
o   (net)  
net 
1 e 
(1  0.25) + (0.5  (-1.5)) = 0.25 + (-0.75) = - 0.5
Squashing:
1
 0.3775
0.5
1 e
• Data is presented to the network in the form of
activations in the input layer
• Data usually requires preprocessing
– Analogous to senses in biology
Defining a Network Topology
• First decide the network topology:
– # of units in the input layer,
– # of hidden layers (if > 1),
– # of units in each hidden layer,
– and # of units in the output layer
• Normalizing the input values for each attribute measured in
the training tuples to [0.0—1.0]
• One input unit per domain value
• Output, if for classification and more than two classes, one
output unit per class is used
July 17, 2015
Data Mining: Concepts and
Techniques
16
age
<=30
<=30
31…40
>40
>40
>40
31…40
<=30
<=30
>40
<=30
31…40
31…40
>40
buys_computer
income studentcredit_rating
no
no fair
high
no
no excellent
high
yes
no fair
high
yes
medium no fair
yes
yes fair
low
no
yes excellent
low
yes excellent yes
low
no
medium no fair
yes
yes fair
low
yes
medium yes fair
medium yes excellent yes
medium no excellent yes
yes
yes fair
high
no
medium no excellent
•# of units in the input layer,
•# of hidden layers (if > 1),
•# of units in each hidden layer,
•and # of units in the output
layer
n
net   wi xi
i 0
 1 
o   (net)  
net 
1 e 
• Weight settings determine the behaviour of a network
 How can we find the right weights?
Training the Network - Learning
• Backpropagation
– Requires training set (input / output pairs)
– Starts with small random weights
– Error is used to adjust weights (supervised learning)
 Gradient descent on error landscape
Backpropagation
• Iteratively process a set of training tuples & compare the network's
prediction with the actual known target value
• For each training tuple, the weights are modified to minimize the
mean squared error between the network's prediction and the actual
target value
• Modifications are made in the “backwards” direction: from the output
layer, through each hidden layer down to the first hidden layer, hence
“backpropagation”
• Steps
– Initialize weights (to small random #s) and biases in the network
– Propagate the inputs forward (by applying activation function)
– Backpropagate the error (by updating weights and biases)
– Terminating condition (when error is very small, etc.)
July 17, 2015
Data Mining: Concepts and
Techniques
20
n
net   wi xi
i 0
 1 
o   (net)  
net 
1

e


 k  ok (1  ok )(tk  ok )
 h  oh (1  oh )
wji  wji  wji
w
koutputs

kh k
where wji   j x ji
Termination Conditions
• Fixed number of iterations
• Error on training examples falls below threshold
• Error on validation set meets some criteria
Example
1
4
2
6
5
3
•Learning rate l=0.9 and class label=1
Avoiding overfitting
• Weight decay
– Decrease weights by small factor during each
iteration
– Stay away from complex surfaces
• Validation Data
– Train with training set
– Get error with validation set
– Keep best weights so far on validation data
• Cross-validation to determine best number of
iterations