ppt - CSE, IIT Bombay

Download Report

Transcript ppt - CSE, IIT Bombay

CS 621 Artificial Intelligence
Lecture 21 - 04/10/05
Prof. Pushpak Bhattacharyya
Artificial Neural Networks: Models,
Algorithms, Applications
04-10-05
Prof. Pushpak Bhattacharyya, IIT
Bombay
1
Softcomputing
• Neural Networks is a computing device composed of
very simple processing elements called neurons
which are interconnected.
• The processing power in such devices comes from
many connections.
• There are different models of neural network like
Perceptron, Back Propogation, Hopfield, Adaptive
Resonance Theory, Kohonen Net.
• The first three models are supervised learning
algorithms and the rest are unsupervised.
04-10-05
Prof. Pushpak Bhattacharyya, IIT
Bombay
2
Fundamental Unit: Perceptron
• These are single neurons:
motivated by Brain Cells.
• ∑n i=1 wixi is compared with θ.
• If the net input is more than the
threshold the neuron fires. i.e.,
assumes a state 1, else the state
is 0.
04-10-05
Prof. Pushpak Bhattacharyya, IIT
Bombay
3
Computing with Perceptron
X1
-0
0
1
1
0 . w 1 + 0 . w2
0 . w 1 + 1 . w2
1 . w1 + 0 . w2
04-10-05
1 . w 1 + 1 . w2
<θ
<θ
<θ
>θ
X2
-0
1
0
1
Y
0
0
0
1
From this, one possible
solution is
θ= 1.5 and
Prof. Pushpak
Bhattacharyya,
w1Bombay
=1.0
and wIIT2=1.0
4
Geometrical View
04-10-05
Prof. Pushpak Bhattacharyya, IIT
Bombay
5
Perceptron Training Algorithm
(PTA): Preprocess
04-10-05
Prof. Pushpak Bhattacharyya, IIT
Bombay
6
Pre-processing for PTA
• The threshold is absorbed as a weight and a new input
line is introduced which is applied a constant input of
-1.
• Also we negate the components of the vector in the 0class so that only the test
∑ni=1wixi > θ
is performed
• In PTA we need to find the values
<w0, w1, ...,wn>
04-10-05
Prof. Pushpak Bhattacharyya, IIT
Bombay
7
Perceptron Training Algorithm
Input : Preprocessed vectors Xi, i = 1 to m
1. Choose random values for wi, i = 0 to n
2. i:= 1
3. IF W.X > 0 goto 6
4. W := W + Xi
5. goto 2
6. i := i+1
7. IF (i > m) goto 9
8. goto 3
9. End
• Thus whenever an example is misclassified, we add it to the
weight vector and this goes on until the test W.Xi > 0 is
satisfied for all input
vectors.
04-10-05
Prof. Pushpak Bhattacharyya, IIT
Bombay
8
Example of PTA
AND function
1-class:
0-class:
{< 1, 1>}
{ <0,0>, <0,1>, <1,0> }
Augmented classes:
1-class: {< -1, 1, 1>}
0-class: {<-1, 0, 0>, <-1, 0, 1>, <-1, 1, 0>}
Negating the 0-class vectors, the total set of vectors is:
X1 = < -1, 1, 1>
X2 = < 1, 0, 0>
X3 = < 1, 0, -1>
X4 = < 1, -1, 0>
04-10-05
Prof. Pushpak Bhattacharyya, IIT
Bombay
9
Example of PTA (contd)
Start with a random value for the weight vector:
W = < 0, 0, 0>
W.X1 = 0 and hence fails
Wnew = Wold + Xi = < -1, 1, 1>
fails for X2
Keep on adding the failed vectors. Finally the result
must come.
WHY ?
04-10-05
Prof. Pushpak Bhattacharyya, IIT
Bombay
10
Perceptron Training Convergence
Theorem
Whatever be the initial choice of the weights, the
Perceptron Training Algorithm will eventually converge
by finding the correct weight values, provided the
function being trained is linearly separable.
04-10-05
Prof. Pushpak Bhattacharyya, IIT
Bombay
11
Non-Linearity
How to deal with Nonlinearity:
• 1. Tolerate some error
e.g. in Case of XOR
Classify the X points
but include one O point
also
• 2. Separate by higher
order surface but in that
case the neuron does not
remain a linear
threshold element.
04-10-05
Prof. Pushpak Bhattacharyya, IIT
Bombay
12
Non-Linearity (contd)
• Introduce more
hyperplanes so that one
segment encloses points
of only one kind and no
point of the other kind.
04-10-05
Prof. Pushpak Bhattacharyya, IIT
Bombay
13
Multilayer Perceptrons
• Multilayer Perceptrons are
more powerful than single
layer perceptron.
• The neurons have Sigmoid
function as input output
relationship.
• Merits of Sigmoid:
1. Biological Plausibility:
mimics the behavior of
actual brain cells.
2. Easy to compute
derivative: dy/dx=y(1-y).
04-10-05
Prof. Pushpak Bhattacharyya, IIT
Bombay
14
Backpropagation Algorithm
04-10-05
Prof. Pushpak Bhattacharyya, IIT
Bombay
15
Backpropagation Algorithm (contd)
• Each neuron in one layer is connected to all neurons
in the next layer if any.
Let I = <i1, i2, …, im> be the input vector,
O = <o1, o2, …, on> be the output vector,
T = <t1, t2, …, tn> be the target vector
Then error E = ½∑ni=1(ti-oi)2
04-10-05
Prof. Pushpak Bhattacharyya, IIT
Bombay
16
BP: Gradient Descent
Δwji α –δE/δwji
where, Δwji = change in weight between ith (feeding)
and jth (fed) neuron.
With learning constant η as constant of proportionality,
Δwji = –ηδE/δwji
From this the weight change values for the whole
network can be easily found.
The weight change always reduces the error. Hence BP
is by nature a greedy strategy.
04-10-05
Prof. Pushpak Bhattacharyya, IIT
Bombay
17
Analysis of BP
Influence of Learning Rate:
Large value gives fast progress, but oscillation about minima.
Too small value makes the progress of the algorithm very
slow.
Symmetry Breaking:
If mapping demands different weights, but we start with same
weights
everywhere, then BP will never converge.
Momentum Factor:
Momentum factor hastens the operating point towards minima
and once near the minima it dampens the oscillations.
04-10-05
Prof. Pushpak Bhattacharyya, IIT
Bombay
18
Local Minima
Due to the Greedy nature
of BP, it can get stuck in
local minimum m and
will never be able to
reach the global
minimum g as the error
can only decrease by
weight change.
04-10-05
Prof. Pushpak Bhattacharyya, IIT
Bombay
19
Some useful tips for BP
• If the network weights do not change, one of the
following might have happened:
a) Learning Rate too small.
b) Network paralysis due to neurons operating
near saturation region (1 or 0) as the inputs are high
positive or negative. In this case scale the inputs.
c) Network stuck in local minimum.
In this case start with a fresh random set of
weights.
04-10-05
Prof. Pushpak Bhattacharyya, IIT
Bombay
20
Tips (contd)
• If there is an equal
distribution of positive
and negative values in
the input then use the
tanh(x) curve.
04-10-05
Prof. Pushpak Bhattacharyya, IIT
Bombay
21
Application
• Loan Defaulter recognition: classification task.
04-10-05
Name
Age
Sex
Addr
Income
(rs/yr)
Y/N
A.K.
Das
43
M
5th
Street,
Juhu
Reclama
tion
5 lakhs
N
S.
Singh
52
M
A.S.
Marg,
Powai
0.7
Y
Prof. Pushpak Bhattacharyya, IIT
Bombay
22
Network Design
•
•
•
•
Input layer with 4 neurons (in actuality hundreds of attributes).
Hidden layer with 2 neurons.
Output layer with a single neuron.
Learning rate of about 0.6 and momentum factor of
about 0.3.
• Training can be done with very efficient
packages/hardware
http://www-ra.informatik.uni-tuebingen.de/SNNS/
04-10-05
Prof. Pushpak Bhattacharyya, IIT
Bombay
23
Hopfield net
• Inspired by associative memory which means
memory retrieval is not by address, but by part of the
data.
• Consists of
N neurons fully connected with
symmetric weight strength wij = wji
• No self connection. So the weight matrix is 0diagonal and symmetric.
• Each computing element or neuron is a linear
threshold element with threshold = 0.
04-10-05
Prof. Pushpak Bhattacharyya, IIT
Bombay
24
Computation
04-10-05
Prof. Pushpak Bhattacharyya, IIT
Bombay
25
Stability
• Asynchronous mode of operation: at any instant a
randomly selected neuron compares the net input with
the threshold.
• In the synchronous mode of operation all neurons
update themselves simultaneously at any instant of
time.
• Since there are feedback connections in the Hopfield
Net, the question of stability arises. At every time
instant the network evolves and finally settles into a
stable state.
• How does the Hopfield Net function as associative
memory ?
• One needs to store or stabilize a vector which is
the memory element.
04-10-05
Prof. Pushpak Bhattacharyya, IIT
Bombay
26
Example
w12 = w21 = 5
w13 = w31 = 3
w23 = w32 = 2
At time t=0
s1(t) = 1
s2(t) = -1
s3(t) = 1
Unstable state. Neuron 1
will flip.
A stable pattern is called
an attractor for the net.
04-10-05
Prof. Pushpak Bhattacharyya, IIT
Bombay
27
Energy Consideration
• Stable patterns correspond to minimum energy states.
• Energy at state <x1, x2, x3, …, xn>
•
E = -1/2∑j∑j<>iwjixixj
• Change in energy always comes out to be negative in
the asynchronous mode of operation. Energy always
decreases.
• Stability ensured.
04-10-05
Prof. Pushpak Bhattacharyya, IIT
Bombay
28
Association Finding
• Association detection a very important problem in
data mining.
• A Hopfield net can be trained to store associations.
• Classic example
People who buy bread also buy butter
• Stores keep bread and butter together.
• Sweet shops serving Bengali sweets also keep
popular Bengali magazines.
04-10-05
Prof. Pushpak Bhattacharyya, IIT
Bombay
29
Hopfield Net as a Storehouse of
Associations
• Get the features of situations in terms of
<attribute, value> pairs
• Train the Hopfield net with an appropriate learning
algorithm (Hopfield rule).
• Use this to detect associations in future examples.
04-10-05
Prof. Pushpak Bhattacharyya, IIT
Bombay
30
Conclusion
• Neural net and fuzzy logic are alternate ways of
computation that provide ways of discovering
knowledge from high volume of data.
• First crucial step is feature selection.
• The features can be fed into the neural net.
• The features can be described qualitatively supported
with profile for data mining tasks.
04-10-05
Prof. Pushpak Bhattacharyya, IIT
Bombay
31
References
Books:
R.J.Schalkoff, Artificial Neural Networks, McGraw Hill
Publishers, 1997.
G.J. Klir and T.A. Folger, Fuzzy Sets, Uncertainty and
Information, Prentice Hall India, 1995.
E. Cox, Fuzzy Logic, Charles River Media Inc. Publication,
1995.
J. Han and M. Kamber, Data Mining: Concepts and Techniques,
Morgan Kaufmann, 2000.
Journals
Neural Computation, IEEE Transactions on Neural Nets, Data
and Knowledge Engineering Journal, SIGKDD, IEEE Expert
etc.
04-10-05
Prof. Pushpak Bhattacharyya, IIT
Bombay
32