cs623-lect15-19oct06 - Department of Computer Science and

Download Report

Transcript cs623-lect15-19oct06 - Department of Computer Science and

CS623: Introduction to
Computing with Neural Nets
(lecture-15)
Pushpak Bhattacharyya
Computer Science and Engineering
Department
IIT Bombay
Finding weights for Hopfield Net
applied to TSP
• Alternate and more convenient Eproblem
• EP = E1 + E2
where
E1 is the equation for n cities, each city in
one position and each position with one
city.
E2 is the equation for distance
Expressions for E1 and E2
n
n
A n n
E1  [ ( xi  1) 2   ( xi  1) 2 ]
2  1 i 1
i 1  1
1 n n n
E2   dij  xi  ( x j , 1  x j , 1 )
2 i 1 j 1  1
Explanatory example
3
1
Fig. 1 shows two possible directions in
which tour can take place
2
Fig. 1
pos
city
1
2
x11 x12
3
X13
2 X21 x22
x23
3
x33
1
x31 x32
For the matrix alongside, xiα = 1, if and
only if, ith city is in position α
Expressions of Energy
E problem  E1  E2
A
E1 
[( x11  x12  x13  1) 2
2
 ( x21  x22  x23  1) 2
 ( x31  x32  x33  1) 2
 ( x11  x21  x31  1) 2
 ( x12  x22  x32  1) 2
 ( x13  x23  x33  1) 2 ]
Expressions (contd.)
E2
1

[ d12 x11 ( x22  x23 ) 
2
d12 x12 ( x23  x21 ) 
d12 x13 ( x21  x22 ) 
d13 x11 ( x32  x33 ) 
d13 x12 ( x33  x31 ) 
d13 x13 ( x31  x32 )...]
Enetwork
Enetwork  [ w11,12 x11x12  w11,13 x11x13  w12,13 x12 x13
 w11, 21x11x21  w11, 22 x11x22  w11, 23 x11x23
 w11,31x11x31  w11,32 x11x32  w11,33 x11x33...]
Find row weight
• To find, w11,12
= -(co-efficient of x11x12) in Enetwork
• Search a11a12 in Eproblem
w11,12 = -A
...from E1. E2 cannot contribute
Find column weight
• To find, w11,21
= -(co-efficient of x11x21) in Enetwork
• Search co-efficient of x11x21 in Eproblem
w11,21 = -A
...from E1. E2 cannot contribute
Find Cross weights
• To find, w11,22
= -(co-efficient of x11x22)
• Search x11x22 from Eproblem. E1 cannot
contribute
• Co-eff. of x11x22 in E2
(d12 + d21) / 2
Therefore, w11,22 = -( (d12 + d21) / 2)
Find Cross weights
• To find, w11,33
= -(co-efficient of x11x33)
• Search for x11x33 in Eproblem
w11,33 = -( (d13 + d31) / 2)
Summary
• Row weights = -A
• Column weights = -A
• Cross weights =
-[(dij + dji)/2], j = i ± 1
0, j>i+1 or j<(i-1)s
• Threshold = -2A
Interpretation of wts and thresholds
• Row wt. being negative causes the winner neuron to
suppress others: one 1 per row.
• Column wt. being negative causes the winner neuron to
suppress others: one 1 per column.
• Threshold being -2A makes it possible to for activations
to be positive sometimes.
• For non-neighbour row and column (j>i+1 or j<i-1)
neurons, the wt is 0; this is because non-neighbour cities
should not influence the activations of corresponding
neurons.
• Cross wts when non-zero are proportional to negative of
the distance; this ensures discouraging cities with large
distances between them to be neighbours.
Can we compare Eproblem and
Enetwork?
n
n
A n n
E1  [ ( xi  1) 2   ( xi  1) 2 ]
2  1 i 1
i 1  1
E1 has square terms (xiα)2 which evaluate to 1/0. It also has
constants again evaluating to 1.
Sum of square terms and constants
<=n X (1+1+…n times +1) + n X (1+1+…n times +1)
=2n(n+1)
Additionally, there are linear terms of the form const* xiα
which will produce the thresholds of neurons by equating
with the linear terms in Enetwork.
Can we compare Eproblem and
Enetwork? (contd.)
1 n n n
E2   dij  xi  ( x j , 1  x j , 1 )
2 i 1 j 1  1
This expressions can contribute only product terms which are
equated with the product terms in Enetwork
Can we compare Eproblem and
Enetwork (contd)
• So, yes, we CAN compare Eproblem and
Enetwork.
• Eproblem <= Enetwork + 2n(n+1)
• When the weight and threshold values are
chosen by the described procedure,
minimizing Enetwork implies minimizing
Eproblem
Principal Component Analysis
Purpose and methodology
• Detect correlations in multivariate data
• Given P variables in the multivariate data,
introduce P principal components Z1, Z2,
Z3, …ZP
• Find those components which are
responsible for the biggest variation
• Retain them only and thereby reduce the
dimensionality of the problem
Example: IRIS Data (only 3 values
out of 150)
ID
Petal
Length
(a1)
Petal
Width
(a2)
Sepal
Length
(a3)
Sepal
Width
(a4)
Classific
ation
001
5.1
3.5
1.4
0.2
Irissetosa
051
7.0
3.2
4.7
1.4,
101
6.3
3.3
6.0
2.5
Irisversicol
or
Irisvirginica
Training and Testing Data
• Training: 80% of the data; 40 from each class: total 120
• Testing: Remaining 30
• Do we have to consider all the 4 attributes for
classification?
• Do we have to have 4 neurons in the input layer?
• Less neurons in the input layer may reduce the overall
size of the n/w and thereby reduce training time
• It will also likely increase the generalization performance
(Occam Razor Hypothesis: A simpler hypothesis (i.e.,
the neural net) generalizes better
The multivariate data
X1
X2 X3 X4 X5… Xp
x11
x21
x31
x41
x12
x22
x32
x42
x13
x23
x33
x43
xn1
xn2
xn3
x14
x24
x34
x44
…
…
xn4
x15 … x1p
x25 … x2p
x35 … x3p
x45 … x4p
xn5 … xnp
Some preliminaries
• Sample mean vector: <µ1, µ2, µ3,…, µp>
For the ith variable: µi= (Σnj=1xij)/n
• Variance for the ith variable:
σi 2= [Σnj=1 (xij - µi)2]/ [n-1]
• Sample covariance:
cab= [Σnj=1 ((xaj - µa)(xbj - µb))]/ [n-1]
This measures the correlation in the data
In fact, the correlation coefficient
rab= cab/ σa σb
Standardize the variables
• For each variable xij
Replace the values by
yij = (xij - µi)/σi 2
Correlation Matrix
1 r12 r13  r1 p 


r21 1 r23  r2 p 

R




rp1 rp 2 rp 3 1 