Multi-Valued Neurons and Multilayer Neural Network based on Multi

Download Report

Transcript Multi-Valued Neurons and Multilayer Neural Network based on Multi

Multi-Valued Neurons
and Multilayer Neural Network
based on Multi-Valued Neurons
MVN and MLMVN
1
A threshold function is a linearly
separable function
f(-1,1)= -1
f(1,1)= 1
1
Linear separability
means that it is
possible to separate
“1”s
and “-1”s
by a hyperplane
-1
1
-1
f(1,-1)= -1
f(-1,-1)= -1
f (x1, x2) is the OR function
2
Threshold Boolean Functions
• The threshold (linearly separable) function can be
learned by a single neuron
• The number of threshold functions is very small in
comparison to the number of all functions (104 of
256 for n=3, about 2000 of 65536 for n=4, etc.)
• Non-threshold (nonlinearly separable) functions can
not be learned by a single neuron (Minsky-Papert,
1969), they can be learned only by a neural network
3
XOR – a classical non-threshold
(non-linearly separable) function
f(-1,1)= -1
f(1,1)=1
1
Non-linear
separability means
that it is impossible
to separate “1”s
and “-1”s
by a hyperplane
-1
1
-1
f(-1,-1)=1
f(1,-1)= -1
4
Multi-valued mappings
• The first artificial neurons could learn only Boolean
functions.
• However, the Boolean functions can describe only very
limited class of problems.
• Thus, the ability to learn and implement not only Boolean,
but also multiple-valued and continuous functions is very
important for solving pattern recognition, classification
and approximation problems.
• This determines the importance of those neurons that can
learn and implement multiple-valued and continuous
mappings
5
Traditional approach to learn the
multiple-valued mappings by a neuron:
•Sigmoid activation function (the most popular):
1
F ( z) 
1  e z
0.5
-1
1
6
Sigmoidal neurons: limitations
• Sigmoid activation function has a limited
plasticity and a limited flexibility.
• Thus, to learn those functions whose behavior
is quite different in comparison with the one
of the sigmoid function, it is necessary to
create a network, because a single sigmoidal
neuron is not able to learn such functions.
7
Is it possible to overcome
the Minsky’s-Papert’s
limitation for the classical
perceptron?
Yes !!!
8
We can overcome the
Minsky’s-Papert’s limitation
using the complex-valued
weights and the complex
activation function
9
Is it possible to learn XOR and Parity n
functions using a single neuron?
• Any classical monograph/text book on neural networks
claims that to learn the XOR function a network from at
least three neurons is needed.
• This is true for the real-valued neurons and real-valued
neural networks.
• However, this is not true for the complex-valued neurons
!!!
• A jump to the complex domain is a right way to overcome
the Misky-Papert’s limitation and to learn multiple-valued
and Boolean nonlinearly separable functions using a single
neuron.
10
NEURAL NETWORKS
Complex-Valued Neurons
Traditional Neurons
Multi-Valued and
Universal Binary
Neurons
Neuro-Fuzzy Networks
Generalizations
of Sigmoidal
Neurons
11
Complex numbers
r
 Unlike a real number, which is
geometrically a point on a line, a
complex number is a point on a
plane.
 Its coordinates are called a real
(Re, horizontal) and an imaginary
(Im, vertical) parts of the number
 i is an imaginary unity
 r is the modulo (absolute value) of
the number
Algebraic form of a complex number
12
Complex numbers
A unit circle
φ is the argument (phase in terms of physics)
of a complex number
Trigonometric and exponential (Euler’s)
forms of a complex number
13
Complex numbers
Complex-conjugated numbers
z  x  iy  rei  r (cos   i sin  )
z  x  iy  rei  r (cos   i sin  )
14
XOR problem
PB  1
PB  1
i
-1
z
x1 x2  w  w x  w x PB (z)
0
1 1
2 2
f(x1, x2)
1
-i
PB  1
n=2, m=4 – four sectors
W=(0, 1, i) – the weighting vector
PB  1
1 1
1+i
1
1
1 -1
1-i
-1
-1
-1 1
-1+i
-1
-1
-1 -1
-1-i
1
1
15
Parity 3 problem
n=3, m=6 : 6 sectors
W=(0, ε, 1, 1) – the weighting vector
-1
ε
1
1
-1
1
-1
-1
1
X1 X 2
X3
1
1
1
1
-1
-1
-1
-1
1
-1
1
-1
1
-1
1
-1
1
1
-1
-1
1
1
-1
-1
Z  w0  w1 x1 
 w2 x 2  w3 x3
2


2
2


2
PB (Z ) f ( x1 , x2 , x3 )
1
-1
-1
1
-1
1
1
-1
1
-1
-1
1
-1
1
1
-1
16
Multi-Valued Neuron (MVN)
• A Multi-Valued Neuron is a neural element
with n inputs and one output lying on the unit
circle, and with the complex-valued weights.
• The theoretical background behind the MVN
is the Multiple-Valued (k-valued) Threshold
Logic over the field of complex numbers
17
Multi-valued mappings and multiplevalued logic
• We traditionally use Boolean functions
and Boolean (two-valued) logic, to present
two-valued mappings:
x1 ,..., xn 0,1 ; f  x1 ,..., xn  0,1
x1 ,..., xn 1, 1 ; f  x1,..., xn  1, 1
• To present multi-valued mappings, we
should use multiple-valued logic
18
Multiple-Valued Logic:
classical view
• The values of multiple-valued (k-valued)
logic are traditionally encoded by the
integers {0,1, …, k-1}
• On the one hand, this approach looks
natural.
• On the other hand, it presents only the
quantitative properties, while it can not
present the qualitative properties.
19
Multiple-Valued Logic:
classical view
• For example, we need to present different
colors in terms of multiple-valued logic. Let
Red=0, Orange=1, Yellow=2, Green=3, etc.
• What does it mean?
• Is it true that Red<Orange<Yellow<Green
??!
20
Multiple-Valued (k-valued) logic over the field
of complex numbers
• To represent and handle both the quantitative
properties and the qualitative properties, it is
possible to move to the field of complex
numbers.
• In this case, the argument (phase) may be
used to represent the quality and the
amplitude may be used to represent the
quantity
21
Multiple-Valued (k-valued) logic over the field
of complex numbers
j  {0, 1, ..., k  1}
i
2
regular values of k-valued logic
1

j   j  exp(i 2 j / k )
0
one-to-one correspondence
1
k-1
k-2
k-1
 j   0 ,  ,  2 ,...,  k 1
The kth roots of unity are values of
k-valued logic over the field of
complex numbers
  exp(i 2 / k )
primitive kth root of unity
22
Important advantage
• In multiple-valued logic over the field of complex numbers
all values of this logic are algebraically (arithmetically)
equitable: they are normalized and their absolute values
are equal to 1
• In the example with the colors, in terms of multiple-valued
logic over the field of complex numbers they are coded by
the different phases. Hence, their quality is presented by
the phase.
• Since the phase determines the corresponding frequency,
this representation meats the physical nature of the
colors.
23
Discrete-Valued (k-valued)
Activation Function
P( z )= exp  i 2πj / k    j ,
i
if1 2 j / k  arg (z )  2π ( j +1) / k
0
j
j-1
k-1
 j 1
Z
J
j+1
k-2
Function P maps the complex plane
into the set of the kth roots of unity
24
Discrete-Valued (k-valued)
Activation Function
k=16
25
Multi-Valued Neuron (MVN)
f ( x1 ,..., xn )  P(w0  w1 x1  ...  wn xn )
f is a function of k-valued logic
(k-valued threshold function)
x1
.
.
.
f ( x1 ,..., x n )
P(z)
xn
z  w0  w1 x1  ...  wn x n
26
MVN: main properties
• The key properties of MVN:
– Complex-valued weights
– The activation function is a function of the
argument of the weighted sum
– Complex-valued inputs and output that are lying
on the unit circle (kth roots of unity)
– Higher functionality than the one for the
traditional neurons (e.g., sigmoidal)
– Simplicity of learning
27
MVN Learning
•Learning is reduced to movement along the unit circle
•No derivative is needed, learning is based on the errorcorrection rule

s

q
i
ε
- Desired output
q
- Actual output
   
q
q
s
ε
s
s
- error, which
completely determines
the weights adjustment
28
Learning Algorithm for the Discrete MVN
with
the Error-Correction Learning Rule
Wr+1  Wr +
r
(ε - ε ) X
(n+1)
q
ε
q
W – weighting vector; X - input vector
X is a complex conjugated to X
s
q
αr – learning rate (should be always equal to 1)
i
s
ε
s
r - current iteration;
r+1 – the next iteration
s
q

is a desired output (sector)
is an actual output (sector)
29
Continuous-Valued Activation Function
Continuous-valued case
(k):
i
1
P( z )  exp(i (arg z ))  eiArg z 
 z/ | z |
P( z )  z / | z |
Z
Function P maps the complex plane
into the unit circle
30
Continuous-Valued Activation Function
31
Continuous-Valued Activation Function
32
Learning Algorithm for the Continuous MVN
with
the Error Correction Learning Rule
Wr 1
r
 q zr
 Wr +
ε (n+1) 
| zr
W – weighting vector; X - input vector
X is a complex conjugated to X

X
|q
ε
i
z
 
|z|
q
z
|z|
αr – a learning rate (should be always equal to 1)
r - current iteration;
r+1 – the next iteration
Z – the weighted sum
q
ε
is a desired output
z
is an actual output
|z|
z
  
|z|
q
- neuron’s error
33
Learning Algorithm for the Continuous MVN
with
the Error Correction Learning Rule
Wr 1
r
 q zr
 Wr +
ε (n+1) 
| zr
W – weighting vector; X - input vector
X is a complex conjugated to X

X
|q
ε
i
z
 
|z|
q
z
|z|
αr – a learning rate (should be always equal to 1)
r - current iteration;
r+1 – the next iteration
Z – the weighted sum
q
ε
is a desired output
z
is an actual output
|z|
z
  
|z|
q
- neuron’s error
34
A role of the factor 1/(n+1) in the
Learning Rule
z
  
|z|
q
The weights after the correction:
~ w 
w
0
0
- neuron’s error

~  w   x ; ... ; w
~ w   x
; w
1
1
1
n
n
n
(n  1)
(n  1)
(n  1)
The weighted sum after the correction:
z  w0  w1 x1  ...  wn xn 
 ( w0 
 w0 

(n  1)

)  ( w1 
 w1 x1 

(n  1)

x1 ) x1  ...  ( w1 
 ...  wn xn 
(n  1)
(n  1)
 w0  w1 x1  ...  wn xn    z  

( n  1)

(n  1)
xn ) xn 

- exactly what we are looking for
35
Self-Adaptation of the Learning Rate
 q zr 
Wr+1  Wr +
ε X
(n+1) zr 
| zr | 
r
i
1/|zr| is a self-adaptive part of the
learning rate
zr
1
zr
-is the absolute value of the
weighted sum on the
previous (rth) iteration.
1
|z| >1
|z| <1
is a self-adaptive part of the learning rate
36
Modified Learning Rules with the SelfAdaptive Learning Rate
Wr 1  Wr +
r
(n+1) | zr |
(ε q - ε s ) X
Discrete MVN
1/|zr| is a self-adaptive part of the learning rate
r
 q z 
Wr 1  Wr +
ε X
(n+1) zr 
| z |
Continuous MVN
37
Convergence of the learning algorithm
• It is proven that the MVN learning algorithm
converges after not more than k! iterations for the k
-valued activation function
• For the continuous MVN the learning algorithm
converges with the precision λ after not more than
(π/λ)! iterations because in this case it is reduced to
learning in π/λ –valued logic.
38
MVN as a model of
a biological neuron
The State of a biological neuron is determined by the frequency
of the generated impulses
The amplitude of impulses is always a constant
Excitation  High frequency
Intermediate State  Medium frequency
No impulses  Inhibition  Zero frequency
39
MVN as a model of
a biological neuron
40
MVN as a model of
a biological neuron
Intermediate
State
π
Maximal inhibition
0
2π
Intermediate
State
Maximal excitation
41
MVN as a model of
a biological neuron
Maximal inhibition
π
0
2π
Maximal excitation
42
MVN:
•
•
•
•
Learns faster
Adapts better
Learns even highly nonlinear functions
Opens new very promising opportunities for the
network design
• Is much closer to the biological neuron
• Allows to use the Fourier Phase Spectrum as a
source of the features for solving different
recognition/classification problems
• Allows to use hybrid (discrete/continuous)
inputs/output
43