Neural Networks.Chap..

Download Report

Transcript Neural Networks.Chap..

Neural Networks 2nd Edition
Simon Haykin
Chap 1. Introduction
柯博昌
What is a Neural Network



2
A neural network is a massively parallel distributed
processor made up of simple processing units, which
has a natural propensity for storing experiential
knowledge and making it available for use.
Knowledge is acquired by the network from its
environment through a learning process. The
procedure performing learning process is called a
learning algorithm.
Interneuron connection strengths, known as synaptic
weights, are used to store the acquired knowledge.
Benefits of Neural Networks

The computing power of neural networks
–
–

Using neural networks offers following properties:
–
–
–
–
–

3
Massively parallel distributed structure
Ability to learn and therefore generalize.
Nonlinearity
Input-Output Mapping
Adaptively
Evidential Response
Contextual Information
–
–
–
–
Fault Tolerance
VLSI Implementability
Uniformity of Analysis and Design
Neurobiological Analogy
Supervised Learning: Modifying the synaptic weights
by applying a set of training samples, which constitute
of input signals and corresponding desired responses.
Human Brain - Function Block
 Block diagram representation of human nervous
system
Forward
Stimulus
Receptors
Neural
Net.
(Brain)
Effectors
Response
Feedback


4
Receptors: Convert stimulus from the human body or
the external environment into electrical impulses that
convey information to brain.
Effectors: Convert electrical impulses generated by
brain into discernible responses as system outputs.
Comparisons: Neural Net. Vs. Brain




Neuron: The structural constituents of the brain.
Neurons are five to six orders of magnitude slower
than silicon logic gates. (e.g. Silicon chips: 10-9 s,
Neural Event: 10-3 s)
10 billion neurons and 60 trillion synapses or
connections are in the human cortex.
The energetic efficiency
–
–
5
Brain: 10-16 joules per operation per second.
Best Computer today: 10-6 joules per operation per second.
Synapses



Synapses are elementary structural and functional
units that mediate the interactions between neurons.
The most common kind of synapse is chemical
synapse.
The operations of synapse:
–
–
–
6
A pre-synaptic process liberates a transmitter substance
that diffuses across the synaptic junction between neurons.
Acts on a post-synaptic process.
Synapse converts a pre-synaptic electrical signal into a
chemical signal and then back into a post-synaptic electrical
signal. (Nonreciprocal two-port device)
Pyramidal Cell
7
Cytoarchitectural map of the cerebral cortex
8
Nonlinear model of a neuron
x1
Bias
bk
wk1
Activation
function
x2
wk2
...
...
Input
signals
xm
vk
S
wkm
j(×)
Output
yk
Summing
junction
Synaptic
weights
m
uk   wkj x j
vk  uk  bk
yk  j (vk )  j (uk  bk )
j 1
Let bk=wk0 and x0=+1 vk   wkj x j and yk  j (vk )
m
9
j 0
Nonlinear model of a neuron (Cont.)
bk>0
Input
signals
Linear combiner’s
output, uk
Affine transformation produced
by the presence of a bias
10
wk0
x1
wk1
x2
wk2
...
Bk=0
Bk<0
Fixed input
x0=+1
...
Induced
local
field, uk
xm
wk0=bk (bias)
wkm
Activation
function
S
vk
j(×)
Output
yk
Summing
junction
Synaptic
weights
(including bias)
Another Nonlinear model of a neuron
Types of Activation Function
Threshold Function
1 if v  0
j (v )  
0 if v  0
Piecewise-Linear Function
v  1/ 2
1
v  1 / 2  v  1 / 2
j (v )  
0
v  1 / 2
Sigmoid Function
1
j (v ) 
1  exp ( av )
a is the slope parameter
11
1.2
1
0.8
0.6
0.4
0.2
0
-2 -1.5
1.2
1
0.8
0.6
0.4
0.2
0
-2 -1.5
1.2
1
0.8
0.6
0.4
0.2
0
-8
j (v )
-1
-0.5
0
v
0.5
1
1.5
2
0
v
0.5
1
1.5
2
j (v )
-1
-0.5
j (v )
Increasing
a
-6
-4
-2
0
v
2
4
6
8
Types of Activation Function (Cont.)


The activation functions defined above range from 0
to +1.
Sometimes, the activation function ranges from -1 to
+1. (How to do?)
Assume the activation function ranging from 0 to +1 is denoted as j(×),
ranging from -1 to +1 is denoted as j’(×)
 j’(×)=j(×)*2-1
Notes: if j(v)=sigmoid function
j (v ) 
12
1
* 2 1
1  exp ( av )
1  exp ( av )

 tanh( v)
1  exp ( av )
Stochastic Model of a Neuron


The above model is deterministic in that its inputoutput behavior is precisely defined.
Some applications of neural network base the
analysis on a stochastic neuronal model.
Let x denote the state of the neuron, and P(v) denote the probability of
firing, where v is the induced local field of the neuron.
 1 with probabilit y P(v)
x
 1 with probabilit y 1 - P(v)
A standard choice for P(v) is the sigmoid-shaped function. T is a
pseudo-temperature that is used to control the noise level and
therefore the uncertainty in firing.
13
P(v ) 
1
1  exp ( v / T )
Neural Network  Directed Graph
Synaptic Links
Activation Links
wkj
xj
j(×)
xj
yk=wkjxj
yk=j(xj)
yi
Synaptic Convergence
(fan-in)
Synaptic Divergence
(fan-out)
yk=yi+yj
yj
xj
xj
xj
14
Signal-flow Graph of a Neuron
x0=+1
x1
wk0=bk
x2
wk1
wk2
...
...
15
xm
wkm
vk
j(×)
yk
Feedback

Feedback plays a major role in recurrent network.
xj’(n)
xj(n)
A
yk=j(xj)
B
yk(n)=A[xj’(n)]
yk (n ) 
xj’(n)=xj(n)+B[yk(n)]


A
x j (n )
1  AB
where A and B act as operators
A/(1-AB) is referred as closed-loop operator,
AB as open-loop operator.
In general, ABBA
16
Feedback (Cont.)
Let A be a fixed weight, w; and B is a unit-delay operator, z-1
(
A
w
1


w
1

wz
1  AB 1  wz 1
 (1  wz

)  w z

1 1
l

l
l 0
1
Use Taylor’s Expansion or Binomial
Expansion to prove it.
A
 w wl z l
1  AB
l 0



yk (n )  w wl z l x j (n )
l 0



yk (n )   wl 1 x j (n  l )
l 0
17
)


 z l x j (n)  x j (n  l )
Time Responses for different weight, w
yk(n)
yk(n)
yk(n)
wxj(0)
w>1
w=1
w<1
wxj(0)
wxj(0)
0
1
2
3
4
5
n
0
1
2
3
4
5
n
0
1
2
Conclusions:
1. |w|<1, yk(n) is exponentially convergent. System is stable.
2. |w|1, yk(n) is divergent. System is unstable.
Think about:
1. What does the time response change, If -1<w<0?
2. What does the time response change, If w-1?
18
3
4
5
n
Network Architectures
Single-Layer
Feedforward
Networks
MultiLayer
Feedforward
Networks
Fully Connected: Every node
in each layer is connected to
every other node in the
adjacent forward layer.
Otherwise, it’s Partially
Connected.
Input layer
Output layer
of source
of neurons
nodes
19
Input layer
of source
nodes
Layer of
hidden
neurons
Layer of
output
neurons
Network Architectures (Cont.)
Recurrent Networks
with no selffeedback loops and
no hidden neurons
Recurrent Networks
with hidden neurons
z-1
Outputs
z-1 z-1 z-1 z-1
Unit-delay
operators
z-1
z-1
z-1
Unit-delay
operators
Inputs
20
Knowledge Representation

Primary characteristics of knowledge representation
–
–



21
What information is actually made explicit
How the information is physically encoded for subsequent
use
Knowledge is goal directed.
A good solution depends on a good representation of
knowledge.
A set of input-output pairs, with each pair consisting
of an input signal and the corresponding desired
response, is referred to as a set of training data or
training sample.
Rules for Knowledge Representation
Rule 1: Similar inputs from similar classes should usually produce similar
representations inside the network.
Similarity Measuring:
(1) Using Euclidian distance, d(xi, xj) (2) Using Inner Product, (xi, xj)
Let xi=[xi1, xi2, …, xim]T
Let xi=[xi1, xi2, …, xim]T

2
d (xi , x j )  xi  x j   (xik  x jk ) 
 k 1

m
1
(x , x )  x
2
i
i
j
xi × x j
22
If ||xi||=1 and ||xj||=1
m
x j   xik x jk
k 1
xi
1
Similarity 
d (xi , x j )
(x , x )
cos( ) 
j
T
i

xiTxj
xj
d2(xi, xj)=(xi-xj)T(xi-xj)=2-2(xiTxj)
Rules for Knowledge Representation
(Cont.)
Rule 2: Items to be categorized as separate classes should be
given widely different representations in the network.
(This is the exact opposite of Rule 1.)
Rule 3: If a particular feature is important, then there should be a
large number of neurons involved in the representation of
that item.
Rule 4: Prior information and invariance should be built into the
design of a neural network, thereby simplifying the network
design by not having to learn them.
23
How to Build Prior Information into
Neural Network Design


Ex:
Restricting the network architecture though the use of local
connections knows as receptive fields.
Constraining the choice of synaptic weights through the use of
weight-sharing.
6
v j   wi xi  j 1 ,
j  1,2,3,4
i 1
Convolution Sum
Convolution Network
24
x1, …, x6 constitute the receptive
field for hidden neuron 1 and so
on for the other hidden neurons.
Artificial Intelligence (AI)


Goal: Developing paradigms or algorithms that
require machines to perform cognitive tasks.
AI system must be capable of doing:
–
–
–

Key components
–
–
–
25
Store knowledge
Apply knowledge stored to solve problems
Acquire new knowledge through experience
Representation
Reasoning
Learning
Representation
Learning
Reasoning