Introduction to neural computation

Download Report

Transcript Introduction to neural computation

Neural Computation
0368-4149-01
Prof. Nathan Intrator
TA: Yehudit Hasson
Tuesday 16:00-19:00 Dan David 111
Office hours: Wed 4-5
[email protected]
Neural Computation
• Neuroscience
– The objective is to understand the human brain
– Biologically azrealistic models of neurons
– Biologically realistic connection topologies
• Neural computation
– The objective is to develop learning, representation and
computation methods
– Novel architectures for data representation and processing
The goals of neural computation
• To understand how the brain actually works
– Its big and very complicated and made of yukky stuff that
dies when you poke it around
• To understand a new style of computation
– Inspired by neurons and their adaptive connections
– Very different style from sequential computation
• should be good for things that brains are good at (e.g. vision)
• Should be bad for things that brains are bad at (e.g. 23 x 71)
• To solve practical problems by using novel learning
algorithms
– Learning algorithms can be very useful even if they have
nothing to do with how the brain works
The Brain
The brain - that's my second most favorite organ!
- Woody Allen
The Brain: Fundamental Questions
• What kind of information is extracted from the
environment?
• How is information represented, e.g. visual?
• How is information stored?
• How is information altered (learning & memory)?
• How is information processed and manipulated?
The Brain: Simpler Questions
• How is 3D information stored
• How is relational information stored:
– The child is on the floor
– The book is in the bag
• How are verbs associated with adjectives
• How is information bound together:
– Collections of items which are on the table
– Collection of edges which form an object
Physiological
experiments help us learn
how a new scene is
analyzed, in particular the
eye movement is used to
learn about the analysis
strategy
In this unseen set of images,
it takes very long time to
detect the changes between
the bear and microscope.
How do we observe changes
in familiar scenes very fast?
Man versus Machine
(hardware)
Numbers
Human brain
Von Neumann
computer
# elements
1010 - 1012 neurons
107 - 108 transistors
# connections / element
104 - 103
10
switching frequency
103 Hz
109 Hz
energy / operation
10-16 Joule
10-6 Joule
power consumption
10 Watt
100 - 500 Watt
reliability of elements
low
reasonable
reliability of system
high
reasonable
Man versus Machine
(information processing)
Featuresa
Human Brain
Data representation
analog
Von Neumann
computer
digital
Memory localization
distributed
localized
Control
distributed
localized
Processing
parallel
sequential
Skill acqazuisition
learning
programming
No memory management,
No hardware/software/data distinction
Brain Performance
Flies have a better stabilizing mechanism
than a Boeing 747
Their gyroscope is being studied
in a wind tunnel
http://www.kyb.mpg.de/publications/pdfs/pdf340.pdf
The bat’s external ears pick up both the emitted sounds and the
returning echoes to serve as the receiving antennas. Echo
delay estimation 20 nanoSec!!
Movies: Navigation
DARPA Robot Race
Dolphin’s sonar properties
• Send up to 200 clicks per second!
• Frequency range 15 kHz – 120 kHz
• Excellent sensor array (whole face)
•
•
•
•
•
Discriminate between alloys of aluminum
‘See’ a tennis ball from 75 meters
Distinguish between a penny and dime from 3 meters
Detect fish buried .5 meter underground
Excellent shape discrimination (same material)
W. W. L. Au (1993) The sonar of dolphins. (Springer).
Brief Outline
•
•
•
•
Unsupervised Learning
– Short bio motivation
– Unsupervised Neuronal Model
– Connection with Projection Pursuit and advanced feature
extraction
Supervised Learning Schemes
– Perceptron and Multi Layer Perceptron
– RBF, SVM, Trees
– Training and optimization
Model Selection and Validation (advanced training methods)
– Cross Validation, Regularization, Noise injection
– Ensembles
Brain Machine Interface
– EEG, fMRI modalities
– Brain state interpretation based on machine learning
model
– Recent Research in BMI
Introduction to the Brain
By: Geoffrey Hinton
www.cs.toronto.edu/~hinton/csc321/notes/lec1.ppt
A typical cortical neuron
• Gross physical structure:
– There is one axon that branches
– There is a dendritic tree that collects input
from other neurons
• Axons typically contact dendritic trees at
synapses
– A spike of activity in the axon causes
charge to be injected into the post-synaptic
neuron
• Spike generation:
– There is an axon hillock that generates
outgoing spikes whenever enough charge
has flowed in at synapses to depolarize the
cell membrane
axon
body
dendritic
tree
A Neuron
The synaptic junction
Synapses, Ca influx, release of neurotransmitter, opening
of post-synaptic channels
Some relevant terms
Axon, dendrite
Ion channels
Membrane rest potential
Action potential,
refractory period
The Biological Neuron
• 10 billion neurons in human brain
• Summation of input stimuli
– Spatial (signals)
– Temporal (pulses)
• Threshold over composed inputs
• Constant firing strength
6
• 10 billion synapses in human brain
• Chemical transmission and modulation
of signals
• Inhibitory synapses
• Excitatory synapses
Biological Neural Networks
• 10,000 synapses per neuron
• Computational power = connectivity
• Plasticity
– new connections (?)
– strength of connections modified
Neural Dynamics
40
mV
membrane
rest
activation
20
0
Action potential
-20
Action potential ≈ 100mV
Activation threshold ≈ 20-30mV
Rest potential ≈ -65mV
Spike time ≈ 1-2ms
Refractory time ≈ 10-20ms
-40
-60
-80
Refractory time
-100
ms
-120
0
10
20
30
40
50
60
70
80
90
100
The Artificial Neuron
Stimulus
ui t    wij  x j t 
j
x1(t)
wi1
x2(t)
x3(t)
Response
yi t   f urest  ui t 
x4(t)
wi2
wi3
wi4
w
j
ij
 x j (t )
yi  f (ui (t))
wi5
x5(t)
urest = resting potential
xj(t) = output of neuron j at time t
wij = connection strength between neuron i and neuron j
u(t) = total stimulus at time t
Neuron i
yi(t)
Artificial Neural Models
• McCulloch Pitt-type Neurons (static)
– Digital neurons: activation state interpretation (snapshot of
the system each time a unit fires)
– Analog neurons: firing rate interpretation (activation of
units equal to firing rate)
– Activation of neurons encodes information
• Spiking Neurons (dynamic)
– Firing pattern interpretation (spike trains of units)
– Timing of spike trains encodes information (time to first
spike, phase of signal, correlation and synchronicity
Binary Neurons
hard threshold
1.2
Stimulus
output
1
0.8
ui   wij  x j
on
0.6
Response
yi  f urest  ui 
j
0.4
0.2
input
0
-0.2 -10
-8
-6
-4
-2
0
2
4
6
8
10
“Hard” threshold
-0.4
-0.6
heaviside
-0.8
-1
-1.2
off
 z    ON 


f z   

 else  OFF 


• ex: Perceptrons, Hopfield NNs, Boltzmann Machines
• Main drawbacks: can only map binary functions,
biologically implausible.
= threshold
Analog Neurons
sigmoid
1.2
output
Stimulus
on
1
ui   wij  x j
0.8
0.6
j
0.4
0.2
input
0
-0.2 -10
-0.4
-8
-6
-4
-2
0
2
4
6
8
2/(1+exp(-x))-1
10
“Soft” threshold
-0.6
-0.8
-1
-1.2
off
f z  
2
1
1  e z
• ex: MLPs, Recurrent NNs, RBF NNs...
• Main drawbacks: difficult to process time patterns,
biologically implausible.
Response
yi  f urest  ui 
Spiking Neurons
Stimulus
 = spike and afterspike potential
urest = resting potential
e(t,u(t)) = trace at time t of input at time t
= threshold
xj(t) = output of neuron j at time t
wij = efficacy of synapse from neuron i to neuron j
u(t) = input stimulus at time t
ui t    wij  x j t 
j
Response


yi (t )  f urest   (t  t f )   0  t , ui  
t
dz


z


&

0

ON


dt
f z   



else

OFF


Spiking Neuron Dynamics
neuron output
2.5
y(t)

urest+(t-tf)
V
2
1.5
1
0.5
t
0
0
-0.5
-1
10
20
30
40
50
60
70
80
90
100
Hebb’s Postulate of Learning
When an axon of cell A is near enough to excite a cell
and repeatedly or persistently takes part in firing it,
some growth process or metabolic change takes place
in one or both cells such that A’s efficiency as one of
the cells firing B is increased.
Hebb’s Postulate: revisited
• Stent (1973), and Changeux and Danchin
(1976)
• have expanded Hebb’s rule such that it also
models inhibitory synapses:
1. If two neurons on either side of a synapse are
activated simultaneously (synchronously), then the
strength of that synapse is selectively increased.
2. If two neurons on either side of a synapse are
activated asynchronously, then that synapse is
selectively weakened or eliminated.a
Synapses
• When a spike travels along an axon and arrives at a synapse it
causes vesicles of transmitter chemical to be released
– There are several kinds of transmitter
• The transmitter molecules diffuse across the synaptic cleft and
bind to receptor molecules in the membrane of the post-synaptic
neuron thus changing their shape.
– This opens up holes that allow specific ions in or out.
• The effectiveness of the synapse can be changed
– vary the number of vesicles of transmitter
– vary the number of receptor molecules.
• Synapses are slow, but they have advantages over RAM
– Very small
– They adapt using locally available signals (but how?)
How the brain works
• Each neuron receives inputs from other neurons
- Some neurons also connect to receptors
- Cortical neurons use spikes to communicate
- The timing of spikes is important
• The effect of each input line on the neuron is controlled by a
synaptic weight
– The weights can be
positive or negative
• The synaptic weights adapt so that the whole network learns to
perform useful computations
– Recognizing objects, understanding language, making plans,
controlling the body
• You have about 10 11
neurons each with about 10 weights
3
– A huge number of weights can affect the computation in a very
short time. Much better bandwidth than pentium.
Modularity and the brain
• Different bits of the cortex do different things.
– Local damage to the brain has specific effects
– Specific tasks increase the blood flow to specific regions.
• But cortex looks pretty much the same all over.
– Early brain damage makes functions relocate
• Cortex is made of general purpose stuff that has the ability to turn
into special purpose hardware in response to experience.
– This gives rapid parallel computation plus flexibility
– Conventional computers get flexibility by having stored
programs, but this requires very fast central processors to
perform large computations.
Idealized neurons
• To model things we have to idealize them (e.g. atoms)
– Idealization removes complicated details that are not
essential for understanding the main principles
– Allows us to apply mathematics and to make analogies to
other, familiar systems.
– Once we understand the basic principles, its easy to add
complexity to make the model more faithful
• It is often worth understanding models that are known to be
wrong (but we mustn’t forget that they are wrong!)
– E.g. neurons that communicate real values rather than
discrete spikes of activity.
Linear neurons
• These are simple but computationally limited
– If we can make them learn we may get insight into
more complicated neurons
bias
ith input
y  b   xi wi
output
i
index over
input connections
y
0
weight on
ith input
b
0
x w
i
i
i
Binary threshold neurons
• McCulloch-Pitts (1943): influenced Von Neumann!
– First compute a weighted sum of the inputs from other
neurons
– Then send out a fixed size spike of activity if the weighted
sum exceeds a threshold.
– Maybe each spike is like the truth value of a proposition
and each neuron combines truth values to compute the truth
value of another proposition!
z   xi wi
i
y
1 if
z 
0 otherwise
1
y
0
threshold
z
Linear threshold neurons
These have a confusing name.
They compute a linear weighted sum of their inputs
The output is a non-linear function of the total input
z j  b j   xi wij
i
yj 
z j if z j  0
0 otherwise
y
0
threshold
z
Sigmoid neurons
• These give a real-valued
output that is a smooth and
bounded function of their
total input.
– Typically they use the
logistic function
– They have nice
derivatives which make
learning easy (see lecture
4).
• If we treat yas a
probability of producing a
spike, we get stochastic
binary neurons.
z  b   xi wi
i
y
1

z
1 e
1
y
0.5
0
0
z
Types of connectivity
• Feedforward networks
– These compute a series of
transformations
– Typically, the first layer is the
input and the last layer is the
output.
• Recurrent networks
– These have directed cycles in
their connection graph. They
can have complicated
dynamics.
– More biologically realistic.
output units
hidden units
input units
Types of learning task
• Supervised learning
– Learn to predict output when given input vector
• Who provides the correct answer?
• Reinforcement learning
– Learn action to maximize payoff
• Not much information in a payoff signal
• Payoff is often delayed
• Unsupervised learning
– Create an internal representation of the input e.g.
form clusters; extract features
• How do we know if a representation is good?