Forward prediction error Backward predictions

Transcript Forward prediction error Backward predictions

Brain Meeting at the CHUV
Free energy and affordance
Karl Friston
Abstract
In this presentation, I will rehearse the free-energy formulation of action and perception, with a special focus on the
representation of uncertainty: The free-energy principle is based upon the notion that both action and perception are trying to
minimise the surprise (prediction error) associated with sensory input. In this scheme, perception is the process of optimising
sensory predictions by adjusting internal brain states and connections; while action is regarded as an adaptive sampling of
sensory input to ensure it conforms to perceptual predictions (this is known as active inference). Both action and perception
rest on an optimum representation of uncertainty, which corresponds to the precision of prediction error. Neurobiologically, this
may be encoded by the postsynaptic gain of prediction error units. I hope to illustrate the plausibility of this framework using
simple simulations of cued, sequential, movements. Crucially, the predictions driving movements are based upon a
hierarchical generative model that infers the context in which movements are made. This means that we can temporarily
confuse agents by changing the context (order) in which cues are presented. These simulations provide a (Bayes-optimal)
simulation of contextual uncertainty and set-switching that can be characterised in terms of behaviour and electrophysiological
responses. Interestingly, one can lesion the encoding of precision (postsynaptic gain) to produce pathological behaviours that
are reminiscent of those seen in Parkinson's disease. I will use this as a toy example of how information theoretic approaches
to uncertainty may help understand action selection and set-switching.
or precision and uncertainty, ansen seale (2005)
“Objects are always imagined as being present in the field of vision as
would have to be there in order to produce the same impression on
the nervous mechanism” - Hermann Ludwig Ferdinand von Helmholtz
Richard Gregory
Geoffrey Hinton
From the Helmholtz machine to the
Bayesian brain and self-organization
Thomas Bayes
Richard Feynman
Hermann Haken
Overview
Ensemble dynamics
Entropy and equilibria
Free-energy and surprise
The free-energy principle
Perception and generative models
Hierarchies and predictive coding
Perception
Birdsong and categorization
Simulated lesions and perceptual uncertainty
Action
Cued reaching and affordance
Simulated lesions and behavioral uncertainty
temperature
What is the difference between a
snowflake and a bird?
Phase-boundary
…a bird can move (to avoid surprises)
What is the difference between snowfall and a flock of birds?
Ensemble dynamics, clumping and swarming
…birds (biological agents) stay in the same place
They resist the second law of thermodynamics, which says that their entropy
should increase
But what is the entropy?
T
…entropy is just average surprise
H   dt L (t )    p ( | m) ln p ( | m)d
0
L   ln p ( s | m)
s  g( )
A 
High surprise (I am never here)
Low surprise (we are usually here)
This means biological agents self-organize to minimise surprise. In other words, to
ensure they occupy a limited number of states (cf homeostasis).
But there is a small problem… agents cannot measure their surprise
s  g( )
?
But they can measure their free-energy, which is always bigger than surprise
F (t )  L (t )
This means agents should minimize their free-energy. So what is free-energy?
What is free-energy?
…free-energy is basically prediction error
sensations – predictions
= prediction error
where small errors mean low surprise
More formally,
Sensations
s  g ( )  
  arg min F ( s ,  )
  f ( , a)  
External states
in the world

Action
Internal states of
the agent (m)
a  arg min F ( s ,  )
a
Free-energy is a function of sensations and a proposal density over hidden causes
F ( s ,  )  Energy  Entropy  Eq (G )  Eq (ln q( |  ))
and can be evaluated, given a generative model (Gibbs Energy) or likelihood and prior:
G ( s ,  )   ln p( s ,  | m)   ln p( s |  , m)  ln p( | m)
So what models might the brain use?
Hierarchal models in the brain
v (i 1)  g ( x(i ) , v (i ) , (i ) )   ( v,i )
D x ( i )  f ( x ( i ) , v ( i ) ,  ( i ) )   ( x ,i )
v~ ( 2 )
lateral
 (2)
Backward
(modulatory)
~
x ( 2)
 ( 2)
v~ (1)
Forward
(driving)
 (1)
~
x (1)
 (1)
s
  {x(t ), v (t ),  , }
So how do prediction errors change predictions?
sensory input
Forward connections convey feedback
Adjust hypotheses
Prediction errors
Predictions
prediction
Backward connections return predictions
…by hierarchical message passing in the brain
David Mumford
More formally,
Synaptic activity and message-passing
 ( v ,i )   ( v ,i ) ( v ,i )   ( v ,i ) (  ( v ,i 1)  g (i ) )
 ( x ,i )   ( x ,i ) ( x ,i )   ( x ,i ) ( D  ( x ,i )  f ( i ) )
Forward prediction error
 (v,1)
 (v,3)
 (v,2)
 ( x,2)
 ( x,1)
 (v,2)
 (v,1)
s(t )
 ( x,1)
Backward predictions
 ( v ,i )  D  ( v ,i )   v(i )T  (i )   ( v ,i 1)
 ( x ,i )  D  ( x ,i )   x( i )T  ( i )
cf., Predictive coding or Kalman-Bucy filtering
 ( x,2)
Low level macrocolumn
high level macrocolumn
Cortical layers
Superficial pyramidal cells
 (1,v )

I
II
III
 (2,v )
(1, x )
Backward predictions
IV
V
Forward prediction error
VI

s (t )
Deep pyramidal cells

 (i ,v )   (i ,v ) (  ( i 1,v )  f (  ( i , x ) ,  ( i ,v ) ))
(1, v )
 (i , x )   (i , x ) (D  ( i , x )  f (  ( i , x ) ,  ( i ,v ) ))
(1, x )
Presynaptic terminals

( i 1, v )
Dendritic spine
Excitatory (AMPA) receptors
 ( i ,v )
Modulatory (D1) receptors
Inhibitory (GABAA) receptors
f
( i ,v )
 ( i ,v )
13
Summary
Biological agents resist the second law of thermodynamics
They must minimize their average surprise (entropy)
They minimize surprise by suppressing prediction error (free-energy)
Prediction error can be reduced by changing predictions (perception)
Prediction error can be reduced by changing sensations (action)
Perception entails recurrent message passing in the brain to optimise predictions
Predictions depend upon the precision of prediction errors
Overview
Ensemble dynamics
Entropy and equilibria
Free-energy and surprise
The free-energy principle
Perception and generative models
Hierarchies and predictive coding
Perception
Birdsong and categorization
Simulated lesions and perceptual uncertainty
Action
Cued reaching and affordance
Simulated lesions and behavioral uncertainty
Making bird songs with Lorenz attractors
 v1 
v 
v2 
hidden states
f (1)
Sonogram
0.5
1
1.5
time (sec)
causal states
18 x2(1)  18 x1(1)

 (1) (1)

 v1 x1  2 x3(1) x1(1)  x2(1) 
 2 x (1) x (1)  v (1) x (1)

2
3
 1 2

Syrinx
Frequency
Vocal centre
Predictive coding and
message passing
prediction and error
20
15
10
5
0
-5
 (v)
10
20
30
40
50
60
causal states
Backward predictions
20
 ( x)
stimulus
15
10
5000
5
4500
s(t )
4000

Forward prediction error

3500
3000
2000
0.2
0.4
0.6
time (seconds)
0.8
15
10
5
0
-5
10
20
30
40
50
0
( x)
-5
-10
hidden states
20
2500
(v)
60
10
20
30
40
50
60
Frequency (Hz)
Perceptual categorization
Song a
Song b
time (seconds)
1( v )
 2( v )
Song c
Hierarchical (itinerant) birdsong: sequences of sequences
Neuronal hierarchy
sonogram
Frequency (KHz)
Syrinx
v1(1)
v2(1)
0.5
1
1.5
Time (sec)
f (2)
18 x2(2)  18 x1(2)



 32 x1(2)  2 x3(2) x1(2)  x2(2) 
 2 x (2) x (2)  8 x (2)

3 3
 1 2

g
(2)
 x2(2)  v1(1) 
  (2)    (1) 
 x3  v2 
f (1)
18 x2(1)  18 x1(1)

 (1) (1)

 v1 x1  2 x3(1) x1(1)  x2(1) 
 2 x (1) x (1)  v (1) x (1)

2
3
 1 2

g
(1)
 x2(1)   s1 
  (1)    
 x3   s2 
Simulated lesions and false inference
percept
LFP
Frequency (Hz)
LFP (micro-volts)
60
40
 ( v ,i )  D  ( v ,i )   v(i )T  (i )   ( v ,i 1)
20
 ( x ,i )  D  ( x ,i )   x( i )T  ( i )
0
-20
-40
no top-down messages
LFP (micro-volts)
Frequency (Hz)
no structural priors
LFP
60
40
 ( v ,i )  D  ( v ,i )   v(i )T  (i )   ( v ,i 1)
20
0
 ( x ,i )  D  ( x ,i )   x( i )T  ( i )
-20
-40
-60
no lateral messages
LFP
Frequency (Hz)
no dynamical priors
LFP (micro-volts)
60
0.5
1
1.5
time (seconds)
40
 ( v ,i )  D  ( v ,i )   v(i )T  (i )   ( v ,i 1)
20
0
 ( x ,i )  D  ( x ,i )   x( i )T  ( i )
-20
-40
-60
0
500
1000
1500
peristimulus time (ms)
2000
Overview
Ensemble dynamics
Entropy and equilibria
Free-energy and surprise
The free-energy principle
Perception and generative models
Hierarchies and predictive coding
Perception
Birdsong and categorization
Simulated lesions and perceptual uncertainty
Action
Cued reaching and affordance
Simulated lesions and behavioral uncertainty
prediction
From reflexes to action

g ( )
dorsal horn
dorsal root
s
s(a)
action
ventral root
ventral horn
   ( s (a)  g (  ))
s
s
a
a   aT 
Sensorimotor contingencies and schema
Proprioception
Exteroception
Proprioceptive forward model
Proprioceptive cues
Visual cues
Easy inverse problem
Classical reflex arc
Motor commands
Visual forward model
Autonomous behaviour and prior beliefs:
Lotka-Volterra dynamics: winnerless competition
Misha Rabinovich
 s( x)
 1 1 1 1


 1 1 1 1 
 ( x, v)  A(v)(1  e )  x  1
x 1
ex
s ( x)  xT
e 1
1
8
0
v
A (v )  
0

 v
v
0
v
0
0
v
0
v
v  0 1 1 1 
0  1 0 1 1 

v   1 1 0 1 
 

0  1 1 1 0 
Premotor cortex
affordance
Motor cortex
joint positions
Parietal cortex
finger location
 p( x ,1)


( v ,1)
v
Anatol Feldman
 a( x ,1)
( x ,1)
p
a( x ,1)
sv
 ( x ,2)
 ( x ,2)
 ( v ,2)
Prefrontal cortex
changes in set
 ( v ,1)
 a( v ,1)
Active inference
sa
Striatum
set selection
Superior colliculus
salience
 p( v ,1)
sp
a
Motoneurones
Model and real world
x(2)   ( x(2) , 161 )   ( x,2)
v(1)  s( x(2) )   ( v ,2)
Set switching
v (1)
a
Action

 x(1)
  12 (  s( xa(1) )  tan( x(1)
( x ,1)
p ))
p


 
 (1) 
(1) 1 (1)
 ( xa , 2 v1 )
 xa  

 x(1)
  tanh(a)  18 x(1)

( x ,1)
p
p

 (1)  
 ω
(1)
(1)
xa   v  xa

Hidden causes
Changes in joint positions
and affordance
Action selection
Extrinsic locations

x (1)
sp  
p

 s   tan( x (1) )    ( v ,1)
p

 v 
(1)
1
 sa   exp( 2 xa ) 
Joint positions
Finger location
Target salience
(1)

sp   x p

 s   tan(x (1) )   ω ( v ,1)
p 
 v 
(1)
 sa   4x a 
(1)
1
(1)
2
(1)
3
(1)
4
tan(x(1)
p )
26
 ( v,i )   ( v,i ) ( v,i )   ( v,i ) (  ( v,i 1)  g (i ) )

( x ,i )

( x ,i )

( x ,i )

( x ,i )
(D 
( x ,i )
Dopamine and precision
f )
(i )
Motor cortex
Premotor cortex
 p( x,1)
Parietal cortex


( v ,1)
v
 a( x ,1)
( x ,1)
p
 a( x ,1)
sv
Prefrontal cortex
 ( x ,2)
Striatum
 ( x ,2)
 ( v ,2)
 ( v ,1)
 a( v ,1)
Superior colliculus
Mesocortical DA projections
sa
Nigrostriatal DA projections
SN/VTA
Mesorhombencephalic pathway
 p( v,1)
sp
a
Motoneurones
v)
 (1,
 exp(5.0)
a
v)
 (1,
 exp(3.5)
a
v)
 (1,
 exp(2.5)
a
hidden causes
hidden causes
hidden causes
1.5
1.5
1.5
1
1
1
0.5
0.5
0.5
0
0
0
-0.5
20
40
60
80
100
-0.5
120
20
40
60
80
100
-0.5
120
-2
-2
-1
-1
-1
0
0
0
1
1
1
-1
0
1
40
2
2
-2
-1
0
60
80
100
120
time
-2
2
-2
20
time
time
1
2
2
-2
-1
0
1
2
Uncertainty and perseveration
-2
-1
0
1
2
-2
-1
0
1
2
salience
proprioception
reaction times
550
reaction times
440
Low DA
500
affordance
reaction times
420
Low DA
420
High DA
400
400
400
milliseconds
milliseconds
milliseconds
380
450
380
360
360
340
340
350
300
High DA
1
2
3
4
5
6
7
8
High DA
320
300
9
1
2
3
4
cue onset (sec)
7
 a( x ,1)

 ( x ,2)
3
4
5
6
 ( x ,2)
 ( x ,2)
 ( x ,2)
 ( v ,2)
 ( v ,1)
 ( v ,1)
 a( v ,1)
Superior colliculus
sa
SN/VTA
9
 a( x ,1)
sv
 ( x ,2)
sa
8
 a( x ,1)
 (px ,1)
 ( v ,2)
 a( v ,1)
7
Premotor cortex
 p( x,1)
 v( v ,1)
 a( x ,1)
sv
 ( v ,1)
Superior colliculus
2
Motor cortex
 a( x ,1)
 ( x ,2)
SN/VTA
1
Premotor cortex
( x ,1)
p
 ( v ,2)
sa
300
9
cue onset (sec)
 (px ,1)
 v( v ,1)
 a( x ,1)
sv
 a( v ,1)
8
Motor cortex
Premotor cortex
( x ,1)
p
 (px ,1)
 v( v ,1)
6
cue onset (sec)
Motor cortex

5
Low DA
320
Superior colliculus
SN/VTA
Uncertainly, delusions and confusion
perseveration
confusion
Motor cortex

 a( x ,1)
 (px ,1)
 v( v ,1)
Motor cortex
Premotor cortex
( x ,1)
p
sv
 ( x ,2)
 a( x ,1)
 (px ,1)
 v( v ,1)
 a( x ,1)
Premotor cortex
 p( x,1)
 a( x ,1)
sv
 ( x ,2)
 ( x ,2)
 ( x ,2)
 ( v ,2)
 ( v ,2)
 ( v ,1)
 ( v ,1)
 a( v ,1)
 a( v ,1)
sa
Superior colliculus
sa
SN/VTA
Superior colliculus
SN/VTA
Thank you
And thanks to collaborators:
Rick Adams
Sven Bestmann
Harriet Brown
Jean Daunizeau
Lee Harrison
Stefan Kiebel
James Kilner
Jérémie Mattout
Klaas Stephan
And colleagues:
Peter Dayan
Jörn Diedrichsen
Paul Verschure
Florentin Wörgötter
And many others

Forward prediction error Backward predictions

Transcript Forward prediction error Backward predictions

Directory