Backward predictions Forward prediction error

Download Report

Transcript Backward predictions Forward prediction error

The 27th Conference on Uncertainty in Artificial Intelligence
uai2011
Active Inference and Uncertainty
Karl Friston
Abstract
In this presentation, I will rehearse the free-energy formulation of action and perception, with a special focus on the
representation of uncertainty: The free-energy principle is based upon the notion that both action and perception are trying to
minimise the surprise (prediction error) associated with sensory input. In this scheme, perception is the process of optimising
sensory predictions by adjusting internal brain states and connections; while action is regarded as an adaptive sampling of
sensory input to ensure it conforms to perceptual predictions (this is known as active inference). Both action and perception
rest on an optimum representation of uncertainty, which corresponds to the precision of prediction error. Neurobiologically, this
may be encoded by the postsynaptic gain of prediction error units. I hope to illustrate the plausibility of this framework using
simple simulations of cued, sequential, movements. Crucially, the predictions driving movements are based upon a
hierarchical generative model that infers the context in which movements are made. This means that we can temporarily
confuse agents by changing the context (order) in which cues are presented. These simulations provide a (Bayes-optimal)
simulation of contextual uncertainty and set-switching that can be characterised in terms of behaviour and electrophysiological
responses. Interestingly, one can lesion the encoding of precision (postsynaptic gain) to produce pathological behaviours that
are reminiscent of those seen in Parkinson's disease. I will use this as a toy example of how information theoretic approaches
to uncertainty may help understand action selection and set-switching
UNCERTAINTY, ansen seale (2005)
“Objects are always imagined as being present in the field of vision as
would have to be there in order to produce the same impression on
the nervous mechanism” - Hermann Ludwig Ferdinand von Helmholtz
Richard Gregory
Geoffrey Hinton
From the Helmholtz machine to the
Bayesian brain and self-organization
Thomas Bayes
Richard Feynman
Hermann Haken
Overview
Ensemble dynamics
Entropy and equilibria
Free-energy and surprise
The free-energy principle
Perception and generative models
Hierarchies and predictive coding
Perception
Birdsong and categorization
Simulated lesions and perceptual uncertainty
Action
Cued reaching and affordance
Simulated lesions and behavioral uncertainty
temperature
What is the difference between a
snowflake and a bird?
Phase-boundary
…a bird can act (to avoid surprises)
What is the difference between snowfall and a flock of birds?
Ensemble dynamics, clumping and swarming
…birds (biological agents) stay in the same place
They resist the second law of thermodynamics, which says that their entropy
should increase
But what is the entropy?
T
…entropy is just average surprise
H   dt L (t )    p( | m) ln p( | m)d
0
L   ln p( s | m)
s  g( )
A 
High surprise (I am never here)
Low surprise (we are usually here)
This means biological agents must self-organize to minimise surprise. In other words, to
ensure they occupy a limited number of states (cf homeostasis).
But there is a small problem… agents cannot measure their surprise
s  g( )
?
But they can measure their free-energy, which is always bigger than surprise
F (t )  L (t )
This means agents should minimize their free-energy. So what is free-energy?
What is free-energy?
…free-energy is basically prediction error
sensations – predictions
= prediction error
where small errors mean low surprise
More formally,
Sensations
s  g( )   ( s )
  arg min F ( s ,  )
  f ( , a)   ( )
External states
in the world

Action
Internal states of
the agent (m)
a  arg min F ( s ,  )
a
Free-energy is a function of sensations and a proposal density over hidden causes
F ( s ,  )  Energy  Entropy  Eq (G)  Eq (ln q)
and can be evaluated, given a generative model (Gibbs Energy) or likelihood and prior:
G( s , )   ln p( s , | m)   ln p( s | , m)  ln p( | m)
So what models might the brain use?
Hierarchal models in the brain
v (i 1)  g (i )   ( v ,i )
D x ( i )  f ( i )   ( x ,i )
v~ ( 2 )
lateral
 (2)
Backward
(modulatory)
~
x ( 2)
 ( 2)
v~ (1)
Forward
(driving)
 (1)
~
x (1)
 (1)
s
  {x(t ), v (t ), , }
So how do prediction errors change predictions?
sensory input
Forward connections convey feedback
Adjust hypotheses
Prediction errors
Predictions
prediction
Backward connections return predictions
…by hierarchical message passing in the brain
David Mumford
More formally,
Synaptic activity and message-passing
 ( v,i )   ( v,i ) ( v,i )   ( v,i ) (  ( v,i 1)  g (i ) )
 ( x ,i )   ( x ,i ) ( x ,i )   ( x , i ) ( D  ( x ,i )  f ( i ) )
Forward prediction error
 (v,1)
 (v,3)
 (v,2)
 ( x,2)
 ( x,1)
 (v,2)
 (v,1)
s(t )
 ( x,1)
Backward predictions
 ( v ,i )  D  ( v ,i )   v(i )T  (i )   ( v ,i 1)
 ( x ,i )  D  ( x ,i )   x( i )T  ( i )
cf., Predictive coding or Kalman-Bucy filtering
 ( x,2)
Summary
Biological agents resist the second law of thermodynamics
They must minimize their average surprise (entropy)
They minimize surprise by suppressing prediction error (free-energy)
Prediction error can be reduced by changing predictions (perception)
Prediction error can be reduced by changing sensations (action)
Perception entails recurrent message passing in the brain to optimise predictions
Predictions depend upon the precision of prediction errors
Overview
Ensemble dynamics
Entropy and equilibria
Free-energy and surprise
The free-energy principle
Perception and generative models
Hierarchies and predictive coding
Perception
Birdsong and categorization
Simulated lesions and perceptual uncertainty
Action
Cued reaching and affordance
Simulated lesions and behavioral uncertainty
Making bird songs with Lorenz attractors
 v1 
v 
v2 
hidden states
f (1)
Sonogram
0.5
1
1.5
time (sec)
causal states
18 x2(1)  18 x1(1)

 (1) (1)

 v1 x1  2 x3(1) x1(1)  x2(1) 
 2 x (1) x (1)  v (1) x (1)

2
3
 1 2

Syrinx
Frequency
Vocal centre
Predictive coding and
message passing
prediction and error
20
15
10
5
0
-5
 (v)
10
20
30
40
50
60
causal states
Backward predictions
20
 ( x)
stimulus
15
10
5000
5
4500
s(t )
4000
Forward prediction error

3500
3000
2000
0.2
0.4
0.6
time (seconds)
0.8
15
10
5
0
-5
10
20
30
40
50
0
( x)
-5
-10
hidden states
20
2500

(v)
60
10
20
30
40
50
60
Frequency (Hz)
Perceptual categorization
Song a
Song b
time (seconds)
1( v )
 2( v )
Song c
Hierarchical (itinerant) birdsong: sequences of sequences
Neuronal hierarchy
sonogram
Frequency (KHz)
Syrinx
v1(1)
v2(1)
0.5
1
1.5
Time (sec)
f (2)
18 x2(2)  18 x1(2)



 32 x1(2)  2 x3(2) x1(2)  x2(2) 
 2 x (2) x (2)  8 x (2)

3 3
 1 2

g
(2)
 x2(2)  v1(1) 
  (2)    (1) 
 x3  v2 
f (1)
18 x2(1)  18 x1(1)

 (1) (1)

 v1 x1  2 x3(1) x1(1)  x2(1) 
 2 x (1) x (1)  v (1) x (1)

2
3
 1 2

g
(1)
 x2(1)   s1 
  (1)    
 x3   s2 
Simulated lesions and false inference
percept
LFP
Frequency (Hz)
LFP (micro-volts)
60
40
20
0
-20
-40
no top-down messages
LFP (micro-volts)
Frequency (Hz)
no structural priors
LFP
60
40
20
0
-20
-40
-60
no lateral messages
LFP
Frequency (Hz)
no dynamical priors
LFP (micro-volts)
60
0.5
1
1.5
time (seconds)
40
20
0
-20
-40
-60
0
500
1000
1500
peristimulus time (ms)
2000
Overview
Ensemble dynamics
Entropy and equilibria
Free-energy and surprise
The free-energy principle
Perception and generative models
Hierarchies and predictive coding
Perception
Birdsong and categorization
Simulated lesions and perceptual uncertainty
Action
Cued reaching and affordance
Simulated lesions and behavioral uncertainty
Premotor cortex
affordance
Motor cortex
joint positions
Parietal cortex
finger location
 p( x,1)


( v ,1)
v
Anatol Feldman
 a( x ,1)
( x ,1)
p
 a( x ,1)
sv
 ( x ,2)
 ( x ,2)
 ( v ,2)
Prefrontal cortex
changes in set
 ( v ,1)
 a( v ,1)
Active inference
sa
Striatum
set selection
Superior colliculus
salience
 p( v,1)
sp
a
Motoneurones
Lotka-Volterra dynamics: winnerless competition
Misha Rabinovich
 s( x)
 1 1 1 1


 1 1 1 1 
 ( x, v)  A(v)(1  e )  x  1
x 1
s ( x) 
x
e
e xT 1
1
8
0
v
A (v )  
0

 v
v
0
0
v
v
0
0
v
v  0
0  1

v   1
 
0  1
1 1 1
0 1 1 
1 0 1

1 1 0
Motor cortex

Parietal cortex

Premotor cortex
( x ,1)
p

( v ,1)
v
 a( x ,1)
( x ,1)
p
 a( x ,1)
sv
Prefrontal cortex
 ( x ,2)
Striatum
 ( x ,2)
 ( v ,2)
 ( v ,1)
 a( v ,1)
Superior colliculus
Mesocortical DA projections
sa
Nigrostriatal DA projections
SN/VTA
Mesorhombencephalic pathway
 p( v,1)
sp
a
Motoneurones
Dopamine and precision
prediction and error
prediction and error
4
4
3
3
3
2
2
2
1
1
1
0
0
0
-1
-1
-1
-2
20
40
60
80
100
-2
120
20
40
80
100
-2
120
hidden causes
hidden causes
1.5
1
0.5
0.5
0.5
0
0
0
40
60
80
100
-0.5
120
20
40
60
80
100
-0.5
120
v)
 (1,
 exp(5.0)
a
-2
v)
 (1,
 exp(3.5)
a
-2
-1
-1
0
0
0
1
1
1
-1
0
1
2
2
-2
-1
0
60
80
100
120
40
60
80
100
120
time
-1
2
-2
20
time
time
-2
40
hidden causes
1.5
1
20
20
time
1
-0.5
prediction and error
time
time
1.5
60
4
1
2
2
-2
v)
 (1,
 exp(2.5)
a
-1
0
1
2
Uncertainty and perseveration
-2
-1
0
1
2
-2
-1
0
1
2
salience
proprioception
reaction times
550
reaction times
440
Low DA
500
affordance
reaction times
420
Low DA
420
High DA
400
400
400
milliseconds
milliseconds
milliseconds
380
450
380
360
360
340
340
350
300
High DA
1
2
3
4
5
6
7
8
High DA
320
300
9
1
2
3
4
cue onset (sec)
7
 a( x ,1)

 ( x ,2)
3
4
5
6
 ( x ,2)
 ( x ,2)
 ( x ,2)
 ( v ,2)
 ( v ,1)
 ( v ,1)
 a( v ,1)
Superior colliculus
sa
SN/VTA
9
 a( x ,1)
sv
 ( x ,2)
sa
8
 a( x ,1)
 (px ,1)
 ( v ,2)
 a( v ,1)
7
Premotor cortex
 p( x,1)
 v( v ,1)
 a( x ,1)
sv
 ( v ,1)
Superior colliculus
2
Motor cortex
 a( x ,1)
 ( x ,2)
SN/VTA
1
Premotor cortex
( x ,1)
p
 ( v ,2)
sa
300
9
cue onset (sec)
 (px ,1)
 v( v ,1)
 a( x ,1)
sv
 a( v ,1)
8
Motor cortex
Premotor cortex
( x ,1)
p
 (px ,1)
 v( v ,1)
6
cue onset (sec)
Motor cortex

5
Low DA
320
Superior colliculus
SN/VTA
Uncertainly, delusions and confusion
perseveration
confusion
Motor cortex

 a( x ,1)
 (px ,1)
 v( v ,1)
Motor cortex
Premotor cortex
( x ,1)
p
sv
 ( x ,2)
 a( x ,1)
 (px ,1)
 v( v ,1)
 a( x ,1)
Premotor cortex
 p( x,1)
 a( x ,1)
sv
 ( x ,2)
 ( x ,2)
 ( x ,2)
 ( v ,2)
 ( v ,2)
 ( v ,1)
 ( v ,1)
 a( v ,1)
 a( v ,1)
X
sa
Superior colliculus
X
sa
SN/VTA
Superior colliculus
SN/VTA
X
Thank you
And thanks to collaborators:
Rick Adams
Harriet Brown
Jean Daunizeau
Lee Harrison
Stefan Kiebel
James Kilner
Jérémie Mattout
Klaas Stephan
And colleagues:
Peter Dayan
Jörn Diedrichsen
Paul Verschure
Florentin Wörgötter
And many others