Lecture 4 - TeachLine
Download
Report
Transcript Lecture 4 - TeachLine
Decision making
?
Probability in games of chance
Blaise Pascal
1623 - 1662
How much should I bet on ’20’?
E[gain] = Σgain(x) Pr(x)
Decisions under uncertainty
Maximize expected value
(Pascal)
Bets should be assessed
according to
p x gain x
x
Decisions under uncertainty
The value of an alternative is a monotonous function
of the
• Probability of reward
• Magnitude of reward
Do Classical Decision Variables
Influence Brain Activity in LIP?
LIP
Varying Movement Value
Platt and Glimcher 1999
What Influences LIP?
Related to Movement Desirability
• Value/Utility of Reward
• Probability of Reward
Varying Movement Probability
What Influences LIP?
Related to Movement Desirability
• Value/Utility of Reward
• Probability of Reward
Decisions under uncertainty
Neural activity in area LIP depends on:
• Probability of reward
• Magnitude of reward
Relative or absolute reward?
Dorris and Glimcher 2004
?
$X
$Y
$Z
$A $B
$C
$D
$E
Maximization of utility
Consider a set of alternatives X and a binary
relation on it, X X , interpreted as “preferred
at least as”.
Consider the following three axioms:
C1. Completeness: For every x, y X , x y or x
C2. Transitivity: For every
x, y, z X , x y and y z imply x
C3. Separability
y
z
Theorem: A binary relation can be represented by a
real-valued function if and only if it satisfies C1-C3
Under these conditions, the function u is unique up to
increasing transformation
(Cantor 1915)
A face utility function?
In there an explicit representation of
‘value’ of a choice in the brain?
Neurons in the orbitofrontal cortex encode value
Padoa-Schioppa and Assad, 2006
Examples of neurons encoding the chosen value
A neuron encoding the value of A
A neuron encoding the value of B
A neuron encoding the chosen juice taste
Encoding takes place at different times
post-offer (a, d, e, blue),
pre-juice (b, cyan),
post-juice (c, f, black)
How does the brain learn the values?
The computational problem
The goal is to maximize the sum of rewards
end
Vt E r
t
The computational problem
The value of the state S1 depends on the policy
If the animal chooses ‘right’ at S1,
V S1 R ice cream V S2
How to find the optimal policy in a
complicated world?
How to find the optimal policy in a
complicated world?
• If values of the different states are known
then this task is easy
V St rt V St 1
How to find the optimal policy in a
complicated world?
• If values of the different states are known
then this task is easy
How can the values of the different states
be learned?
V St rt V St 1
V(St) = the value of the state at time t
rt = the (average) reward delivered at time t
V(St+1) = the value of the state at time t+1
The TD (temporal difference) learning algorithm
V St V St t
where
t rt V St 1 V St
is the TD error.
Schultz, Dayan and Montague, Science, 1997
2
1
3
4
6
5
CS
7
8
Reward
Before trial 1:
V S1 V S2
V S9 0
In trial 1:
• no reward in states 1-7
t rt V St 1 V St 0
V St V St t 0
• reward of size 1 in states 8
t rt V S9 V S8 1
V S8 V St t
9
1
2
3
4
5
6
CS
7
8
9
Reward
Before trial 2:
V S1 V S2
V S7 V S9 0
V S8
In trial 2, for states 1-6
t rt V St 1 V St 0
V St V St t 0
For state 7,
t rt V St 1 V St
2
V S7 V S7 t
1
2
3
4
5
6
CS
7
8
9
Reward
Before trial 2:
For state 8,
V S1 V S2
V S7 V S9 0
V S8
t rt V St 1 V St 1
V S8 V S8 t 1 2
2
1
3
4
5
6
CS
7
8
9
Reward
Before trial 3:
V S1 V S2
V S6 V S9 0
V S7 2 V S8 2
In trial 2, for states 1-5
t rt V St 1 V St 0
V St V St t 0
For state 6,
t rt V St 1 V St 2
3
V S7 V S7 t
2
1
3
4
5
6
7
CS
Before trial 3:
9
8
Reward
V S1 V S2
V S6 V S9 0
V S7 2 V S8 2
For state 7,
t rt V St 1 V St 2 2 1
2
V S7 V S7 t 2 1 3 2
2
2
For state 8,
t rt V St 1 V St 1 2
V S V S 2 1 1 2
3
1
2
3
4
5
CS
After many trialsV
6
7
8
Reward
S1
V S8 1 V S9 0
t rt V St 1 V St 0
Except for the CS whose time is unknown
9
Schultz, 1998
“We found that these neurons encoded the difference between
the current reward and a weighted average of previous rewards,
a reward prediction error, but only for outcomes that were
better than expected”.
Bayer and Glimcher, 1998
Bayer and Glimcher, 1998