Operant conditioning

Download Report

Transcript Operant conditioning

Lecture 04 - main goals:
• Describe the essential difference between classical and operant conditioning
• Acquire an understanding of the neural basis of reward and it’s relation to the
dopaminergic system
• Understand how drugs of abuse might tap into the neural reward circuitry
•
Acquire an understanding of reward prediction errors
• Button push neurons in basal ganglia. Reverse replay
•Slide 1
Lecture 04 – Circuit Motifs
1. reinforcement learning – operant conditioning
( Basal Ganglia? )
Behavior 1
(push left button)
Behavior 2
(push right button)
•Slide 2
Lecture 04 – Circuit Motifs
Behavior 1
Behavior 2
Synaptic tagging
vs
Working Memory
•Slide 3
What is free will? Do we have it?
What about disorders of
decision-making?
What is free will? Do we have it?
Nao Uchida:
Neurobiology of Perception and Decision Making - MCB 145
Drugs of abuse
• cocaine
• amphetamine
• opiate (heroin)
• nicotine
• ethanol
• cannabinoids (marijuana)
• hallucinogens
• PCP
• These drugs cause “compulsive” drug-taking despite the knowledge of negative outcomes.
Operant Conditioning
Operant conditioning was the dominant school in American psychology from the 1930s
through the 1950s.
(Edward Thorndyke; Burrhus Frederic Skinner)
Where classical conditioning illustrates S-->R learning, operant conditioning is often
viewed as R-->S learning
•Slide 7
Law of effect
•Slide 8
Thorndyke’s puzzle box
• Placed hungry cat in box
• Cat can escape and eat if it
hits the foot peddle.
• Thorndike observed the
behaviours of the cat.
•Slide 9
Thorndyke’s observation
First trial
inside of
box
Scratch at bars
Dig at floors
Howl
Push at ceiling
Pace around
Hiss
Press Lever
•Slide 10
Thorndyke’s observation
Scratch at bars
Dig at floors
A few trials
later
Howl
Push at ceiling
Pace around
Hiss
Press Lever
•Slide 11
Thorndyke’s observation
Scratch at bars
Dig at floors
After many
trials in the
box
Howl
Push at ceiling
Pace around
Hiss
Press Lever
•Slide 12
Time required to escape
(seconds)
Thorndyke’s results
240
180
120
60
5
10
15
20
25
Successive trials in the puzzle box
• Law of Effect: Responses that produce a satisfying result are more likely to be
repeated in a similar situation, responses that produce a discomforting result are
less likely to reoccur in similar situations.
•Slide 13
Skinner’s operant conditioning
•Slide 14
Pigeon Movies
•Slide 15
Skinner’s operant conditioning
• Operant response: Behaviour that has an effect on the environment.
• Operant conditioning: Learning associated with the above behaviour.
• Reinforcer: A stimulus that increases the likelihood of a behaviour.
-> Thorndike’s ‘satisfaction” is mentalistic
• Problems with the puzzle box.
-> Animal can only make one correct response per trial.
•Slide 16
The Skinner box
• Animal can respond multiple times
• Operant response: Bar pressing
• Operant conditioning: Increased bar pressing when food is delivered
following the response.
• Shaping by successive approximations
•Slide 17
Pleasure Rats
•Slide 18
•Slide 19
Dopamine
Substantia nigra pars compacta (SNc)
Ventral tegmental area (VTA)
The mesocorticolimbic dopamine pathway
•The neurons of the VTA (ventral tegmental area) contain the neurotransmitter
dopamine which is released in the nucleus accumbens and in the prefrontal
cortex. This pathway is activated by a rewarding stimulus
•Slide 21
The mesocorticolimbic dopamine pathway
•Slide 22
Error in prediction drives learning
Prediction
Evaluation
Blocking Paradigm Showing that Learning Depends on Prediction Error Rather than Stimulus-Reward Pairing
Alone
•Slide 24
Dopamine neurons encode reward prediction during learning
•Slide 25
W. Schultz. Getting formal with dopamine and reward. Neuron 36:241, 2002.
Activity of Dopamine Neurons depends on Prediction Error ( = Surprise)
•Slide 26
Sustained activity correlates with uncertainty
•Mcb105 2003 4th
C. D. Fiorillo, et al. Discrete coding of
reward probability and uncertainty by
27
dopamine neurons. Science•Slide
299:18981902, 2003.
Sustained activity correlates with uncertainty…
•Risk-taking
•Gambling
•uncertain rewards are much more powerful – starting an old car
… and with importance
•Mcb105 2003 4th
•Slide 29
Different Temporal Operating Modes for Different Dopamine
Functions
•Mcb105 2003 4th
•Slide 30
papers
B. Brembs, F. D. Lorenzetti, F. D. Reyes, D. A. Baxter, and J. H. Byrne. Operant
reward learning in Aplysia: neuronal correlates and mechanisms. Science
296:1706-1709, 2002.
Jeremiah Y. Cohen, Sebastian Haesler, Linh Vong, Bradford B. Lowell &
Naoshige Uchida. Neuron-type-specific signals for reward and punishment in
the ventral tegmental area, Nature 482:85-88, 2012
•Mcb105 2003 4th
•Slide 31
Matlab Module:
Simple Model of Cue vs Reward responses during learning
•Slide 32
How do dopamine neurons compute error signals?
Cue
Reward
Output
Input
Cue
excitatory
Reward
inhibitory
Reward expectation
Cue
Reward expectation
Reward