Gluck_OutlinePPT_Ch08 short

Download Report

Transcript Gluck_OutlinePPT_Ch08 short

Chapter 8
Instrumental
Conditioning:
Learning the
Consequences of
Behavior
8.1
Behavioral
Processes
8.1 Behavioral Processes
•
The “Discovery” of Instrumental Conditioning
•
Components of the Learned Association
•
Putting It All Together: Building the S–R–C
Association
•
Learning and Memory in Everyday Life—
The Problem with Punishment
•
Choice Behavior
3
The “Discovery” of
Instrumental Conditioning
Instrumental conditioning—developing a
contingency between response and outcome.
•
Organism learns to make responses
to obtain or avoid
important
consequences.
e.g., trained circus
animals, waterskiing
squirrels
AP/Wide World Photos
•
4
Free-Operant Learning
•
Operant conditioning—a type of
instrumental learning.
•
Skinner’s free-operant paradigm:
Replaces Thorndike’s discrete trials.
Learner can operate apparatus (e.g. Skinner
box) at will.
Learner’s trial-independent responses are
measured with a cumulative recorder.
5
Operant Conditioning
6
Free-Operant Learning
•
Reinforcement—consequences increase
behavior probability.
•
Punishment—consequences decrease
behavior probability.
7
Components of the
Learned Association
•
Three components to instrumental
conditioning:
Stimulus (S)
Response (R)
Consequence (C)
8
Stimulus
•
A discriminative stimulus is a cue, not a
US or CS.
A signal for when response will lead to
consequence.
Examples:
Starting whistle for racing swimmers
Potty seat for toilet-training toddler
•
Can increase behavior probability, but does
NOT elicit behavior.
9
Response: Shaping
•
Shaping—successive approximations to
the desired response are reinforced.
Collect baseline data on current behavior
(establish operant level).
Identify target behavior.
Reinforce successive approximations of the
target response.
10
Response: Shaping
•
Example: Helping autistic children learn
language (Ivar Lovaas, 1987).
Say target word (e.g., child’s name).
Reinforce with food any sound, then closer
imitations.
Introduce new words.
11
Response: Chaining
•
Chaining—learning a complicated
sequence of responses by adding one
discrete “link” (step) at a time.
Backward chaining—training steps in reverse
order.
•
Examples:
Teaching pets unusual tricks.
Teaching workers a sequential manufacturing
process.
12
Skinner
•
http://www.youtube.com/watch?v=mm5FGr
QEyBY
13
Consequence:
Primary Reinforcers
•
Reinforcer—behavioral consequence that
makes future behavior more likely.
•
Primary reinforcers:
Reinforcing events that occur because of their
natural characteristics and inherent ability to
reinforce behavior (drive reduction theory).
Examples:
Food or Water
Sleep
Sex
14
Consequence:
Secondary Reinforcers
•
Secondary (conditioned) reinforcers:
Reinforcing events that function as reinforcers
because they are consistently associated with
one or more primary reinforcers.
Example: Money
No biological imperative.
Can be exchanged for primary reinforcers (e.g., food
or shelter).
15
Consequence: Punishers
•
Punishers:
Behavioral consequence leads to a reduction of
future behavior.
•
Strong and enduring aversive stimuli are
the most effective suppressors.
Aversive stimuli of low intensity may reinforce
behavior we intend to suppress!
Apply aversive stimuli immediately after the
targeted behavior.
Delaying punisher decreases contingency.
16
The Negative Contrast Effect
Data from Kobre and Lipsitt, 1972.
17
Learning and Memory in Everyday Life—
The Problem with Punishment
•
Use of corporal punishment is controversial.
•
Alternatives:
Scolding
Time-out
Grounding
Withholding allowance
•
Avoid attention for punished behavior.
•
Reinforce appropriate behavior.
18
Putting It All Together:
Building the S–R–C Association
•
Timing affects learning:
Immediate consequence = best learning
Instrumental conditioning faster if R–C interval
is short (temporal contiguity).
•
Timing can also impact:
Punishment
Immediate punishment more effective than delayed
punishment.
Self-control
Forego immediate reward for greater future reward.
19
8.1 Interim Summary
•
Instrumental conditioning = learning a threeway association (S → R → C) between:
Discriminative stimulus (S)
Response (R)
Consequence (C)
C may be reinforcer or punisher.
•
In instrumental conditioning, C occurs only if
R is made; whereas,
In classical conditioning, the consequence (US)
occurs automatically after the stimulus (CS).
20
8.1 Interim Summary
•
Four classes of instrumental conditioning:
Positive reinforcement
Negative reinforcement
Positive punishment
Negative punishment.
•
“Negative” and “positive” show if
consequence is subtracted or added.
•
“Reinforcement” and “punishment” show
response increase or decrease with learning.
21
8.1 Interim Summary
•
Operant conditioning: subclass of
instrumental conditioning
Organism responds at its own rate.
•
Complex responses may be trained by:
Shaping
Reinforcement of progressive approximations.
Chaining
Training a sequence of responses, one step at a time.
22
8.2
Brain
Substrates
8.2 Brain Substrates
•
The Basal Ganglia (BG) and Instrumental
Conditioning
•
Mechanisms of Reinforcement in the Brain
24
BG and
Instrumental Conditioning
•
BG help connect information from the
sensory and motor cortices to make a
behavioral response.
•
BG may serve as storage for S–R
associations (especially those in which R is
a movement).
•
With BG lesions (in dorsolateral striatum):
Rats learned to lever-press for food.
But showed impaired discriminative S training.
25
Basal
Ganglia
26
Reinforcement in the Brain:
This figure shows that instrumental learning may involve the
interaction of several neural systems.
27
Electrical Brain Stimulation
•
One of the “pleasure centers” is the ventral
tegmental area (VTA) in the brainstem.
The VTA is the center for dopamine
neuromodulation.
•
VTA stimulation = powerful reinforcement
28
Electrical Brain Stimulation:
Brain stimulation may directly activate the brain's “reinforcement”
system, eliminating the need for natural reinforcers (e.g., food).
Stimulus S
(Sight of lever)
Visual System
(e.g. visual
cortex)
Motor System
(e.g. basal
ganglia)
Reinforcement
System
Taste System
(e.g.brainstem
gustatory
nuclei)
Response R
(Press lever)
Consequence C:
Electrical Brain
Stimulation
Hungry?
29
Dopamine and Reinforcement
•
Some VTA axons extend to the nucleus
accumbens in BG.
Nucleus accumbens sends dopamine to motor
areas in the striatum.
•
Dopamine may be the physiological basis
for the “wanting” aspect of reinforcement.
“Motivation” or “wanting” in chemical form
May contribute to addictive behavior.
30
Reward Prediction by
Dopamine Neurons
•
Schultz (2002) trained monkeys to press a
lever for food.
•
Electrophysiological recordings indicate
that dopamine neurons in a monkey’s
midbrain signal reward (or omission of
reward).
31
Reward Prediction by
Dopamine Neurons
•
In study:
Dopamine neurons in a monkey’s midbrain
respond strongly after unexpected rewards.
If light occurs before food, dopamine neurons
increase activation after light, but not after food.
Dopamine neurons decrease activity after an
expected reward does NOT occur (omission).
•
Illustrates reward prediction hypothesis
i.e., dopamine is involved in predicting future
reward.
32
(A) Unexpected
juice reward:
(B) Reward is
predicted by light
stimulus:
(C) Predicted
reward is omitted:
Adapted from Schultz, 2002.
33
Opioids and
Hedonic (Liking) Value
•
Endogenous opioids (endorphins) may
mediate “liking.”
•
Opiates (heroin, morphine) bind to the
brain’s natural opiate receptors.
•
Opiates may provide information about
“liking” that helps stimulate VTA’s “wanting”
system.
34
8.2 Interim Summary
•
In the brain, instrumental S-R-C associations
may be stored in corticocortical connections
and via basal ganglia.
•
Brain’s reinforcement system may include
release of dopamine from ventral tegmental
area to basal ganglia.
•
Drugs that interfere with the dopamine
system disrupt instrumental conditioning.
35
8.2 Interim Summary
•
Several hypotheses on interaction of
dopamine and reinforcement.
Anhedonia hypothesis:
Dopamine gives reinforcers their “goodness.”
Incentive salience hypothesis:
Dopamine modulates “wanting” rather than “liking”
(how hard an organism is willing to work for
reinforcement).
Reward prediction hypothesis:
Dopamine signals whether reinforcement is expected.
36
8.2 Interim Summary
•
Whereas dopamine may be involved in
“wanting,” endogenous opioids may be
involved in “liking.”
Drugs that affect brain opiate receptors affect
hedonic (“goodness”) value of primary reinforcers
and punishers (e.g., food and pain).
37
8.3
Clinical
Perspectives
8.3 Clinical Perspectives
•
Drug Addiction
•
Behavioral Addiction
•
Treatments
39
Drug Addiction
•
Pathological addiction—a strong habit
maintained despite harmful consequences.
Involves craving a high “euphoria” and avoiding
withdrawal.
Seeking pleasure involves positive reinforcement.
Avoiding pain involves negative reinforcement.
•
As indicated by the incentive salience
hypothesis, dopamine is involved in
“wanting” a drug.
40
Effects of Drugs on
Dopaminergic Neurons
41
Behavioral Addiction
•
Behavioral addiction—addiction to certain
behaviors, rather than drugs.
•
Examples:
Compulsive gambling,
eating, sex, Internet use,
shopping, exercise, work
Everynight Images/Alamy
Produces euphoria.
Understanding drug addiction may
help understand/treat
behavioral addictions.
42
Behavioral Addiction
http://www.youtube.com/watch?v=Bz2VT5Ky
7Kw
Everynight Images/Alamy
•
43
Treatments
•
Naltrexone (drug) treatment:
Indirectly inhibits dopamine production; may help
treat heroin addicts and compulsive gamblers.
•
(Cognitive) behavior therapies:
e.g., extinction, distancing, reinforcement of
alternative behaviors, delayed reinforcement
Based on instrumental conditioning principles.
44
8.3 Interim Summary
•
Addictive drugs (e.g., heroin, caffeine) may
hijack brain’s reinforcement system.
May be psychological as well as physiological
addiction.
•
Behavioral addictions may reflect same brain
processes as drug addictions.
45
8.3 Interim Summary
•
Treatment for people with addictions may
include:
Cognitive therapy
Medication
Behavioral therapy
Including principles learned from instrumental
conditioning.
46