4 - smw15.org

Download Report

Transcript 4 - smw15.org

PSY 445: Learning & Memory
Chapter 4:
Instrumental Conditioning: Reward
INSTRUMENTAL CONDITIONING
E. L. Thorndike (1905)
• Described the learning that was
governed by his "law of effect" as
instrumental conditioning because
responses are strengthened when they
are instrumental in producing rewards
• Law of Effect
• Responses that are rewarded are
more likely to be repeated and
responses that produce discomfort
are less likely to be repeated
“Rewarded behaviors are more likely to recur”
E. L. Thorndike
(1874-1949)
TRIAL-AND-ERROR LEARNING
Thorndike's Puzzle Box
• In his classic experiment, a cat was locked in the
box and enticed to escape by using food that was
placed out of the reach from the box
• Ropes, levers, and latches that the cat could use
to escape
• Trial and error behavior would lead to ultimate
success (usually within three minutes)
• Thorndike felt we learned trial and error through
awareness
GESTALT VIEWPOINT
Wolfgang Kohler
• A Gestalt psychologist had an opposing
view is that we learn things implicitly –
unawareness – natural insight
• Example: gorilla in a cage – food out of
reach – but stick is not…
INSTRUMENTAL CONDITIONING
Operant Conditioning
• A type of learning in which voluntary
(controllable and non-reflexive) behavior is
strengthened if it is reinforced and
weakened if it is punished (or not
reinforced)
B.F Skinner 
SKINNER’S OPERANT CONDITIONING
The organism learns a response by
operating on the environment…
Note:
• The terms instrumental conditioning and operant
conditioning describe essentially the same learning process
and are often used interchangeably
• Basically, Skinner extended and formalized many of
Thorndike's ideas
CLASSICAL CONDITIONING VS.
INSTRUMENTAL CONDITIONING
Instrumental Conditioning: Response comes first and
is voluntary unlike classical where stimulus comes
first and response is involuntary
Classical: S  R
Operant: S  R  S that becomes R  S
METHODS OF STUDY
Skinner Box
• Key-pecking of round plexiglass disk
placed at eye level
Mazes
• T-shaped; straight runway
Infants
• Head turning; Leg movements 
Computer games
Skinner Box 
POSITIVE REINFORCEMENT
Behavior is strengthened when something
pleasant or desirable occurs following the
behavior
• With the use of positive reinforcement chances
that the behavior will occur in the future is
increased
REINFORCEMENT VARIABLES
AFFECTING ACQUISITION
Amount /Quality of Reinforcement
• Contrast Effect
• Previous experience with the reward matters
• Qualitatively
• Kobre & Lipsitt (1972)
• Infants: Water  Sucrose  Water (less)
• Quantitatively
• Crespi (1942): rats running for pellets
See next slide 
QUANTITY, QUALITY, & CONTRASTS OF
REINFORCEMENT
Amount of Reinforcement Effect/Contrast Effect
Crespi (1942)
Procedure
Rats running for pellets
Group 1: Initial small reward; then switched from small reward to
larger reward
Group 2: initial large reward; then switched from large reward to
smaller reward
Group 3: served as a control (no change in reward)
Results
Initially, Group 2 > Group 1
Group 1: Started running faster (positive contrast)
Group 2: Started running slower (negative contrast)
See next slide 
Running Speed (ft/sec)
CRESPI (1942)
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
<---------- Preshift -------------> Postshift ------------->
256-16 Pellets
16-16 Pellets
1 - 16 Pellets
2 4 6 8 10 12 14 16 18 20 2 4 6 8
Trials
REINFORCEMENT VARIABLES
AFFECTING ACQUISITION
Drive
• Motivational need or desire for the reward
Raymond (1954)
Procedure
• Deprivation experiment in which rats were deprived of food
Results
• The rats deprived longer, ran faster
Deprivation experiment 
REINFORCEMENT VARIABLES
AFFECTING ACQUISITION
Drive
Hull (1949)
Response strength = H x D x K
Rats will run fastest when:
H: high prior reinforcement (habit strength)
D: the rat is hungry (in deprivation state; high drive state)
K: the reinforcer is appealing (incentive)
SCHEDULES OF REINFORCEMENT
A schedule of reinforcement is the response
requirement that must be met in order to obtain
reinforcement
Different schedules
• Continuous: usually better for acquisition
• Partial (intermittent): less extinction
SCHEDULES OF REINFORCEMENT
Stevenson & Zigler (1958)
Procedure
• Children performed a push-button task; one of three buttons
produced the reward
• 3 groups: 100% reward; 66% reward; 33% reward
Results
• Children had highest frequency of pressing the correct
button on the continuous schedule
• Children on partial schedules tried patterns that involved all
three buttons; many errors
Interpretation
• Partial schedule interfered with learning of the responsereward contingency
PARTIAL SCHEDULES
Ratio Schedule
• When you want to reinforce based on a certain
number of responses occurring
Interval Schedule
• When you want to reinforce the first response after
a certain amount of time has passed
FOUR TYPES OF PARTIAL SCHEDULES
Ratio Schedules
Interval Schedules
• Fixed Ratio
• Fixed Interval
• Variable Ratio
• Variable Interval
FIXED RATIO SCHEDULE
On a fixed ratio schedule, reinforcement is
contingent upon a fixed, predictable number of
responses
Characteristic pattern:
• High rate of response
• Short pause following each reinforcer
FIXED RATIO SCHEDULE
Higher Ratio requirements result in longer postreinforcement pauses
• Example: The longer the chapter you read, the
longer the study break!
Ratio Strain – a disruption in responding due to
an overly demanding response requirement
• Movement from “dense/rich” to “lean” schedule
should be done gradually
FIXED INTERVAL SCHEDULES
On a fixed interval schedule, reinforcement is
contingent upon the first response after a fixed,
predictable period of time
Characteristic pattern:
• Pattern includes the response/reward then a postreinforcement pause followed by a gradually increasing
rate of response as the time interval draws to a close
VARIABLE RATIO SCHEDULE
On a variable ratio schedule, reinforcement is contingent
upon a varying, unpredictable number of responses
Characteristic pattern:
• High and steady rate of response
• Little or no post-reinforcer pausing
• Telemarketing is an example of a behavior on this type of
schedule
Casino slot machines 
VARIABLE INTERVAL SCHEDULE
On a variable interval schedule, reinforcement is
contingent upon the first response after a varying,
unpredictable period of time
Characteristic pattern:
• A moderate, steady rate of response with little or no
post-reinforcement pause
DELAY OF REINFORCEMENT
A delay in reinforcement is usually less effective than
when reinforcement is given immediately after a correct
response
Problems
• Other behaviors may occur during the delay that may
unintentionally become conditioned
• Response has been forgotten
DELAY OF REINFORCEMENT
Lieberman, McIntosh, & Thomas (1979)
Procedure
• T-shaped maze: rats rewarded for a correct turn; no reward
for incorrect turn
• However, before reward/no reward they were put in a delay
box
Results
• Rats were slow to learn which turn was correct
Interpretation
• Difficulty remembering
Did I make the right turn?
DELAY OF REINFORCEMENT
Self-Control: The capacity to inhibit immediate
gratification
• Choice between small immediate reward vs. delayed large
reward
• Too long is not good; too small is not good
• Gradual increases can work
Logue, Forzano, & Ackerman (1996)
• Experiment on children: age matters as 3-year-olds were
more likely to choose a smaller’ immediate reward than
were 5-year-olds
DELAY OF REINFORCEMENT
Self-Control
• Food is a tough one to wait for
• Even adults have trouble waiting for the bigger reward
REINFORCERS
Primary Reinforcers
• Innately rewarding; no learning necessary
• Stimulus that naturally strengthens any response
(increases behavior) that precedes it without the need for
any learning on the part of the organism
• Food, water, etc.
Secondary Reinforcers
• A consequence that is learned by pairing with a primary
reinforcer and thus increases behavior
• For people, money, good grades, and words of praise, etc.
are often linked to basic rewards
• We need money to buy food, etc.
SECONDARY REINFOREMENT
Social Reinforcers
• Praise, attention, physical contact, facial
expressions given by parents, teachers, or
peers can exert considerable control over our
behavior
THEORIES OF REINFORCEMENT
Reinforcers as stimuli
• Drive Reduction
• Incentive Motivation
• Brain Stimulation
DRIVE REDUCTION THEORY
(HULL, 1943)
Supporters of this theory believe that when a
need requires satisfaction, it produces drives
• These are tensions that energize behavior in
order to satisfy a need
• Thirst and hunger are, for instance, drives for
satisfying the needs of eating and drinking,
respectively
DRIVE REDUCTION THEORY
Drives have been generally established as primary and
secondary…
• Primary drives satisfy biological needs and must be fulfilled
in order to survive
• Homeostasis is the motivational phenomenon for primary
drives that preserves our internal equilibrium. This is true,
for example, for hunger or thirst
• Secondary drives satisfy needs that are not crucial to a
person's life
Critics felt that this theory was inadequate in explaining
secondary drives
INCENTIVE MOTIVATION
Sometimes, we just do things because they are
FUN!
When this happens, we can say that motivation is
coming from some property of the reinforcer itself
rather than from some kind of internal drive
• Examples include playing games and sports,
putting spices on food, etc.
INCENTIVE THEORY
Suggests that people act to obtain positive
incentives and avoid negative incentive
• Explains secondary drives much better than
drive-reduction theory
BRAIN STIMULATION
Underlying physiological basis of reinforcement
• Possible part of the brain that is activated by stimuli that
work as reinforcers
Olds & Milner (1954)
• Stimulation of reticular formation in the rat’s brain was
reinforcing
Blum et al. (1996)
• Found some evidence that a genetic anomaly is
associated with a reward-craving syndrome
• People with a certain version of this gene become easily
addicted to compulsive behaviors (smoking, gambling)
REINFORCERS AS BEHAVIORS
Rather than characterizing reinforcers as stimuli, they can
be viewed as activities and behaviors
• This view clearly expands the category of reinforcers
Premack Principle (1962)
• The idea that behaviors can be ranked in terms of their
preference or value to an individual
• Once this is determined, a more probable activity can be
used to reinforce a less probable one
Limitations
• Unrestricted opportunity afforded person to engage in the
activity is necessary to determine baseline frequency of
occurrence
• Behaviors will vary over time (deprivation, satiation)
REINFORCERS AS BEHAVIORS
Homme et al. (1963)
• Unruly preschoolers
High probability behaviors
• Ignored teacher
• Screaming
• Pushing furniture
Low probability behavior
• Sitting quietly
Premack Principle
HOMME ET AL. (1963)
Rewarded sitting quietly with...
• 3 min of running around screaming
Results
• Sitting quietly increased
Particular behaviors observed by different kids
• Different responses effective reinforcers for
different kids
Premack Principle
REINFORCERS AS STRENGTHENERS
A reinforcer can strengthen the association
between a discriminative stimulus and an
instrumental response
Light  Bar Press  Food
Food reinforcer strengthens the association
between the light and bar press
• Question remains as to whether rats are doing
this because of reward (food) or because food
strengthened the association between light and
bar press?
REINFORCERS AS STRENGTHENERS
Huston, Mondadori, & Waser (1974)
• Mice on platform in Skinner Box; natural reaction is to step
off
• Group 1: Step off platform  shock
• Group 2: Step off platform  shock  food
• If food is reward then Group 2 should step off again
OR
• If food is strengthener it will cause stronger link between
stepping off platform and shock
• Results?
REINFORCERS AS INFORMATION
No obvious reinforcer
• Information may be positive (“Yes, I got it right”) or
negative (“no, I messed up”)
IS REINFORCEMENT NECESSARY?
Tolman & Honzik (1930)
Exp. 1: we discussed in chapter 1; latent learning among rats
not immediately reinforced
Exp. 2
Procedure
• Group 1: Reinforced every time they found their way out of
the maze (food in goal box) for 10 days; on day 11 no food
in the goal box
Results
• Rats started taking wrong turns
Interpretation
• Taking reinforcement away leads to confusion
AWARENESS IN HUMAN
INSTRUMENTAL LEARNING
Subliminal messages were thought to be so effective
that US congress passed laws prohibiting these
commercial messages in movies
New Jersey (1957) movie theatre
• Eat popcorn
• Drink Coca-Cola
AWARENESS IN HUMAN
INSTRUMENTAL LEARNING
Subliminal messages were thought to be so effective
that US congress passed laws prohibiting these
commercial messages in movies
New Jersey (1957) movie theatre
• Eat popcorn
• Drink Coca-Cola
AWARENESS IN HUMAN
INSTRUMENTAL LEARNING
Click on picture for
video clip 
AWARENESS IN HUMAN
INSTRUMENTAL LEARNING
Greenspoon (1955)
• Researcher would mutter “umm humm” whenever a plural
noun was emitted
• While subjects were not told of this contingency they
nevertheless began to use more plural nouns throughout
the course of the experiment
• Researcher’s conclusions: Verbal conditioning took place
without awareness
Dulany (1968)
• Replication suggests different conclusion
CRITICISMS OF THE USE OF
REINFORCEMENT
1. Manipulative form of control
2. Certain behaviors should be performed without
rewards
3. Reinforcement produces transient changes
4. Intrinsic motivation is undermined by rewards
• Internally motivated desire to perform a behavior
for its own sake may be lessened
DOES REINFORCEMENT UNDERMINE
INTRINSIC MOTIVATION?
Lepper, Greene, and Nisbett (1973)
Baseline observations:
• 51 3-5 yr olds who showed intrinsic interest in a
target activity
Procedure
• Expected-Award condition (reinforcement)
• Unexpected-Award condition (reinforcement)
• No-Award condition (no reinforcement)
Results
• Those reinforced colored less than those not
reinforced
DOES REINFORCEMENT UNDERMINE
INTRINSIC MOTIVATION?
Lepper, Greene, and Nisbett (1973)
Interpretation
•
Overjustification
Limitations
• Children in the experimental groups that were rewarded for
drawing may have become satiated on the activity
RESPONSE LEARNING
Shaping
• Training a behavior not in an organism’s behavior
repertoire
• Reinforcing successive approximations is usually part of
this process (see next slide)
• Skinner taught pigeons “unpigeon-like” behaviors
• Others have trained monkeys to help quadriplegics
RESPONSE LEARNING
Chaining
• Here, the response being reinforced is an entire sequence
of behaviors
• Explanations
1. Each response also acts as the discriminative stimulus
for the next response in the series
2. Each response acts as a secondary reinforcer for the
previous response
Note
• Both forward-chaining and backward-chaining appear to
work equally well
RESPONSE LEARNING
Limitations
1. Some reflex responses cannot be modified
2. Species-specific limitations
Breland & Breland (1961)
• Tried to teach pigs to put wooden coins in a piggy bank by
offering food reward
• Researchers were unsuccessful; natural instincts won out
over reinforcement
RESPONSE LEARNING
Limitations
3. Evolutionary preparedness interference
4. Behavior system interference
DISCRIMINATIVE STIMULUS CONTROL
Discriminative Stimuli (SD )
• Stimuli that signals when (or where) reinforcement is
available
Stimulus Control
•
The probability of the behavior varies depending upon the
stimuli present
•
The response is brought under control of the stimulus
Talwar et al. (2002)
• Remote-control rat
Click on image for news report 
STIMULUS CONTROL:
GENERALIZATION
Generalization is when responses to one stimulus occur to
other, usually similar stimuli
Generally, as the training and test stimuli become more
different responding will decline, producing what is called
a generalization gradient
Guttman & Kalish (1956)
• Pigeons were reinforced for pecking a 580 nm lit key
(orange-yellow) on a VI schedule
• A test session was then given where many different
colored key lights were presented in extinction
See next slide 
STIMULUS GENERALIZATION AS A
MEASURE OF STIMULUS CONTROL
Responses
400
350
300
250
200
150
Training SD
100
50
0
500
520
540
560
580
600
620
640
Wavelength (nanometers)
Pigeons were trained to peck in the presence of a colored light of 580 nm
wavelength and then tested in the presence of other colors.
Guttman & Kalish (1956)
STIMULUS CONTROL:
DISCRIMINATION
Discrimination training involves presenting at least 2
stimuli but reinforcing only one of them
• Discrimination is differential responding to multiple stimuli
• Responses are reinforced in the presence of SD, but these
responses are not reinforced in the presence of S∆, a stimulus that
signals the absence of reinforcement
Limitations
• Non-reinforced responses can produce negative reactions
(frustration, agitation)
• S∆ may become aversive through association with these negative
emotions
WHAT IS LEARNED IN
INSTRUMENTAL CONDITIONING
Response-Reinforcer Learning
• Organism performs the response to get reward
Stimulus-Response Learning
• Connection is learned between SD and the response
• Reinforcer acts to condition this association but is not part of the
learned sequence
• Singh (1970): “free rewards”
Stimulus-Reinforcer Learning
• Classical conditioning can occur
• Typical sequence in a instrumental trial is:
SD  Response  Reinforcement
HABITS
Habit Slips
• Intrusion of a habit when an alternative behavior had been
intended
• SD can evoke an instrumental response even though
changed conditions suggest that different response is
currently more appropriate
Breaking Habits
• Correct a habit when the eliciting stimuli (SD ) is otherwise
absent or disrupted
BEHAVIOR MODIFICATION
Successful programs follow rules of instrumental conditioning
• Punishment
• Ayllon (1963): stealing food in cafeteria example
• Eliminate the reinforcer
• Ayllon (1963): towel example
• Increase rewards
• Token economy (secondary reinforcer)
BEHAVIORAL ECONOMICS
Loss Aversion: Irrational actions related to being more sensitive to
potential losses than to potential equivalent gains
Chen, Lakshminarayanan, & Santos (2006)
Procedure
• Monkeys offered slices of apple in exchange for a token
• Two options:
• Person 1: showed two slices and sometimes gave one or both
• Person 2: showed one slice but sometimes gave a second one
• Each person gave the exact number of slices to the monkeys
Results
• Monkeys began to avoid the person showing the two slices
Interpretation
• It seems like a loss when you appear to be offered two slices and only
get one as compared to when you see one slice and get it
BEHAVIORAL ECONOMICS
The Goal Gradient Hypothesis: The effect of a reward is
weaker the further away the behavior is from the reward
Kivetz, Urminsky, & Zheng (2006)
Procedure
• Reward program punch cards at campus coffee shop
• Experiment 1: regular punch cards (10 holes)
• Experiment 2: one group of students got regular punch cards
while a second groups got “illusory” punch cards with (12 holes
but 2 already punched out)
Results:
• In Exp. 1, responses increased as students got closer to free
coffee; in Exp. 2 students with the “illusory” punch cards
responded quicker than the others
CREDITS
Some slides prepared with the help of the following website:
• www.radford.edu/~pjackson/ExtinctIC.ppt