variable-ratio schedule

Download Report

Transcript variable-ratio schedule

Concepts of Conditioning
DR DINESH RAMOO
Classical Conditioning
CONCEPTS
Explanations of Classical Conditioning
 What
really?
is classical conditioning,
 As is often the case, the process
appeared simple at first, but later
investigation found it to be a more
complex and more interesting
phenomenon.
 Pavlov
noted that conditioning
depended on the timing between
CS and UCS
Pavlov’s Hypothesis
 Pavlov surmised that presenting the
CS and UCS at nearly the same time
caused a connection to grow in the
brain so that the animal treated the
CS as if it were the UCS.
 The figure illustrates the connections
before the start of training:
 The UCS excites a UCS centre in the
brain, which immediately stimulates
the UCR centre.
Pavlov’s Hypothesis
 The figure illustrates connections
that develop during conditioning:
Pairing the CS and UCS develops a
connection between their brain
representations.
 After this connection develops, the
CS excites the CS centre, which
excites the UCS centre, which
excites the UCR centre and
produces a response.
 Later studies contradicted that idea. For example, a shock (UCS) causes rats
to jump and shriek, but a conditioned stimulus paired with shock makes them
freeze in position.
 They react to the conditioned stimulus as a danger signal, not as if they felt a
shock. Also, in delay conditioning, where a delay separates the end of the CS
from the start of the UCS, the animal does not make a conditioned response
immediately after the conditioned stimulus but instead waits until almost the
end of the usual delay between the CS and the UCS.
 Again, it is not treating the CS as if it were the UCS; it is using it as a
predictor, a way to prepare for the UCS (Gallistel & Gibbon, 2000).
 It is true, as Pavlov suggested, that the longer the delay between the CS
and the UCS, the weaker the conditioning, other things being equal.
 However, just having the CS and UCS close together in time is not
enough.
 It is essential that they occur more often together than they occur apart.
That is, there must be some contingency or predictability between them.
 Consider this experiment: For rats in both Group 1 and Group 2,
every presentation of a CS is followed by a UCS, as shown in Figure
6.9. However, for Group 2, the UCS also appears at many other times,
without the CS. In other words, for this group, the UCS happens
every few seconds anyway, and it isn’t much more likely with the CS
than without it. Group 1 learns a strong response to the CS; Group 2
does not (Rescorla, 1968, 1988).
 Now consider this experiment: One group of rats receives a light (CS)
followed by shock (UCS) until they respond consistently to the light. (The
response is to freeze in place.)
 Then they get a series of trials with both a light and a tone, again followed by
shock. Do they learn a response to the tone? No. The tone always precedes the
shock, but the light already predicted the shock, and the tone adds nothing
new. The same pattern occurs with the reverse order:
 First rats learn a response to the tone and then they get light–tone
combinations before the shock. They continue responding to the tone, but not
to the light, again because the new stimulus predicted nothing that wasn’t
already predicted (Kamin, 1969).
 These results demonstrate the blocking effect: The previously established
association to one stimulus blocks the formation of an association to the
added stimulus.
 Again, it appears that conditioning depends on more than presenting two
stimuli together in time.
 Learning occurs only when one stimulus predicts another.
 Later research has found that presenting two or more stimuli at a time often
produces complex results that we would not have predicted from the results
of single-stimulus experiments (Urushihara, Stout, & Miller, 2004).
Operant Conditioning
Shaping Behaviour
 Suppose you want to train a rat to press a lever. If you put the rat in a box and wait,
the rat might never press it.
 To avoid interminable waits, Skinner introduced a powerful technique, called
shaping, for establishing
approximations to it.
a
new
response
by
reinforcing
successive
 To shape a rat to press a lever, you might begin by reinforcing the rat for standing
up, a common behaviour in rats. After a few reinforcements, the rat stands up more
frequently.
 Now you change the rules, giving food only when the rat stands up while facing the
lever. Soon it spends more time standing up and facing the lever. (It extinguishes its
behaviour of standing and facing in other directions because those responses are not
reinforced).
 Next you provide reinforcement only when the rat stands facing the correct
direction while in the half of the cage nearer the lever.
 You gradually move the boundary, and the rat moves closer to the lever. Then
the rat must touch the lever and, finally, apply weight to it.
 Through a series of short, easy steps, you shape the rat to press a lever.
Shaping works with humans too, of course.
 All of education is based on the idea of shaping: First, your parents or
teachers praise you for counting your fingers; later, you must add and
subtract to earn their congratulations; step by step your tasks become more
complex until you are doing calculus.
Chaining Behaviour
 Ordinarily, you don’t do just one action and then stop. You do a long sequence
of actions.
 To produce sequences of learned behaviour, psychologists use a procedure
called chaining.
 Assume you want to train an animal, perhaps a guide dog or a show horse, to
go through a sequence of actions in a particular order.
 You could chain the behaviours, reinforcing each one with the opportunity to
engage in the next one. First, the animal learns the final behaviour for a
reinforcement. Then it learns the next to last behaviour, which is reinforced
by the opportunity to perform the final behaviour. And so on.
 For example, a rat might first be placed on the top
platform as shown in figure f, where it eats food.
 Then it is put on the intermediate platform with a
ladder in place leading to the top platform.
 The rat learns to climb the ladder. After it has done
so, it is placed again on the intermediate platform,
but this time the ladder is not present. It must learn
to pull a string to raise the ladder so that it can climb
to the top platform.
 Then the rat is placed on the bottom platform (figure
a).
 It now has to learn to climb the ladder to the
intermediate platform, pull a string to raise the
ladder, and then climb the ladder again.
 We could, of course, extend the chain still further.
Each behaviour is reinforced with the opportunity for
the next behaviour, except for the final behaviour,
which is reinforced with food.
 People learn to make chains of responses too.
 First, you learned to eat with a fork and spoon.
 Later, you learned to put your own food on the plate before eating.
 Eventually, you learned to plan a menu, go to the store, buy the ingredients,
cook the meal, put it on the plate, and then eat it.
 Each behaviour is reinforced by the opportunity to engage in the next
behaviour.
 To show how effective shaping and chaining can be, Skinner performed this
demonstration: First, he trained a rat to go to the centre of a cage.
 Then he trained it to do so only when he was playing a certain piece of music.
 Next he trained it to wait for the music, go to the centre of the cage, and sit up on its
hind legs.
 Step by step he eventually trained the rat to wait for the music (which happened to
be the “Star-Spangled Banner”), move to the centre of the cage, sit up on its hind
legs, put its claws on a string next to a pole, pull the string to hoist the U.S. flag, and
then salute it.
 Only then did the rat get its reinforcement. Needless to say, a display of patriotism is
not part of a rat’s usual repertoire of behaviour.
Schedules of Reinforcement

The simplest procedure in operant conditioning is to
provide reinforcement for every correct response, a
procedure known as continuous reinforcement.

However, in the real world, unlike the laboratory,
continuous reinforcement is not common. Reinforcement
for some responses and not for others is known as
intermittent reinforcement.

We behave differently when we learn that only some of our
responses will be reinforced. Psychologists have investigated
the effects of many schedules of reinforcement, which
are rules or procedures for the delivery of reinforcement.

Four schedules for the delivery of intermittent
reinforcement are fixed ratio, fixed interval, variable ratio,
and variable interval. A ratio schedule provides
reinforcements depending on the number of responses. An
interval schedule provides reinforcements depending on the
timing of responses.
Fixed-Ratio Schedule
 A fixed-ratio schedule provides reinforcement only after a certain
(fixed) number of correct responses have been made—after every sixth
response, for example.
 We see similar behaviour among pieceworkers in a factory whose pay
depends on how many pieces they turn out or among fruit pickers who get
paid by the bushel.
 A fixed-ratio schedule tends to produce rapid and steady responding.
Researchers sometimes graph the results with a cumulative record, in
which the line is flat when the animal does not respond, and it moves up
with each response.
 For a fixed-ratio schedule, a typical result would look like the figure.
 However, if the schedule requires a large number of responses for
reinforcement, the individual pauses after each reinforced response. For
example, if you have just completed 10 calculus problems, you may pause
briefly before starting your next assignment. After completing 100
problems, you would pause even longer.
Variable-Ratio Schedule
 A variable-ratio schedule is similar to a fixed-ratio
schedule, except that reinforcement occurs after a variable
number of correct responses.
 For example, reinforcement may come after as few as one or
two responses or after a great many. Variable-ratio
schedules generate steady response rates.
 Variable-ratio schedules, or approximations of them, occur
whenever each response has about an equal probability of
success.
 For example, when you apply for a job, you might or might
not be hired. The more times you apply, the better your
chances, but you cannot predict how many applications you
need to submit before receiving a job offer.
Fixed-Interval Schedule
 A fixed-interval schedule provides reinforcement for the first
response made after a specific time interval.
 For instance, an animal might get food for only the first response it
makes after each 15-second interval.
 Then it would have to wait another 15 seconds before another
response would be effective. Animals (including humans) on such a
schedule learn to pause after each reinforcement and begin to
respond again toward the end of the time interval.
 The cumulative record would look like the figure. Checking your
mailbox is an example of behaviour on a fixed-interval schedule. If
your mail is delivered at about 3 P.M., and you are eagerly awaiting
an important package, you might begin to check around 2:30 and
continue checking every few minutes until it arrives.
Variable-Interval Schedule

With a variable-interval schedule, reinforcement is available after a
variable amount of time has elapsed.

For example, reinforcement may come for the first response after 2 minutes,
then for the first response after the next 7 seconds, then after 3 minutes 20
seconds, and so forth.

You cannot know how much time will pass before your next response is
reinforced.

Consequently, responses on a variable-interval schedule occur slowly but
steadily. Checking your e-mail is an example: A new message could appear at
any time, so you check occasionally but not constantly.

Stargazing is also reinforced on a variable-interval schedule. The
reinforcement for stargazing—finding a comet, for example—appears at
unpredictable intervals. Consequently, both professional and amateur
astronomers scan the skies regularly.
Extinction of Responses Reinforced on Different Schedules
 Suppose you and a friend go to a gambling casino and bet on the roulette
wheel.
 Amazingly, your first 10 bets are all winners. Your friend wins some and loses
some.
 Then both of you go into a prolonged losing streak. Presuming the two of you
have the same amount of money available and no unusual personality quirks,
which of you is likely to continue betting longer?
 Your friend is, even though you had a more favourable early experience.
Responses extinguish more slowly after intermittent reinforcement (either a
ratio schedule or an interval schedule) than after continuous reinforcement.
Extinction of Responses Reinforced on Different Schedules
 Consider another example. Your friend Beth has been highly reliable.
Whenever she says she will do something, she does it.
 Becky, on the other hand, sometimes keeps her word and sometimes
doesn’t.
 Now both of them go through a period of untrustworthy behaviour.
 With whom will you lose patience sooner? It’s Beth. One explanation is
that you notice the change more quickly. If someone has been unreliable
in the past, a new stretch of similar behaviour is nothing new.
Classical and Operant Conditioning
INTERRELATIONSHIPS
Interrelationships of Classical and Operant Conditioning
 we have been discussing classical and operant conditioning as if they
were totally separate aspects of behaviour.
 However,
it should not be surprising to find that there are
interconnections between the two: after all, organisms are constantly
producing many responses, both reflex and operant.
 In this sense, the distinction between the two types of learning is partly
a way of simplifying the analysis of behaviour, by breaking it into reflex
and operant components.
Interrelationships of Classical and Operant Conditioning
 In the real world, both processes can be occurring simultaneously.
 One striking example of this is negative reinforcement.
 You may recall that negative reinforcement utilizes a negative reinforcer in
order to increase the probability of a response.
 One form of this is escape, where a negative reinforcer is presented, and is
only removed after the organism makes the desired response.
 In this circumstance, the removal of the aversive stimulus is effectively like a
reward, so the behaviour becomes more likely (hence, reinforcement).
 For example, a dog given a mild shock through an electrified floor grid
will learn to jump to another chamber to escape the shock.
 Now, if a light flashes before the start of the shock, the dog will soon
anticipate the shock, and jump before the shock begins.
 This becomes avoidance – the dog is jumping in order to avoid the
negative reinforcer.
 This leads to an interesting problem: since the dog jumps before the shock, there is no
longer any experience of the original reinforcer – a circumstance that would lead to
extinction of the response if one were looking at positive reinforcement.
 So why does the dog keep jumping each time the light goes on?
 The light, of course, has become a discriminative stimulus, enabling the dog to respond
before the shock occurs.
 Still, why should the dog persist in jumping without at least an occasional experience of
shock?
 The answer seems to be that, through classical conditioning, the light has become a CS
associated with the UCS of shock – which is a perfect scenario for creating a conditioned
fear. Thus, the dog continues to jump, not to avoid the shock, but to escape from the feared
light! (Mowrer 1956; Rescorla and Solomon 1967).
 Recognizing that the two processes (operant and classical conditioning) are
occurring together also adds to our understanding of conditioned fears.
 Watson, in his demonstration with little Albert, discovered that conditioned fears do
not readily extinguish.
 The reason for this seems to be that the feared stimulus (the CS) also triggers
operant escape behaviour.
 This escape response removes the individual from the situation before there is an
opportunity to determine if the UCS will follow or not – thereby preventing the
conditions necessary for extinction. (The same mixture of classical and operant
responses happens in the shower when we hear the toilet flush: while we fear the
sound, we also tend to jump away from the water spray to avoid being scalded.)
 The fact that fear stimuli can evoke an operant response is a very significant
point in terms of those everyday fears, which are called phobias.
 If, as Watson argued, such fears are based on classical conditioning, then it is
also likely that the fears persist long after the original experience, because we
avoid the situations that elicit the fear.
 As a result, there is no opportunity to find out if our fear is realistic or not. For
example, a person who is afraid of flying will be reluctant to fly, and therefore
has no chance to find out that flying is safe, and that there is nothing to fear.
 In essence, until we face the fear situation, there is no opportunity to
extinguish the fear response.
 Another type of interaction can occur in which conditioned behaviours
are also sustained by reinforcement.
 For example, a phobia may arise through classical conditioning, but the
individual may also be positively reinforced by attention and sympathy
from other people. In such circumstances, the individual may be
unlikely to try to change.
Questions?