Instrumental Conditioning Foundations
Download
Report
Transcript Instrumental Conditioning Foundations
Chapter 5 – Instrumental Conditioning: Foundations
• Outline 1
– Comparison of Classical and Instrumental
Conditioning
– Early Investigations of Instrumental Conditioning.
• Thorndike
– Chicks and mazes
– Cats and puzzle box
– Modern approaches to the study of instrumental
conditioning
• Discrete trials Procedures
• Free-operant procedures
– Magazine training and shaping
– Response rate as measure of operant behavior
– Instrumental Conditioning Procedures
•
•
•
•
Positive Reinforcement
Negative Reinforcement (Escape or Avoidance)
Positive Punishment (Punishment)
Negative Punishment (Omission Training/DRO)
• Comparison of Classical and Instrumental
Conditioning.
– Classical = S-S relationship
• Light-shock – elicits fear
• Tone-food – elicits salivation
• In Classical Conditioning there is no response
requirement.
– Instrumental = R-S relationship
• We will refer to it as R-O
• Behavior (Response) is instrumental in producing the
outcome
– Press lever – Get food
– Pull lever – Get money
• In Instrumental Conditioning a particular response is
required
• Keep in mind Classical and Instrumental
conditioning are approaches to
understanding learning.
– not completely different kinds of learning.
– Many learning situations could be described by
either approach.
• Child touches hot stove
– CS, US, UR, CR?
– Pavlovian = fear stove
– Instrumental = less likely to approach
• Conditioned Taste Aversion
– S-S = Taste – LICl
– R-O= Drink liquid - get sick (punished)
• Early Investigations of Instrumental
Conditioning.
• Edward Lee Thorndike (American)
– Same time as Pavlov
• Late 1800s; early 1900s
– interested in animal intelligence.
• late 19th century - many people believed that
animals reasoned – like people do
– Romanes
– Stories of amazing abilities of animals.
• biased reporting?
– Report interesting behavior
– ignore stupid
• Thorndike (1898)
– “Dogs get lost hundreds of times and no one ever
notices it or sends an account of it to a scientific
magazine, but let one find his way home from
Brooklyn to Yonkers and the fact immediately
becomes a circulating anecdote. Thousands of
cats on thousands of occasions sit helplessly
yowling, and no one takes thought of it or writes
to his friend, the professor; but let one cat claw at
the knob of a door supposedly as a signal to be
let out, and straightway this cat becomes
representative of the cat-mind in all the books…In
short, the anecdotes give really
the…supernormal psychology of animals.”
• Thorndike attempted to understand normal
or ordinary animal intelligence.
– Chicks in a maze
– Cats in a box
• Puzzle box video
• Thorndike tested many animals
– chicks, cats, dogs, fish, and monkeys.
– little evidence for reasoning.
• Instead learning seemed to result from
trial and accidental success
• Modern approaches to the study of
instrumental conditioning
– Discrete trials Procedures
• Like Thorndike’s work
• Simpler mazes though
– Figure 5.3
» Straight alleyway (running speed)
» T-Maze (errors)
– Later
» radial arm maze
» Morris Water Maze
– Note
• each run is separated by an intertrial interval
• Just like in Pavlovian Conditioning
8-arm Maze
Morris Water Maze
• Free-operant procedures
– There are no “trials”
– The animal is allowed to behave freely
• Skinner Box
– an automated method for gathering data from
animals
• Skinner
• Rat in Operant Chamber
• Skinner used these boxes to study
operant conditioning
– operant
• any response that “operated” on the environment
– Defined in terms of the effect it has on the
environment
• Pressing a lever for food
– Doesn’t matter how the rat does it
• Right paw, left paw, tail
• As long as it actuated the switch
– Similar to opening a door
• Doesn’t matter which hand you use
• Or foot (carrying groceries)
• Just as long as the result is achieved
• Magazine training and shaping
– First have to train the animals about
availability of food
– training by successive approximations
(shaping)
• Shaping a rat
• Response rate as measure of operant
behavior
– In a free-operant situation you do not have
measures such as percent correct, or errors.
– Skinner used response rate as a primary
measure
– We will see later that various schedules of
reinforcement affect response rate in various
ways.
• Instrumental Conditioning Procedures
• First let’s get some terminology down
– Positive
• Behave = Stimulus applied
– Negative
• Behave = Stimulus removed
– Reinforcement
• Behavior increased
– Punishment
• Behavior decreased
• 2 x 2 table
Punishment
Reinforcement
positive
negative
• Application
– Box 5.2 in book
• Mentally disabled woman
• Head banging behavior
• Possibly to get attention - reward
– Change contingencies
• Ignore head banging
• Social rewards when not head banging
– Procedure?
• Negative Punishment
– Differential Reinforcement of Other Behavior (DRO)
ABAB design
• Your book uses somewhat different
terminology
– Positive Reinforcement
• Same
– Escape or Avoidance
• Negative Reinforcement
– Punishment
• Positive Punishment
– Omission Training/DRO
• Negative Punishment
• We will use my terminology
O
• Outline 2
– Fundamental Elements of Instrumental
Conditioning
• The Response
– Behavioral variability vs. Stereotypy
– Relevance or Belongingness
– Behavior Systems and Constraints on Instrumental
conditioning
• The Reinforcer
– Quantity and Quality of the Reinforcer
– Shifts in Reinforcer Quality or Quantity
• The Response-Outcome Relation
–
–
–
–
–
Temporal Relation
Contingency
Skinner’s Superstition experiment
The Triadic Design
Learned helplessness hypothesis
• Behavioral variability vs. Stereotypy
– Thorndike emphasized the fact that reinforcement increases the
likelihood of the behavior being repeated in the future
• Uniformity – stereotypy
• This is often true
– You can increase response variability, however, by requiring it
• Page & Neuringer, 1985
• Two keys (50 trials per session)
– Novel group
» Peck 8 times
» Do not repeat a pattern that was used in the last 50 trials
» LRLRLRLR
» RLRLRLRL
» LLRRLLRR
» LLLLRRRR
– Control group
» RF 8 pecks
» Doesn’t matter how they do it.
• Figure 5.8
Reward and
originality
(creativity)?
Box 5.3
• Relevance or Belongingness
• We have discussed this in Classical
Conditioning
• Bright Noisy Tasty water
• Peck differently for water and Grain
• It has also been studied extensively in the
Instrumental literature
• Originally noted by Thorndike
• Cat – puzzle box
• Train cat to yawn to escape
• Or scratch themselves
• Did not go well
• Brelands “The Misbehavior of Organisms” 1961
• Play on Skinner’s “Behavior of Organisms” (1938)
• students of Skinner
– training animals to do tricks as advertising gimmicks
• raccoon and coin(s)
– Shaped to pick up coin
– Then to place in bank
– Then 2 coins
• pig and wooden coin
• Instinctive drift
– Arbitrarily established responses drift toward
more innately organized behavior (instinct)
• The arbitrary operant
– place coin in bank
• Instinctive drift
– species specific behaviors related to food
consumption
• Wash food
• Root for food
Behavior Systems and Constraints on Instrumental conditioning
• Behavior Systems (Timberlake)
• the response that occurs in a learning episode is
related to the particular behavioral system that is
active at the time
• If you are a hungry rat and food is the reward
• behaviors related to foraging will increase
• If you are a male quail maintained on a light cycle
that indicates mating season and access to a
female is being offered
• mating behaviors will be elicited
Behavior systems continued
• The effectiveness of any procedure for increasing an
instrumental response will depend on the compatibility of that
response with behavioral system currently activated
– rats pressing levers for food?
– pigeons pecking keys for food?
– Very easy to train
• Even works for fish
– Easy
• bite a stimulus associated with a rival male
• Swim through hoops for a stimulus associated with female
– Difficult
• Bite stimulus associated with access to female
• Swim through hoops for access to rival male
The Reinforcer
• How do qualities of the Reinforcer affect
Instrumental Conditioning?
– Quantity and Quality of the Reinforcer
• Just like in Pavlovian conditioning.
– More intense US Better conditioning
• Trosclair-Lassere et al. (2008)
– Taught autistic child to press button for social reward
» Praise, hugs, stories
– Social reward
» 10 s
» 105 s
» 120 s
– Progressive ratio
• Magnitude of RF and drug abstinence
– Perhaps not surprising
• The more you pay the better addicts do.
.
• Shifts in Reinforcer Quality or Quantity
• How well a reinforcer works depends on what
subjects are used to receiving
– Mellgren (1972) • Straight alleyway
– Phase 1
» Half – found 2 pellets (low reward)
» Half – found 22 pellets (high reward)
– Phase 2
» Half from each group switched to opposite condition
» Other half stay the same
• 4 conditions
• H-H
• high RF control
• L-L
• low RF control
• L-H
• Positive contrast
• H-L
• Negative Contrast
Results – Figure 5.10
[note Domjan changed it
to Small (S) and Large
(L) rewards in text]
• Response- Outcome Relation
– Temporal relation?
• contiguity
– Causal relation?
• Contingency
• Independent
– Contiguous doesn’t mean contingent
• You wake up – sun rises
– Contingent not always contiguous
• Submit tax returns
– Wait a few weeks for the money
• Effects of temporal relationship
– Hard to get animals to respond if long delay
between response and reward
– Delays are tough experiments to run using
free operant procedure
• Allow barpressing during “delay?”
– Some barpresses likely close to RF
• Enforce no barpressing after initial barpress?
– Delay RF?
• In study in book (Fig. 5.11) each bar press resulted in RF at a
specified set time
• For some animals the delay was short
– 2-4s
• For some it was long
– 64 s
• Example (16 s delay)
– Bar press 1 at 1 s (RF1 = 17 s)
– Bar press again at 3 s (RF2 at 19s)
• 14 s from RF1
– Bar press again at 12 s (RF3 at 28 s)
• 5 s from RF1
– There will still be some “accidental” contiguity
• Graph shows how responding is affected by actually experienced
delays
• Why are animals so sensitive to the delay
between response and outcome?
– Delay makes it difficult to determine which
behavior actually caused RF
• Press lever (contingent after 20 s)
–
–
–
–
Scratch ear
Dig in bedding on the floor
Rear up
Clean face
» Reward
• All of the other behaviors are more contiguous with RF
than is lever pressing
– A marking procedure can help maintain
responding over a long delay
• Provide a light or click after “target” response
– Lever press – click .....20 s food
• Helps animals bridge the gap
• Response-Reinforcer Contingency
– Skinner thought contiguity was more
important than contingency
– Superstition experiment
– Superstition and bowling video from u-tube
• relevant content begins at 3:12
• Reinterpretation of Superstition Experiment
• Behavioral systems again
– Different kinds of responses occur when periodic RF is used
• Focal search
– Behaviors near food cup as time for RF approaches
» “I know its coming”
• Post-food focal search
– Again - activity near cup
» “Did I miss any?”
• General search
– Move
»
»
»
away from cup
This is probably when skinner saw the turning and head tossing behaviors
“I have to wait, might as well look around”
“I am also a bit frustrated”
– There is evidence for these patterns of responding
• Staddon and Simmelhag (1971)
– There is also evidence that slightly different patterns emerge with food
vs. water RF.
• Effects of the controllability of Reinforcers
• Is having control a good thing?
• We briefly mentioned the Brady Executive
Monkey study earlier.
• That study implied having operant control
over outcomes could be bad for the animal
• That study was confounded
• Better evidence for the effects of control over
outcomes comes from the learned
helplessness literature
• Learned helplessness hypothesis
– Animals learn that they have no control over the
shock
– Creates an expectation
• expect shock regardless of their behavior
• Implications
– 1. This expectation reduces motivation to make
an instrumental response
– 2. Even if they do escape/avoid shock, it is
harder for them to associate their behavior with
the outcome.
• Shock was independent of behavior in the past
• Similar to US preexposure effect