Punishment and Learning

Download Report

Transcript Punishment and Learning

Instrumental/Operant Conditioning
Thorndike’s Puzzle Box
Result
Thorndike’s Law of Effect
• “Of several responses made to the same
situation, those which are accompanied or
closely followed by satisfaction…will be
more likely to recur”
Situation
Response
Outcome
Puzzle Box
Pull Loop
Meat or Fish
SR
association
Two Theories
• Thorndike
– Stimulus associated with response (S-R), so the
response is a “habit” triggered by the situation
• Grandmother
– “Cat is working to get food” (R-O)
Situation
Response
Puzzle Box
Pull Loop
Outcome
Meat or Fish
RO
association
Test of Grandma’s Theory
• Stage 1: Train instrumental S-R-O
• Stage 2: Alter value of O (devalue) in the
absence of R and S
• Stage 3: Test to determine if R is reduced
Responding
100
90
80
70
60
50
40
30
20
10
0
Non-Devalued
Devalued
Untrained
Shaping
• Shaping is a method for encouraging
novel behavior
– Reinforcing successive approximations to the
target behavior
Types of Reinforcers
• Primary Reinforcers satisfy a need and
reinforces behavior without any special
experiences
• Secondary Reinforcers become valuable
through association with primary
reinforcers
Delay of Reinforcement
100
90
Reinforcer Potency
• Delayed reinforcers
are steeply
discounted
• Loss of self-control
and implusivity
• Precommitment
80
70
small
immediate
60
50
large
delayed
40
30
20
10
0
-9
-6
Delay
-3
0
Stimulus Discrimination
Stimulus Discrimination?
Positive and
Negative Reinforcement
Shuttle Box
Escape versus Avoidance Conditioning
Schedules of Reinforcement
• Continuous Reinforcement Schedule:
Reinforcer is delivered every time a
particular response occurs.
• Partial or Intermittent Reinforcement
Schedule: Reinforcement is given only
some of the time.
Partial Reinforcement Schedules
• Fixed Ratio (FR): Reinforcement occurs after a
fixed number of responses.
• Variable Ratio (VR): Reinforcement occurs after
a varied number of response.
• Fixed Interval (FI): Reinforcement occurs for
the first response after a fixed time interval
• Variable Interval (VI): Reinforcement occurs for
the first response after a variable time interval
Partial Reinforcement Schedules
Schedules and Extinction
• Failure to reinforce a response eventually
extinguishes it.
• Partial reinforced responses are more
difficult to distinguish.
– “Partial reinforcement extinction effect”
– “Superstitious behavior” is resistant to
extinction for this reason
Why Reinforcers Work
• Deprived of the opportunity to engage in
behavior (drink, eat, etc.), called the
response deprivation hypothesis
• Physiological
– James Olds and “pleasure centres”
– Nucleus Accumbens and Dopamine
Punishment and Learning
• Punishers decrease of probability the
immediately preceding response
– Two kinds of punishment.
• Negative Reinforcement versus
Punishment
– Negative Reinforcement: Strengthens
behavior
– Punishment: Weakens behavior
Continue
Figure 5.11: Two Kinds of Punishment
Return
Drawbacks of Punishment
• Only suppresses unwanted behavior
• Unwanted side effects
– target becomes aggressive, avoidance
• Often ineffective unless a strong punisher
given immediately after every response
(e.g., red light camera)
• Does not specify what should be done.
Guidelines for Effective Punishment
• Specify why punishment is being given
• Emphasize the behavior, not the person,
being punished
• Without being abusive, make sure the
punishment immediate and noticeable
• Identify and positively reinforce more
appropriate responses.
Some Applications of Instrumental
Conditioning
•
•
•
•
Classroom Management
Token Economies in Mentally Challenged
Autism
Self-Control
Other Specialized Forms of Learning
•
•
•
•
Spatial Learning
Knowledge Attribution
Helplessness
Observational Learning
– Mirror neurons