Transcript Convert

Reinforcement & Punishment:
What is an SR?
Lesson 8
What is an SR?
Thorndike’s Law of Effect
 Satisfiers & annoyers
 Skinner
 determined by how B changes
 reinforcer:  B
 punisher:  B
 Primary reinforcers & punishers


biologically important stimuli ~
What is an SR? (continued)
Secondary reinforcers & punishers
 money
 praise
 How do they become an SR?
 Classical Conditioning
 Higher order learning ~

Drive Reduction View
(50s & 60s)
Similar to Law of Readiness
 Relative state of deprivation required
 for a basic drive
 thought to always be true
  Drive  motivation

B  reduction of drive state (SR)
~
But...
Sometimes
hard to identify
drive
 What drive is
this? ~

Sensory reinforcement
Sensory stimulus unrelated to
biological drive
 monkeys learn response
 reward is watching toy train
 rats learn to bar press
 reward = turning on a light
 or turning off light ~

Premack Principle
Commonly used in educational setting
 impractical or unethical to use food
 Thought of reinforcers as responses
 press bar  eating response
 wider application of I/O conditioning
 Differential probability principle
 High probability responses
reinforce low probability responses ~

Premack Principle
Homme et al (1963)
 Unruly 3 year olds
 High probability behaviors
 ignored teacher
 screaming
 pushing furniture
 Low probability behavior
 sitting quietly ~

Premack Principle: Homme et al
Rewarded sitting quietly with...
 3 min of running around screaming
 Results: sitting quietly increased
 Particular behaviors observed by
different kids
 different responses effective
reinforcers for different kids ~

Premack Principle
Charlop, Kurtz, & Casey (1990)
 autistic children
 High probability behaviors
 echolalia
 perseveration
 Low probability behaviors
 adding up coins
 judging objects: same or different ~

Premack Principle: Charlop et al
100
% correct
responses
80
echolalia RFT
60
food RFT
40
# of sessions
Premack Principle: Problems
Fluctuation of response probabilities
 e.g., sometimes kid would rather
play outside than play video games
 Solution: token economies
 Does not explain how reinforcer
increases response probability ~

Behavioral Regulation Approach
Response deprivation
 limit access to a response
 does not require high vs. low probability
 Behavioral homeostasis
 preferred distribution of activities
 operant conditioning imposes limits
 behavioral bliss point

e.g.,
time spent studying vs. video games ~
Behavioral Regulation Approach

A behavior is limited below bliss point
 disturbance of behavioral homeostasis
analogous

to increased biological drive
Contingency set during I/O procedure
 establish relationship between responses
 B  move toward bliss point (baseline) ~
Behavioral Regulation Approach
Low probability behaviors as reinforcers
 observe baseline rate of behavior
 limit activity below baseline
 Require a response to engage in deprived
behavior

contingency

Increase toward bliss point
cost
vs. benefits determines how much ~
What Becomes Connected?
Skinner?
 refused to consider associations
 Thorndike: S-R view (SD-B)
 association b/n stimulus context
and response
R
 NOT the outcome (S )
 no representation of reinforcer ~

S-R-O (SD-B-SR) view: Tinkelpaugh (1928)
Goal-oriented responding
 respond with idea of getting reward
 The monkey and the hidden banana
 2 cups, put banana under 1
 task: choose cup with banana
 Secretly substituted rotten lettuce
 monkey became agitated
 Expected banana reward (outcome) ~

S-R vs. S-R-O
Adams & Dickinson (1981)
 Taste aversion paradigm
 Associate sucrose (sweetner)
 w/ lithium chloride (LiCl)  illness
 Will rats press bar to get something that
makes them sick? ~

S-R vs. S-R-O
Phase 1:
 Trained rats to bar press for sucrose
 Phase 2:
 associate sucrose w/ illness
 Phase 3:
 Will rats press bar now?

No
sucrose delivered ~
S-R vs. S-R-O : Results
Predictions?
 If S-R-O
 If S-R
 Results
 Rats did not press bar
 Supports S-R-O ~

S-R vs. S-R-O
Use different levels of training
 Phase 1: Same procedure but…
 some get 100 RFTs
 some get 500 RFTs ~

Results & Conclusions
Less training  low response rate
 Little training  outcome important
 S-R-O
 Extensive training  high response rate
 outcome less important
 response is well established
 S-R ~

Parallel learning in humans
Learning a skill
 e.g., to drive a car
 Early trials
 consider consequences
 must think about what you are doing
 After extensive experience
 becomes automatic
 after many trials ~

Extrinsic Reward vs Intrinsic Motivation
Early trials
 expectation of reinforcer
 extrinsic reward
 CER = positive affect
 Well-established behavior
 no expectation of reward
 intrinsic motivation
 CER = positive affect ~
