No Slide Title

Download Report

Transcript No Slide Title

Generalized Matching
Continued
Schneider (1973) and Todorov (1973)
B1
log
B2

M1
a log
 log c
M2
 Version of the GML that describes the effect of
reinforcer magnitude on choice
 Schneider and Todorov both varied reinforcer
magnitude and found that the above equation
did describe their data
 But sensitivity to magnitude was about 0.5 –
much less than sensitivity to rate
 Reinforcer rate is more effective at influencing
choice than reinforcer magnitude
The concatenated generalized
matching law
B
log 1
B2

R
M
ar log 1  am log 1  log c
R2
M2
 If we put the GML descriptions of control by rate
and magnitude together, we could write the
above equation
 The sensitivity terms have subscripts identifying
the IV they refer to
 Each IV controls choice in the same way, but
sensitivity to each might differ
 Because reinforcer rate affects choice more than
reinforcer magnitude does, ar is greater than am
The concatenated generalized matching
law
B
log 1
B2

R
M
D
ar log 1  am log 1  ad log 2  ...
R2
M2
D1
A2
Q1
aA log
 aq log
 log c
A1
Q2
 Add a log ratio and a sensitivity term for all the other IVs
that affect choice…
 Notice that reinforcer delay and response arduousness
have reversed log ratios – why?
 Sensitivity term: how much influence that IV has on
choice
 If an IV is constant and equal for both alternatives, its
log ratio will be zero and it will drop out of the equation
 If an IV is constant and unequal it will produce a bias
The concatenated generalized matching
law
B1
log
B2

R1
M1
ar log
 am log
 log c
R2
M2
 e.g., suppose M1 is 6 s access to wheat and M2 is 3 s
 We know that am is about 0.5
 The magnitude ratio is 2, so log magnitude ratio is 0.3,
so the magnitude term is about 0.5 x 0.3 = 0.15
 So if we vary the reinforcer rates and measure bias, it
should be about 0.15 towards the larger magnitude, as
long as all the other IVs in the concatenated GML are
constant and equal, and there’s no inherent bias
Vollmer and Bouret (2000)
College basketball,
3-point shots vs 2point shots
Across players, ratio
of shots attempted
(responses) slightly
undermatched ratio
of successful shots
(reinforcer rate)
Bias in favor of 3point shots
(reinforcer
magnitude)
The concatenated generalized matching
law
B1
log
B2

X1
 ax log
X2
 This is an efficient way to write the concatenated
GML. The symbol S means “the sum of”
 X is any independent variable that affects choice
 Predicted log response ratio is the sum of all the
log independent-variable ratios multiplied by
their sensitivities
 There’s no bias term in the equation
 When all the IVs that affect choice have been
discovered, there will be no inherent bias
The concatenated generalized matching
law
B1
log
B2

X1
 ax log
X2
 Looking at sensitivities to different independent
variables
 Already seen that sensitivity to :
 reinforcer rate is about 0.8 to 0.9
 reinforcer magnitude is about 0.5
 Now, two more IVs – reinforcer delay and
response force requirement (manipulating the
arduousness of a response)
Response force: Hunter and Davison (1982)
B1
log
B2

R1
A2
ar log
 aA log
 log c
R2
A1
 Varied both reinforcer rate in concurrent VI VI
and the force required for a key-peck to be
counted as a response
 Response measures:
ar = 0.88, aA = 0.71
 Time measures:
ar = 0.98, aA = 0.41
 Force required affected choice less than reinforcer
rate
 Sensitivities measured using time allocation aren’t
always higher than for response allocation
 Thorough test of the concatenated GML, fitted
the data very well
Rf magnitude: Elliffe, Davison, Landon
(2008)
B1
log
B2

R1
M1
ar log
 am log
 log c
R2
M2
 What about with other variables?
 Both reinforcer rate and magnitude varied thoroughly
 The concatenated GML fitted the data very well, but
 Sensitivity to rate was highest when the magnitudes were equal
 Sensitivity to magnitude was highest when the rates were equal
 Interaction between a values
 The CGML can’t be exactly right
 probably near enough for predicting behavior in practical terms
Reinforcer delay
B1
log
B2

D2
ad log
 log c
D1
 The first experiment on this (Chung & Herrnstein, 1967)
reported strict matching to relative delay (ad = 1)
 But a more thorough later experiment (Williams &
Fantino, 1978) found that sensitivity increased with the
average delay
 This shouldn’t happen according to the GML – sensitivity
should stay constant for a particular IV
Assessing preference: reinforcer quality
B1
log
B2

R1
Q1
ar log
 aq log
 log c
R2
Q2
 Testing food preference
 Arranging each as a reinforcer for a different response
 Varying reinforcer rates
 Bias will reflect preference for one or other food,
because it will be affected by the log quality ratio
 This is the same logic that Hollard and Davison
used with food v EBS
 Matthews and Temple (1979): cow feeds
 Preference for crushed barley
Assessing Reinforcer Quality
B1
log
B2

R1
Q1
ar log
 aq log
 log c
R2
Q2
 Important to use dependent schedules
 If you use independent schedules the animal will
respond more for the preferred food (say, Q1)
and so its reinforcer rate for that food (R1) will increase
 so log (R1/R2) increases
 and that increases preference more
 so log (R1/R2) increases more

 And so on. Can have an apparently very strong
preference caused by the animal receiving that
food at a higher rate
 Pet-food advertisements
 Reinforcer preference assessments
Summary
 Generalized matching seems to describe choice well in a
wide variety of situations – different species, response,
reinforcers
 It says that different IVs to do with responses and
reinforcers all affect choice in the same way …
 But that some of them are more effective than others, as
measured by their different sensitivity values
 It’s the best, most widely applicable, description of
choice that we have at the moment
 But is it a theory of choice? Why should it happen?
 Is Baum’s idea that it’s really ‘failed strict matching’
right? Or is it an outcome of some other mechanism for
choice, like maximization?
Theories of Matching
Experimental Analysis of Choice
• Methods: concurrent schedules, concurrent chains,
delay discounting, foraging contingencies, behavioral
economic contingencies.
• Models and Issues: matching/melioration,
maximizing/optimality, hyperbolic discounting, behavioral
economic/ecological models, behavior momentum, molar
versus molecular issue, concepts of response strength.
• Applications: self-control, drug abuse, gambling, risk,
economics, behavioral ecology, social/political decision
making.
Melioration: A Theory of Matching
“To make better”: Behavior shifts to the
higher return (lower cost) or equal local
rates of reinforcement.
(R1 / R2) = (r1 / r2), or (r1 / R1) = (r2 / R2)
(reinforcers per response, i.e., return).
(T1 / T2) = (r1 / r2), or (r1 / T1) = (r2 / T2) (local
rate of reinforcement).
Example: Conc VI 30” VI 120”
Suppose in the first hour of exposure 1000 responses were
emitted to each alternative:
(VI 30”) r1 / R1 = (120 rfs / 1000 resps). Return = 0.120
(VI 120”) r2 / R2 = (30 rfs / 1000 resps). Return = 0.03
Ultimately behavior will shift toward the higher return. What
will be the result?
120 / (1000 + x) = 30 / (1000 – x); x = 600.
120 / 1600 = 30 / 400 i.e., matching (80% responses on VI
30” alternative).
Return = 0.075 rfs/resp on each alternative.
Operant Conditioning
What about VR30 VR120?
• How does matching say behavior will be
allocated?
Problem: Conc VR 30 VR 120
ALL responses will ultimately be made to the VR
30 alternative. This is consistent with matching,
but same would be said if all the responses were
made to the VR 120 alternative. But melioration
can predict which alternative should receive all
the responses:
VR 30: r1/ R1 = 1/30; VR 120: r2/ R2 = 1/120.
These cannot change, so shifting to the higher
return means all the responses will go to VR 30
alternative.
And VR 20 – VI 60?
• What would an optimal pattern of
response be?
Overall
• Mixed bag of results
– Some optimization
– Some melioration
• Something else at play?
– Molar theories
• Matching, melioration, optimization
– Molecular theory
• Momentary maximization theory
A Thought Experiment
• 2 doors
• .1 and .2 probability of getting a dollar
respectively
• Can get a dollar behind both doors on
the same trial
• Dollars stay there until collected, but
never more than 1 dollar per door.
• What order of doors do you choose?
Patterns in the Data
• If choices are made moment by
moment, should be orderly patterns in
the choices: 2, 2, 1, 2, 2, 1…
• Results mixed but promising results
when using time as the measure
What Works Best Right Now
• Maximizing local rates and moment to
moment choices can lower overall
reinforcement rate.
• Short-term vs. long-term
Next Time:
• Delay and self-control
Delay and Self-Control
A critical fact about self control:
Preference reversal
In positive self control, the further you are away from the
smaller and larger reinforcers, the more likely you are to
accept the larger, more delayed reinforcers.
But, the closer you get to the first one, the more likely you are
to chose the smaller, more immediate one.
Preference thus reverses as you make your choice further and
further away from the reinforcers.
30
Friday night:
“Alright, I am setting my alarm clock to wake me up at 6.00 am
tomorrow morning, and then I’ll go jogging.” ...
Saturday 6.00 am:
“Hmm….maybe not today.”
31
Ainslie-Rachlin theory – an analogy
Rachlin (1970, 1974), Ainslie (1975)
Preference reversal:
The analogy is simply looking at tall buildings as you are
walking towards them. From far away, you can see the tall
building in the distance.
But as you get closer, you can't see it any more, just the
shorter building closer to you.
So, if you are looking for the tallest building, the highest hill,
the largest reinforcer, then you chose the bigger one if you are
far away. But if you are close, you choose the smaller one.
32
Assume: At the moment in time when we make the choice, we
choose the reinforcer that has the highest current value...
To be able to understand why preference reversal occurs, we
need to know how the value of a reinforcer changes the time
by which it is delayed...
Outside the laboratory, the majority of reinforcers are delayed.
Studying the effects of delayed reinforcers is therefore very
important.
34
Basically, the effects that reinforcers have on behaviour
decrease
-- rapidly
-- when
the reinforcers
are more
This is
how reinforcer
value
generally
changes with
delay:
and more delayed after the reinforced response.
3
REINFORCER VALUE
A concave-upwards graph
2
1
0
0
5
10
15
20
REINFORCER DELAY
25
30
35
Studying preference of delayed reinforcers
Humans:
- verbal reports at different points in time
- “what if” questions
Humans AND nonhumans:
A.
Concurrent chains
B:
Titration
All are choice techniques.
36
A.
Concurrent chains
Concurrent chains are simply concurrent schedules -- usually
concurrent equal VI VI -- in which reinforcers are delayed.
When a response is reinforced, usually both concurrent
schedules stop and become unavailable, and a delay starts.
Sometimes the delays are in blackout with no response
required to get the final reinforcer (an FT schedule);
Sometimes the delays are actually schedules, with an
associated stimulus, like an FI schedule, that requires
responding.
37
Initial links,
Choice
phase
W
W
Conc VI VI
W
Terminal
links,
Outcome
phase
W
W
W
VI a s
VI b s
Food
Food
The concurrent-chain procedure
38
An example of a concurrent-chain experiment
MacEwen (1972) investigated choice between two terminallink FI and and two terminal-link VI schedules, one of which
was always twice as long as the other.
The initial links were always concurrent VI 60-s VI 60-s
schedules.
39
The terminal-link schedules were:
FI 5 s
FI 10 s
FI 20 s
FI 40 s
VI 5 s
VI 10 s
VI 20 s
VI 40 s
FI 10 s
FI 20 s
FI 40 s
FI 80 s
VI 10 s
VI 20 s
VI 40 s
VI 80 s
Constant reinforcer (delay and immediacy) ratio in the
terminal links – all immediacy ratios are 2:1.
40
2.0
BIRD M6
LOG RESPONSE RATIO
1.5
1.0
FI TERMINAL LINKS
0.5
VI TERMINAL LINKS
0.0
0
10
20
30
40
SMALLER FI or VI VALUE (s)
41
From the generalised matching law, we would expect:
B1
D2
log
 ad log
 log c
B2
D1
If ad was constant, then because D2/D1 was kept constant, we
would expect no change in choice with changes in the
absolute size of the delays.
FI 5 s
FI 10 s
FI 20 s
FI 40 s
VI 5 s
VI 10 s
VI 20 s
VI 40 s
FI 10 s
FI 20 s
FI 40 s
FI 80 s
VI 10 s
VI 20 s
VI 40 s
VI 80 s
D2/D1 was kept
constant throughout.
42
But choice did change,
so ad did NOT remain
constant:
A serious problem for
the generalised
matching law.
LOG RESPONSE RATIO
2.0
BIRD M6
1.5
1.0
FI TERMINAL LINKS
0.5
VI TERMINAL LINKS
0.0
0
10
20
30
SMALLER FI or VI VALUE (s)
40
43
B:
Titration - Finding the point of preference reversal
The titration procedure was introduced by Mazur:
- one standard (constant) delay and
- one adjusting delay.
These may differ in what schedule they are (e.g., FT versus
VT with the same size reinforcers for both), or they may be the
same schedule (both FT, say) with different magnitudes of
reinforcers.
What the procedure does is to find the value of the adjusting
delay that is equally preferred to the standard delay -- the
indifference point in choice.
44
For example:
- reinforcer magnitudes are the same
- standard schedule is VT 30 s
- adjusting schedule is FT
How long would the FT schedule need to become to make
preference equal?
Answer: Probably about FT 20 s. Concurrent chains give the
same result. Animals prefer the risky option (hyperbolic
averaging).
45
Titration: Procedure
Trials are in blocks of 4.
The first 2 are forced choice, randomly one to each alternative
The last 2 are free choice.
If, on the last 2 trials, it chooses the adjusting schedule twice,
the adjusting schedule is increased by a small amount.
If it chooses the standard twice, the adjusting schedule is
decreased by a small amount.
If equal choice (1 of each) -- no change
(von Bekesy procedure in audition)
46
Mazur's
titration
procedure
ITI
W
W
W
Trial
start
W
W
W
Peck
Choice
W
W
Standard
delay +
red
houselight
Why the postreinforcer
blackout?
W
Adjusting
delay +
green
houselight
2-s
food,
BO
6-s
food
47
Animal research: Preference reversal
Green, Fisher, Perlow, & Sherman (1981)
 Choice between a 2-s and a 6-s reinforcer.
 Larger reinforcer delayed 4 s more than the smaller.
 Choice response (across conditions) required from 2 to 28 s
before the smaller reinforcer.
We will call this time T.
48
choice
Small rf
28 s
Large rf
4s
T
2s
49
choice
Small rf
28 s
T
Large rf
4s
2s
50
choice
Small rf
28 s
T
Large rf
4s
2s
51
Green et al. (continued)
Thus, if T was 10 s, at the choice point,
the smaller reinforcer was 10-s away
the larger was 14-s away
So, as T is changed over conditions, we should see
preference reversal.
(Note that, in this and most experiments in this area, blackouts
are arranged after reinforcer deliveries so that the overall
reinforcer rate is the same whichever choice is made.)
Why is this important?
52
GREEN ET AL. (1981)
MEAN DATA
LOG RESPONSE RATIO
So, at delays that
long, pigeons can
still clearly tell
which reinforcer is
sooner and which
one later.
2
Larger, later / Smaller, sooner
Control condition:
two equal-sized
reinforcers were
delayed, one 28 s
the other 32 s.
Preference was
strongly towards
the reinforcer that
came sooner.
1
SELF CONTROL
0
-1
IMPULSIVITY
-2
0
5
10
15
20
25
VALUE OF T
53
Theories of delayed reinforcement and self control
It is very important to determine the shape of the decay curve.
After all, self control is all about delay of reinforcement and
one main feature of self control is preference reversal.
What we need to do is:
(1) To understand quantitatively where the point of preference
reversal is (indifference point)
(2) Better – to be able to predict choice between smaller-moreimmediate and larger-more-delayed reinforcers
54
Exponential versus hyperbolic decay
It is important to understand how the effects of reinforcers
decay over time, because different sorts of decay predict
different effects.
The two main candidates that have been looked at are:
Exponential decay -- the rate of decay remains constant over
time in this
Hyperbolic decay -- the rate of decay decreases over time
-- as in memory, too
55
Exponential decay
Vt  V0 e
-bt
Vt : value of the delayed reinforcer at time t
Vo : value of the reinforcer at 0-s delay
t : delay in seconds
b : a parameter that determines the rate of decay
e : the base of natural logarithms.
56
REINFORCER VALUE
EXPONENTIAL
6
MAG = 2
MAG = 6
4
2
0
0
10
20
30
SECONDS FROM SMALLER RF
57
Hyperbolic decay
hV 0
Vt 
ht
In this equation, all the variables are the same as in the
exponential decay, except that h is the half-life of the decay -the time over which the value of Vo reduced to half its initial
value.
Hyperbolic decay is strongly supported by Mazur’s research.
58
REINFORCER VALUE
HYPERBOLIC
6
MAG = 2
MAG = 6
4
2
0
0
10
20
30
SECONDS FROM SMALLER RF
Only hyperbolic decay can explain preference reversal
59
1.00
Two sorts of decay
fitted to McEwen's
(1972) data
Note: The previous
McEwen graph was
log response ratio,
this one is relative
responses.
0.50
RateRATIO
Relative
LOG RESPONSE
Hyperbolic is clearly
better.
0.75
HYPERBOLIC DECAY
0.25
0.00
1.00
0.75
0.50
EXPONENTIAL DECAY
0.25
0.00
0
10
20
30
40
SMALLER DELAY (s)
60
REINFORCER VALUE
4
3
2
0
1
2
3
1
4
4-s RFT HERE
2-s RFT HERE
1 s to small, 3 s to large
4 s to small, 6 s to large
Hyperbolic predictions shown the same way
Choice reverses here
0
5
6
TIME
61
Self-control and (strict) matching theory
The concatenated strict matching law for reinforcer magnitude
and delay (see the generalised matching lecture) is:
B1 M 1 D 2

B 2 M 2 D1
where M is reinforcer magnitude, and D is reinforcer delay.
Note that for delay, a longer delay is less preferred, and
therefore D2 is on top.
(OK, we know SM isn’t right, and delay sensitivity isn’t
constant)
62
We will take the situation used by Green et al. (1981), and
work through what the STRICT matching law predicts:
The baseline is: M1 = 2, M2 = 6, D1 = 0, D2 = 4
B1 M 1 D 2 2 x 4 8

.

 
B 2 M 2 D1 6 x0 0
The choice is infinite. Thus, the subject is predicted always to
take the smaller, zero-delayed, reinforcer
(we could do all this using generalized matching, but it’s more
difficult. The way things change would be the same.)
63
Now, add T = 0.5 s, so M1 = 2, M2 = 6, D1 = 0.5, D2 = 4.5
B1 M 1 D 2 2 x 4.5 9

.

 3
B 2 M 2 D1 6 x0.5 3
The subject is predicted to prefer the smaller magnitude
reinforcer three times more than the larger magnitude
reinforcer, and again be impulsive. But its preference for the
immediate reinforcer has decreased a lot.
64
Then, when T = 1,
B1 2 x5 10

  1.67
B 2 6 x1 6
The choice is now less impulsive.
65
For T = 2, the preference ratio B1/B2 is 1 -- so now, the
generalised matching law predicts indifference between the
two choices.
For T = 10, the preference ratio is 0. 47 -- more than 2:1
towards the larger, more delayed, reinforcer. That is, the
subject is now showing self control
The whole function is shown next -- predictions for Green et al.
(1981) assuming strict matching.
66
MATCHING LAW PREDICTIONS
Self control
LOG RESPONSE RATIO
This graph shows
log (B2/B1), rather
than (B1/B2)
because I wanted
to show you how
self control
increases as you
go back in time
from when the
reinforcers are
due.
1.0
0.5
0.0
Impulsive
-0.5
-1.0
0
5
10
15
20
25
VALUE OF T
67
Green et al.’s
actual data
2
GREEN ET AL. (1981)
LOG RESPONSE RATIO
MEAN DATA
1
SELF CONTROL
0
-1
IMPULSIVITY
-2
0
5
10
15
20
25
VALUE OF T
68
Commitment
69
Commitment in the laboratory
Rachlin & Green (1972)
Pigeons chose between:
EITHER
allowing themselves a later choice between a small shortdelay (SS) reinforcer or a large long-delay reinforcer (LL),
OR
denying themselves this later choice, and can only get the LL
reinforcer.
70
Rachlin & Green (1972)
Larger later
W
Smaller sooner
Larger later, no
choice
W
Blackout
T
Reinforcer
71
As they moved the time T at which the commitment response
was offered earlier in time from the reinforcers (from 0.5 to 16
s), preference should reverse.
Indeed, Rachlin and Green found that 4 out of 5 birds
developed commitment (what we might call a commitment
strategy) when T was larger.
72
Human research
Mischel – human research
Much human data replicated with animals by Neuringer &
Grosch (1981).
For example, making food reinforcers visible upset self control,
but an extraneous task helped self control.
73
Mischel & Baker (1975)
Experimenter puts one pretzel on a table and leaves the room
for an unspecified amount of time.
If the child rings a bell, experimenter will come back and child
can eat the pretzel.
If the child waits, experimenter will come back with 3 pretzels.
Most children chose the impulsive option.
But there is apparently a correlation with age, SES, IQ scores.
(correlation!)
74
Mischel & Baker (1975)
Self control less likely if children are instructed to think about
the taste of the pretzels (e.g., how crunchy they are).
Self control was more likely if they were instructed to think
about the shape or colour of the pretzels.
75
Can nonhumans be trained to show sustained self control?
Mazur & Logue (1978) - Fading in self control
Choice 1
Choice 2
Delay (s)
6
6
Magnitude (s)
2
6
Preferred Choice 2 (larger magnitude, same delay) -- Self
control
Over 11,000 trials, they faded the delay to the smaller
magnitude (Choice 1) to 0 s -- and self control was
maintained!
76
Additionally, and this is important, self control was even
maintained even when the outcomes were reversed between
the keys.
In other words, the pigeons didn’t have to be re-taught to
choose the self control option, but applied it to the new
situation.
77
Contingency contracting
A common therapeutic procedure:
e.g., “I give you my CD collection, and agree that if I don't lose
0.5 kg per week, you can chop up one of my CDs -- each
week.”
You use the facts of self control -- i.e., you say "let's start this a
couple of weeks from now" and the client will readily agree -- if
you said, "starting today", they most likely would not.
It's easy to give up anything next week...
78
Social dilemmas
A lot of the world’s problems are problems of self control on a
macro scale.
-Investment strategies
Rachlin, H. (2006). Notes on discounting. Journal of the
Experimental Analysis of Behavior, 85, 425- 435.
“In general, if a variable can be expressed as a function of its
own maximum value, that function may be called a discount
function. Delay discounting and probability discounting are
commonly studied in psychology, but memory, matching, and
economic utility also may be viewed as discounting
processes.”
79
Operant Conditioning
FUNCTIONAL PROPERTIES AND CURVE
FITTING
• What is the “real” delay function?
Vt = V0 / (1 + Kt)
Vt = V0/(1 + Kt)s
Vt = V0/(M + Kts)
Vt = V0/(M + ts)
Vt = V0 exp(-Mt)
Operant Conditioning
Operant Conditioning
Operant Conditioning
Operant Conditioning
Operant Conditioning
Operant Conditioning
Operant Conditioning
Operant Conditioning
Operant Conditioning
Operant Conditioning
Operant Conditioning
Operant Conditioning
Operant Conditioning