Reinforcement - Webcourses - University of Central Florida

Download Report

Transcript Reinforcement - Webcourses - University of Central Florida

Basic Learning Processes
Robert C. Kennedy, PhD
University of Central Florida
[email protected]
Chapter 5 - Operant Learning:
Reinforcement
Vocabulary
•
•
•
•
•
•
•
•
•
•
Automatic reinforcer: Another term for natural reinforcer.
Behavioral momentum: A term used to refer to the strength of a reinforced behavior. It
is worth noting that the strength can be measured in various ways, including resistance
to non-reinforcement and aversive consequences for the behavior.
Conditioned reinforcer: See secondary reinforcer.
Contrived reinforcer: Reinforcing events that have been arranged by someone, usually
to modify behavior.
Dopamine: A neurotransmitter that appears to be important in reinforcement.
Drive-reduction theory: The theory of reinforcement that attributes a reinforcer’s
effectiveness to the reduction of a drive.
Escape-avoidance learning: A form of negative reinforcement in which the subject first
learns to escape, and then to avoid, an aversive.
Establishing operation: A motivating operation that increases the effectiveness of a
reinforcer. See the comment on motivating operations, below.
Generalized reinforcer: Conditioned (or secondary) reinforcers that have been paired
with many different kinds of reinforcers.
Instrumental learning: See operant learning.
Vocabulary
•
•
•
•
•
•
•
•
•
•
•
•
Law of effect: The statement that behavior is a function of its consequences.
Motivating operation: This term is now generally used to mean any procedure that makes a reinforcer
more effective (establishing operations) or less effective (abolishing operations).
Natural reinforcer: Reinforcing events that follow automatically (naturally) from the behavior.
Negative reinforcement: A reinforcement procedure in which a behavior is followed by the removal of,
or a decrease in the intensity of, a stimulus.
Negative reinforcer: Any stimulus which, when removed following a behavior, increases or maintains the
strength of that behavior.
One-process theory: The view that avoidance and punishment involve only one process, operant
learning.
Operant learning: Any procedure in which a behavior becomes stronger or weaker (e.g., more
or less likely to occur), depending on its consequences. (Any suggestion that the term applies only to
animal learning should be corrected.)
Positive reinforcement: A reinforcement procedure in which a behavior is followed by the presentation
or an increase in the intensity of a stimulus.
Positive reinforcer: Any stimulus which, when presented following a behavior, increases or maintains the
frequency of that behavior.
Premack principle: The observation that high-probability behavior reinforces low-probability behavior.
Primary reinforcer: Any reinforcer that is not dependent on another reinforcer for its reinforcing
properties.
Vocabulary
•
•
•
•
•
•
•
•
•
Reinforcement: The procedure of providing consequences for a behavior that increase or maintain the
strength of that behavior.
Relative value theory: Theory of reinforcement that considers reinforcers to be behaviors rather than
stimuli and that attributes a reinforcer’s effectiveness to its probability relative to other behaviors.
Response-deprivation theory: The theory of reinforcement that maintains that a behavior is reinforcing
to the extent that the organism has been deprived (relative to its baseline frequency) of performing that
behavior.
Reward learning: Another term for positive reinforcement. (In reward learning, we learn from
reinforcing consequences.) Some instructors strongly oppose the use of this term, but it is used by
biologists and neuroscientists, so it may be a good idea to point out that what people in those fields are
talking about is positive reinforcement.
Reward pathway: A poorly defined area in the septal region of the brain thought to provide the
physiological basis of at least some forms of reinforcement. Formerly called the reward center.
Satiation: A sharp reduction in the reinforcing power of an event due to repeated exposure to that
event. Primary reinforcers are particularly prone to satiation.
Secondary reinforcer: Any reinforcer that has acquired its reinforcing properties through its association
with other reinforcers.
Sidman avoidance procedure: An escape-avoidance training procedure in which no stimulus regularly
precedes the aversive stimulus.
Two-process theory: The view that avoidance and punishment involve both Pavlovian and operant
procedures.
Reinforcement Learning
• Behavioral approach which assumes that
reinforcement conditions behavior and that
behavior is environmentally caused.
• Doesn’t concern itself with what initiates
behavior
• Ignores feelings, expectations, and attitudes, all
cognitive variables that are known to influence
behavior
Reinforcement Learning
• Thorndike’s early work led him to formulate
the law of effect: rewarded behavior is likely
to occur again.
• Skinner built on this foundation: He invented
most of the terms of operant learning used
today, and described the basic procedures.
Reinforcement Learning
• Operant and Pavlovian procedures differ in
several ways
– operant procedures involve response-contingent
events, whereas Pavlovian conditioning involves
stimulus-contingent events
– rate of operant learning is affected by the degree of
contingency and contiguity, reinforcer characteristics,
and other factors.
– extinction of an operant response involves the
withholding of reinforcing consequences.
Reinforcement Learning
• There are three main theories of positive
reinforcement
– drive-reduction
– relative value
– response deprivation
• There are two theories of avoidance
– two-process
– one-process
Law of effect
• According to Thorndike, behavior is strengthened
of weakened by its consequences
• Four key elements:
1) Environment (behavior where situation occurs)
2) Behavior that occurs
3) Change in environment following behavior
4) Change in behavior produced by consequence
Operant Learning
• Skinner called experiences where behavior is
strengthened or weakened by consequences
“operant learning”
• Reinforcement: an increase in strength of
behavior due to its consequence
Positive Reinforcement
• Also called reward learning
• Consequence of behavior is the appearance or
increase in intensity of a stimulus
• Positive reinforcer: something individual seeks
out (e.g. success, money, approval, food)
Negative Reinforcement
• Also known as escape learning or escapeavoidance learning
• Behavior is strengthened by removal, or
decrease in intensity of stimulus
• Reinforcement of behavior by escaping from
the aversive situation
Operant Conditioning
• Both positive and negative reinforcement
increases strength of behavior
• Positive reinforcement adds to the situation
• Negative reinforcement removes from the
situation
Operant Chamber
•
•
•
Using Thorndike's law of effect as a starting point, Skinner
developed the Operant chamber, or the Skinner box, to study
operant conditioning.
Comes with a bar or key that an animal manipulates to obtain a
reinforcer like food or water.
The bar or key is connected to devices that record the animal’s
response.
Walter Dawn/ Photo Researchers, Inc.
Reinforcers
• Some researchers argue that Pavlovian and operant
procedures are really different aspects of the same
phenomenon.
• Taste aversion provides an example.
• However, to consider it a form of operant learning:
Eating a particular flavor is punished by nausea.
• Thus, whether learning is Pavlovian or operant may
depend on how one looks at the events.
Types of Reinforcers
Reinforcement: Any event that strengthens the
behavior it follows. A heat lamp positively
reinforces a meerkat’s behavior in the cold.
Reuters/ Corbis
17
Primary & Secondary Reinforcers
• Primary Reinforcer: An innately reinforcing stimulus
like food or drink.
• Conditioned Reinforcer: A learned reinforcer that gets
its reinforcing power through association with the
primary reinforcer.
18
More types of reinforcers
• Primary reinforcers (unconditioned reinforcers): food,
water, sexual stimulation
- Lose effectiveness quickly with satiation
• Secondary reinforcers (conditioned reinforcers): result of
learning experiences
• Generalized reinforcers: reinforcers that have been paired
with many different kinds of reinforcers (e.g. money)
• Contrived reinforcers: events that are provided by someone
for purpose of modifying behavior (e.g. give reward when
child does something correctly)
Immediate & Delayed Reinforcers
• Immediate Reinforcer: A reinforcer that occurs instantly
after a behavior. A rat gets a food pellet for a bar press.
• Delayed Reinforcer: A reinforcer that is delayed in time
for a certain behavior. A paycheck that comes at the end
of a week.
• We may be inclined to engage in small immediate
reinforcers (watching TV) rather than large delayed
reinforcers (getting an A in a course) which require
consistent study.
More types of reinforcers
• Primary reinforcers (unconditioned reinforcers): food,
water, sexual stimulation
- Lose effectiveness quickly with satiation
• Secondary reinforcers (conditioned reinforcers): result of
learning experiences
• Generalized reinforcers: reinforcers that have been paired
with many different kinds of reinforcers (e.g. money)
• Contrived reinforcers: events that are provided by someone
for purpose of modifying behavior (e.g. give reward when
child does something correctly)
Variables influence operant learning
• Primary reinforcers (unconditioned reinforcers): food,
water, sexual stimulation
- Lose effectiveness quickly with satiation
• Secondary reinforcers (conditioned reinforcers): result of
learning experiences
• Generalized reinforcers: reinforcers that have been paired
with many different kinds of reinforcers (e.g. money)
• Contrived reinforcers: events that are provided by someone
for purpose of modifying behavior (e.g. give reward when
child does something correctly)
Operant Learning Influences
• Contingency: refers to likelihood that a reinforcer
will follow a behavior (i.e. the more reliably a
reinforcer follows behavior, the more it strengths
the behavior)
• Contiguity: Gap in time between a behavior and
its reinforcing consequence
• Some reinforcers work better than others
through size and strength
Neuromechanisms Reinforcement
• Work by Olds and Milner demonstrated the
reinforcing potential of electrical stimulation
of the brain
- Implanted electrodes stimulated “reward
pathway”, where dopamine is produced when
stimulated
Hull’s Drive Reduction Theory
•
This theory states that organism , especially humans , learn to perform
•
Behavior that have the effect of reducing their biological drives .
•
Hull’s drive reduction theory is based upon his mathematical formulation
•
Known as: Hull’s law
•
The equation reads as follows :
E = H x D where
E = Energy or Response Potential :
The energy for performing the behavior , which is directly related to the
probability of the behavior being completed .
H = Habit : the strength of particular stimulus-response association
D = Drive : the strength of biologically – based homeostatic need
Theories of Positive Reinforcement
• Relative Value Theory: high probability
behavior reinforces low-probability behavior
• Response-Deprivation Theory: behavior
becomes reinforcing when the individual is
prevented from engaging in the behavior at its
normal frequency
Theories of Avoidance
• Two-process theory: Pavlovian and operant
learning experiences are involved in avoidance
learning
• One-process theory: Avoidance involves only
operant learning
Two Process Theory of Avoidance
Explains avoidance learning in terms of two necessary
processes:
First, the subject learns to associate the warning
stimulus with the SAversive – what is this?
This is a classical conditioning process; the warning
stimulus of the light is the CS, the SAversive of shock is
the US.
CS (light)
CR (fear)
US (shock)
UR (fear)
Two Process Theory of Avoidance
Now, the subject can be negatively reinforced during the
warning stimulus; this is the second, operant
conditioning process
Removes
R
CS
i.e., reduces fear
Strengthens
Thus the two-process theory reduces avoidance
learning to escape learning; the organism learns to
escape from the CS and the fear that it elicits.
Premack Principle
• The Premack Principle, often called "grandma's rule," states
that a high-frequency activity can be used to reinforce lowfrequency behavior.
• Access to the preferred activity is contingent on completing
the low-frequency, non-preferred behavior.
•The high frequency behavior to use as a reinforcer can be
determined by:
•Asking what one would like to do.
•Observing one during free time.
•Knowledge of interests of a particular age group.
Next Class
• Next lecture Monday, 10/19, Chapter 6
• Practice Quizzes