reinforcement

Transcript reinforcement

LEARNING
Prof.Dr.Rana ÖZEN KUTANİS
1. THE CONCEPT OF LEARNING
AND ITS DEFINITION
1.
2.
3.
4.
Learning is a quite permanent change in
behavior as a result of reinforced repeating or
experience.
There are 4 points which are emphasized in this
definition.
Learning is a change in behavior. This change
may be in a positive or negative way.
In order to define a change in behavior as
permanent, this change has to be quite
permanent.
For learning, there should be some kind of
repetition or experience (experiencing the event).
In order to make learning happen, there should
be repetition or reinforcement of the experience
in any way.
2. RESEARCH MADE ON
LEARNING
The process of learning is under the
influence of many variables. In order to
understand learning, it is necessary to know
which variables are influential in the
learning process and under what conditions
they are influential
The most widely used learning exercises are
word lists, problem solving and word pairs.
3.THEORIES OF
LEARNING
3.1. Behaviorist Learning Theories:
According to the behaviorist learning
theories, learning is a relationship between
the stimuli and behavior. Therefore, in order
to clarify learning, it is necessary to know
the quality of the relationship between the
stimuli and behavior.
3.1.1. Classical Conditioning:
The basis of Pavlov’s experiments of
forming a relationship between the stimuli and
response is determining the fact that external
stimuli may be influential on the emergence of
reflexive behaviors. In the experiment Pavlov
conducted on dogs, he examined whether the
stimuli which are normally ineffective (neutral)
in terms of saliva reflex such as bell ringing or
turning on lights will cause saliva reflex or not.
Food (Unconditioned Stimulus) Saliva Release (Unconditioned
response)
Bell ringing (Conditioned Stimulus) Saliva Release (Conditioned
response)
When food (unconditioned stimulus) is presented, the
unconditioned response emerges because in order that
this reflex emerges, the experimental subject does not
need to experience an event and learn it. However, in the
second condition, there is a bell (conditioned stimulus)
and a conditioned response, because the experimental
subject here (the dog) has been conditioned by the
experience of food-bell matching constantly and has
learnt it.
Also, even fear can be taught.
Experimental Neurosis: If the experimental
subject is under stress while the experiment is conducted,
certain kinds of overreaction may occur.
3.1.2. Operant (Instrumental)
Conditioning:
It has been stated that the stimulus does not
trigger a reaction all the time while learning a
behavior, these are mostly voluntary behaviors and
as a result of these trial-and-error kind of
behaviors, the events are repeated.
The cats which are put into an experimental
box form a relationship by the method of trial-anderror between the environmental effects (opening
the box by pushing buttons) and desired outcome
(the reward) to get out of the box and reach the
food. After a while, they repeat only the behaviors
which lead to the conclusion (pushing the button),
not the others (e.g. touching the walls). The
influence of these outcomes on behavior is called
Law of Effect.
The Differences Between
Classical Conditioning and
Operant Conditioning:
1- While classical conditioning is applied to
reflexes, operant conditioning is applied
to voluntary behaviors. The first kind of
behavior is called reactional behavior
and the second one is called operant
(instrumental) behavior.
2- Although the operant behavior is not a
reaction to a stimulus within the
environment, it is related to some
factors within this environment,
especially to rewards and punishments.
Superstitious Behavior: The behavior
and the outcome are independent.
3.2. Cognitive Learning
Theory
According to the defendants of the cognitive learning
theory, learning occurs systematically and purposefully.
Associations between the stimulus and reaction occur with
cognitive processes.
Stimulus
Organism
Reaction
In an experiment, two groups of mice were put into a
labyrinth. The food was given to the group that exited the
labyrinth, not to the other group. The next time they were put
into the labyrinth, the group that was given food exited the
labyrinth within a shorter period of time and with less
mistakes.

While explaining the mouse’s
learning of the labyrinth,Tolman
stated that the mouse creates a
cognitive map in its brain…
3.3. Social Learning
Theory
Social learning theory consists of the combination
of environmental factors and experience by which the
behaviorist approaches explain learning and internal
factors on which the cognitive approach focuses.
Stimulus
Organism
Reaction
Outcome(Reward/Punishment)
Feedback
A new concept brought by the social learning
approach is self reinforcement. When an individual
acts in a specific way, he/she rewards himself/herself.
4. TYPES OF LEARNING
4.1. Learning by the Trial-and Error
Method
In this type of learning, when the individual is
confronted by a problem, he/she tries all the ways that
are strained from the information filter of the brain to
solve the problem and finds the solution. This solution
then becomes permanent. That is, the learning occurs.
For example, Edison, who invented the lightbulb,
had tried many materials and finally got the desired
result by using carbon. This type of learning is valid
when a problem has many solutions. However, it is not
a quick way of learning and requires too much effort
and time.
4.2. Learning by
Observation
It is a type of learning which occurs by
observing the behavior-outcome
relationships of others’, without any direct
experience. For example, by watching
television, reading a newspaper or from the
sayings and acts of others around us, we
can get information.
Learning by observation is also called
learning by imitation. However, it is not
just a simple imitation, it is about envisaging
and the evaluation of the behaviors of the
model.
Learning by observation is summarized under
three groups:
The Characteristics of the Observer: The
observer should have a developed cognitive
capacity enough to envisage the behaviors of the
model and he/she should be able to pay attention
to the behaviors of the model.
The Characteristics of the Model: The
more a model’s condition resembles the observers’,
the more easily the observer imitates the behaviors
of this model. For instance, children can learn some
skills more easily by taking their peers as a model
rather than the adults.
Reinforcement of the Modelled Behaviors:
If the behaviors chosen as a model result in
positive reinforcement, it becomes more probable
that the observer will imitate these behaviors.
4.3. Learning by Insight
For this type of learning, the qualities of the
intellect are very influential. The individual is
confronted with a problem. For a while, the
individual cannot make any progress in solving the
problem. However, after a while, he/she suddenly
recognizes the solution by reorganizing his/her
perceptions related to the problem. In other words,
he/she gains an insight.
In the experiment W. Kohler conducted, it was
determined that chimpanzees in a cage can reach a
stick placed at a reachable distance outside the cage
and they manage to get the bananas outside the
cage with this stick.
5. LEARNING
PRINCIPLES
5.1. Learning Curve
On the learning curves, the vertical axis shows the
measured performance of learning and the horizontal axis
shows the number of repetitions and experiences. In each
repeated trial, reaction power gradually increases.
5.1.1. Decreasing Efficiency Curve
This curve shows decreasing efficiency and acceleration.
If the activity is known beforehand, in the first trials the
performance increases rapidly, then, this speed decreases.
The learning of mental and kinetic activities is as stated in this
model, especially, the learning of routine activities.
LEARNING
PERFORMANCE
TRIALS AND TIME
5.1.2. Increasing
Efficiency Curve
It represents positive acceleration. It is not frequent. It is
encountered especially in situations where the individual
learns an issue and activity about which he/she has not
known anything previously. At the beginning, learning occurs
slowly. After a while, it becomes accelerated. Engineering,
market research, works regarding senior positions and works
regarding lower positions which require many skills are learnt
in this way
PERFORMANCE
TRIALS AND TIME
5.1.3. S Curve:
All types of learning have this kind of curve. It
is encountered while learning very difficult,
unknown works which require cognition (technical
works which require many skills), when the learner
does not bring any knowledge to the environment
of learning.
PERFORMANS
DENEMELER VEYA ZAMAN
5.1.4. Learning Plateau
In most learning cases, learning progresses at
a specific speed, then, it reaches a point from
which nothing new can be learned. There occurs a
plateau. It is encountered in low position jobs and
monotonous work which have a dead end.
PERFORMANS
DENEMELER VEYA ZAMAN
5.1.5. Skill Acquisition
Curve
This curve shows the most complicated form of
learning. After the plateau, an increasing activity is
recognized. The individual makes a sudden
progress in learning and this enhances his/her
performance further. The individual may remember
the skills he/she gained by perpetual repetition;
driving skills are an example. In this skill, even if
the individual suspends driving, he/she will retrieve
his/her skill again within a short period of time.
PERFORMANCE
TRIALS AND TIME
5.2. Fading Away of
Behavior
If the conditioned reaction is not reinforced, it
fades away over the course of time. If the
satisfactory event (that is, the reinforcer) is
retracted, the frequency of the conditioned
behavior decreases and inclines to fade away. In
the course of time, the behavior will not be
repeated.
The repetition of the desired behavior in the
work environment is possible only when the
individuals are constantly awarded with premiums
and praise. If there is no reward, the individuals will
be inclined not to behave in the desired way, over
the course of time.
5.3. Self Retrieval
After taking a break, the individual will again start to display
conditioned reaction on a specific level. If reinforcement
continues this time, conditioning occurs again. Otherwise, it
fades away. This shows that the conditioned reaction does not
disappear completely over the course of fading away, but is
suppressed.
CONDITIONING
FADING AWAY
DİNLENME ARASI
RECONDITIONING
The amount of
self retrieval
REINFORCED REPETITION
REINFORCED OR
5.4. Generalization
A stimulus generalization is formed towards
a stimulus resembling the stimuli encountered
previously and a reaction generalization is
formed by displaying different behaviors to get the
same result. The negative aspect of the
generalization is that it may lead to mistakes. (An
individual who had a bad experience with a doctor
may generalize it to all doctors.)
5.5. Differentiation
Differentiation is the opposite of generalization.
Not similarities but differences between the stimuli
and reactions are taken as the basis. In
differentiation, displaying different reactions to
different stimuli or behaving differently to get
different results is learned.
5.6.Reinforcement
It is the most important principle of learning.
According to the “Law of Effect” developed on
this issue, From the reactions displayed in the same
situation, the ones which are reinforced are
repeated and the ones which are punished are not
repeated.
The Law of Effect tells that to make learning occur,
constant reinforcement is required. However, in the
latent learning there is not any direct
reinforcement.
(Latent learning: The learning occurs but it is
not recognized, the informations are stored in
memory and they direct the behaviors when it is
required).
The reward which accompanies behavior in the
course of learning or follows it is called the
reinforcer.
5.6.1. Types of Reinforcer:
5.6.1.1. Positive-Negative
Reinforcers:
If the desired result is obtained by behaving in
a specific way, this result is a positive reinforcer
and it enhances this behavior.
If an undesired result is obtained by behaving
in a specific way, this behavior is avoided or altered
in order not to get this result again. For instance,
an individual who touches hot objects such as a
stove or a fire and gets burned, he/she will not
touch them again.
Undesired behavior
away
non-reinforcement
Fading
5.6.1.2. External and Internal
Reinforcers:




External reinforcer for an external result
(candy, good grade, doing homework for a
present),
If there is no external reward, internal
reinforcer (self-rewarding internal emotionplaying the piano, learning a new skill).
5.6.1.3. Primary and Secondary
Reinforcers:
Primary reinforcers, reinforcers which are not
learned (food, electrical shock),
Secondary reinforcers, learned reinforcers.
(Social stimuli: if interest, approval,
compassion etc. are repeated, they are positive
Reinforcement Application Program:
 1) Proportional programs: The reinforcer is
given after a specific number of ‘behaviors.’
 Fixed proportion : The number of behaviors is
fixed. (charge per piece)
 Variably proportional: The number of behaviors
changes. (lottery ticket, slot machine)
 2) Timed programs: The reinforcer is given after
a specific period of ‘time’
 Fixed time: The time is fixed. (Monthly salary,
hourly rate, weekly wage and daily wage etc.)
Variable timed: The time is variable. (Receiving a
letter, selling act of the marketing staff, pop-up
quizzes)
The Results of the
Reinforcement Program:

The frequency of the behavior in
proportional programs is more than
that of timed programs. The frequency
of the behavior is directly proportional
to the number of reinforcement.
Variable programs create more
resistant behaviors than fixed
programs (gambling addiction).
5.6.3. Control Factor:
In order to make the reinforcer function, behavior
and reinforcer relationship should certainly be
formed for the experimental subject. This control
power of the experimental subject facilitates
learning and reinforces the behavior.
Learned helplessness; In a situation where
an individual carries the sentiment of being unable
to control the result he/she got in a certain
situation over into other environments, and even if
he/she has the possibility to control it, he/she
becomes more passive, leading to low self-esteem,
depression and unsuccessful performance (flea,
fish).
6. COGNITIVE LEARNING
AND MEMORY
Our capacity to operate with information and
make it permanent is called memory. Thanks to
our memory, we do not have to learn all the things
we learned previously.
6.1. The Main Functions of Memory As the
Tool of Information Operation
Although memory has a more complex structure
than the most developed computer, it operates with
information just like other operation systems by
carrying out three basic functions. These functions
are coding, restoring and recalling of this
information.
6.1.1. Coding
The information related to the place, time and
frequency of repetition of some events are restored
by our memory automatically although we are
unaware of it. For instance, although we do not
make an effort to remember what we ate last night,
we can remember it easily if we want.
However, in order to code the information
related to the meaning, relations and organization
of the events, it is required to make a conscious
effort. For example, in order to keep the subjects
related to the course that we study in mind, we
need to make an effort. We can remember the
subject of these courses only when we act in this
way.
6.1.2. Restoring
Information is stored in two ways within
memory. In the short-term memory, the
information can be stored for a very short period of
time (20-30 seconds). When someone gives his/her
phone number to us, we will forget the number
quickly if do not make an effort to keep it in mind.
However, in long-term memory, the amount of
information that can be kept in mind is almost
endless.
6.1.3. Recalling
To make recalling easier, we try to use objects
and words as cues and we classify and organize the
events we want to remember according to their
mutual characteristics. In this respect, the events
we recall are not a copy of the original events.
7. LEARNING
STRATEGIES
7.1. Learning as a whole or by dividing into
parts
Surveys indicate that when the material is long
or when it can be divided into parts easily, this kind
of learning is easier and more efficient.
When a university student studies a unit in his
course book, he has to read it quickly by paying
attention to the titles, return to the details which
require special attention and study the whole unit
again. One of the strategies which is most suitable
for university students is the strategy of wholepart-whole.
7.2. Reading and Explaining
To make learning permanent and efficient, the
reading material should be read actively. One of the
easiest ways of comprehending the core of reading
material is to repeat and organize the material in
our own words.
7.3. Feedback and Programmed Learning
Another way of enhancing learning and making
it efficient is to give information to the learner
about how much he/she learned has learned about
the core material. This process is called
“feedback.” In the Programmed Learning
Technic, the material to be learned is divided into
small steps from difficult to easy, which is not hard
to comprehend.
THANK YOU...

reinforcement

Transcript reinforcement

Directory