Descision making

Download Report

Transcript Descision making

‫נוירוביולוגיה ומדעי המוח‬
‫‪2010‬‬
Types of Machine Learning
1. Unsupervised Learning:
• Only network inputs are available to the learning algorithm.
• The network is given only unlabeled examples.
• Network learns to categorize (cluster) the inputs.
• Example: Hebbian plasticity rule
Wi (n  1)  Wi (n)  aXi (n)  Y (n)
Wi – weight of ith synapse
X – presynaptic activity
Y – postsynaptic activity;
n – number of synaptic changes (input patterns)
a – amplitude of learning
Hebbian Rules
• In 1949, Hebb postulated that the changes in a synapse are proportional to the
correlation between firing of the neurons that are connected through the synapse
(the pre- and post- synaptic neurons):
“Neurons that fire together, wire together”
• Examples:
 Classical conditioning
 Spike-timing-dependent synaptic plasticity (STDP)
‫‪Synaptic Plasticity and Memory‬‬
‫למידה – הפעלת דפוס פעילות על פני התאים שמייצג את המאורעות בעולם גורם לשינוי חוזקי‬
‫סינפסות ברשת נוירונים‪.‬‬
‫שליפת זיכרון – שפעול הקשרים שהשתנו מחדש עקב חשיפה לחלק מהדפוס שנלמד קודם‪.‬‬
‫‪ LTP‬כמנגנון הביאני ללמידה וזיכרון‪:‬‬
‫ מכיל פאזה מוקדמת ומאוחרת כתהליכים‬‫נפרדים שניתן לחסום פרמקולוגית רק אחד מהם‬
‫ ספציפי‬‫ אסוציאטיבי‬‫ קורלציה בין למידה ל‪LTP-‬‬‫(‪)classical conditioning, fear conditioning‬‬
‫ קורלציה בין חסימת ‪( LTP‬דרך חסימת ‪(NMDA‬‬‫לחסימת למידה ושליפת זיכרון )‪(Morris Water Maze‬‬
‫ יצירת ‪ LTP‬מלאכותי (גירוי חשמלי בלבד) יכול‬‫להחליף גירוי סנסורי שמוביל ללמידה וזיכרון‬
Application of the hebbian learning rule:
The linear associator
• The activation of each neuron in the output layer is given by a sum of
weighted inputs.
• The strength of each connection is calculated from the product of the preand postsynaptic activities, scaled by a “learning rate” a (which determines
how fast connection weights change).
Δwij = a * g[i] * f[j].
• The linear associator stores associations
between a pattern of neural activations in the
input layer f and a pattern of activations in the
output layer g.
• Once the associations have been stored in the
connection weights between layer f and layer g,
the pattern in layer g can be “recalled” by
presentation of the input pattern in layer f.
Types of Machine Learning
2. Reinforcement Learning:
• The network is only provided with a grade, or score, which indicates network
performance.
• The network learns how to act given an observation of the world. Every action
has some impact on the environment, and the environment provides feedback in
the form of rewards that guides the learning algorithm.
• Reinforcement learning differs from supervised learning in that correct
input/output pairs are never presented, and sub-optimal actions aren’t explicitly
corrected.
• Formally, the basic reinforcement learning model consists of:
a set of environment states S
a set of actions A
a set of scalar "rewards" in
Types of Machine Learning
(deducing a function from training data )
3. Supervised Learning:
• The network is provided with a set of examples of proper network behavior
(inputs/targets).
{ p1, t 1}  { p2, t 2}    {pQ,tQ }
- Experimenter needs to determine the type of training examples
- The training set needs to be characteristic of the real-world use of the function.
- Determine the input feature representation of the learned function (what and how
many features in the vector).
• The network generates a function that maps inputs to desired outputs.
• Example: the Perceptron
Application of Supervised Learning:
Binary Classification
• Given learning data: (x1,x2), (x1,x2), … ,(x1,x2)
• A model is constructed:
X
y  {0, 1}
Model
• The output y is a linear combination of x:
x1
x2
w1
w2
wm
xm
y
The Perceptron
h  W j  X j
1
Y  1  sgn  h  
2
j
Y – output ; h – sum of scaled inputs ; W – synaptic weight ; X - input
sgn() =1 if h>0, else sgn() = 0


1
Y  1  sgn W  X
2
x1
x2
w1
w2
wm
xm

y
Geometrical interpretation
W  X  W1  X1  W2  X 2
X1
W1
W2
X2
Y
W
X
W1
X2
W2
X1
Geometrical interpretation
W  X  W1  X 1  W2  X 2
 W cos W   X cos  X   W sin W   X sin  X 
 W X   cos W  cos  X   sin W  sin  X  
 W X  cos W   X 
W2
X2
W
X
W1
X1
Geometrical interpretation


1
Y  1  sgn W  X
2
1
 sgn  cos   
2


X1
W1

X2
W2
Y
W2
X2
W
X

W1
X1
The Perceptron
• A single layer perceptron can only learn linearly
separable problems.
• A single layer perceptron of N units can only learn N
patterns.
• More than one layer of perceptrons can learn any
Boolean function
• Overtraining: accuracy usually rises, then falls
Perceptron Learning
Demonstration
Perceptron Learning Demonstration
Input Features:
Taste
Sweet = 1, Not_Sweet = 0
Seeds
Edible = 1, Not_Edible = 0
Skin
Edible = 1, Not_Edible = 0
Output:
sweet fruit = 1
not sweet fruit = 0
We start with no knowledge:
Input
Taste
0.0
Output
Seeds
0.0
0.0
Skin
If ∑ > 0.4
then fire
Perceptron Learning
• To train the perceptron, we will show it each example and
have it categorize each one.
• Since it’s starting with no knowledge, it is going to make
mistakes.
• When it makes a mistake, we are going to adjust the weights
to make that mistake less likely in the future.
• When we adjust the weights, we’re going to take relatively
small steps to be sure we don’t over-correct and create new
problems.
1. We Show it a banana:
Taste
Input
1
Seeds
1
1
1
0.0
0.0
0.0
Skin
0
0
0.0
Output
0
If ∑ > 0.4
then fire
In this case we have:
[(1 * 0) = 0] + [(1 * 0) = 0] + [(0 * 0) = 0] = 0
Since that is less than the threshold (0.4), we responded “no.”
Is that correct? No.
Since we got it wrong, we need to change the weights using the delta rule:
∆w = learning rate * (overall teacher - overall output) * node output
∆w = learning rate * (overall teacher - overall output) * node output
1. Learning rate: We set that ourselves. Has to be large enough that learning happens in
a reasonable amount of time, but small enough not to go too fast. (let’s pick 0.25)
2. (overall teacher - overall output): The teacher knows the correct answer (e.g., that
a banana should be a good fruit).
In this case, the teacher says 1, the output is 0, so (1 - 0) = 1.
3. Node output: That’s what came out of the node whose weight we’re adjusting.
first node: ∆w = 0.25 X 1 X 1 = 0.25.
Taste
Input
1
1
Seeds
1
1
Skin
0
0
0.0
0.0
0.0
0.0
If ∑ > 0.4
then fire
Output
0
The Delta Rule
∆w = learning rate * (overall teacher - overall output) * node output
• If we get the categorization right, (overall teacher - overall output) will be
zero (the right answer minus itself).
In other words, if we get it right, we won’t change any of the weights.
• If we get the categorization wrong, (overall teacher - overall output) will
either be -1 or +1:
- If we said “yes” when the answer was “no,” we’re too high
on the weights and we will get a (teacher - output) of -1 which
will result in reducing the weights.
- If we said “no” when the answer was “yes,” we’re too low
on the weights and this will cause them to be increased.
The Delta Rule
∆w = learning rate * (overall teacher - overall output) * node output
• If the node whose weight we’re adjusting is “0”, then it didn’t
participate in making the decision. In that case, it shouldn’t be
adjusted. Multiplying by zero will make that happen.
• If the node whose weight we’re adjusting is “1”, then it did
participate and we should change the weight (up or down as needed).
How do we change the weights for a banana?
Feature:
Learning rate:
(overall teacher
- overall output):
Node output:
∆w
taste
0.25
1
1
+0.25
seeds
0.25
1
1
+0.25
skin
0.25
1
0
0
Taste
Input
1
1
Seeds
1
1
Skin
0
0
0.0
0.0
0.0
0.0
Output
0
If ∑ > 0.4
then fire
Input
Taste
Seeds
Skin
0.25
0.25
0.0
0.0
If ∑ > 0.4
then fire
Output
0
2. We Show it a pear:
Input
Taste
1
1
Seeds
0
0
Skin
1
1
0.25
0.25 0.25
0.0
If ∑ > 0.4
then fire
Output
0
We change the weights for a pear:
Feature:
Learning rate:
(overall teacher
- overall output):
Node output:
∆w
taste
0.25
1
1
+0.25
seeds
0.25
1
0
0
skin
0.25
1
1
+0.25
Adjusted weights for a pear:
Input
Taste
0.50
0.25
Seeds
Skin
0.25
If ∑ > 0.4
then fire
Output
3. We Show it a lemon:
Input
Taste
0
0
0.50
Output
Seeds
0
0
0.25
0.25
Skin
0
0
0
If ∑ > 0.4
then fire
0
We change the weights for a lemon:
Feature:
Learning rate:
(overall teacher Node output:
- overall output):
∆w
taste
0.25
0
0
0
seeds
0.25
0
0
0
skin
0.25
0
0
0
Adjusted weights for a lemon:
Input
Taste
Seeds
0.50
0.25
0.25
Skin
Output
If ∑ > 0.4
then fire
4. We Show it a strawberry:
Input
Taste
1
1
0.50
Output
Seeds
1
1
0.25
0.25
Skin
1
1
1
If ∑ > 0.4
then fire
1
We change the weights for a strawberry :
Feature:
Learning rate:
(overall teacher
- overall output):
Node output:
∆w
taste
0.25
0
1
0
seeds
0.25
0
1
0
skin
0.25
0
1
0
Adjusted weights for a strawberry :
Input
Taste
0.50
Seeds
0.25
0.25
Skin
If ∑ > 0.4
then fire
Output
The perceptron can now classify correctly any example.
5. We Show it a green apple:
Input
Taste
0
0
0.50
Output
Seeds
0
0
0.25
0.25
Skin
1
1
0.25
If ∑ > 0.4
then fire
0
Decision Making
• Neuroanatomical substrates of decision making:
Orbitofrontal cortex (within the prefrontal cortex):
Responsible for processing, evaluating and filtering social and emotional
information for appropriate decision making abilities.
It is seen to be involved because of on-line rapid evaluation of stimulusreinforcement associations, that is, learning to link a stimulus and action with its
reinforcing properties.
Anterior cingulate cortex:
Controls and selects appropriate behavior as well as monitors errors and incorrect
responses of the organism
Dorsolateral prefrontal cortex (DLPFC):
Monitors errors and make appropriate choices during decision making. Analysis of
cost-benefit in working memory.
Basal ganglia-thalamocortical circuits (BGTC) and frontoparietal networks:
Directing attention toward relevant information as opposed to irrelevant
information during goal-related decision making processes
Decision Making
• Neuroanatomical
substrates of decision making:
The dopaminergic system: Appears to be a primary substrate for the representation
of decision utility. Increased firing of dopamine neurons has been documented when
people are faced with unexpected rewards and in response to stimuli that predict future
rewards.
The Ventral Striatum: the center of integration of the ‘data’ between the prefrontal
cortex, amygdala and hippocampus. It plays a critical role in the representation of the
magnitude of anticipated reward
The Amygdala: involved in emotion and learning ; responsible for producing fear
responses. Plays a key role in the representation of utility from a gain or dis-utility from
losses.
Decision Making
•
Factors that impact decision making:
Expertise: with expertise come differences in the function and structure of
brain regions required for decision making and task completion.
-
-
London black cab drivers who are required to learn and memorize London streets
show a different degree of hippocampal volume distribution when compared to
ordinary drivers.
Physics experts use a ‘working forwards’ strategy to solve problems, making
decisions using the information given in the problem to derive a solution. In
contrast, neophytes to physics typically employ a ‘working backwards’ strategy in
which they start from the perceived goal state or decision and back track.
Age: with age come changes in the recruitment of specific brain regions for
task completion during decision making.
Older adults will often compensate for age-related declines in prefrontal structure
and function by recruiting additional prefrontal regions and more posterior regions
Sex: bias toward men for faster decision making in situations of uncertainty
and limited feedback.
Neural Activity Correlates of Decision Making
• Neural correlates of decision variables in parietal cortex (M.L. Platt & P.W.
Glimcher, 1999):
The gain (or reward) a monkey can expect to realize from an eyemovement response modulates the activity of neurons in the lateral
intraparietal area (LIP). In Addition, the activity of these neurons is
sensitive to the probability that a particular response will result in a gain.
Neural Activity Correlates of Decision Making
• “Neurons in the orbitofrontal cortex encode economic value” (C. PadoaSchioppa & J.A. Assad, 2006):
- Neurons in the orbitofrontal cortex (OFC) encode the value of offered and
chosen goods.
- OFC neurons encode value independently of visuospatial factors and
motor responses. (If a monkey chooses between A and B, neurons in the
OFC encode the value of the two goods independently of whether A is
presented on the right and B on the left, or vice versa).
Conclusion: economic choice is essentially choice between goods rather than
choice between actions.
Neural Activity Correlates of Decision Making
• “Microstimulation of macaque area LIP affects decision-making in a
motion discrimination task” (TD Hanks, J Ditterich & MN Shadlen, 2006):
- In each experiment, they identified a cluster of LIP cells with overlapping
response fields (RFs)
- Choices toward the stimulated RF were faster with microstimulation,
while choices in the opposite direction were slower.
- Microstimulation never directly evoked saccades, nor did it change
reaction times in a simple saccade task.
- These results demonstrate that the discharge of LIP neurons is causally
related to decision formation in the discrimination task.