Transcript Slide 1

580.691 Learning Theory
Reza Shadmehr
Neural mechanisms of classification
Generalization in linear classification
Patient H.M.
27 year old assembly line worker who had suffered from untreatable and debilitating
temporal lobe seizures for many years. Surgeon removed medial portion of the temporal
lobes bilaterally (only right lobe’s removal is shown on the figure on the right).
H.M.’s seizures were improved, but there was a devastating side effect: he could no longer
form long-term memories.
Kandel et al. Principles of Neural Science 2000 (62-1)
R. Carter (1998) Mapping the Mind
Patient H.M.
• After recovery from surgery, he maintained his vocabulary and language skills,
maintained his high IQ, and ability to recall facts about his life that preceded the surgery:
• could remember job that he held, where he had lived, and events of childhood. His
memory of public and personal events extend only to when he was 16 years old
(1942), 11 years before his operation. This is not typical of an amnesic individual,
who generally remember facts and events up to near the date of their brain damage.
• normal immediate memory: he can retain a number for a short period of time. He
can carry on a conversation.
• could not recognize people that he had talked to just the day before at the hospital.
He does not know where he lives, who cares for him, or what he ate at his last meal.
• He rarely complains. There could be something seriously wrong with him, but you
would have to guess. At the nursing home, when H.M. is observed to be acting
differently, the nurses question him by running through a list of possible complaints,
such as toothache, headache, stomachache, until they hit upon the correct one. He
will not spontaneously say that “I have a toothache”.
Corkin, Seminars in Neurology 4:249-259 1984.
Immediate memory is intact in amnesia
• Subjects with medial temporal lobe damage and normal individuals were read a
sequence of digits (for example, 5-7-4-1) and then asked immediately to repeat back the
sequence.
• Each time the subject was successful, the number of digits in the test sequence was
increased by one.
• Digit span: the number of digits that was successfully repeated back before a subject
failed twice at the same sequence.
• The amnesic patients and the control subjects both repeated back an average of 6.8
digits.
Cave and Squire
Delayed recall in H.M. became severely impaired within 1 minute
Delayed paired-comparison task.
Clicks, flashes, tones, or hues were
presented and then some seconds
later, the same or another cue was
presented and the subject was asked to
determine whether the two stimuli were
the same or different.
Average performance of H.M.
Source: Brenda Milner
Mirror tracing task in H.M.
While viewing hand in mirror, H.M. tries to trace between the
two lines. Number of errors refers to times that the border was
crossed.
• Could learn to do mirror writing: performance would
improve with practice and remain good on next day,
despite no conscious recall of prior practice.
Lesions of the temporal lobe appear to affect forms of
learning and memory that require a conscious record, and
are called declarative memories.
Kandel et al. Principles of Neural Science 2000 (62-2)
Memory systems of the brain
Non-declarative memory is expressed
through performance rather than
recollection.
Squire (2004) Neurobiology of Learning and Memory
Review of online linear classification
x f   x1
x f 
T
x   1 x1
x f 
y  0,1


P y ( n )  1 x( n ) 
w
( n 1)
g x

1

1  exp wT xi 
w
(n)
(n)

w ( n 1)
T
 q( n)
1
x n 
(n)
(n)
  (n)
y  q   n T  n 
(n) 
q 1  q 
x x
  1
P y ( n )  1 g  x( n )  

Linear classification with linear
encoding of feature space
g1  x

(n)
f
1

1  exp wT g  x( n ) 
gm  x

(n)
f

T
Linear classification with
non-linear encoding of
feature space
 q(n)
(n)
g
x


1
(n)
(n)
 w(n)   (n)
y

q

 (n) T (n)
q 1  q ( n ) 
g x  g x 
Knowlton et al. (1996) “A neo-striatal habit learning system in humans”
Science 273:1399
Task: Individuals learned to predict
which of two outcomes would occur
on each trial, given the particular cue
that appeared.
x
p  x
P  s  1 x
Setting up the Knowlton et al. (1996) task in on-line learning
 x1 
x   
 x4 
x1  0,1
p  xi s  1
x
 i i
P  xi  1 s  1   i
P  xi  1 s  0   i
P  xi  0 s  1  1   i
p  xi s  0   i i 1  i 
P  s  1 xi  
P  s  0 xi  
1 xi 
p  xi s  1 P  s  1
p  xi 

p  xi s  0  P  s  0 
p  xi 
1 x
 ixi 1   i  i 
i
p  xi 
1    ixi 1  i 1 xi 

p  xi 
1 x
 ixi 1   i  i 

P  s  0 xi  1     xi 1   1 xi 
P  s  1 xi 
i
P  s  1 x  ?
Let’s begin with the simpler problem of
observing only one cue. We want to know
the probability of sunshine, given that the
one cue was observed.
1  i 1 xi 
x
P  s  1  
log
P  s  1 xi 
P  s  0 xi 
 log   xi log  i  1  xi  log 1   i   log 1   
 log 1     xi log i  1  xi  log 1  i 
P  s  1 xi 
P  s  0 xi 
 wi xi  c
 exp  wi xi  c 
P  s  1 xi   exp  wi xi  c  P  s  0 xi 

 exp  wi xi  c  1  P  s  1 xi 

P  s  1 xi  
exp  wi xi  c 
1  exp  wi xi  c 
1
1  exp   wi xi  c 





p xi , x j s  1  p  xi s  1 p x j s  1
  i i 1   i 
1 xi 
x


p xi , x j s  0   i i 1   i 
1 xi 
x
x

 j j 1 j
x

1 x j 

 j j 1  j
1 x j 


  p  xi , x j s  1 P  s  1
P  s  0 xi , x j  p  xi , x j s  0  P  s  1
P s  1 xi , x j

log
x
1 x
 ixi 1   i  i   j j 1   j

1 x j 

x
1 x 
1    ixi 1  i 1 xi   j j 1   j  j

 w x w x c
i i
j j
P  s  0 xi , x j 
1
P  s  1 xi , x j  
1  exp   wi xi  w j x j  c 
P s  1 xi , x j
P  s  1 x 

1
1  exp  wT x  c

Therefore, the weather
forecasting task is linear
classification in the feature
space of the cards.
Parkinson patients were impaired in learning the classification task, while
amnesic patients were normal
PD-star represents the PD patients with
the most severe symptoms. PD also
involves damage to the frontal lobe. They
tested frontal patients and found that they
were normal in learning the classification
problem. When PD patients were tested
on an additional 100 trials, their
performance was now comparable to
control subjects. This was a little
puzzling.
Similar to PD patients, Huntington’s disease patients exhibited impaired ability to learn the weather
prediction task. (Knowlton et al., Dissociations within nondeclarative memory in Huntington’s
disease, Neuropsychology 10 (1996) 538–548.
After completing the task, subjects were given eight multiple-choice questions to
determine how well they remembered the testing situation. These questions asked,
for example, about the layout of the screen, the number of cards that could appear
together on the computer screen, the number of weather prediction trials presented,
and the appearance of the cues.
Medial temporal lobe structures damaged in Amnesic patients appear to support
acquisition of “declarative” memory of the training episode. In contrast, basal
ganglia structures damaged in Parkinson’s disease appear to support acquisition of
internal models for classification.
Witt et al. (2002) Dissociation of Habit-Learning in Parkinson's and
Cerebellar Disease. J. Cognitive Neurosci 14:493
Eldridge et al. (2002) Intact Implicit Habit Learning in Alzheimer's Disease.
Behavioral Neurosci 116:735
cerebellar damage
control
Alzheimer’s disease
Brief notes on Alzheimer’s disease: In
early stages of the disease, there is
neurodegeneration in the medial temporal
lobes, similar to damage observed in
amnesic patients. In later stages, neuronal
loss extends to the neocortex.
Parkinson’s
disease
In the post-experiment
interview (explicit memory
component), recall of AD
patients did not differ
from chance.
Poldrack et al. (2001) Interactive memory systems in the human brain.
Nature 414:546
A “block” design: one group of subjects performed the FB task (and the baseline task), while
another performed the PA task (and the baseline task). Classification ability at end of training was
similar for the two groups.
Between subject contrast: PA vs. FB
The FB task requires that you first select the class, and then you are provided with an error signal
regarding your choice. In the PA task, there is no explicit error signal because no choices are made.
Poldrack et al. (2001) Interactive memory systems in the human brain.
Nature 414:546
Activity in caudate
Activity in hippocampus
Plot shows activity (with respect to baseline) in an event related design during the feedbacklearning task. Initially, as the task is performed there is increased activity in the hippocampus and
decreased activity in the caudate. With further training, the caudate activity increases and the
hippocampus activity declines. This suggests there may be a competition between these two
memory systems in the brain.
Generalization properties of classifiers
Study items
Test items
Prototype Low distortion
Percent correct
Control
Amnesic
High distortion
Random
Knowlton and Squire (1993) The learning of categories: parallel
brain systems for item memory and category knowledge.
Science 262:1747.
40 examples were generated from a prototype and studied.
Subjects were instructed that all examples belonged to the
same category. Five minutes later, performance was measured
on 84 new examples generated from the same prototype.
Subjects were asked “does this belong to the same category?”
A generalization function for a linear classifier: system identification
P  y  1 x, w  
1
1  exp   w T g  x  
q ( n )  P  y  1 x( n ) , w ( n) 
o( n )  x  
“odds”
Error experienced
in trial n
P  y  1 x, w ( n ) 
P  y  0 x, w ( n ) 
1  P  y  1 x, w ( n ) 

1
exp  w ( n )T g  x  
y (n)  y (n)  q( n)
w ( n 1)  w ( n )  y ( n )
Generalization
function

P  y  1 x, w ( n ) 
b  x, x ( n )   
o ( n 1)  x  
o
( n 1)
g  x( n ) 

q ( n ) 1  q ( n )  g  x ( n )  g  x ( n ) 
T
g  x( n)  g  x 
T
g  x( n)  g  x( n) 
T
1



( n )T
(n)
( n)
exp  w g  x   y
b  x, x  
(n)
( n)


q 1  q 


x  o
(n)



(n)
( n)
b  x, x  
 x  exp  y ( n )
( n)

q 1  q 


A generalization function for a linear classifier: system identification
“State” of the learner:
log of the odds
z
(n)
 x   log o  x   log
(n)
log o( n 1)  x    log o( n )  x    y ( n )
State transition
equation
z ( n 1)  x   z ( n )  x   y ( n )
P  y  1 x, w ( n ) 
P  y  0 x, w ( n ) 

q ( n ) 1  q ( n ) 

q ( n ) 1  q ( n ) 
b  x, x( n ) 
b  x, x ( n ) 
Error in trial n
Generalization function
Input where error was
experienced
Early in training
After 300 trials
mean+/-SD
Catch Trial
Shadmehr, Brandt & Corkin, J Neurophysiol 1998
Smith and Shadmehr (2005) Intact ability to learn internal models of arm dynamics
in Huntington’s disease but not cerebellar degeneration. J. Neurophysiology
Cerebellar patients
Training set (bin=100 trials)
Huntington’s Disease patients