Bridging LSTM Architecture and the Neural Dynamics during Reading

Download Report

Transcript Bridging LSTM Architecture and the Neural Dynamics during Reading

Bridging LSTM Architecture and the
Neural Dynamics during Reading
Peng Qian, Xipeng Qiu, Xuanjing Huang
School of Computer Science, Fudan University
Motivation
• Recently, the long short-term memory neural
network (LSTM) has attracted wide interest due to
its success in many tasks.
• LSTM architecture consists of a memory cell and
three gates, which looks similar to the neuronal
networks in the brain.
• However, there still lacks the evidence of the
cognitive plausibility of LSTM architecture as well as
its working mechanism.
Methodology
Understanding artificial neural network model via
brain-mapping paradigm
We need:
An appropriate cognitive task  story reading
An interesting model  LSTM language model
Brain imaging data (Wehbe, 2014)
• 8 subjects read the ninth chapter from Harry Porter
and the Philosopher's Stone.
• The words are presented at the center of the screen
one by one, staying for 0.5 seconds each.
• Each section started with a fixation period. The total
length of the four sections was about 45 minutes
(about 5180 words).
• fMRI data is acquired every 2 second.
Artificial representation – Language model
• word emb & hidden state: 50 dim.
• The other parameters are initialized
by randomly sampling from uniform
distribution in [-0.1, 0.1].
• Remaining chapters of the book is
used as the training data.
For a time step and certain window size,
generate a representation at .
<BOS>
he
pulled
his
broomstick
Alignment
yt: brain activity at the tth time step of
the story reading process.
at : the activations of internal neurons
in LSTM. at may be memory vector ct
and hidden state vector ht.
yt = M at
M is the linear mapping matrix
between at and yt , which is learnt by
the least square error.
Evaluation: average cosine distance
between the predicted and the true
brain activity & 20-fold cross-validation
Comparison of memory and hidden layer
Gate mechanism
• ‘Remove’ a certain gate by setting it as a constant allone vector.
Model comparison
• Both RNN hidden layer and LSTM hidden layer
cannot predict well.
Visualization
Raw brain activity
Predicted by memory vector
Predicted by hidden layer
Visualization
• Compute the voxel-wise
Pearson correlation of the
predicted and the real
brain activities for each
subject.
• Average the correlation
for a specific anatomical
region defined by AAL
(Automated Anatomical
Labeling) atlas.
Conclusion
• LSTM has the ability to encode the semantics of a
story by the memory vector. Compared to the simple
RNN, the overall architecture of LSTM should be
more cognitively plausible.
• The gating mechanisms are effective for LSTM to
filter the valuable information except the forget gates.
• The long-term memory can be well kept by LSTM.
When we deliberately cut the source of long-term
memory (by using small context window size), the
prediction accuracy decreases greatly.
Thanks for Listening!
All comments are welcome