Learning linguistic structure with simple recurrent networks

Download Report

Transcript Learning linguistic structure with simple recurrent networks

From Sequential Structure to
Semantic Interpretation:
More Connectionist Research on
Language Processing
PDP Class Lecture
February 14, 2011
The Simple Recurrent Network
• Network is trained on a stream
of elements with sequential
structure
• At step n, target for output is
next element.
• Pattern on hidden units is
copied back to the context units.
• After learning the network
comes to retain information
about preceding elements of the
string, allowing expectations to
be conditioned by an indefinite
window of prior context.
Learning about sentence
structure from streams of
words
Learned and imputed hidden-layer representations
(average vectors over all contexts)
‘Zog’ representation
derived by averaging
vectors obtained by
inserting novel item
in place of each
occurrence of ‘man’.
Elman (1991)
Prediction with
an embedded clause
Components tracking
constituents within
clauses of different
types.
Can we extend the approach
to address comprehension?
Who did what to whom, etc
8
Some factors in comprehension
• Sentence structure and constraints on events are both
important:
– The boy chased the girl.
– The girl chased the boy.
– The car was parked by the attendant.
– The car was parked by the lamppost.
– We ate some food with some friends that we like.
– We found a painting in the gallery that was painted by Rembrandt.
– The horse raced past the barn...
– The horse dragged past the barn…
– The cart raced past the barn…
Alternative Approaches
• Parsing based approaches:
– ‘Syntax proposes, Semantics Disposes’
• Although data were collected that initially seemed to support this,
further studies changed the picture (see next slide).
– Beam Search and Particle Filtering
• Keep several explicit alternatives active at a time; discard
alternatives as they become implausible.
• The PDP approach
– Use constituents of the sentence as they are
encountered to construct a representation of the event
described be the sentence directly.
– Keep a single distributed representation that implicitly
represents a mixture of possibilities.
A Syntactic Parsing Principle:
‘Minimal Attachment’
• The principle predicts that a prepositional phrase following
a direct object will be treated as a constituent of the verb
phrase of a sentence .
• This leads to the prediction that subjects will be slower to
read the last word of (b) relative to (a) below:
(a) The spy shot the policeman with the revolver.
(b) The spy shot the policeman with the binoculars.
• Although this seemed to be true for the sentences used,
the reverse is true for other sentences:
(a) The man read the article in the magazine.
(b) The man read the article in the bathtub.
What about the idea that everything
depends on the verb?
–The spy saw the policeman with binoculars
–The spy saw the policeman with a revolver
–The bird saw the birdwatcher with binoculars
–The bird saw its prey with binoculars
–The children collected …
–The rain collected …
Additional Aspects of Sentence
Comprehension
• Context helps us:
– Select the correct meaning of ambiguous words
• The boy hit the ball with the bat.
– Fill in missing information
• The boy spread the peanut butter on the bread.
– Shade and specify the ‘meaning’ of a particular word
•
•
•
•
•
The container held the apples.
The container held the coffee.
The boy kissed someone under the mistletoe.
The baby rolled the ball to her daddy.
The slugger hit the ball out of the park.
• John loves Mary.
• John loves ice cream.
• The pope loves sinners.
• The {writer/student/goat} finished the book.
The Role of Situation
(Elman, 2009)
• The shopper saved…
• The lifeguard saved…
• There was a big sale at the swimshop. The
lifeguard saved…
• … was skating … primes ‘arena’
• … had skated … does not
Do words have meanings, or are they clues
to meaning?
• For a first approximation, the lexicon is the store of words in long-term
memory from which the grammar constructs phrases and sentences.
• [A lexical entry] lists a small chunk of phonology, a small chunk of syntax,
and a small chunk of semantics.
– Ray Jackendoff
• My approach suggests that comprehension, like perception, should be
likened to Hebb's (1949) paleontologist, who uses his beliefs and
knowledge about dinosaurs in conjunction with the clues provided by the
bone fragments available to construct a full-fledged model of the original. In
this case the words spoken and the actions taken by the speaker are
likened to the clues of the paleontologist, and the dinosaur, to the meaning
conveyed through these clues.
– David Rumelhart
The Sentence Gestalt Model
• Input consists of sequences of words,
• After each word, net attempts to complete a set of rolefiller pairs (can probe with role or filler).
• Sentence gestalt is used to constrain completion and
serves as context for interpretation of next constituent.
• Rhode (2002) extended this model to allow probes for
fillers of roles with respect to particular head words (e.g.
verbs) so model could deal with embedded clauses.
A probabilistic formulation of back
propagation
• Think of the activation of a unit as representing the network’s estimate of the
probability that the unit should be on in the given context.
• We can measure the degree to which the observed target values match
their predicted values using a measure called ‘Cross-Entropy’
CEp = -Si [tiplog(aip) + (1-tip)log(1-aip)]
• If targets are actually probabilistic, minimizing CEp maximizes the probability
of the observed target values.
• The minimum value of the CE will occur when the activations match the
target probabilities. (SSE also has the same minimum, but lacks the explicit
probabilistic interpretation).
• [CE has the practical advantage of eliminating the ‘pinned output unit’
problem.]
Sentences can be active or passive,
constituents can be vaguely identified or
may be left out it strongly implied.
Changing interpretations of role
fillers as a sentence unfolds
St. John’s (1992)
Story Gestalt Model
B
• Learns from stereotyped multi-proposition stories
with slots and fillers
• Can answer specific questions, fill in missing
propositions based on typical proerties of scripts,
etc.
Limitations
• Can only deal with ‘one level’ event-structures
– Cannot handle embeddings or modifiers or consituents,
as in
– ‘The policeman saw that the young girl was bitten by the
mean dog’.
• Two follow-on approaches
– Use fuller probes for completions of embedded
propositions (Bryant and Miikulainen, 2001)
– Use a recursively-constructed compressed
representation of the semantics of the sentence (Rohde,
2002).
Compressed Decodable
Representation of a
Head-relation-filler triple
Hierarchical Compressed
Representation of a Moderately
Complex Sentence
Rohde’s (2002) Model
• Used a common
representation constrained
by three-role propositions
and sentences.
• Did prediction and production
as well as comprehension.
One complaint remains
• The models pre-supposes propositional
representations of events … that does not seem right.
• Can we get rid of stipulation of structure and query the
Gestalt with an English question?
• Can we create a target for learning based on an
actual scene representation rather than a
propositional representation?
Schematic of a Future Model
Event
Event