Transcript Document

• Construction of phrases and sentences from morphemes
and words. Usually the word syntax refers to the way
words are arranged together.
• Syntactic structure and rules that determine syntactic
• There are various different models for computationally
modeling syntactic structure. Most of them are based on
Context Free Grammar, a formalism powerful enough to
model many phenomena occurring in natural language, and
yet computationally tractable
Three important notions related to syntax:
– Constituency refers to groups of words behaving as one single
unit, called constituent.
– Grammatical relations refer to notions about the role of words in
a sentence and the relations between such roles. E.g. Notions about
the subject and the object of a sentence.
– Subcategorization refers to the relations between words and
phrases and the syntactical preferences of words. E.g. The verb
want can be followed by an infinitive, but not the verb find.
I want to fly to Detroit
* I found to fly to Detroit
• How do words group together?
– Noun phrases:
three parties from Brooklyn
a high-class spot such as Mindy’s
the reason he comes into the Hot Box
• Certain linguistic evidence lead us to believe that these
words group together (form a constituent).
• Words belonging to similar group appear in similar syntactic
environments. E.g. noun phrases can be followed by a verb.
three parties from Brooklyn arrive...
a high-class spot such as Mindy’s attracts...
* from arrive...
* as attracts...
• Often such structures cannot be broken inside a sentence. E.g.
On September 17th, I’d like to fly from Atlanta to Denver.
I’d like to fly on September 17th from Atlanta to Denver.
* On September, I’d like to fly 17th from Atlanta to Denver.
Context-Free Grammars
• Context-free grammars (CFG) (or Phrase-Structure
Grammars) are a formalism for modeling constituent
• A CFG consists of a set of rules (or productions), each of
which expresses the ways that symbols of a language can
be grouped and ordered together, and a lexicon of words
and symbols.
• The symbols that correspond to words in the language are
called terminal symbols, while the symbols that express
generalization of these are called non-terminal.
Context-Free Grammars
• E.g.
NP -> Det Nominal (1)
NP -> ProperNoun (2)
Nominal -> Noun | Noun Nominal (3)
Det -> a (4)
Det -> the (5)
Noun -> flight (6)
• Terminals: a, the, flight
• Non-terminals: NP, Det, Nominal, ProperNoun, Noun
• A CFG is a device for generating sentences and a device for assigning
structure to a given sentence. An arrow -> can be thought as meaning
“rewrite the symbol on the left with the string of symbols on the right”.
Such rewrites are also called derivations e.g.
NP (1)-> Det Nominal (3)-> Det Noun (4),(6) -> a flight
• We say that a flight can be derived from the symbol NP. This can be
also represented as a tree.
Simple English Grammar Example
Example Lexicon
Example Parse Tree
[S[NP [PRO I] ] [VP [V prefer] [NP [Det a] [Nom [N morning] [N flight]]]]]
Context Free Grammars
• A CFG defines a formal language. All sentences that can
be derived by the CFG, starting from a set non-terminal
symbol (start symbol) belong to the language and are
called grammatical. Sentences that cannot be derived are
called ungrammatical.
• The problem of mapping from a string of words to its parse
tree is called parsing.
[NP What flights] leave in the morning.
[NP What flight] leaves in the morning.
* [NP What flight] leave in the morning.
How can a CFG grammar handle this agreement phenomenon? One
solution is to expand the grammar with multiple sets of rules, one rule
for each case. E.g.
S -> Aux NP VP
is broken into
S -> 3sgAux 3sgNP VP
S -> Non3sgAux Non3sgNP VP
3sgAux -> does | has | can | ...
Non3sgAux -> do | have | can | ...
In a similar way NP must be broken into 3sgNP and Non3sgNP
• This method for dealing with agreement doubles the size of
the grammar. In many other languages the problem is far
more complicated. E.g. in Greek there is gender agreement
and case agreement between.
• A more elegant way to deal with the problem of agreement
is through Unification Grammars that allow the
parameterization of non-terminal symbols of the grammar
with feature structures.
Verb Phrase and Subcategorization
• A Verb phrase consists of a verb and a number of other constituents.
VP -> Verb
VP -> Verb NP
prefer a morning flight
VP -> Verb NP PP
leave Boston in the morning
• Or a verb may be followed by more complicated complements
You [VP [S said you had a 266 dollar fare]]
[VP [V Tell] [NP me] [S how to get from the airport to downtown]]
I [V want [VP to arrange three flights]]
• But not every verb is compatible with every possible complement.
I want to fly to Detroit
* I found to fly to Detroit
Verb Phrase and Subcategorization
• We say that verbs subcategorize for different
complements. Traditional grammars distinguish between
transitive and intransitive verbs. Modern grammars
distinguish up to 100 different categories. The possible sets
of complements are called subcategorization frames.
eat, sleep
I want to eat
prefer, find
Find the flight to Boston
show, give
Show me the flights to Boston
PPfrom PPto
fly, travel
I would like to fly from Boston to New York
prefer, want
I would like to go to Boston
Does this mean AA has a hub in Boston
Spoken Language Syntax
• Several differences between spoken and written language
syntax. Usually in spoken language the term utterance is
used instead of the term sentence.
• In speech we deal with:
– Instead of punctuation we have pauses.
– Non-verbal events: [uh], [mm], [clear throat].
– Disfluencies.
Finite State vs. Context Free Grammars
• Why do we need to resort to CFG to model constituency in
syntax? Are the finite-state models we used for
morphology inadequate?
• The problem is recursion.
• Generally, it is not possible to fully model syntax using
FSAs, but it is often possible to approximate the behavior
of CFGs with FSA (e.g. by restricting the depth of the
Grammars and Human Processing
• Do people actually use CFGs in their mental processing of
language? We are not certain.
• Early studies showed that when people heard an
interruption (e.g. a click) in the middle of a constituent
they often misinterpreted it as occurring in a constituent
boundary. But this might have been because the constituent
also formed a semantic unit.
Grammars and Human Processing
• Other studies showed that when humans were presented with a certain
constituent structure. e.g.
IBM moved [NP a bigger computer] [PP to the Sears store]
it made it more likely that they use a similar structure like:
The wealthy widow gave [NP her Mercedes] [PP to the church]
instead of:
The wealthy widow gave [NP the church][NP her Mercedes]
• Some researchers claim that natural language syntax can be described by
formal languages and is separated from semantic or pragmatic
information (modularist position).
• Others claim that it is impossible to model syntactic knowledge without
including additional knowledge (e.g. semantic, intonational, pragmatic,
• Syntactic Parsing is the task of recognizing an input
sentence and assigning some syntactic structure to it. CFGs
are just a declarative formalism. In order to compute how a
parse tree will be assigned to a sentence we require a
parsing algorithm.
• Applications of parse trees: Word processing (grammar
checkers), semantic analysis, machine translation, question
answering, information extraction, speech recognition, ...
Parsing as Search
• Syntactic parsing can be seen as a search through all
possible parse trees to find the correct parse for the
sentence. The search space is defined by the grammar.
Parsing as Search
The correct parse tree for the sentence:
Book that flight
Parsing as Search
• The goal of the search is to find all trees whose root is the
start symbol S, and which cover exactly all the words in
the input. There are two kinds of constraints. One that
comes from the data and one that comes from the grammar.
• When the search is based on the grammar constraints, we
have a top-down or goal-directed search.
• When the search is based on the data constraints, we have
a bottom-up or data-directed search.
Top-Down Parsing
• A top-down parser tries to build a parse tree by building
from the root node S down to the leaves.
Bottom-Up Parsing
A bottom-up parser
starts with the input
and tries to build a
tree rooted in the
start symbol S,
which covers all the
Top-Down vs. Bottom-Up Parsing
• Top-down does not waste time exploring trees that cannot result in an
S, or subtrees that cannot exist in an S rooted tree. Bottom-up
generates large number of trees that have no chance of ever leading to
an S.
• But top-down also wastes considerable time on examining S trees that
are not consistent with the input, since it starts generating trees without
examining the input. Bottom-up parsers never suggest trees that are not
(at least locally) consistent with the input.
• Each approach fails to take advantage of all the constraints of the
problem. The best results are given by parsers that incorporate features
from both top-down and bottom-up parsers
A Basic Top-Down Parser
• When building a parser we make decisions about the search. Such
decisions affect the search strategy, the choice of which node of the
tree to expand and the order in which the grammar rules are to be
applied. We can build a simple top-down parser based on a depth first
search strategy, by expanding the left-most node and by applying
grammar rules based on the order in which they appear in the
• Such an algorithm contains an agenda of search-states. Each state
consists of partially parsed tree along with a pointer to the next input
word in the sentence. The search is performed by taking a state from
the agenda and producing a new set of states by applying the possible
grammar rules.
Bottom-Up Filtering
• The top-down parser along the left-edge of the tree until it gets to the
bottom-left of the tree. If the parse is successful the current input word
must be the first word in the derivation from the node that the parser is
currently processing. This leads to the idea of bottom-up filtering.
• The parser should not consider a grammar rule if the current input
word cannot serve as the first word along the left edge of some
derivation of the rule. e.g.
S -> NP VP
S -> Aux NP VP
S -> VP
If the input word it Does (Aux), the only rule that can lead to an Aux is
the rule S -> Aux NP VP. Therefore the parser doesn’t need to examine
the other two rules.
• Depth-first search often leads to infinite loops when
exploring infinite spaces. This occurs in top-down, depthfirst parsing when the grammar is left-recursive. A
grammar is left-recursive if it contains a non-terminal
symbol that has a derivation that includes itself anywhere
along its leftmost branch. e.g.
NP -> Det Nominal
Det -> NP ’ s
• Left recursive rules are rules of the form A-> A b
S -> S and S
• Rewrite the grammar, eliminating left recursion. This is
theoretically possible, but the new grammar may not be
intuitive or natural in describing syntactic structures.
• Restrict the depth of the search.
• Structural ambiguity occurs when a grammar assigns more than one
possible parse trees to a sentence. There are various different types of
structural ambiguity.
• Attachment ambiguity is when a particular constituent can be
attached to the parse tree in more that one ways. E.g
– I shot an elephant in my pajamas.
– We saw the Eiffel Tower flying to Paris.
• Coordination ambiguity is when there are different sets of phrases
that can be joined by a conjunction such as and.
– [old [men and women]] or [old men] and [women]
• Noun phrase bracketing ambiguity.
– [Dead [poets’ society]] or [[Dead poets’] society]
• Choosing the correct parse of a sentence among the
possible parses is a task that requires additional semantic
and statistical information. A parser without such
information should return all possible parses.
• However often a sentence may lead to a huge number of
parses. Sentences with many PP attachments like
Show me the meal on Flight UA 386 from San Francisco to Denver.
lead to an exponational number on parses.
469 1430 4867
Repeated Parsing of Subtrees
• The parser often builds valid trees for a portion of the input and then
discards them during backtracking because they fail to cover all of the
input. Later, the parser has to rebuild the same trees again in the
• In the table is shown how many times each constituent of the example
sentence “A flight from Indianapolis to Houston on TWA” is built.
A flight
from Indianapolis
on TWA
A flight from Indianapolis
A flight from Indianapolis to Houston
A flight from Indianapolis to Houston on TWA
The Earley Parser
• The Earley parser deals successfully with the
aforementioned problems. Early parser is based on the
dynamic programming paradigm, according to which a
problem is solved by solving sub-problems of the problem
and then combining the to solve the whole problem.
• The core of the Early algorithm is a chart of N+1 entries
(N is the length of the input). For each word position the
chart contains a list of states representing the partial parse
trees generated so far. Each state contains a grammar rule
corresponding to a subtree, information about the progress
in completing the subtree, and the position of the subtree
with respect to the input.
The Earley Parser
•By keeping the partial parses in the chart, the Early parser doesn’t have
to rebuild the trees during backtracking, so there is no unnecessary
repeated parsing of subtrees.
•Additionally, all the possible parses of the sentence are implicitly
stored in the chart in polynomial time O(N3).
•Of course if the number of parses is exponential, the algorithm will
need exponential time to return them all.
S8: Verb -> book
S14: Det -> that
S18: Noun ->flight
S21: NP -> Det NOMINAL
S22: VP -> Verb NP
S23: S -> VP
Finite-State Parsing
• Often an application doesn’t require a full parse, but a
partial parse or shallow parse is sufficient. In such cases
instead of using a CFG systems use cascades of finite-state
automata. Such FSA grammars instead of returning a full
parse of a sentence can be used to detect noun groups, or
verb groups etc...
• In cases when such systems require recursion (e.g. the
definition of NPs may require other NPs for relative
clauses) then recursion is limited by using cascades of
FSA. One level finds NPs without recursion, the next level
combines them into NPs with one level of recursion and so
on.ecursive Transition Networks