Transcript slides8

‫‪Text Books‬‬
‫עיבוד שפות טבעיות ‪ -‬שיעור שמונה‬
‫‪Context Free Parsing‬‬
‫אורן גליקמן‬
‫המחלקה למדעי המחשב‬
‫אוניברסיטת בר אילן‬
‫‪1‬‬
‫‪88-680‬‬
‫‪Text Books‬‬
Text Books
Problems with FS grammars
Reasons why finite-state grammars may be
inadequate:
• Cannot represent constituency adequately
• Cannot represent structural ambiguity
• Cannot deal with recursion
Recursion occurs when an expansion of a nonterminal includes the non-terminal itself,
eg.Nominal Nominal PP
88-680
Text Books
2
Text Books
88-680
Text Books
3
Text Books
88-680
Text Books
4
Text Books
Context-free grammars
• A context-free grammar consists of
– a set of rules or productions
– a lexicon
88-680
Text Books
5
Text Books
A baby CF grammar for NPs
•
•
•
•
•
•
•
•
•
•
•
•
NP Det Nominal
NP ProperNoun
NP Pronoun
Nominal Noun
Nominal Noun Nominal
Det  a
Det  the
ProperNoun oren
Pronoun  I
Noun  flight
Noun  cat
Noun  chair
88-680
Text Books
6
Text Books
Context-free rules
• 2 classes of symbols: terminal and
non-terminal symbols
• Each context-free rule is of the form:
Aγ
• a single non-terminal symbol 
an ordered list of one or more terminals
and non-terminals
88-680
Text Books
7
Text Books
Context-free grammars
• CFGs can be used for analyzing or for
generating
• „“ means: rewrite the symbol on the
left with the string of symbols on the
right
• Example:
NP  Det Nominal  Det Noun  a
Noun  a flight
88-680
Text Books
8
Text Books
Context-free grammars
• A sequence of rule expansions is called
the derivation of a string of words.
• Formally, a particular CF language is a
set of strings which can be derived
from a particular CF grammar
• A derivation is represented by a parse
tree
88-680
Text Books
9
Text Books
88-680
Text Books
10
Text Books
88-680
Text Books
11
Text Books
The combined baby CFG
•
•
•
•
•
•
•
•
•
•
•
•
•
S NP VP
NP  Pronoun |Det Nominal |ProperNoun
Nominal  Noun|Noun Nominal
VP Verb NP | Verb NP PP | Verb PP
PP Preposition NP
Noun flight|flights |trip |morning |...
Verb is |prefer |like|need |want |fly |...
Adjective  cheapest|non-stop|first|direct|...
Pronoun me|I |you|it|...
ProperNoun Los Angeles |Chicago |Alaska |...
Det a |the|an |this | these|that|...
Preposition from|to |on |near|...
Conjunction and |or|but|...
88-680
Text Books
12
Text Books
88-680
Text Books
13
Text Books
Formal definition of a CFG
• A context-free grammar is a 4-tuple:
– A set of non-terminal symbols N
– A set of terminal symbols Σ(disjoint from
N)
– A set of productions P, each of the form A
α, where A is a non-terminal and α is a
string of symbols from the infinite set of
strings (Σ.N)*
– A designated start88-680
symbol S
Text Books
14
Text Books
CF language
• If A β is a production of P and α and
γ are any strings in the set (Σ.N)*,we
say that αAγ directly derives αβγ,
or αAγ  αβγ
• Derivation is a generalization of direct
derivation
– Let α1, α2, ..., αm be strings in (Σ.N)*,
m1, such that α1  α2, α2  α3, ...,
αm-1 αm
– we say that α1 derives αm, or α1 *
αm
88-680
Text Books
15
Text Books
CF language
• The language L(G) generated by a
grammar G is the set of strings
composed of terminal symbols which
can be derived from the designed start
symbol S:
L(G)= {W| w is in Σ* and S * w}
88-680
Text Books
16
Text Books
Grammar equivalence
• Two grammars are strongly equivalent
if they generate the same set of strings
and if they assign the same phrase
structure to each sentence
• Two grammars are weakly equivalent
if they generate the same set of strings
but do not assign the same phrase
structure to each sentence
88-680
Text Books
17
Text Books
Chomsky normal form
• A CFG is in Chomsky normal form
– if it is ε-free
– if each production is either of the form
ABC or A  α
– Any CFG can be converted into a weaklyequivalent Chomsky normal form
grammar
– Example:
A BCD eq A BX, X  CD
88-680
Text Books
18
Text Books
Sentence Level Constructions
• Declarative Sentences
– I want a flight from Boston to Chicago
• Imperative Sentences
– Show me all flights available
• Yes-No Questions
– Do any of these flights have stops?
• Wh-Questions
– What Airlines fly from Boston to Chicago?
88-680
Text Books
19
Text Books
Declarative Structure
• Subject NP followed by VP
• Statements or assertions
• Examples:
– The flight should leave at eleven a.m.
– I need a flight to Seattle.
• S  NP VP
88-680
Text Books
20
Text Books
Imperative Structure
• Begin with a VP and have no subject
• Commands
• Examples:
– Show me the lowest fare.
– List all flights from Dallas to Denver.
• S  VP
88-680
Text Books
21
Text Books
Yes-No-Question Structure
• Auxiliary verb, subject NP, VP
• Questions
• Examples:
– Does this flight have stops?
– Does this flight have stops?
• S  Aux NP VP
88-680
Text Books
22
Text Books
Wh-Question Structure
• Contain a wh-phrase constituent that
includes a wh-word (who, what, when,
where, why, which, how)
• Questions
• Two classes:
– Wh-subject-question
– Wh-non-subject-question
88-680
Text Books
23
Text Books
Wh-Subject-Question
• Wh-phrase is the subject of the sentence
• Same as declarative structure except the
first NP contains some wh-word
• Examples:
– What airlines fly from Dallas to Denver?
– Which flights serve breakfast?
• S →Wh-NP VP
88-680
Text Books
24
Text Books
Wh-Non-Subject-Question
• Wh-phrase is not the subject of the sentence
• Aux appears before the subject NP
• Examples:
– What flights do you have from Dallas to
Denver?
– When does this flight leave Dallas?
• S →Wh-NP Aux NP VP
88-680
Text Books
25
Text Books
Noun Phrase (NP)
• Components of an NP:
– Head: central noun
– Pre-nominal (pre-head) modifiers
– Post-nominal (post-head) modifiers
88-680
Text Books
26
Text Books
Pre-head Modifiers
• Determiners
– a stop, the flights, this fare, those flights
• Determiners can be omitted
– Mass nouns: "Water is precious"
• Pre-determiners
– Appear before the determiners
– Number or amount: "all the flights"
88-680
Text Books
27
Text Books
Pre-head Modifiers
• Others that can appear between the determiner and
the head noun:
• Cardinal numbers: one stop, two friends
• Ordinal numbers: the first child, the last day
• Quantifiers: many stops, several friends
• Adjective phrase (AP): a nonstop flight, the least
expensive fare
• NP (Det) (Card) (Ord) (Quant) (AP) Nominal
88-680
Text Books
28
Text Books
Post-head Modifiers
• Three kinds:
– Prepositional phrases
– Non-finite clauses
– Relative clauses
88-680
Text Books
29
Text Books
Post-head Modifiers
• Prepositional phrases (PP)
• Examples:
– any stopovers [ for Delta seven fifty one ]
– all flights [ from Dallas ] [ to Denver ]
– arrival [ in Dallas ] [ before seven p.m. ]
• Nominal Nominal PP (PP) (PP)
88-680
Text Books
30
Text Books
Post-head Modifiers
• Non-finite clauses
– -ed
• the aircraft used by this flight
– infinitive
• The last flight to arrive in Dallas
– -ing (gerundive)
• flights arriving after eleven a.m.
• those leaving on Monday
88-680
Text Books
31
Text Books
Post-head Modifiers
•
•
•
•
•
•
•
To add gerundive non-finite clauses:
Nominal →NominalGerundVP
GerundVP→GerundV
GerundVP→GerundVNP
GerundVP→GerundVPP
GerundVP→GerundVNP PP
GerundV→arriving | leaving | …
88-680
Text Books
32
Text Books
Post-head Modifiers
• Relative clauses
– –A clause that begins with a relative pronoun (that,
who)
– Examples:
• a flight that serves breakfast
• flights that leave in the morning
– Nominal →NominalRelClause
– RelClause→who VP
– RelClause→that VP
88-680
Text Books
33
Text Books
Coordination
• NPs, VPs, and sentences can be joined with
conjunctions (and, or)
• NP →NP and NP
• VP →VP and VP
• S →S and S
• Examples:
– –Please repeat [NP[NP the flights] and [NP the costs]]
– –What flights do you have [VP[VP leaving Dallas] and [VP
arriving in Denver]]
– –[S[S I’m interested in a flight to Dallas] and [S I’m also
interested in going to Baltimore]]
88-680
Text Books
34
Text Books
Problems with context-free
grammars
• Agreement:
• Subject noun and verb have to agree in
person and number
– What flights leave in the morning?
– What flight leaves in the morning?
– But simple CFGs over generate:
• *What flights leaves in the morning?
• *What flight leave in the morning?
88-680
Text Books
35
Text Books
Problems with context-free
grammars
• Agreement contd.
• Possible rules which account for agreement:
– S  3sgNP 3sgVP
– S  Non3sgNP Non3sgVP
– Instead of S  NP VP
• Not satisfactory due to rule proliferation
– Massive redundancy
– Lack of linguistically significant generalizations
88-680
Text Books
36
Text Books
Problems with context-free
grammars
• Subcategorization
• Possible complements of a verb are called
the subcategorization frame of that verb
– The problem disappeared.
– He finds a solution.
– The lady wants to fly to Paris.
• But again simple CFGs over generate:
– *The teacher disappeared the problem.
– *He finds.
– *She finds to fly to Paris
88-680
Text Books
37
Text Books
88-680
Text Books
38
Text Books
CF Parsing
• Syntactic parsing means recognizing a
sentence and assigning a structure to it
• CF grammars are a declarative
formalism
• Many possible CF parsing algorithms
88-680
Text Books
39
Text Books
Parsing as a search problem
• Syntactic parsing can be viewed as a
searching through the space of all possible
parse trees to find the correct parse tree(s)
whose root is the start symbol S and which
cover exactly the input words
• The search space is defined by the grammar
• The search strategy can be
– top-down or goal-driven
– bottom-up or data-driven
88-680
Text Books
40
Text Books
Top-Down Parsing
• A top-down parsing algorithm starts
with the root S and builds trees down to
the input words
88-680
Text Books
41
Text Books
88-680
Text Books
42
Text Books
Top-down parsing
S
S
NP
VP
S
NP
S
S
VP
Det Nom
NP
PropN
VP
Aux NP
S
VP
VP
S
S
S
S
Aux NP VP
Aux NP VP
VP
VP
Det Nom
PropN
88-680
Text Books
V
V
NP
43
Text Books
Bottom-up parsing
• A bottom-up parsing algorithm starts
with the input words and builds trees up
toward the root S
88-680
Text Books
44
Text Books
88-680
Text Books
45
Text Books
Top-down vs. Bottom-up
Parsing
• Top-down parser:
– Only generates trees that result in S
– Spends much time on trees that are not consistent
with the input
• Bottom-up parser:
– Spends much time on generating trees that do not
result in S
– Only generates trees that are consistent with the
input
88-680
Text Books
46
Text Books
Top-down depth-first left-toright parser
• Depth-first strategy only explores one
state at a time
• Further choices:
–Which node of a tree to expand (e.g. leftmost unexpanded node of the current tree)
–The order in which grammar rules are
applied (e.g. according to their textual
order in the grammar)
88-680
Text Books
47
Text Books
A miniature English Grammar
88-680
Text Books
48
Text Books
TD-DF-LR parsing
• “Does this flight include a meal?”
88-680
Text Books
49
Text Books
88-680
Text Books
50
Text Books
88-680
Text Books
51
Text Books
88-680
Text Books
52
Text Books
Problems with the basic
parser
• Left-recursion:
• rules of the type: NP  NP PP
• solution: rewrite each rule of the form
A  Ab | a
using a new symbol:
A  aA’ A  bA’ | e
88-680
Text Books
53
Text Books
Left Recursion
88-680
Text Books
54
Text Books
Problems with the basic
parser
• Ambiguity: attachment ambiguity,
coordination ambiguity, noun-phrase
bracketing ambiguity
• Attachment ambiguity:
– “I saw the Grand Canyon flying to New
York”
• Coordination ambiguity:
– “old men and women”
88-680
Text Books
55
Text Books
Ambiguity
• Ambiguity is a problem to all parsers
• Parsing requires disambiguation
• Number of possible parses should be
reduced (potentially exponential number
of parses)
• e.g. Show me the meal on Flight 386
from Munich to Denver(14 parses)
88-680
Text Books
56
Text Books
Ambiguity
• Example:
President Kennedy today pushed aside other
White House business to devote all his time
and attention to working on the Berlin crisis
address he will deliver tomorrow night to the
American people over nationwide television
and radio.
• Solutions: return all parses or include
disambiguation in the parser.
88-680
Text Books
57
Text Books
88-680
Text Books
58
Text Books
88-680
Text Books
59
Text Books
88-680
Text Books
60
Text Books
88-680
Text Books
61
Text Books
Parse Tree
S
VP
NP
NP
Nom
Pronoun
Verb
Det
Noun
Noun
I
prefer
a
morning
flight
88-680
Text Books
62
Text Books
Context Free Grammars
• “Stronger” then Finite State Machines
• Capture constituency and ordering
• Modern Linguistic Theories of grammar
are only vaguely based on context-free
grammars.
88-680
Text Books
63
Text Books
• Example of a CFG:
– NP  Det Nominal
– Nominal  Noun | Noun Nominal
– Det  a | the
– Noun  flight | trip
• Generator or recognizer
88-680
Text Books
64
Text Books
CFG
• Derivation:
– NP  Det Nominal  Det Noum  a Noun
 a flight
• Parse Tree:
88-680
Text Books
65
Text Books
Context Free Grammars
G = (N,,P,S):
– N: A set of terminals (either lexical items or
parts of speech).
– : A sets of non terminals (The constituents
of the language).
– A sets of rules of the form A  a
A N, a  (  N)*
– A designated start symbol S  N
88-680
Text Books
66
Text Books
CFG
• If Ab is a production of P, and a and 
are any strings in (  N)*, then aA
directly derives ab
• If a1  a2  ….  am then way say
that a1 derives am or a1 * am
88-680
Text Books
67
Text Books
CFG
• The language generated by a grammar G,
denoted L(G), is the set of strings composed
of terminal symbols which can be derived
from the start symbol S of G
L(G) = { w | w ∈∑* and S ⇒* w }
• Parsing: Mapping from a string of words
to its Parse Tree.
88-680
Text Books
68
Text Books
CFG
• Strong equivalence: Two grammars G1 and
G2 are strongly equivalent if they generate
the same set of strings (i.e., L(G1) = L(G2))
and they assign the same phrase structure to
each string (allowing only for renaming of
non-terminal symbols)
• Weak equivalence: Generate the same set of
strings but don’t assign the same phrase
structure to each string
88-680
Text Books
69
Text Books
CNF
• Chomsky Normal Form (CNF):
– The grammar is ε-free
– Each production of the grammar is either of
the form A →B C or A →a
(i.e., either 2 non-terminal symbols or 1
terminal symbol on RHS)
• Any CFG can be converted into a weakly
equivalent CFG in Chomsky Normal Form
88-680
Text Books
70
Text Books
88-680
Text Books
71