Presentation

Download Report

Transcript Presentation

Syntactic analysis using
Context Free Grammars
Analysis of language
• Morphological analysis
– Chairs <chair, n, plural>, <chair, v, 3rd person, present>
• Part Of Speech (POS) tagging
– The/DT man/NN left/VBD the/DT room/NN
– The/DT red/ADJ block/NN on/PREP the/DT blue/ADJ
cylinder/NN was/AUX moved/VBD onto/PREP the/DT
brown/ADJ table/NN
• Any further analysis?
Analysis of language
• Part Of Speech (POS) tagging
– The/DT man/NN left/VBD the/DT room/NN
– The/DT red/ADJ block/NN on/PREP the/DT blue/ADJ
cylinder/NN was/AUX moved/VBD onto/PREP the/DT
brown/ADJ table/NN
• Any further analysis?
– chunks, clauses, syntax, semantics, word senses etc…
• Today’s lecture on analyzing syntax
What is Syntax?
• Study of structure of language
– how words can connect to each other
• Specifically, goal is to relate surface form (i.e. the
sentence) to semantics (the meaning)
• Representational device is tree structure
Structure in Strings
Proposal 1
• Some words: the a small nice big very boy girl sees likes
• Some good sentences:
– (the) boy (likes a girl)
– (the small) girl (likes the big girl)
– (a very small nice) boy (sees a very nice boy)
• Some bad sentences:
– *(the) boy (the girl)
– *(small) boy (likes the nice girl)
Structure in Strings
Proposal 2
• Some words: the a small nice big very boy girl sees likes
• Some good sentences:
– (the boy) likes (a girl)
– (the small girl) likes (the big girl)
– (a very small nice boy) sees (a very nice boy)
• Some bad sentences:
– *(the boy) (the girl)
– *(small boy) likes (the nice girl)
More Structure in Strings
Proposal 2 -- ctd
• Some words: the a small nice big very boy girl sees likes
• Some good sentences:
– ((the) boy) likes ((a) girl)
– ((the) (small) girl) likes ((the) (big) girl)
– ((a) ((very) small) (nice) boy) sees ((a) ((very) nice) girl)
• Some bad sentences:
– *((the) boy) ((the) girl)
– *((small) boy) likes ((the) (nice) girl)
From Substrings to Trees
• (((the) boy) likes ((a) girl))
boy
the
likes
a
girl
Context-Free Grammars
• Terminals
– This would be the lexicon/vocabulary
• Non-Terminals
– The constituents in a language
• Like noun phrase, verb phrase, prepositional phrase and sentence
• Rules
– Rules consist of a single non-terminal on the left and any number of terminals
and non-terminals on the right.
– Describe the allowed structure of the constituents
– Express the ways in which symbols of the language can be grouped or ordered
together
27
Phrase Structure Tree
• (((the/Det) boy/N) likes/V ((a/Det) girl/N))
S
nonterminal
symbols
= constituents
NP
DetP
the
NP
likes
boy
Phrase-structure
tree
DetP
girl
a
terminal symbols = words
Context?
• The notion of context in CFGs is not the same as the ordinary meaning of
the word context in language.
• All it really means is that the non-terminal on the left-hand side of a rule is
out there all by itself
– A -> B C
– Means that I can rewrite an A as a B followed by a C regardless of the context
in which A is found
CFG: Example
• Many possible CFGs for English, here is an example (fragment):
–
–
–
–
–
–
–
–
–
S  NP VP
VP  V NP
NP  DetP N | AdjP NP
AdjP  Adj | Adv AdjP
N  boy | girl
V  sees | likes
Adj  big | small
Adv  very
DetP  a | the
the very small boy likes a girl
Derivations in a CFG
S
S  NP VP
VP  V NP
NP  DetP N | AdjP NP
AdjP  Adj | Adv AdjP
N  boy | girl
V  sees | likes
Adj  big | small
Adv  very
DetP  a | the
S
Derivations in a CFG
NP VP
S  NP VP
VP  V NP
NP  DetP N | AdjP NP
AdjP  Adj | Adv AdjP
N  boy | girl
V  sees | likes
Adj  big | small
Adv  very
DetP  a | the
S
NP
VP
Derivations in a CFG
DetP N VP
S  NP VP
VP  V NP
NP  DetP N | AdjP NP
AdjP  Adj | Adv AdjP
N  boy | girl
V  sees | likes
Adj  big | small
Adv  very
DetP  a | the
S
NP
DetP
VP
N
Derivations in a CFG
the boy VP
S  NP VP
VP  V NP
NP  DetP N | AdjP NP
AdjP  Adj | Adv AdjP
N  boy | girl
V  sees | likes
Adj  big | small
Adv  very
DetP  a | the
S
NP
DetP
VP
N
the boy
Derivations in a CFG
the boy likes NP
S  NP VP
VP  V NP
NP  DetP N | AdjP NP
AdjP  Adj | Adv AdjP
N  boy | girl
V  sees | likes
Adj  big | small
Adv  very
DetP  a | the
S
NP
DetP
VP
N
V
the boy likes
NP
Derivations in a CFG
the boy likes a girl
S  NP VP
VP  V NP
NP  DetP N | AdjP NP
AdjP  Adj | Adv AdjP
N  boy | girl
V  sees | likes
Adj  big | small
Adv  very
DetP  a | the
S
NP
DetP
VP
N
V
the boy likes
NP
DetP
N
a
girl
Simple lexicon
38
Simple grammar
39
Generativity
• We can view these rules as either analysis or synthesis
machines
– Generate strings in the language
– Reject strings not in the language
– Impose structures (trees) on strings in the language
40
A CFG defines a formal language
• Sentences (strings of words) that can be derived by the
grammar are in the formal language defined by the grammar
• Sentences that cannot be derived from the grammar are not
in the language
– Ungrammatical
41
Derivations
• A derivation is a sequence of
rules applied to a string that
accounts for that string
– Covers all the elements in the
string
– Covers only the elements in the
string
42
Recursion
• We’ll have to deal with rules such as the following where the non-terminal
on the left also appears somewhere on the right (directly).
– NP -> NP PP
– VP -> VP PP
[[The flight] [to Boston]]
[[departed Miami] [at noon]]
Recursion
• Of course, this is what makes syntax interesting
–
–
–
–
–
–
flights from Denver
Flights from Denver to Miami
Flights from Denver to Miami in February
Flights from Denver to Miami in February on a Friday
Flights from Denver to Miami in February on a Friday under $300
Flights from Denver to Miami in February on a Friday under $300 with lunch
The Point
• If you have a rule like
– VP -> V NP
– It only cares that the thing after the verb is an NP. It doesn’t have to know
about the internal affairs of that NP
The Point
• VP -> V NP
• I hate
–
–
–
–
–
–
flights from Denver
Flights from Denver to Miami
Flights from Denver to Miami in February
Flights from Denver to Miami in February on a Friday
Flights from Denver to Miami in February on a Friday under $300
Flights from Denver to Miami in February on a Friday under $300 with lunch
Potential Problems in CFG
• Agreement
• Subcategorization
• Movement
Agreement
•
By agreement, we have in mind constraints that hold among various constituents
that take part in a rule or set of rules
•
For example, in English, determiners and the head nouns in NPs have to agree in
their number.
This flight
Those flights
48
*This flights
*Those flight
Problem
• Our earlier NP rules are clearly deficient since they don’t capture this
constraint
– NP  Det Nominal
• Accepts, and assigns correct structures, to grammatical examples (this flight)
• But its also happy with incorrect examples (*these flight)
– Such a rule is said to overgenerate.
49
Verb Phrases
• English VPs consist of a head verb along with 0 or more following
constituents which we’ll call arguments.
50
Subcategorization
•
•
•
•
•
•
•
Sneeze: John sneezed
Find: Please find [a flight to NY]NP
Give: Give [me]NP[a cheaper fare]NP
Help: Can you help [me]NP[with a flight]PP
Prefer: I prefer [to leave earlier]TO-VP
Told: I was told [United has a flight]S
…
• *John sneezed the book
• *I prefer United has a flight
• *Give with a flight
• Subcat expresses the constraints that a predicate (verb for now)
places on the number and type of the argument it wants to take
So?
• So the various rules for VPs overgenerate.
– They permit the presence of strings containing verbs and arguments that don’t
go together
– For example
– VP -> V NP therefore
– Sneezed the book is a VP since “sneeze” is a verb and “the book” is a valid NP
• Subcategorization frames can fix this problem (“slow down” overgeneration)
•
•
This is a modern take on the traditional notion of transitive/intransitive.
Modern grammars may have 100s or such classes.
Subcategorization
•
•
•
•
•
•
•
53
Sneeze: John sneezed
Find: Please find [a flight to NY]NP
Give: Give [me]NP[a cheaper fare]NP
Help: Can you help [me]NP[with a flight]PP
Prefer: I prefer [to leave earlier]TO-VP
Told: I was told [United has a flight]S
…
Movement
• Core example
– [[My travel agent]NP [booked [the flight]NP]VP]S
• I.e. “book” is a straightforward transitive verb. It expects a
single NP arg within the VP as an argument, and a single NP arg
as the subject.
Movement
• What about?
– Which flight do you want me to have the travel agent book?
• The direct object argument to “book” isn’t appearing in
the right place. It is in fact a long way from where its
supposed to appear.
• And note that its separated from its verb by 2 other verbs.
Grammar equivalence and normal form
• Strong equivalence:
– two grammars are strongly equivalent if:
• they generate the same set of strings
• they assign the same phrase structure to each sentence
– two grammars are weakly equivalent if:
• they generate the same set of strings
• they do not assign the same phrase structure to each sentence
• Normal form
– Restrict the form of productions
– Chomsky Normal Form (CNF)
– Right hand side of the productions has either one or two terminals or nonterminals
– e.g. A -> BC A -> a
– Any grammar can be translated into a weakly equivalent CNF
– A -> B C D <=> A-> B X X -> C D
Building tree structures
• Draw tree structures for the following phrases
•
•
•
•
•
Dallas
from Denver
arriving in Washington
I need to fly between Philadelphia and Atlanta
My flight from Philadelphia to Atlanta has been cancelled