Transcript cfgs-asrx

Parsing with
Context-Free Grammars for
ASR
Julia Hirschberg
CS 4706
Slides with contributions from Owen Rambow,
Kathy McKeown, Dan Jurafsky and James Martin
What is Syntax?
• Structure of language
• How words are arranged together and related to one
another
• Goal of syntactic analysis: relate surface form (what
someone says or writes) to underlying structure, to
support semantic analysis (what the utterance or text
means)
• Syntactic representation: typically a tree structure
Structure in Strings
• A set of words, or, a lexicon: the a small nice big very boy girl
sees likes
• Some `good’ (grammatical) sentences:
– the boy likes a girl
– the small girl likes the big girl
– a very small nice boy sees a very nice boy
• Some bad (ungrammatical) sentences:
– *the boy the girl
– *small boy likes nice girl
• Can we find a way of distinguishing between the two kinds of
sequences?
• Can we identify similarities among grammatical
subsequences?
One Version of Constituent Structure
• Lexicon: the a small nice big very boy girl sees likes
• Grammatical sentences:
– (the) boy (likes a girl)
– (the small) girl (likes the big girl)
– (a very small nice) boy (sees a very nice boy)
• Ungrammatical sentences:
– *(the) boy (the girl)
– *(small) boy (likes the nice girl)
Another Constituency Hypothesis
• Lexicon: the a small nice big very boy girl sees likes
• Grammatical sentences:
– (the boy) likes (a girl)
– (the small girl) likes (the big girl)
– (a very small nice boy) sees (a very nice boy)
• Ungrammatical sentences:
– *(the boy) (the girl)
– *(small boy) likes (the nice girl)
• Better: fewer types of constituents (blue and red are of
same type)
Even More Structures
• Lexicon: the a small nice big very boy girl sees likes
• Grammatical sentences:
– ((the) boy) likes ((a) girl)
– ((the) (small) girl) likes ((the) (big) girl)
– ((a) ((very) small) (nice) boy) sees ((a) ((very) nice)
girl)
• Ungrammatical sentences:
– *((the) boy) ((the) girl)
– *((small) boy) likes ((the) (nice) girl)
From Substrings to Trees
• (((the) boy) likes ((a) girl))
boy
the
likes
a
girl
How do we Label the Nodes?
• ( ((the) boy) likes ((a) girl) )
• Choose constituents so each one has one non-bracketed
word: the head
• Group words by distribution of constituents they head
(POS)
– Noun (N), verb (V), adjective (Adj), adverb (Adv),
determiner (Det)
• Category of constituent: XP, where X is POS
– NP, S, AdjP, AdvP, DetP
Types of Nodes
• (((the/Det) boy/N) likes/V ((a/Det) girl/N))
nonterminal
symbols
= constituents
S
NP
DetP
the
boy
likes
NP
DetP
Phrase-structure
tree
girl
a
terminal symbols = words
Determining Part-of-Speech
A blue seat/a child seat: noun or adjective?
– Syntax:
• a blue seat
a child seat
• a very blue seat *a very child seat
• this seat is blue
*this seat is child
– Morphology:
• bluer
*childer
– blue and child are not the same POS
– blue is Adj, child is Noun
Determining Part-of-Speech
– Preposition or particle?
•
•
•
•
A
B
A
B
he threw out the garbage
he threw the garbage out the door
he threw the garbage out
*he threw the garbage the door out
– The two out are not same POS
• A is particle, B is Preposition
Constituency
• Some Noun phrases (NPs)
• A red dog on a blue tree
• A blue dog on a red tree
• Some big dogs and some little dogs
• A dog
•I
• Big dogs, little dogs, red dogs, blue dogs,
yellow dogs, green dogs, black dogs, and white
dogs
• How do we know these form a constituent?
NP Constituency
• NPs can all appear before a verb:
– Some big dogs and some little dogs are going
around in cars…
– Big dogs, little dogs, red dogs, blue dogs,
yellow dogs, green dogs, black dogs, and white
dogs are all at a dog party!
– I do not
• But individual words can’t always appear before
verbs:
– *little are going…
– *blue are…
– *and are
• Must be able to state generalizations like:
– Noun phrases occur before verbs
PP Constituency
• Preposing and postposing:
– Under a tree is a yellow dog.
– A yellow dog is under a tree.
• But not:
– *Under, is a yellow dog a tree.
– *Under a is a yellow dog tree.
• Prepositional phrases notable for ambiguity in
attachment
– I saw a man on a hill with a telescope.
Context-Free Grammars
• Defined in formal language theory
– Terminals: e.g. cat
– Non-terminal symbols: e.g. NP, VP
– Start symbol: e.g. S
– Rewriting rules: e.g. S  NP VP
• Start with start symbol, rewrite using rules, done
when only terminals left
A Fragment of English
S  NP VP
VP  V PP
NP  DetP N
N  cat | mat
V  is
PP  Prep NP
Prep  on
DetP  the
Input: the cat is on the mat
Derivations in a CFG
S  NP VP
VP  V PP
NP  DetP N
N  cat | mat
V  is
PP  Prep NP
Prep  on
DetP  the
S
S
Derivations in a CFG
S  NP VP
VP  V PP
NP  DetP N
N  cat | mat
V  is
PP  Prep NP
Prep  on
DetP  the
NP VP
S
NP
VP
Derivations in a CFG
S  NP VP
VP  V PP
NP  DetP N
N  cat | mat
V  is
PP  Prep NP
Prep  on
DetP  the
DetP N VP
S
NP
DetP
VP
N
Derivations in a CFG
S  NP VP
VP  V PP
NP  DetP N
N  cat | mat
V  is
PP  Prep NP
Prep  on
DetP  the
the cat VP
S
NP
VP
DetP
N
the
cat
Derivations in a CFG
S  NP VP
VP  V PP
NP  DetP N
N  cat | mat
V  is
PP  Prep NP
Prep  on
DetP  the
the cat V PP
S
NP
VP
DetP
N
the
cat
V
PP
Derivations in a CFG
S  NP VP
VP  V PP
NP  DetP N
N  cat | mat
V  is
PP  Prep NP
Prep  on
DetP  the
the cat is Prep NP
S
NP
VP
DetP
N
V
the
cat
is
PP
Prep
NP
Derivations in a CFG
S  NP VP
VP  V PP
NP  DetP N
N  cat | mat
V  is
PP  Prep NP
Prep  on
DetP  the
the cat is on Det N
S
NP
VP
DetP
N
V
the
cat
is
PP
Prep
NP
on DetP N
Derivations in a CFG
S  NP VP
VP  V PP
NP  DetP N
N  cat | mat
V  is
PP  Prep NP
Prep  on
DetP  the
the cat is on the mat
S
NP
VP
DetP
N
V
the
cat
is
PP
Prep
NP
on DetP
the
N
mat
A More Complicated Fragment of English
•
•
•
•
•
•
•
•
•
S  NP VP
S  VP
VP  V PP
VP  V NP
VP  V
NP  DetP NP
NP  N NP
NP  N
PP  Prep NP
Mary likes the cat bowl.
The cat ate the tasty food.
Hello. Nice talking to you.
• N  cat | mat | food | bowl
| Mary
• V  is | likes | sits
• Prep  on | in | under
• DetP  the | a
Pocket Sphinx Grammar Format
• Variables go in angle brackets, e.g. <city>
• Terminals must appear in your pronunciation
dictionary (case sensitive)
• X Y is concatenation -- e.g., I WANT
• (X | Y) means X or Y -- e.g., (WANT|NEED)
• Square brackets mean optional -- e.g., [ON]
FRIDAY
• * means that the expansion may be spoken zero or
more times -- e.g. <digit>*
• + means one or more times -- e.g. <digit>+
Example
• <city> = BOSTON | NEWYORK | WASHINGTON |
BALTIMORE;
• <time> = MORNING | EVENING;
• <day> = FRIDAY | MONDAY;
• public <query> = (((WHAT TRAINS LEAVE) |
(WHAT TIME CAN I TRAVEL) | (IS THERE A
TRAIN)) (FROM|TO) <city> [(FROM|TO) <city>]
ON <day> [<time>]);
Hello. No. I want to go on Tuesday. When does the
train leave?
Next Class
• Language modeling for large vocabulary applications:
Ngrams