Transcript 一般性演講
Context Free Grammar (CFG): Specification for
Structures & Constituency
Parse Tree: graphical representation of structure
root node (S): a sentencial level structure
internal nodes: constituents of the sentence
arcs: relationship between parent nodes and their children (constituents)
terminal nodes: surface forms of the input symbols (e.g., words)
alternative representation: bracketed notation:
e.g., [I saw [the [girl [in [the park]]]]]
For example:
NP
NP
PP
NP
girl
in the
park
Jing-Shin Chang
1
Parse Tree: “I saw the girl in the park”
S
NP
VP
NP
NP
NP
PP
NP
pron
v
det
n
p
det
I
saw
the
girl
in
the park
Jing-Shin Chang
n
2
CFG: Components
CFG: formal specification of parse trees
G = {, N, P, S}
: terminal symbols
N: non-terminal symbols
P: production rules
S: start symbol
: terminal symbols
the input symbols of the language
pre-terminal: parts of speech (when words are regarded as terminals)
N: non-terminal symbols
programming language: tokens (reserved words, variables, operators, …)
natural languages: words or parts of speech
groups of terminals and/or other non-terminals
S: start symbol: the largest constituent of a parse tree
P: production (re-writing) rules
form: α → β (α: non-terminal, β: string of terminals and non-terminals)
meaning: α re-writes to (“consists of”, “derived into”)β, or βreduced to α
start with “S-productions” (S → β)
Jing-Shin Chang
3
CFG: Example Grammar
Grammar Rules
S → NP VP
NP → Pron | Proper-Noun | Det Norm
Norm → Noun Norm | Noun
VP → Verb | Verb NP | Verb NP PP | Verb PP
PP → Prep NP
S: sentence, NP: noun phrase, VP: verb phrase
Pron: pronoun
Det: determiner, Norm: Norminal
PP: prepositional phrase, Prep: preposition
Lexicon (in CFG form)
Noun → girl | park | desk
Verb → like | want | is | saw | walk
Prep → by | in | with | for
Det → the | a | this | these
Pron → I | you | he | she | him
Proper-Noun → IBM | Microsoft | Berkeley
Jing-Shin Chang
4
CFG: Accepted Languages
CFG Operations
derivation: applying a production rule to re-write the LHS non-terminal into
its constituents
rightmost derivation: a sequence of derivations in which the rightmost nonterminal is always re-write first
leftmost derivation: leftmost non-terminal first
Context-Free Language
Language accepted by a CFG
L(G) = {w | S =*=> w (strings of terminals that can be derived from start
symbol)}
Jing-Shin Chang
5
CFG: Expressive Power
CFG vs. Regular Expression (R.E.)
every R.E. can be recognized by a FSA
every FSA can be represented by a CFG with production rules of the
form: A -> a B | ε
therefore, L(RE) < L(CFG)
Writing a CFG for a FSA (RE)
define a non-terminal Ni for a state with state number i
start symbol S = N0 (assuming that state 0 is the initial state)
for each transition δ(i,a)=j (from state i to stet j on input alphabet a),
add a new production Ni -> a Nj to P
for each final state i, add a new production Ni -> εto P
Jing-Shin Chang
6
CFG: Expressive Power (cont.)
Writing a CFG for a FSA (RE)
define a non-terminal Ni for a state with state number i
start symbol S = N0 (assuming that state 0 is the initial state)
for each transition δ(i,a)=j (from state i to stet j on input alphabet a),
add a new production Ni -> a Nj to P
for each final state i, add a new production Ni -> εto P
For example: RE: (a|b)* a b b
a
0
a
b
1
b
2
b
Jing-Shin Chang
3
S -> a S | b S | a N1
N1 -> b N2
N2 -> b N3
N3 -> ε
7
CFG: Expressive Power (cont.)
Chomsky Hierarchy:
R.E.: regular set (FSA)
CFG: context-free (pushdown automata)
CSG: context-sensitive (linear bounded automata)
unrestricted: recursively enumerable (Tuning Machine)
Jing-Shin Chang
8
CFG: Equivalence
Chomsky Normal Form (CNF) (Chmosky, 1963):
ε-free, and
Every production rule is in either of the following form:
A -> A1 A2
A -> a (A1, A2: non-terminal, a: terminal)
two non-terminals or one terminal at the RHS
generate binary tree
good simplification for some algorithms (e.g., grammar training with
the inside-outside algorithm (Baker 1979))
Every CFG can be converted into a weakly equivalent CNF
equivalence: L(G1) = L(G2)
strong equivalent: assign the same phrase structure to each sentence
(except for renaming non-terminals)
weak equivalent: do not assign the same phrase structure to each sentence
e.g., A -> B C D == {A -> B X, X -> CD}
Jing-Shin Chang
9
CFG vs. Finite-State Machine
Inappropriateness of FAS
Constituents
Recursion
RTN (Recursive Transition Network)
FSA with augmentation of recursion
arc: terminal or non-terminal
if arc is non-terminal: call to a sub-transition network & return upon
traversal
Jing-Shin Chang
10
CFG for English
Sentence Level Constructions
Declarative (直述句): NP (Subject) VP
Imperative (命令句): VP
Yes-No Questions: Aux NP VP
WH-Questions: Wh-NP VP
Jing-Shin Chang
11
CFG for English
Noun Phrase
Head Noun
Modifiers:
Pre-nominal Modifiers:
pre-determiner: “all”
determiner: “the”
post-determiner: (ordinal) (cardinal) (quantifier) (ADJP)
pre-nominal (pre-head) and post-nominal (post-head)
ordinal: “first”,”second”,”next”
cardinal: “two”, ”three”
quantifier: “many”, “several”
Post-nominal Modifiers:
PP: prepositional phrase
non-finite clauses: VP(+ing), VP(+ed), VP(to-V) forms
relative clauses: restrictive, non-restrictive
the man whose son lives in NY (restrictive)
the man, whose son lives in NY (non-restrictive)
Jing-Shin Chang
12
CFG for English
Coordination (同位語, 對等連接詞,…)
conjunction (conj): and, or, but
X → X conj X
a big source of ambiguity: X can be almost anything
Comparison with Mathematic Operators
(left/right) association: ((a + b) + c), ( a ** (b ** c) )
(high/low) precedence: a + b x c : (a + b) x c, a + (b * c)
Jing-Shin Chang
13
CFG for English
Agreement
Subject-Verb (or Aux. Verb): person & number
I like her
He likes her
Gender Agreement (German or French): ADJ-Noun, Det-Noun
Jing-Shin Chang
14
CFG for English
Verb Phrases & Subcategorization
not every verb is compatible with every verb phrases
+ NP, +NP-NP, +to-V, +Ving…
e.g., transitive (Vt), intransitive (Vi)
subcat. frame for a verb: possible set of complements
CFG for SUBCAT Problems
Solution
2:
VP → verb
VP → verb NP
VP → verb S
Solution 1:
VP → v1
VP → v2 NP
VP → v3 S
v1 → disappear | …
v2 → find | leave | repeat
v3 → think | believe | say
verb
Jing-Shin Chang
→ disappear | … | find |… | think
15
CFG for English
Auxiliaries
modal: “can”, “may”, “must”, “will” +V(stem)
perfect: “have” +V(pp)
progressive: “be” +V(ing)
passive: “be” +V(past)
Multiple Auxiliaries
modal < perfect < progressive < passive
modal perfect: “could have been …”
modal passive: “will be married …”
perfect progressive: “have been feasting …”
modal perfect passive: “might have been prevented …”
Jing-Shin Chang
16