Transcript Slide 1
10. Parsing with Context-free Grammars
-Speech and Language Processing-
발표자: 정영임
발표일: 2007. 8. 7.
10.4 The Earley Algorithm
Earley Algorithm
Dynamic programming
Solution for those three parsing problems
Information Represented by
Chart: N+1 entries (N: Number of words)
Dotted rule
e.g.) S → • VP [0,0]
2
10.4 The Earley Algorithm
Fig.10.16 The Earley algorighm
3
10.4 The Earley Algorithm
Predictor
To create new states representing top-down expectations
is applied to any state that has a non-terminal immediately to the right of its
dot that is not a part-of-speech category
results in the creation of one new state for each alternative expansion of
that non-terminal provided by the grammar
begins and ends at the point in the input where the generating state ends.
Example
S→• VP, [0,0]
Predictor
4
10.4 The Earley Algorithm
Scanner
is called to examine the input and incorporate a state corresponding to
the prediction of a word with a particular part-of-speech into the chart.
is accomplished by creating a new state from the input state with the
dot advanced over the predicted input category.
Example
VP→• Verb NP, [0,0]
Scanner consults the current word in the input since the category following
the dot is a part-of-speech.
It notes that book can be a verb, matching the expectation in the current
state
This results in the creation of the new state VP → Verb• NP, [0,1].
The new state is added to the chart entry that follows the one currently
being processed
5
10.4 The Earley Algorithm
Completer
is applied to a state when its dot has reached the right end of the rule.
is to find, and advance, all previously created states that were looking
for this grammatical category at this position in the input.
New states are then created by copying the older state, advancing the
dot over the expected category, and installing the new state in the
current chart entry.
Example
NP→ Det Nominal• [1,3]
Completer looks for states ending at 1 expecting an NP
– VP→ Verb • NP, [0,1]
This results in the addition of a new complete state VP→ Verb NP •, [0,3]
6
10.4 The Earley Algorithm
Fig.10.17 An Example “Book that filght.”
7
10.4 The Earley Algorithm
Retrieving Parse Trees from a Chart
the version of the Earley algorithm is a recognizer not a parser.
valid sentences will leave the state S → α •, [0,N] in the chart.
Extraction of individual parses from the chart
the representation of each state must be augmented with an additional field
to store information about the completed states that generated its
constituents.
change necessary is to have COMPLETER add a pointer to the older state
onto a list of constituent-states for the new state.
following pointers starting with the state (or states) representing a complete
S in the final chart entry.
8
10.4 The Earley Algorithm
Retrieving Parse Trees from a Chart
8
14
18
19
21
22
23
14
8
18
19
21
22
10.18
9
10.4 The Earley Algorithm
Cost at tree retrieval process
if there are an exponential number of trees for a given sentence, the
algorithm will require an exponential amount of time to return them all.
3
O(N )
The Earley algorithm may fill the table in
it can’t magically return them as quickly.
time but
10
10.5 Finite-State Parsing Methods
Efficient in a partial parse or shallow parse
Recognition of basic phrases(noun groups, verb groups, location,
preposition and etc.)
Extraction of some sort of template in required data
11
10.5 Finite-State Parsing Methods
Finite-state rules for detecting noun groups(NG)
NG → Pronoun|Time-NP|Date-NP
NG → (DETP)(Adjs) HdNns|DETP Ving HdNns|DETP-CP (and HdNns)
DETP → DETP-CP|DETP-INCP
DETP-CP → ({Adv-pre-num|“another”|{Det|Pro-Poss}({Adv-prenum|“only”(“other”)})})Number|Q|Q-er|(“the”)Q-est| “another”|Detcp|DetQ|Pro-Poss-cp
DETP-INCP {{{Det|Pro-Poss}|“only”|“a”|“an”|Det-incomp|Pro-Possincomp}(“other”)|(DET-CP)“other”}
Adjs → AdjP({ “,”|(“,”) Conj}{AdjP|Vparticiple})*
AdjP → Ordinal|{(Q-er|Q-est}{Adj|Vparticiple}+|Number(“-”){“month”|
“day” | “year”}(“-”) “old”}
12
10.5 Finite-State Parsing Methods
Finite-state rules for detecting noun groups(NG) (Ctnd’)
HdNns -> HdNn(“and” HdNn)
HdNn -> PropN|{PreNs|PropNPreNs}N[!Time-NP]
|{PropN CommonN[!Time-NP]}
PreNs -> PreN(“and” PreN2)*
PreN -> (Adj”-”)Common-Sing-N
PreN2 -> PreN|Ordinal|Adj-noun-like
13
10.5 Finite-State Parsing Methods
Fig. 10.20-10.21
14
10.5 Finite-State Parsing Methods
Handling recursion of complete English grammar
Allowing only a limited amount of recursion
FASTUS does this by using its automata cascade
The second level of FASTUS finds non-recursive noun group
The third level combines these groups into larger NP-like units by
– adding on measure phrases
» 20,000 iron and “metal wood” clubs a month
– Attaching preposition phrases
» Production of 20,000 iron and “metal wood” …
– Dealing with noun group conjunction
» A local concern and a Japanese trading house
=> By splitting the parsing into two levels, NP on the left side is treated as a
different kind of object from NP on the right side
15
10.5 Finite-State Parsing Methods
Handling recursion of complete English grammar
Chunk-based partial parsing via a set of finite-set cascades(Abney, 1996)
16
10.5 Finite-State Parsing Methods
Handling recursion of complete English grammar
Recursive Transition Network(RTN)
RTN is defined by a set of graphs like those in Fig.10.20 and Fig.
10.21
Each arc contains a terminal or non-terminal node
Difference between RTN and FSA
– In an RTN, whenever the machine comes to an arc labeled with
a non-terminals, it treats that non-terminal as a subroutine
» It places its current location onto a stack
» It jumps to the non-terminal
» Then it jumps back when that non-terminal has been parsed
RTN is exactly equivalent to a context-free grammar
– A graphical way to view a simple top-down parser for contextfree rules
17