Transcript Slide 1

10. Parsing with Context-free Grammars
-Speech and Language Processing-
발표자: 정영임
발표일: 2007. 8. 7.
10.4 The Earley Algorithm
Earley Algorithm
 Dynamic programming
 Solution for those three parsing problems
 Information Represented by
 Chart: N+1 entries (N: Number of words)
 Dotted rule
e.g.) S → • VP [0,0]
2
10.4 The Earley Algorithm
Fig.10.16 The Earley algorighm
3
10.4 The Earley Algorithm
Predictor
 To create new states representing top-down expectations
 is applied to any state that has a non-terminal immediately to the right of its
dot that is not a part-of-speech category
 results in the creation of one new state for each alternative expansion of
that non-terminal provided by the grammar
 begins and ends at the point in the input where the generating state ends.
 Example
 S→• VP, [0,0]
Predictor
4
10.4 The Earley Algorithm
Scanner
 is called to examine the input and incorporate a state corresponding to
the prediction of a word with a particular part-of-speech into the chart.
 is accomplished by creating a new state from the input state with the
dot advanced over the predicted input category.
 Example
 VP→• Verb NP, [0,0]
 Scanner consults the current word in the input since the category following
the dot is a part-of-speech.
 It notes that book can be a verb, matching the expectation in the current
state
 This results in the creation of the new state VP → Verb• NP, [0,1].
 The new state is added to the chart entry that follows the one currently
being processed
5
10.4 The Earley Algorithm
Completer
 is applied to a state when its dot has reached the right end of the rule.
 is to find, and advance, all previously created states that were looking
for this grammatical category at this position in the input.
 New states are then created by copying the older state, advancing the
dot over the expected category, and installing the new state in the
current chart entry.
 Example
 NP→ Det Nominal• [1,3]
 Completer looks for states ending at 1 expecting an NP
– VP→ Verb • NP, [0,1]
 This results in the addition of a new complete state VP→ Verb NP •, [0,3]
6
10.4 The Earley Algorithm
Fig.10.17 An Example “Book that filght.”
7
10.4 The Earley Algorithm
Retrieving Parse Trees from a Chart
 the version of the Earley algorithm is a recognizer not a parser.
 valid sentences will leave the state S → α •, [0,N] in the chart.
 Extraction of individual parses from the chart
 the representation of each state must be augmented with an additional field
to store information about the completed states that generated its
constituents.
 change necessary is to have COMPLETER add a pointer to the older state
onto a list of constituent-states for the new state.
 following pointers starting with the state (or states) representing a complete
S in the final chart entry.
8
10.4 The Earley Algorithm
Retrieving Parse Trees from a Chart
8
14
18
19
21
22
23
14
8
18
19
21
22
10.18
9
10.4 The Earley Algorithm
Cost at tree retrieval process
 if there are an exponential number of trees for a given sentence, the
algorithm will require an exponential amount of time to return them all.
3
O(N )
The Earley algorithm may fill the table in
it can’t magically return them as quickly.
time but
10
10.5 Finite-State Parsing Methods
Efficient in a partial parse or shallow parse
 Recognition of basic phrases(noun groups, verb groups, location,
preposition and etc.)
 Extraction of some sort of template in required data
11
10.5 Finite-State Parsing Methods
Finite-state rules for detecting noun groups(NG)
NG → Pronoun|Time-NP|Date-NP
NG → (DETP)(Adjs) HdNns|DETP Ving HdNns|DETP-CP (and HdNns)
DETP → DETP-CP|DETP-INCP
DETP-CP → ({Adv-pre-num|“another”|{Det|Pro-Poss}({Adv-prenum|“only”(“other”)})})Number|Q|Q-er|(“the”)Q-est| “another”|Detcp|DetQ|Pro-Poss-cp
 DETP-INCP {{{Det|Pro-Poss}|“only”|“a”|“an”|Det-incomp|Pro-Possincomp}(“other”)|(DET-CP)“other”}
 Adjs → AdjP({ “,”|(“,”) Conj}{AdjP|Vparticiple})*
 AdjP → Ordinal|{(Q-er|Q-est}{Adj|Vparticiple}+|Number(“-”){“month”|
“day” | “year”}(“-”) “old”}




12
10.5 Finite-State Parsing Methods
Finite-state rules for detecting noun groups(NG) (Ctnd’)
 HdNns -> HdNn(“and” HdNn)
 HdNn -> PropN|{PreNs|PropNPreNs}N[!Time-NP]
|{PropN CommonN[!Time-NP]}
 PreNs -> PreN(“and” PreN2)*
 PreN -> (Adj”-”)Common-Sing-N
 PreN2 -> PreN|Ordinal|Adj-noun-like
13
10.5 Finite-State Parsing Methods
Fig. 10.20-10.21
14
10.5 Finite-State Parsing Methods
Handling recursion of complete English grammar
 Allowing only a limited amount of recursion
 FASTUS does this by using its automata cascade
 The second level of FASTUS finds non-recursive noun group
 The third level combines these groups into larger NP-like units by
– adding on measure phrases
» 20,000 iron and “metal wood” clubs a month
– Attaching preposition phrases
» Production of 20,000 iron and “metal wood” …
– Dealing with noun group conjunction
» A local concern and a Japanese trading house
=> By splitting the parsing into two levels, NP on the left side is treated as a
different kind of object from NP on the right side
15
10.5 Finite-State Parsing Methods
Handling recursion of complete English grammar
 Chunk-based partial parsing via a set of finite-set cascades(Abney, 1996)
16
10.5 Finite-State Parsing Methods
Handling recursion of complete English grammar
 Recursive Transition Network(RTN)
 RTN is defined by a set of graphs like those in Fig.10.20 and Fig.
10.21
 Each arc contains a terminal or non-terminal node
 Difference between RTN and FSA
– In an RTN, whenever the machine comes to an arc labeled with
a non-terminals, it treats that non-terminal as a subroutine
» It places its current location onto a stack
» It jumps to the non-terminal
» Then it jumps back when that non-terminal has been parsed
 RTN is exactly equivalent to a context-free grammar
– A graphical way to view a simple top-down parser for contextfree rules
17