Transcript document

recovering empty categories
Penn Treebank
• The Penn Treebank Project annotates naturally
occurring text for linguistic structure. It produces skeletal
parses showing rough syntactic and semantic
information: a bank of linguistic trees. It annotates text
with POS tags.
• Bracketing (strictly POS Vs. syntax and predicates):
(Mary) (visited a very nice boy) (1)
(A very nice boy) (visited Mary) (2)
(1)  (S (NP Mary) (VP (V visited) (NP (ART a) (ADJP
(ADV very) (ADJ nice)) (N boy))))
syntactic tags
•
ADJP - Adjective phrase. Example: “outrageously expensive”.
•
ADVP - Adverb phrase. Examples: “rather timidly”, “very well indeed”.
•
NP - Noun phrase.
•
PNP - Proper noun phrase.
•
PP - Prepositional phrase.
•
S - Simple declarative clause
•
SBAR - Clause introduced by a subordinating conjunction.
•
SBARQ - Direct question introduced by a wh-word or wh-phrase.
syntactic tags
•
SINV - Inverted declarative sentence, one in which the subject follows the
verb.
•
SQ - That part of an SBARQ that excludes the wh-word or wh-phrase.
•
VP - Verb phrase. Phrasal category headed a verb.
•
WHADVP - Wh-adverb phrase. Example: “how” or “where”.
•
WHNP - Wh-noun phrase. Examples: “who”, “whose daughter”, “which
book”.
•
WHPP - Wh-prepositional phrase. Example: “on what”.
•
QP - Quantifier phrase used within NP.
•
X - Constituent of unknown or uncertain type.
examples
• adverb and preposition: (S (NP He) was (VP (ADVP very
hurriedly) throwing (NP clothes) (PP into NP (a
suitcase))) .)
• apposition: (NP (NP Mr. Smith) , (ADJP (NP 65 years)
old) , (NP chairman (PP of (NP the board))))
• comparative: (S (NP He) (VP is (ADJP as tall (SBAR
as (S (NP John) (VP is))))) .)
(S (SBAR( X the sooner)
(S our vans hit the road)) ,
(S (X the easier)
(S we will fulfill that obligation)) .)
function tags
• Subject and Predicate NP’s: (S (NP-SBJ I) (VP
consider (S (NP-SBJ Kris) (NP-PRD a fool))))
• Benefactive: (S (NP-SBJ I) (VP baked (NP a cake)
(PP-BNF for (NP Doug))))
• ADV (adverbial noun: “a little bit”), VOC (vocative), DTV
(dative), DIR (direction with PP like from-to), LOC
(locative with PP), MNR (manner), TMP (temporal), CLR
(closely related: predication adjuncts or phrasal verbs),
HLN (headline or dateline), TTL (title), etc.
gapping
• gap coindexing:
(S (S (NP-SBJ-1 Mary) (VP likes) (NP-2 Bach)) and
(S (NP-SBJ=1 Susan) , (NP=2 Beethoven)))
predicate-argument structure
 LIKES(Mary,Bach) & LIKES(Susan,Beethoven)
• (S (NP-SBJ I)
(VP (VP eat (NP-1 breakfast)
(PP-TMP-2 in (NP the morning)))
and (VP (NP=1 lunch)
(PP-TMP=2 in (NP the afternoon)))))
empty categories
• Empty categories or null elements are used for non-local
dependencies, discontinuous constituents, and missing elements.
They are coindexed with their antecedents in the same sentence.
• In addition, if a node has a particular grammatical function (such as
subject) or semantic role (such as location), it has a function tag
indicating that role; empty categories may also have function tags.
NP *
arbitrary or controlled PRO, trace of
NP movement
*T*
trace of A movement (WHNP)
0
null complementizer (i.e. that)
*U*
unit
indexing & *T* examples
• Indices used to express coreference, binding (whmovement), close association (it extraposition)
• (S (NP-SBJ Willie)
(VP knew (SBAR (WHNP-1 who)
(S (NP-SBJ *T*-1)
(VP threw (NP the ball))))))
• (SBARQ (WHNP-1 what)
(SQ are (NP-SBJ you)
(VP thinking (PP-CLR about (NP *T*-1)))) ?)
NP *
• object of passive verb:
(S (NP-SBJ-1 John) (VP was (VP hit (NP *-1) (PP by
(NP a ball)))))
• reduced relative clause:
(NP (NP an agreement) (VP signed (NP *) (PP by (NP
everyone))))
• subjects of participial clauses and gerunds:
(S NP-SBJ-1 I) (VP stopped (S (NP-SBJ *-1) (VP
eating (NP chocolate)))))
(S (NP *) (VP Having (VP carefully considered (NP
his options))))
• adverbial: (S (NP-SBJ-1 She) (VP left, (S-ADV (NPSBJ-2 *-1) (VP offended (NP *-2) (PP by (NP their
remarks))))))
0 and *U*
• that: (S (NP-SBJ I) (VP believe (SBAR 0 (S (NP-SBJ
he) (VP will (VP stay))))))
• WHNP 0: (NP (NP a movie) (SBAR (WHNP-1 0) (S (NPSBJ *) (VP to (VP see (NP *T*-1))))))
• WHADVP 0: (S (NP-SBJ That) (VP is (NP-PRD (NP a
good way (SBAR (WHADVP-1 0) (S (NP-SBJ *) (VP to
(VP keep (ADJP-PRD warm) (ADVP-MNR *T*-1))))))))
• units:
(NP US$ 5 *U*)
(NP (QP between 12% to 13%) *U*)
recovery of empty categories
[Campbell 2004] recovery refers to:
• detection: locate empty categories in the parse tree
• resolution: coindexation with their antecedents, assign
them function tags.
NOT learning- or corpus-based, but syntax rule-based.
algorithm for recovery
• Walk the tree from top. At each node X try to insert every
empty category c. If the syntactic context of c (rulebased) is met by X, decide for c. Assign function tags to
X. If X = NP *, try to find antecedent for X.
• rule to insert NP *:
if X is passive VP & X has no complement S
if  postmodifying PP Y  ins NP * before postmodifiers of Y
else ins NP * before postmodifiers of X
else if X is a non-finite S and X has no subject
 ins NP-SBJ * after all premodifiers of X
parameters
• rules make no use of lexical information.
• only some function words (aux. or inf. to) but no content.
• for WHADVP: check quality of the head of the NP
relative clause and add function tag to *T* (why: PRP,
how: MNR, when: TMP, etc)
(NP (NP the country) (SBAR (WHADVP-1 where) (S
(NP-SBJ I) (VP live (ADVP-LOC *T*-1)))))
“time to go” ?
• the method depends on the system’s ability to detect
passives, infinitives, modifiers, functional info such as
subject etc.
more rules
• an extra rule inserts NP * as subject of imperative:
(S (NP-VOC Chris),
(NP-SBJ *) (VP go (ADVP-DIR home)) !)
• to find antecedents of NP *:
If non-subject NP *, assign local subject (“John was hit
(NP *) by a ball”).
If NP * subject of a non-finite S, search the tree for
another NP subject (“I stopped (NP-SBJ *) eating
chocolate”).
evaluation
• perfect input: PTB w/o empty categories.
correct recovery: label + string position.
Prec: % correct empty categories / empty categories detected
Rec: % correct empty categories / empty categories in corpus
F1: 2*PR/(P+R)
• perfect input w/o function tags.
evaluation
• Charniak’s parser output as input. PCFG parser based
on the PTB for training/testing.
correct recovery: label + string position.
• low results: errors introduced by the parser and no
function tags on parser output.
learning & lexical-based?
• we need lexical info in some cases: VP (S (VP to…)…)
empty category as subject to S: NP * or NP *T* ?
“I’d like (NP-SBJ *) to have.”
“Everyone seems (NP-SBJ *) to dislike him.”
“John designed telescopes (NP-SBJ *T*) to sit on Kitt
Park.”
“We bought a broom (NP-SBJ *) to sweep the floor with
(NP *T*)”
• the last 2 verbs + to express purpose (PRP).
 combined learning + rule-based for function or subject
tags in NP * and their antecedents (resolution).