Transcript Slide 1

Prepositional Phrase Attachment
&
Generation of Semantic Relation
Ashish Almeida (03M05601)
Guide: Pushpak Bhattacharyya
Problem Definition
• Semantics Extraction
– English to UNL:
• UNL: Language independent knowledge
representation
– Some important problem
• Prepositional phrase (PP) attachment
• Semantic head detection
• PRO resolution
• Generation of semantic relations
18 July 2015
2
UNL: Semantics Representation
–He read the book on physics
Universal Networking
Language – UNL
• Knowledge representation
through graph
read
• Concepts and relationships
among them
He
book
modifier
• Universal word (UW)
- unique concept
physics
• Relation
- connect two UWs
18 July 2015
3
Example: PP Attachment
He read the book on physics
read
Correct
He
read
book
the
He
on
Incorrect
book
on
the
physics
physics
18 July 2015
4
Overview
• Problem definition
• Previous work
• PP Attachment
• Semantic Head Detection
• PRO resolution in infinitival-to
• Automatic Dictionary Enrichment
• Rules and implementation
• Results & Conclusion
• References
18 July 2015
5
Previous Work
• English to UNL analysis
– P. Bhattacharyya: UNL analysis process
• PP attachment
– Ratnaparakhi: probabilistic approach
– Brill: rule based approach
• Semantic relations
– P.Pantel: detection of different roles of
preposition
18 July 2015
6
PP Attachment
18 July 2015
7
The Sentence Frame [V-N-P-N]
– [ V-NP1-P-NP2 ]
• Attachment problem (V or NP1)
• NP: simple noun phrase without any embedded clause or
prepositional phrase
• Sufficient context information
• Comparison with other’s work
• Example:
He [is reading]V [this book]NP1 [for]P [his exam]NP2.
Solution to PP attachment
- based on argument structure theory.
18 July 2015
8
Argument Structure (AS) of Verb
• Example: He forwarded the mail to John.
– Forward (X, Y)
– Forward (the mail, John)
• The verb takes to-PP as a complement
– The verb also determines the choice of
preposition, i.e., to
• Important clue: the noun after ‘to’ attaches
to verb ‘forward’
18 July 2015
9
Argument Structure: Nouns
• Example: We received [[an invitation] to
the wedding].
– noun attachment
– invitation (wedding)
• Noun ‘invitation’ demands to-PP as an
argument
• Receive (invitation (wedding) )
18 July 2015
10
Augmenting the Dictionary
Entries
[forward] “forward(icl>do)” (V, VOA, #_TO_AR2)
verb
English word
UW
Action verb
Attributes list
2nd argument is
to-prepositional phrase
• Dictionary encodes the knowledge through this attribute (#_TO_AR2)
that the verb ‘forward’ takes to-PP as second argument.
18 July 2015
11
PP Attachment
• In [V-N1-P-N2] frame,
– N2 can attach to V or N1
– It depends on argument taking property of both V and N1
• 2 cases: V may or may not demand P-N2
• 2 cases: N1 may or may not demand P-N2
• While attaching N2 to V or N1, Priority is given
– First to argument-hood
– Second to neighbor-hood
... of V and N1
18 July 2015
12
PP Attachment Table
• Four cases:
for example for the frame [V-N1-of-N2]
V
N1
demands
demands
1 to-PP
to-PP
2 to-PP
No to-PP V
N2
Examples
attaches
to _
N1
I can’t easily give an answer to
the question.
John gave a flower to Mary.
N1
She made several minor
amendments to her essay.
4 No to-PP No to-PP N1
I caught a bus to the coast.
3 No to-PP to-PP
18 July 2015
13
Automatic Dictionary
Enrichment
• Oxford Dictionary (OALD): argument
structure
• WordNet: argument structure
• Penn Treebank corpus: PRO controlledness property of verbs
18 July 2015
14
Using Oxford Dictionary
•
A typical entry in OALD
– E.g. noun addition Second Sense
add•ition noun
……
2 [C] ~ (to sth) a thing that is added to sth else: the latest
addition to our range of cars an addition to the family(=
another child) (NAmE) to build a new addition onto a
house last minute additions to the government’s package
of proposals
“Addition to <something>” indicates that the word ‘addition’
takes to-PP as an argument
Added the feature #_TO_AR1 in the attribute list of the noun ‘addition’.
18 July 2015
15
Semantic Relations
• The semantic relations between verb and its
argument is an idiosyncratic property of the verb
• Semantic relations of arguments are stored in
the lexicon as feature
• Using Beth Levin’s verb category
– Verbs in same class behave similarly
• syntactically and semantically
• Example:
– Give type verbs: give, lend, pay, sell, refund
• Give - #_TO_AR2_, #_TO_AR2_GOL
18 July 2015
16
Semantic Head Detection
case study - of
18 July 2015
17
Semantic Head Detection
• In case of NP involving [N1-of-N2],
• Syntactically, N1 is head
– University of Mumbai
– Bunch of sticks
• Semantically, N1 or N2 can be head
– Bunch of sticks
– Sticks is semantic head
• qua (sticks, bunch)
18 July 2015
18
Example: Semantic Head
V
N1
V
N2
Saw the book of physics
18 July 2015
N1
N2
Drank a cup of milk
19
Partitives
• Dictionary enrichment
• Identified and classified such quantity words
–
–
–
–
–
Numbers- one-third, dozen
Container- can, cup, bag
Collection- bundle, group
Measure- inch, gram
Indefinite amount - drop, dose
• #PARTITIVE attribute is given to such words
18 July 2015
20
Solution: Semantic Head
detection
• Given the sentence frame [N1 of N2], if N1
has the attribute #PARTITIVE then N2
becomes semantic head
• Quantity (qua) relation is generated.
• For example
– Cup of tea
• qua (tea, cup)
18 July 2015
21
PRO Resolution
in to-infinitival Clauses
18 July 2015
22
What is PRO?
•
PRO:
– pronominal, anaphoric
•
•
•
•
He wants [to go]IP .
Hei wants [PROi to go].
Subject of ‘go’ is same as subject of
‘want’, i.e. ‘he’
PRO is co-indexed with the subject ‘he’
18 July 2015
23
PRO: Idiosyncratic
• PRO:
– Subject controlled
• Hei promised me [PROi to come for the party].
– Object controlled
• He ordered usk [PROk to finish the work].
• Promise – subject controlled
• Order – object controlled
• Added as an attribute of the verb
18 July 2015
24
PRO Resolution
• If
– the verb has “sub/obj-cotrpolled-PRO”
property
– and has to-infinitival clause
• Then
– copy the subject/object of that main clause to
the position of PRO and give it same UW-id
(unique identifier).
18 July 2015
25
PRO Realization in UNL
• They promised Mary [to give a party]
18 July 2015
26
Dictionary Enrichment : PRO
((S
(NP-SBJ-1 investors)
(VP continue
(S (NP-SBJ *-1)
(VP to
(VP pour
(NP cash)
(PP-DIR into
(NP money funds))))))
• Penn Tree Bank Corpus
•Annotated with co-indexed PRO
information
• NP-SBJ-1 is also subject of toclause *-1
Thus the verb ‘continue’ will get
attribute ‘subject-controlled-pro’
.))
E.g.: They ____ him to write the letter.
English Wordnet provide such frames against verbs,
which indicates that the verb takes to-inf as an argument
18 July 2015
27
Implementation
18 July 2015
28
UNL system
Dictionary
English
sentence
Enconnvertor
UNL
expression
Rule-base
For English
18 July 2015
29
Enconvertor: Analysis
• Enconvertor
– Rules based
– Similar to Turing machine
– Two analysis heads (windows)
– Many condition heads (windows)
– Move over a sentence
• Usually, word by word
18 July 2015
30
Rules: Shift
• Shift (can move left or right)
– Right shift over a sentence by a word
– For instance,
R{V,^# FOR AR2:::}{N:::}(PRE,#FOR)P60;
Move to the right (R) over the sentence,
if
the left analysis window {V,^# FOR AR2:::} is on verb which does not
expect for-PP as second argument (^ indicates negation)
And right analysis window {N:::} is on noun
And next condition window (PRE, #FOR) matches to a preposition FOR
The rule has absolute priority of 60. (255 is hightest)
18 July 2015
31
Rules: Reduce
• Reduce (delete a node and/or relate it to other node)
– Delete a node and create a relation
<{V,#_FOR_AR2,#_FOR_AR2_rsn:::}{N,FORRES,PRERES::rsn:}P25;
Delete word under right analysis window while creating a reason (rsn)
relation with the verb on its left,
if
The left analysis window {V,#_FOR_AR2,#_FOR_AR2_rsn:::} is on verb
which expects for-PP as second argument (#_FOR_AR2)
And right analysis window {N,FORRES,PRERES::rsn:} is on a noun
which also specifies rsn relation to be created
The rule has absolute priority of 25. (255 is hightest)
18 July 2015
32
Limitations
• Prerequisite:
– word sense disambiguation
– Dictionary contains all words of the sentence
• Multiword or named entity detection is
based on dictionary lookup
• Arbitrary PRO is not handled
18 July 2015
33
Results: PP attachment (of and to)
Sentences
Correct
Incorrect
attachment
/unl
Accuracy
%
V-N1-of-N2
BNC/oxford
1000
886
114
88
V-N1-of-N2
(WSJ data)
661
597
64
90
Sentences
(oxford/BNC)
Correct Role
detection
Correct
UNL/attachment/
PRO resolution
To preposition
100
97
84
To infinitival
100
93
77
18 July 2015
34
Results
• Semantic Head Detection
Total (N1-of-N2)
1140
Total partitives
197 (17.3%)
Recall (partitives detection) 182 (92%)
• Temporal analysis
#Temporal preposition phrases
1326
#Cases of correct UNL
1112
Average accuracy
83.9%
18 July 2015
35
Error analysis
• Inadequate rules
– Missing rules that handle common
phenomena leads to wrong UNL
• Errors in attributes assigned to entries in
dictionary
– Spelling errors, missing attributes etc.
• Idiomatic constructs
18 July 2015
36
Conclusion
• Future work
– It can be applied to other prepositions
• Special cases like ‘of’ and ‘to’ could be investigated
– Clause attachment can similarly be handled
• Key idea
– Enrichment of dictionary automatically/ semiautomatically
• It involves adding syntactic and semantic level attributes
18 July 2015
37
Resources
• A. S. Hornby. 2006. Oxford Advanced
Learner’s Dictionary of Current English. Oxford
University Press, Oxford.
• Chris Greaves. 2006. Web Concordancer,
http://www.edict.com.hk
• George Miller. 2003. WordNet 2.0.
http://wordnet.princeton.edu/
• M. Marcus, G. Kim and M. Marcinkiewicz. 1994.
The Penn Treebank: annotating predicateargument structure. ARPA.
18 July 2015
38
References
• UNDL Foundation. 2003. The Universal Networking Language
(UNL) specifications version 3.2. http://www.unlc.undl.org
• Jignashu Parikh, Jagadish Khot, Shachi Dave and Pushpak
Bhattacharyya. 2004. Predicate Preserving Parsing. European
Union Working Conference on Sharing Capability in Localization
and Human Language Technologies (SCALLA04), Kathmandu,
Nepal
• Jane Grimshaw. 1990. Argument Structure. The MIT Press,
Cambridge, Mass.
• E. Brill and R. Resnik. 1994. A Rule based approach to
Prepositional Phrase Attachment disambiguation. Proc. of the
fifteenth International conference on computational linguistics.
Kyoto.
• Adwait Ratnaparkhi. 1998. Statistical Models for Unsupervised
Prepositional Phrase Attachment. Proceedings of COLING-ACL.
http://www.cis.upenn.edu/ adwait/statnlp.html
18 July 2015
39
Contribution
• R. K. Mohanty, A. Almeida, Srinivas S. and P.
Bhattacharyaa. 2004. The complexity of OF.
ICON, Hyderabad, India.
• A. Almeida and P. Bhattacharyya. 2007.
Semantics of ‘to’ ICCTA 2007, Kolkata, India.
• R. K. Mohanty, A. Almeida and P. Bhattacharyaa.
2005. Prepositional Phrase Attachment and
Interlingua.CCLING-2005 Workshop, Mexico,
India.
18 July 2015
40
Thanks
18 July 2015
41
Questions asked by reviewers and
answers
18 July 2015
42
Questions - Prof. S. Kaushik
• The lexicon carries lot of information which will make
development of lexicons very difficult task. Subsequently
this will make processing slow and inefficient. Comment
on this.
• The entries in the lexicon has following structure
• [Head-word] “Universal Word” (attribute list)
• In our work, we have been adding more attributes into this attribute
list. This does not complicate the dictionary. In MT based system it is
common practice to have many attributes for each word in the
lexicon. Addition of more attribute to the words has no effect on
number of entries in the dictionary. However, if the dictionary size
increase, the dictionary access can be made faster with the help of
database storage and proper indexing scheme.
• Also, We have tried to address the issue of creating/ enriching the
lexicon automatically through annotated corpus/ oxford dictionary to
simplify the dictionary creation.
18 July 2015
43
• Are the existing lexicons and rules scalable?
– Existing lexicon and rules are scalable.
– We can add more entries into lexicon. It uses
indexing, so that there will be little difference
in speed since the access time will be in
terms of O(log n).
– Rules can also be extended. Though for a
given language (say English) rules will be
finite in number. Thus there will not be any
sizable increase in the number of rules.
18 July 2015
44
• Can your approach be extended for other
languages?
– This work is done specifically for English. It
uses heavily argument structure information
and word properties.
– But the linguistic theory can also be applied
while solving similar problems in other
languages. The algorithm developed for
attachment can be tried out on languages
which have structure similar to English.
18 July 2015
45
Questions – Prof. SasiKumar
18 July 2015
46
• How significant is the UNL base for the work reported
here? If the translation framework was something else,
how much would that affect the work done?
– UNL is a well known interlingua. Some other interlinguas are
LCS (Lexical Conceptual Structure) by Dorr and Conceptual
Structures. These interlinguas do not have computer information
support. Since there representation is complex compared to
UNL. There is a universal language called Esperanto. But it also
lacks preciseness and hence is difficult to represent in the
computer.
– Any framework will have two parts: enconversion and
deconversion. Difficulty of analysis depends on how deeply that
framework encodes the knowledge. Besides, this work is based
on argument structure theory and semantic properties of the
words. Hence any framework can be used for this.
18 July 2015
47
• What was the methodology adopted for the
analysis reported in chapters 4-7?
– Our approach is based on linguistic theory and
principles. The process involves corpus lookup,
extraction of different syntactic patterns form the
corpus and its analysis. We relied mainly on
concordance search on Brown corpus and BNC
corpus. Initially, we focused on analysis of sentences
with only of-PPs. For testing we used sentences from
BNC corpus and WSJ data-set used by Ratnaparkhi.
– For study of partitives, we manually looked for
partitives in the corpus in addition to using thesaurus
and Wordnet ontologies.
– For dictionary enrichment, we referred to various
available resources. We explored them to extract
desired features for the dictionary.
18 July 2015
48
• How do you know if the categories identified for
this analysis are exhaustive? Are there
alternative ways to categorise? Is there a basis
for categoraisation?
– For verbs, we used Beth Levin work on verb
classification and Wordnet. Wordnet ontologies are
used for noun categories.
– In the case of prepositions, we tried to categorize
prepositions according to their roles, i.e., temporal,
spatial, manner etc. But except for temporal, we were
not able to do much work in this direction. We found
that unless we do analysis of each preposition
individually, it would be difficult to categorize them. So
we chose to do complete analysis of individual
prepositions. This led us to select much common
prepositions such as of and to.
18 July 2015
49