Caroline Brun and Caroline Hagege
Download
Report
Transcript Caroline Brun and Caroline Hagege
written by C. Hagège and C.Brun / July 2003/ page 1
Resources for paraphrase detection
Caroline Hagège
[email protected]
Caroline Brun
[email protected]
written by C. Hagège and C.Brun / July 2003/ page 2
Resource types
1. Derivational morphology
2. Deep syntax
3. Domain-specific resources
written by C. Hagège and C.Brun / July 2003/ page 3
1. Derivational morphology
Use of the CELEX database (distributed by the LDC)
http://www.kun.nl/celex/index.html
Hand made revision of the extracted pairs in order to typify the kind
of relations (predicate) between them.
• Automatic extraction of verbs and corresponding deverbal nouns
•Suffixes: +OR, +ER, +ION
•Predicates relating noun-verbs from the same morphological
family (~ 1600 pairs)
Predicate types:
S0 : The noun paraphrases the action expressed by the verb.
e.g. S0(acceleration,accelerate)
S1H : The noun corresponds to the first actant of the action
expressed by the verb and has a human:+ feature.
e.g. S1H(writer,write)
written by C. Hagège and C.Brun / July 2003/ page 4
1. Derivational morphology (cntd)
S1NH : The noun corresponds to the first actant of the action
expressed by the verb and has a human:~ feature.
e.g. S1NH(abbreviation,abbreviate)
S2 : The noun corresponds to the second actant of the action
expressed by the verb.
E.g. S2(affirmation,affirm)
•Automatic extraction of noun and corresponding adjective
•Suffix: +AN
written by C. Hagège and C.Brun / July 2003/ page 5
2. Deep syntax
• Use of Comlex lexicon (Grisham & al. 1994) in order to extract
logical subject/objects of infinitives.
Example 1 “He ordered Peter to go”
SUBJ-N(order,he), OBJ-N(order,Peter), SUBJ-N(go,Peter)
Example 2 “He promised Peter to go”
SUBJ-N(promise,he), OBJ-N(promise,Peter), SUBJ-N(go,he)
• Active-Passive transformation
• Use of verb class alternation (Levin 93)
Example 3 “Acetone burns easily”
SUBJ-N(burn,VARIABLE), OBJ-N(burn,acetone),
written by C. Hagège and C.Brun / July 2003/ page 6
About 120 rules exploiting the derivational morphology and deep
syntactic resources are necessary for the general normalization
grammar.
written by C. Hagège and C.Brun / July 2003/ page 7
3. Domain-specific resource
Hand-made resources. Directly encoded as XIP rules
• Creation of specific relations between lexical items (about 30
relations)
SYNONYMY relations e.g. odor-smell
HASN relation e.g. evaporate-volatility
TURNTO relation e.g. evaporate-vapor
ISAJ relation e.g. burn-burnable
• Elaboration of XIP rules exploiting these relations and the
normalized syntactic analysis (about 150 rules)
If ( SUBJ-N(#1[lem:have],#2) & OBJ-N(#1,#3) &
HASN(#4,#3) )
PROPERTY(#2,#4)
This rule gives equivalent representations to
X has volatility and X evaporates, X has flammability and X burns
etc.