Transcript Slide
Syntactic Contributions in the
Entailment Task
Lucy Vanderwende,
Arul Menezes,
Rion Snow (Stanford)
RTE-1 analysis
• Recap of MSR’s manual analysis of RTE-1 test data; in
principle, 74% is achievable using syntax and thesaurus
Without
thesaurus
Using thesaurus
True
69 (9%)
147 (18%)
False
197 (25%)
243 (30%)
Not syntax
534 (67%)
410 (51%)
RTE-1 analysis
• Recap of MSR’s manual analysis of RTE-1 test data; in
principle, 74% is achievable using syntax and thesaurus
Without
thesaurus
Using thesaurus
True
69 (9%)
147 (18%)
False
197 (25%)
243 (30%)
Not syntax
534 (67%)
410 (51%)
MENT algorithm
Predicting negative entailment using syntactic features:
Obtain syntactic dependency graphs for T and H sentences
Attempt to align each H node to a node in T
Check syntactic heuristics on aligned nodes
if match, then predict false
If no match, use lexical similarity model (with threshold)
MENT: heuristic alignment
MENT: superlative heuristic
Superlative heuristic (100% accurate, 5 test items):
– If the superlatives align, and their heads are aligned, and the
head in Text has any additional modifiers, and those modifiers
are aligned to some modifier in H, say yes, else say no.
(RTE2-test- #477)
• Crater Lake is the deepest lake in the United States, the
second deepest in the Western Hemisphere, and the
seventh deepest in the world, dropping downward to
1,932 feet just southeast of Merriam Cone.
• Crater Lake is the deepest lake in the world.
MENT: superlative heuristic
Superlative heuristic (100% accurate, 5 test items):
– If the superlatives align, and their heads are aligned, and the
head in Text has any additional modifiers, and those modifiers
are aligned to some modifier in H, say yes, else say no.
(RTE2-test- #477)
• Crater Lake is the deepest lake in the United States, the
second deepest in the Western Hemisphere, and the
seventh deepest in the world, dropping downward to
1,932 feet just southeast of Merriam Cone.
• Crater Lake is the deepest lake in the world.
MENT: superlative heuristic
Superlative heuristic (100% accurate, 5 test items):
– If the superlatives align, and their heads are aligned, and the
head in Text has any additional modifiers, and those modifiers
are aligned to some modifier in H, say yes, else say no.
(RTE2-test- #477)
• Crater Lake is the deepest lake in the United States, the
second deepest in the Western Hemisphere, and the
seventh deepest in the world, dropping downward to
1,932 feet just southeast of Merriam Cone.
• Crater Lake is the deepest lake in the world.
MENT: superlative heuristic
Superlative heuristic (100% accurate, 5 test items):
– If the superlatives align, and their heads are aligned, and the
head in Text has any additional modifiers, and those modifiers
are aligned to some modifier in H, say yes, else say no.
(RTE2-test- #477)
• Crater Lake is the deepest lake in the United States, the
second deepest in the Western Hemisphere, and the
seventh deepest lake in the world, dropping downward to
1,932 feet just southeast of Merriam Cone.
• Crater Lake is the deepest lake in the world.
MENT: Counterfactual heuristic
Counterfactual heuristic (80% accurate, 15 test items):
– If there is a pair of aligned nodes, and a second pair of aligned nodes,
and the PATH in the dependency contains a conditional or
counterfactual, say no.
(RTE2-test- #473)
• Blondlot was trying to polarize X-rays when he claimed to
have discovered this new form of radiation.
• Blondlot discovered x-rays.
MENT: Counterfactual heuristic
Counterfactual heuristic (80% accurate, 15 test items):
– If there is a pair of aligned nodes, and a second pair of aligned nodes,
and the PATH in the dependency contains a conditional or
counterfactual, say no.
(RTE2-test- #473)
• Blondlot was trying to polarize X-rays when he claimed to
have discovered this new form of radiation.
• Blondlot discovered x-rays.
MENT: training feature weights
• “run2”: treating a syntactic heuristic match as a
yes/no vote, alignment threshold set using
training data
• “run1”: learning weights (using MaxEnt) for each
syntactic and alignment heuristic, as well as for
sub-components of these heuristics
MENT: results
Run1 (with feature
weights)
Run2
Training (1717 sents)
67.79
65.40
Dev (450 sents)
66.22
63.77
RTE2 test (800 sents)
60.25
58.50
RUN1
TRUTH
Yes
No
Yes
268
132
No
186
214
MENT Run1 says no 43.25% of the time
MENT variations – no thresholds
• If heuristics apply, say no
• Else say yes
• 56% accurate
• system says no 35%
SYSTEM
TRUTH
Yes
No
Yes
284
116
No
236
164
• Say no, unless
• everything is aligned and no
heuristics apply
• 59.25% accurate
TRUTH
Yes
No
Yes
134
261
• system says no 74.5%
No
65
335
SYSTEM
** Note: Run2 = if no heuristics apply, and alignment score is above a threshold
trained on the training set, then say yes, else no. Accuracy: 58.50
MENT variations – with threshold
• With learned alignment and
syntactic heuristic weights,
with alignment threshold
from training, say no
• Else say yes
• 60.25% accurate
RUN1
TRUTH
Yes
No
Yes
268
132
No
186
214
• System says no 43% of the
time
• Say no, unless
• alignment score is above an
Oracle threshold and no
heuristics apply
• 61.25% accurate
• System says no 70% of the
time
SYSTEM
TRUTH
Yes
No
Yes
168
232
No
75
325
Lessons?
• Use syntactic heuristics and sub-components as
features and apply discriminative training
• Thresholding for lexical similarity isn’t stable
across data sets
• Error Analysis …
bad parses (e.g., rte2 test #550)
How far do you take syntactic heuristics?
Location:
for a pair of aligned verb nodes, if there is an argument in
H, and that argument is aligned to a node in T, say no if that node is
not also the same argument of the aligned verb (applied 7 times, 5
incorrect)
• Brandenburg Gate is one of Berlin's best known
landmarks and is now regarded as one of the greatest
symbols of German unity.
• Brandenburg Gate is in Berlin.
A great heuristic …but
Unaligned Verb: if there is an aligned subject and an aligned object,
then if their verb is not aligned, say no
• This heuristic was not used because of its poor
performance, for example:
– Rodriguez told detectives he never touched the burning
backpack, which was loaded with plastic pipes packed with
gunpowder and BBs.
– The burning backpack contained plastic pipes packed with
gunpowder and BBs.
• Need to learn paraphrase similarity for verbs – see
NAACL-HLT paper forthcoming.
Directions and Plans
• MSR submission available at http://research.microsoft.com/~lucyv/
Might it be possible to have access to all sites’ submissions?
• Need to learn paraphrase similarity for verbs
• More feature engineering
• Different graph-matching strategies to avoid brittleness of syntactic
heuristics
• Find more data for training to build more stable systems
A plug for Pyramids
•
•
•
Conservatives oppose any form of devolution.
The conservatives are opposed to devolution.
The UK’s Tory Prime Minister adamantly resisted calls for devolution of
British rule.
•
•
•
Scotts want self-rule
… as buoyed as most Scotts by North Ireland’s prospective self-rule
Wales is following Scotland, and moving towards a call for an elected
assembly with devolved powers …
•
•
•
•
A self-governing Wales would be part of the EU
… an independent Wales within the European community
… Wales could participate directly in forthcoming EC meetings …
… a fully self-governing Wales within the European Community.
A plug for Pyramids
•
•
•
SCU name, given
by annotator
Conservatives oppose any form of devolution.
The conservatives are opposed to devolution.
The UK’s Tory Prime Minister adamantly resisted calls for devolution of
British rule.
•
•
•
Scotts want self-rule
… as buoyed as most Scotts by North Ireland’s prospective self-rule
Wales is following Scotland, and moving towards a call for an elected
assembly with devolved powers …
Candidate hypothesis?
Candidate Text?
•
•
•
•
A self-governing Wales would be part of the EU
… an independent Wales within the European community
… Wales could participate directly in forthcoming EC meetings …
… a fully self-governing Wales within the European Community.