Recognizing Textual Entailment
Download
Report
Transcript Recognizing Textual Entailment
Recognizing Textual
Entailment Challenge
PASCAL
Suleiman BaniHani
Textual Entailment
Textual entailment recognition: is the
task of deciding, given two text
fragments, whether the meaning of
one text is entailed (can be inferred)
from another text.
Task Definition
Given pairs of small text snippets, referred to as TextHypothesis (T-H) pairs. Build a system that will decide
for each T-H pair whether T indeed entails H or not.
Results will be compared to the manual gold standard
generated by annotators.
Example:
T: Kurdistan Regional Government Prime
Minister Dr. Barham Salih was unharmed
after an assassination attempt.
Prime minister targeted for assassination
Dataset Collection and Application Settings
The dataset of Text-Hypothesis pairs was
collected by human annotators. It consists
of seven subsets
Information Retrieval (IR)
Comparable Documents (CD)
Reading Comprehension (RC)
Question Answering (QA)
Information Extraction (IE)
Machine Translation (MT)
Paraphrase Acquisition (PP)
Approaching textual entailment recognition
Solution approaches can be categorizes
as.
1. Deep analysis or “understanding”
– Using types of linguistic knowledge and
resources to accurately recognize textual
entailment
Patterns of entailment (e.g. lexical relations,
syntactic alternations)
Processing technology (word co-occurrence
statistics, thesaurus, parsing, etc.)
2. Shallow approach
Baseline
Given that half the pairs are FALSE, the
simplest baseline is to label all pairs
FALSE. This will achieve 50% accuracy.
Application of the BLEU (BiLingual
Evaluation Understudy) algorithm
Shallow based on lexical level.
It is based on calculating the percentage of n-grams
for a given translation to the human standard one, a
typical values for N are taken, i.e. 1, 2, 3, 4.
It limits each n-gram appearance to a maximum
frequency.
The result of each n-gram is combined, and a
penalty is added to short text.
Scored 54% for development set, and a 50% in the
test set.
Good results in the CD, bad results in IE and IR.
Problem, does not recognize syntactical or semantics,
such as synonyms and antonyms.
Syntactic similarities
Human annotators were asked to divide the
data set to
True by syntax
False by syntax
Not syntax
Cannot decide
Then using a robust parser to establish the
results.
A partial submission was provided. And
humans were used for the test.
Tree edit distance
The text as well as the Hypothesis is transformed to a
tree using a sentence splitter and a parser to create
the syntactic representation.
A matching module, find the beast sequence of editing
operations to obtain H from T.
Each editing operation (deletion, insertion and
substitution) is given a relative score.
Finally the total score is evaluated, if it exceeds a
certain limit them the pair is labeled as true.
High accuracy for CD but 55% overall accuracy.
Should be enriched by using resources as WordNet
and other libraries.
Dependency Analysis and WordNet
A dependency parser is used to normalized data in
appropriate tree representation.
Then a lexical entailment module is used, where the sub
branches of T an H can be entailed from the other using
Synonymy and similarity
Hyponymy and WordNet entailment, i.e. death entail kill.
Multiwords, i.e. melanoma entails skin-cancer.
Negation and antonymy, where negation is propagated
through tree leaves.
A matching between dependency trees using a matching
algorithm searching for matching branches between T and
H.
Results show high score in CD and a between 42 to 55 % in
other fields
Syntactic Graph Distance: a rule
based and a SVM based approach
Use a graph distance theory, where a graph
is used to represent the H and T pair.
Use similarity measures to determine
entailment
T semantically subsumes H, e.g. H: [The cat
eats the mouse] and T: [the cat devours the
mouse], eat generalizes devour).
T syntactically subsumes H, e.g., H: [The cat
eats the mouse] and T: [the cat eats the mouse
in the garden], T contains a specializing
prepositional phrase).
T directly implies H (e.g., H: [The cat killed the
mouse], T: [the cat devours the mouse]).
Cont.
A rule based system realize the following
Node similarity
Syntactic similarity
Semantic similarity
Applying a machine learning technique to
evaluate the parameters and make the final
decision
Results high for CD .76 and .44-.59 for
others
hierarchical knowledge
representation
A hierarchical logic passed representation o the T H
pairs, where a description logic inspired language is
used, extended feature description login (EFDL) which
is similar to concept graph.
Nodes in the graph represent words or phrases.
Manually generated rewriting rules are used for
semantic and syntactic representations.
A sentence in the text can have different alternatives
The evaluation is based if any of the sentence
representations can infer H.
Results in the system set 64.8 while in the test 56.1,
high CD lowest QA 50%
Logic like formula representation
A parser is used to transfer the pair
T and H to graph, of logical
phrases, where the nodes are the
words and the links are the
relations.
A matching score is given for each
pair of terms.
The theorem proof is used to find
the proof with the lowest coast.
The final cost is evaluated is it is
less than a threshold, then the
entailment is proved.
High results in the CD 79%, Lowest
with MT 47% average 55%.
Atomic Propositions
Find entailment relation by
comparing the atomic proposition
contained in the T and H.
The comparison of the atomic
propositions is done using a
deduction system OTTER.
The atomic propositions are
extracted from the text using a
parser.
WordNet is used for word
relations.
A semantic analyzer is used to
transform the output of the parser
to first order logic.
Low accuracy .5188 especially for
QA 47%.
Combining shallow over lapping
technique with deep theorem
proving
In the shallow stage a simple frequency test of over lapping
words is used.
In the deep stage CCG–parser is used to generate DRS,
discourse representation theory. Which is transformed to
first order logic.
Vampire theorem prover and Paradox where used for
entailment proof.
A knowledge base was used to validate results with real
world.
WordNet
Geographical knowledge from CIA
Generic axioms for, for instance, the semantics of possessives,
active-passives, and locations.
The combined system has accuracy of .562 while the
shallow approach has an accuracy of 0.55.
Applying COGEX logic prover
First use parser to convert into logic.
Then use COGEX, which is a modified version of OTTER.
The prover requires a set of clauses called the “set of
support” which is used to initiate the search for inferences.
The set of support is loaded with the negated form of the
hypothesis as well as the predicates that make up the text
passage.
Another list is required called the usable list, contains
clauses used by OTTER to generate inferences.
The usable list consists of all the axioms that have been
generated either automatically or by hand.
World Knowledge Axioms (Manually)
NLP Axioms(SS and SM)
WordNet Lexical Chains
Overall accuracy .551, a lot of errors in the parsing stage.
Comparing task accuracy
Average Accuracy By Task
80.0
73.3
Accuracy %
70.0
60.0
50.0
47.7
48.3
50.7
51.9
51.2
50.5
QA
MT
PP
IE
IR
RC
40.0
30.0
20.0
10.0
0.0
CD
Task
CD – Comparable Documents
IE – Information Extraction
QA – Question Answering
IR – Information Retrieval
MT – Machine Translation
RC – Reading Comprehension
PP – Paraphrasing
Rsullt comparison
Future work
Search for a candidate parser to
transform NL to first order logic.
Use the largest set of KB to caputre
similarity.
Search for a robust theorem prover.
References
The first PASCAL Recognising Textual
Entailment Challenge (RTE I)
Ido Dagan, Oren Glickman and Bernardo
Magnini. The PASCAL Recognising Textual
Entailment Challenge.
In Proceedings of the PASCAL Challenges
Workshop on Recognising Textual
Entailment, 2005.
http://www.cs.biu.ac.il/~glikmao/rte05/ind
ex.html