FALCON: Boosting Knowledge for Answer Engines

Download Report

Transcript FALCON: Boosting Knowledge for Answer Engines

Semantic Inference for
Question Answering
Sanda Harabagiu
Department of Computer Science
University of Texas at Dallas
and
Srini Narayanan
International Computer Science Institute
Berkeley, CA
Outline

Part I. Introduction: The need for
Semantic Inference in QA





Current State-of-the-art in QA
Parsing with Predicate Argument Structures
Parsing with Semantic Frames
Special Text Relations
Part II. Extracting Semantic Relations
from Questions and Texts


Knowledge-intensive techniques
Supervised and unsupervised techniques
Outline

Part III. Knowledge representation and
inference






Representing the semantics of answers
Extended WordNet and abductive inference
Intentional Structure and Probabilistic Metonymy
An example of Event Structure
Modeling relations, uncertainty and dynamics
Inference methods and their mapping to answer
types
Outline

Part IV. From Ontologies to Inference




From OWL to CPRM
FrameNet in OWL
FrameNet to CPRM mapping
Part V. Results of Event Structure
Inference for QA



AnswerBank examples
Current results for Inference Type
Current results for Answer Structure
The need for Semantic
Inference in QA

Some questions are complex!

Example:


How can a biological weapons program be detected ?
Answer: In recent months, Milton Leitenberg, an expert on
biological weapons, has been looking at this murkiest and most
dangerous corner of Saddam Hussein's armory. He says a series of
reports add up to indications that Iraq may be trying to develop a
new viral agent, possibly in underground laboratories at a military
complex near Baghdad where Iraqis first chased away inspectors
six years ago. A new assessment by the United Nations suggests
Iraq still has chemical and biological weapons - as well as the
rockets to deliver them to targets in other countries. The UN
document says Iraq may have hidden a number of Scud missiles,
as well as launchers and stocks of fuel. US intelligence believes
Iraq still has stockpiles of chemical and biological weapons and
guided missiles, which it hid from the UN inspectors.
Complex questions

Example:


How can a biological weapons program be detected ?
This question is complex because:


It is a manner question
All other manner questions that were evaluated in TREC were asking
about 3 things:
• Manners to die, e.g. “How did Cleopatra die?”, “How did Einstein die?”
• Manners to get a new name, e.g. “How did Cincinnati get its name?”
• Manners to say something in another language, e.g. “How do you say
house in Spanish?”

The answer does not contain any explicit manner of detection
information, instead it talks about reports that give indications that Iraq
may be trying to develop a new viral agent and assessments by the United
Nations suggesting that Iraq still has chemical and biological weapons
Complex questions and
semantic information

Complex questions are not characterized only by a question
class (e.g. manner questions)




Example: How can a biological weapons program be detected ?
Associated with the pattern “How can X be detected?”
And the topic X = “biological weapons program”
Processing complex questions is also based on access to the
semantics of the question topic


The topic is modeled by a set of discriminating relations, e.g.
Develop(program); Produce(biological weapons);
Acquire(biological weapons) or stockpile(biological weapons)
Such relations are extracted from topic-relevant texts
Alternative semantic
representations

Using PropBank to access a 1 million word
corpus annotated with predicate-argument
structures.(www.cis.upenn.edu/~ace)

We can train a generative model for recognizing the
arguments of each predicate in questions and in the
candidate answers.

Example: How can a biological weapons program be
detected ?
Predicate: detect
Argument 0 = detector : Answer(1)
Argument 1 = detected: biological weapons
Argument 2 = instrument : Answer(2)
Expected
Answer
Type
More predicate-argument
structures for questions

Example: From which country did North Korea import its
missile launch pad metals?
Predicate: import
Argument 0: (role = importer): North Korea
Argument 1: (role = commodity): missile launch pad metals
Argument 2 (role = exporter): ANSWER

Example: What stimulated India’s missile programs?
Predicate: stimulate
Argument 0: (role = agent): ANSWER (part 1)
Argument 1: (role = thing increasing): India’s missile programs
Argument 2 (role = instrument): ANSWER (part 2)
Additional semantic
resources

Using FrameNet
•

frame-semantic descriptions of several thousand English lexical
items with semantically annotated attestations
(www.icsi.berkeley.edu/~framenet)
Example: What stimulated India’s missile programs?
Frame: STIMULATE
Frame Element: CIRCUMSTANCES: ANSWER (part 1)
Frame Element: EXPERIENCER: India’s missile programs
Frame Element: STIMULUS: ANSWER (part 2)
Frame: SUBJECT STIMULUS
Frame Element: CIRCUMSTANCES: ANSWER (part 3)
Frame Element: COMPARISON SET: ANSWER (part 4)
Frame Element: EXPERIENCER: India’s missile programs
Frame Element: PARAMETER: nuclear/biological proliferation
Semantic inference for Q/A

The problem of classifying questions



The problem of recognizing answer
types/structures



E.g. “manner questions”:
Example “How did Hitler die?”
Should “manner of death” by considered an answer type?
What other manner of event/action should be considered as
answer types?
The problem of extracting/justifying/
generating answers to complex questions



Should we learn to extract “manner” relations?
What other types of relations should we consider?
Is relation recognition sufficient for answering complex
questions? Is it necessary?
Manner-of-death
In previous TREC evaluations 31 questions asked about
manner of death:
 “How did Adolf Hitler die?”
 State-of-the-art solution (LCC):




We considered “Manner-of-Death” as an answer type, pointing
to a variety of verbs and nominalizations encoded in WordNet
We developed text mining techniques for identifying such
information based on lexico-semantic patterns from WordNet
Example:
• [kill #sense1 (verb) – CAUSE  die #sense1 (verb)]
Source of the troponyms of the [kill #sense1 (verb)] concept
are candidates for the MANNER-OF-DEATH hierarchy
• e.g., drown, poison, strangle, assassinate, shoot
Practical Hurdle


Not all MANNER-OF-DEATH concepts are lexicalized as a verb
 we set out to determine additional patterns that capture such cases
Goal: (1) set of patterns
(2) dictionaries corresponding to such patterns
 well known IE technique: (IJCAI’99, Riloff&Jones)
X DIE
be killed
in ACCIDENT
X DIE
be killed
{from|of} DISEASE
X DIE
after suffering MEDICAL
suffering of
CONDITION

seed: train, accident,
(ACCIDENT) car wreck
seed: cancer
(DISEASE) AIDS
seed: stroke,
(ACCIDENT) complications
caused by diabetes
Results: 100 patterns were discovered
Outline

Part I. Introduction: The need for
Semantic Inference in QA
Current State-of-the-art in QA
 Parsing with Predicate Argument
Structures
 Parsing with Semantic Frames
 Special Text Relations

Answer types in
State-of-the-art QA systems
Docs
Question
Question
Expansion
Answer Type
Prediction
IR
Ranked set
of passages
Answer
Selection
Answer
answer type
Answer Type Hierarchy
Features
Answer type


Labels questions with answer type based on a taxonomy
Classifies questions (e.g. by using a maximum entropy model)
In Question Answering two
heads are better than one

The idea originated in the IBM’s PIQUANT project

Traditional Q/A systems employ a pipeline approach:
• Questions analysis
• Document/passage retrieval
• Answer selection


Questions are classified based on the expected answer type
Answers are also selected based on the expected answer
type, regardless of the question class
Motivated by the success of ensemble methods in machine
learning, use multiple classifiers to produce the final output for
the ensemble made of multiple QA agents
 A multi-strategy, multi-source approach.
Multiple sources,
multiple agents
Knowledge Source
Portal
QGoals
QUESTION
Question
Analysis
Answer
Classification
Q-Frame
Answer Type
QPlan
Generator
Answering Agents
Predictive Annot.
Answering Agents
WordNet
Cyk
Web
Statistical
Answering Agents
Definitional Q.
Answering Agents
QPlan
Executor
KSP-Based
Answering Agents
Pattern-Based
Answering Agents
ANSWER
Answers
Answer Resolution
CNS
TREC
Semantic
Search
Keyword
Search
AQUAINT
Multiple Strategies

In PIQUANT, the answer resolution strategies consider that
different combinations of the questions processing,
passage retrieval and answer selection from different
agents is ideal.

This entails the fact that all questions are processed
depending on the questions class, not the question type
• There
about
• There
yet in
are multiple question classes, e.g. “What” questions asking
people, “What” questions asking about products, etc.
are only three types of questions that have been evaluated
systematic ways:
• Factoid questions
• Definition questions
• List questions

Another options is to build an architecture in which
question types are processed differently, and the semantic
representations and inference mechanisms are adapted for
each question type.
The Architecture of LCC’s QA
System
Question Processing
Question Parse
Factoid
Question
Document Processing
Single Factoid
Passages
Multiple
List
Passages
Semantic
Transformation
Factoid Answer Processing
Answer Extraction
Answer Justification
Answer Reranking
List
Question
Recognition of
Expected
Answer Type
Multiple
Definition
Passages
Keyword Extraction
Named Entity
Recognition
(CICERO LITE)
Definition
Question
Question Processing
Question Parse
Pattern Matching
Keyword Extraction
Theorem Prover
Passage Retrieval
Axiomatic Knowledge
Base
Document Index
List Answer Processing
Answer Extraction
Answer Type
Hierarchy
(WordNet)
Factoid
Answer
List
Answer
Threshold Cutoff
AQUAINT
Document
Collection
Pattern
Repository
Definition Answer Processing
Answer Extraction
Pattern Matching
Definition
Answer
Extracting Answers for
Factoid Questions


In TREC 2003 the LCC QA system extracted 289
correct answers for factoid questions
The Name Entity Recognizer was responsible for
234 of them
QUANTITY
55
ORGANIZATION
15
PRICE
3
NUMBER
45
AUTHORED WORK
11
SCIENCE NAME
2
DATE
35
PRODUCT
11
ACRONYM
1
PERSON
31
CONTINENT
5
ADDRESS
1
COUNTRY
21
PROVINCE
5
ALPHABET
1
OTHER LOCATIONS
19
QUOTE
5
URI
1
CITY
19
UNIVERSITY
3
Special Case of Names
Questions asking for names of authored works
1934: What is the play “West Side Story” based on?
Answer: Romeo and Juliet
1976: What is the motto for the Boy Scouts?
Answer: Driving Miss Daisy
1982: What movie won the Academy Award for best picture in
1989?
Answer: Driving Miss Daisy
2080: What peace treaty ended WWI?
Answer: Versailles
2102: What American landmark stands on Liberty Island?
Answer: Statue of Liberty
NE-driven QA

The results of the past 5 TREC evaluations of QA
systems indicate that current state-of-the-art QA
is determined by the recognition of Named
Entities:
 Precision of recognition
 Coverage of name classes
 Mapping into concept hierarchies
 Participation into semantic relations (e.g.
predicate-argument structures or frame
semantics)
Concept Taxonomies

For 29% of questions the QA system
relied on an off-line taxonomy with
semantic classes such as:






Disease
Drugs
Colors
Insects
Games
The majority of these semantic classes
are also associated with patterns that
enable their identification
Definition Questions

They asked about:




PEOPLE (most of them starting with “Who”)
other types of NAMES
general concepts
People questions

Many use the PERSON name in the format [First name, Last
name]
•

Some names had the PERSON name in format [First name, Last
name1, Last name2]
•

example: Antonia Coello Novello
Other names had the name as a single word  very well known
person
•

examples: Aaron Copland, Allen Iverson, Albert Ghiorso
examples: Nostradamus, Absalom, Abraham
Some questions referred to names of kings or princes:
•
examples: Vlad the Impaler, Akbar the Great
Answering definition questions


Most QA systems use between 30-60 patterns
The most popular patterns:
Id
Pattern
Freq.
Usage
Question
25
person-hyponym QP
0.43
%
The doctors also consult with former Italian
Olympic skier Alberto Tomba, along with other
Italian athletes
1907: Who is Alberto
Tomba?
9
QP, the AP
0.28
%
Bausch Lomb, the company that sells contact
lenses, among hundreds of other optical
products, has come up with a new twist on the
computer screen magnifier
1917: What is Bausch
& Lomb?
11
QP, a AP
0.11
%
ETA, a Basque language acronym for Basque
Homeland and Freedom _ has killed nearly 800
people since taking up arms in 1968
1987: What is ETA in
Spain?
13
QA, an AP
0.02
%
The kidnappers claimed they are members of
the Abu Sayaf, an extremist Muslim group, but
a leader of the group denied that
2042: Who is Abu
Sayaf?
21
AP such as QP
0.02
%
For the hundreds of Albanian refugees
undergoing medical tests and treatments at
Fort Dix, the news is mostly good: Most are in
reasonable good health, with little evidence of
infectious diseases such as TB
2095: What is TB?
Complex questions

Characterized by the need of domain knowledge

There is no single answer type that can be identified, but
rather an answer structure needs to be recognized

Answer selection becomes more complicated, since
inference based on the semantics of the answer type
needs to be activated

Complex questions need to be decomposed into a set of
simpler questions
Example of Complex Question
How have thefts impacted on the safety of Russia’s nuclear navy,
and has the theft problem been increased or reduced over time?
Need of domain knowledge
Question decomposition
To what degree do different thefts put nuclear
or radioactive materials at risk?
Definition questions:
 What is meant by nuclear navy?
 What does ‘impact’ mean?
 How does one define the increase or decrease of a problem?
Factoid questions:
 What is the number of thefts that are likely to be reported?
 What sort of items have been stolen?
Alternative questions:
 What is meant by Russia? Only Russia, or also former Soviet
facilities in non-Russian republics?
The answer structure


For complex questions, the answer structure has a
compositional semantics, comprising all the answer
structures of each simpler question in which it is
decomposed.
Example:
Q-Sem: How can a biological weapons program be detected?
Question pattern: How can X be detected?
X = Biological Weapons Program
Conceptual Schemas
INSPECTION Schema
Inspect, Scrutinize, Monitor, Detect, Evasion,
Hide, Obfuscate
POSSESSION Schema
Acquire, Possess, Develop,
Deliver
Structure of Complex Answer Type: EVIDENCE
CONTENT SOURCE QUALITY JUDGE RELIABILITY
Answer Selection


Based on the answer structure
Example:
Structure of Complex Answer Type: EVIDENCE
CONTENT SOURCE QUALITY JUDGE RELIABILITY

The CONTENT is selected based on:
Conceptual Schemas
INSPECTION Schema
Inspect, Scrutinize, Monitor, Detect, Evasion,
Hide, Obfuscate



POSSESSION Schema
Acquire, Possess, Develop,
Deliver
Conceptual schemas are instantiated when predicate-argument
structures or semantic frames are recognized in the text passages
The SOURCE is recognized when the content source is
identified
The Quality of the Judgements, the Reliability of the
judgements and the Judgements themselves are produced by
an inference mechanism
ANSWER: Evidence-Combined:
Pointer to Text Source:
Answer Structure
A1: In recent months, Milton Leitenberg, an expert on biological weapons, has been looking
at this murkiest and most dangerous corner of Saddam Hussein's armory.
A2: He says a series of reports add up to indications that Iraq may be trying to develop a new
viral agent, possibly in underground laboratories at a military complex near Baghdad where
Iraqis first chased away inspectors six years ago.
A3: A new assessment by the United Nations suggests Iraq still has chemical and
biological weapons - as well as the rockets to deliver them to targets in other countries.
A4:The UN document says Iraq may have hidden a number of Scud missiles, as well as
launchers and stocks of fuel.
A5: US intelligence believes Iraq still has stockpiles of chemical and biological weapons and
guided missiles, which it hid from the UN inspectors
Content: Biological Weapons Program:
develop(Iraq, Viral_Agent(instance_of:new))
Justification: POSSESSION Schema
Previous (Intent and Ability): Prevent(ability, Inspection); Inspection terminated;
Status: Attempt ongoing Likelihood: Medium Confirmability: difficult, obtuse, hidden
possess(Iraq, Chemical and Biological Weapons)
Justification: POSSESSION Schema Previous (Intent and Ability): Prevent(ability, Inspection);
Status: Hidden from Inspectors Likelihood: Medium
possess(Iraq, delivery systems(type : rockets; target: other countries))
Justification: POSSESSION Schema Previous (Intent and Ability): Hidden from Inspectors;
Status: Ongoing Likelihood: Medium
Answer Structure (continued)
ANSWER: Evidence-Combined:
Pointer to Text Source:
A1: In recent months, Milton Leitenberg, an expert on biological weapons, has been looking
at this murkiest and most dangerous corner of Saddam Hussein's armory.
A2: He says a series of reports add up to indications that Iraq may be trying to develop a new
viral agent, possibly in underground laboratories at a military complex near Baghdad where
Iraqis first chased away inspectors six years ago.
A3: A new assessment by the United Nations suggests Iraq still has chemical and
biological weapons - as well as the rockets to deliver them to targets in other countries.
A4:The UN document says Iraq may have hidden a number of Scud missiles, as well as
launchers and stocks of fuel.
A5: US intelligence believes Iraq still has stockpiles of chemical and biological weapons and
guided missiles, which it hid from the UN inspectors
Content: Biological Weapons Program:
possess(Iraq, delivery systems(type : scud missiles; launchers; target: other countries))
Justification: POSSESSION Schema Previous (Intent and Ability): Hidden from Inspectors;
Status: Ongoing Likelihood: Medium
possess(Iraq, fuel stock(purpose: power launchers))
Justification: POSSESSION Schema Previous (Intent and Ability): Hidden from Inspectors;
Status: Ongoing Likelihood: Medium
hide(Iraq, Seeker: UN Inspectors; Hidden: CBW stockpiles & guided missiles)
Justification: DETECTION Schema Inspection status: Past; Likelihood: Medium
Answer Structure (continued)
ANSWER: Evidence-Combined:
Pointer to Text Source:
A1: In recent months, Milton Leitenberg, an expert on biological weapons, has been looking
at this murkiest and most dangerous corner of Saddam Hussein's armory.
A2: He says a series of reports add up to indications that Iraq may be trying to develop a new
viral agent, possibly in underground laboratories at a military complex near Baghdad where
Iraqis first chased away inspectors six years ago.
A3: A new assessment by the United Nations suggests Iraq still has chemical and
biological weapons - as well as the rockets to deliver them to targets in other countries.
A4:The UN document says Iraq may have hidden a number of Scud missiles, as well as
launchers and stocks of fuel.
A5: US intelligence believes Iraq still has stockpiles of chemical and biological weapons and
guided missiles, which it hid from the UN inspectors
Source: UN documents, US intelligence
SOURCE.Type: Assesment reports; Source.Reliability: Med-high Likelihood: Medium
Judge: UN, US intelligence, Milton Leitenberg (Biological Weapons expert)
JUDGE.Type: mixed; Judge.manner; Judge.stage: ongoing
Quality: low-medium; Reliability: low-medium;
State-of-the-art QA:
Learning surface text patterns


Pioneered by Ravichandran and Hovy (ACL-2002)
The idea is that given a specific answer type (e.g. BirthDate), learn all surface patterns that enable the extraction
of the answer from any text passage

Patterns are learned by two algorithms:
Algorithm 1 (Generates Patterns)
Step 1: Select an answer type AT and a question Q(AT)
Step 2: Generate a query (Q(AT) & AT) and submit it to
search engine (google, altavista)
Step 3: Download the first 1000 documents
Step 4: Select only those sentences that contain the
question content words and the AT
Step 5: Pass the sentences through a suffix tree
constructor
Step 6: Extract only the longest matching sub-strings
that contain the AT and the question word
it is syntactically connected with.

Relies on Web redundancy
Algorithm 2 (Measures the Precision of Patterns)
Step 1: Query by using only question Q(AT)
Step 2: Download the first 1000 documents
Step 3: Select only those sentences that contain
the question word connected to the AT
Step 4: Compute C(a)= #patterns matched
by the correct answer;
C(0)=#patterns matched by any word
Step 6: The precision of a pattern is given by:
C(a)/C(0)
Step 7: Retain only patterns matching >5 examples
Results and Problems

Some results:
Answer Type=INVENTOR:
<ANSWER> invents <NAME>
the <NAME> was invented by <ANSWER>
<ANSWER>’s invention of the <NAME>
<ANSWER>’s <NAME> was
<NAME>, invented by <ANSWER>
That <ANSWER>’s <NAME>

Answer Type=BIRTH-YEAR:
<NAME> (<ANSWER>- )
<NAME> was born on <ANSWER>
<NAME> was born in <ANSWER>
born in <ANSWER>, <NAME>
Of <NAME>, (<ANSWER>
Limitations:



Cannot handle long-distance dependencies
Cannot recognize paraphrases – since no semantic knowledge
is associated with these patterns (unlike patterns used in
Information Extraction)
Cannot recognize a paraphrased questions
Shallow semantic parsing

Part of the problems can be solved by using
shallow semantic parsers

Parsers that use shallow semantics encoded as either
predicate-argument structures or semantic frames
• Long-distance dependencies are captured
• Paraphrases can be recognized by mapping on IE
architectures



In the past 4 years, several models for training such
parsers have emerged
Lexico-Semantic resources are available (e.g PropBank,
FrameNet)
Several evaluations measure the performance of such
parsers (e.g. SENSEVAL, CoNNL)
Outline

Part I. Introduction: The need for
Semantic Inference in QA
Current State-of-the-art in QA
 Parsing with Predicate Argument
Structures
 Parsing with Semantic Frames
 Special Text Relations

Proposition Bank Overview
S
NP
VP
VP
PP
NP
The futures halt
ARG1 = entity assailed



was
assailed
PRED
by
Big Board floor traders
ARG0 = agent
A one million word corpus annotated with predicate argument
structures [Kingsbury, 2002]. Currently only predicates
lexicalized by verbs.
Numbered arguments from 0 to 5. Typically ARG0 = agent,
ARG1 = direct object or theme, ARG2 = indirect object,
benefactive, or instrument.
Functional tags: ARMG-LOC = locative, ARGM-TMP = temporal,
ARGM-DIR = direction.
The Model
S
NP
VP
VP
PP
NP
The futures halt
was
assailed
by
Task 1
Big Board floor traders
PRED
ARG1


ARG0
Task 2
Consists of two tasks: (1) identifying parse tree constituents
corresponding to predicate arguments, and (2) assigning a role to
each argument constituent.
Both tasks modeled using C5.0 decision tree learning, and two sets of
features: Feature Set 1 adapted from [Gildea and Jurafsky, 2002],
and Feature Set 2, novel set of semantic and syntactic features
[Surdeanu, Harabagiu et al, 2003].
Feature Set 1
S

NP
VP
VP

PP
NP
The futures halt was assailed by Big Board floor traders
ARG1
PRED


ARG0




PHRASE TYPE (pt): type of the syntactic phrase as
argument. E.g. NP for ARG1.
PARSE TREE PATH (path): path between argument
and predicate. E.g. NP  S  VP  VP for ARG1.
PATH LENGTH (pathLen): number of labels stored in
the predicate-argument path. E.g. 4 for ARG1.
POSITION (pos): indicates if
constituent appears before
predicate in sentence. E.g. true
for ARG1 and false for ARG2.
VOICE (voice): predicate voice
(active or passive). E.g. passive
for PRED.
HEAD WORD (hw): head word of
the evaluated phrase. E.g. “halt”
for ARG1.
GOVERNING CATEGORY (gov):
indicates if an NP is dominated
by a S phrase or a VP phrase.
E.g. S for ARG1, VP for ARG0.
PREDICATE WORD: the verb
with morphological information
preserved (verb), and the verb
normalized to lower case and
infinitive form (lemma). E.g. for
PRED verb is “assailed”, lemma is
“assail”.
Observations about Feature
Set 1




Because most of the argument constituents
are prepositional attachments (PP) and
relative clauses (SBAR), often the head word
(hw) is not the most informative word in the
phrase.
Due to its strong lexicalization, the model
suffers from data sparsity. E.g. hw used <
3%. The problem can be addressed with a
back-off model from words to part of speech
tags.
The features in set 1 capture only syntactic
information, even though semantic
information like named-entity tags should
help. For example, ARGM-TMP typically
contains DATE entities, and ARGM-LOC
includes LOCATION named entities.
Feature set 1 does not capture predicates
lexicalized by phrasal verbs, e.g. “put up”.
PP
in
NP
last June
SBAR
that
S
VP
occurred NP
yesterday
VP
to
VP
be
VP
declared
Feature Set 2 (1/2)




CONTENT WORD (cw): lexicalized feature that selects an
informative word from the constituent, other than the
head. Selection heuristics available in the paper. E.g.
“June” for the phrase “in last June”.
PART OF SPEECH OF CONTENT WORD (cPos): part of
speech tag of the content word. E.g. NNP for the phrase
“in last June”.
PART OF SPEECH OF HEAD WORD (hPos): part of speech
tag of the head word. E.g. NN for the phrase “the futures
halt”.
NAMED ENTITY CLASS OF CONTENT WORD (cNE): The
class of the named entity that includes the content word. 7
named entity classes (from the MUC-7 specification)
covered. E.g. DATE for “in last June”.
Feature Set 2 (2/2)

BOOLEAN NAMED ENTITY FLAGS: set of features that indicate if a named
entity is included at any position in the phrase:








neOrganization: set to true if an organization name is recognized in the phrase.
neLocation: set to true if a location name is recognized in the phrase.
nePerson: set to true if a person name is recognized in the phrase.
neMoney: set to true if a currency expression is recognized in the phrase.
nePercent: set to true if a percentage expression is recognized in the phrase.
neTime: set to true if a time of day expression is recognized in the phrase.
neDate: set to true if a date temporal expression is recognized in the phrase.
PHRASAL VERB COLLOCATIONS: set of two features that capture
information about phrasal verbs:


pvcSum: the frequency with which a verb is immediately followed by any
preposition or particle.
pvcMax: the frequency with which a verb is followed by its predominant
preposition or particle.
Results
Features
Arg P
Arg R
Arg F1
Role A
FS1
84.96
84.26
84.61
78.76
FS1 + POS tag
of head word
92.24
84.50
88.20
79.04
FS1 + content
word and POS
tag
92.19
84.67
88.27
80.80
FS1 + NE label
of content
word
83.93
85.69
84.80
79.85
FS1 + phrase
NE flags
87.78
85.71
86.73
81.28
FS1 + phrasal
verb
information
84.88
82.77
83.81
78.62
FS1 + FS2
91.62
85.06
88.22
83.05
FS1 + FS2 +
boosting
93.00
85.29
88.98
83.74
Other parsers based on PropBank




Pradhan, Ward et al, 2004 (HLT/NAACL+J of ML) report on
a parser trained with SVMs which obtains F1-score=90.4%
for Argument classification and 80.8% for detecting the
boundaries and classifying the arguments, when only the
first set of features is used.
Gildea and Hockenmaier (2003) use features extracted
from Combinatory Categorial Grammar (CCG). The F1measure obtained is 80%
Chen and Rambow (2003) use syntactic and semantic
features extracted from a Tree Adjoining Grammar (TAG)
and report an F1-measure of 93.5% for the core
arguments
Pradhan, Ward et al, use a set of 12 new features and
obtain and F1-score of 93.8% for argument classification
and 86.7 for argument detection and classification
Applying Predicate-Argument
Structures to QA

Parsing Questions
Q: What kind of materials were stolen from the Russian navy?
PAS(Q): What [Arg1: kind of nuclear materials] were [Predicate:stolen]
[Arg2: from the Russian Navy]?

Parsing Answers
A(Q): Russia’s Pacific Fleet has also fallen prey to nuclear theft; in 1/96, approximately 7 kg of
HEU was reportedly stolen from a naval base in Sovetskaya Gavan.
PAS(A(Q)): [Arg1(P1redicate 1): Russia’s Pacific Fleet] has [ArgM-Dis(Predicate 1) also]
[Predicate 1: fallen] [Arg1(Predicate 1): prey to nuclear theft];
[ArgM-TMP(Predicate 2): in 1/96], [Arg1(Predicate 2): approximately 7 kg of HEU]
was [ArgM-ADV(Predicate 2) reportedly] [Predicate 2: stolen]
[Arg2(Predicate 2): from a naval base] [Arg3(Predicate 2): in Sovetskawa Gavan]

Result: exact answer=
“approximately 7 kg of HEU”
Outline

Part I. Introduction: The need for
Semantic Inference in QA
Current State-of-the-art in QA
 Parsing with Predicate Argument
Structures
 Parsing with Semantic Frames
 Special Text Relations

The Model
S
NP
VP
NP
PP
She
clapped
her hands
in inspiration
Body Part
Cause
Task 1
PRED
Agent


Task 2
Consists of two tasks: (1) identifying parse tree constituents
corresponding to frame elements, and (2) assigning a semantic role
to each frame element.
Both tasks introduced for the first time by Gildea and Jurafsky in
2000. It uses the Feature Set 1 , which later Gildea and Palmer used
for parsing based on PropBank.
Extensions

Fleischman et al extend the model in 2003 in
three ways:



Adopt a maximum entropy framework for learning a
more accurate classification model.
Include features that look at previous tags and use
previous tag information to find the highest probability
for the semantic role sequence of any given sentence.
Examine sentence-level patterns that exploit more
global information in order to classify frame elements.
Applying Frame Structures to QA

Parsing Questions
Q: What kind of materials were stolen from the Russian navy?
FS(Q): What [GOODS: kind of nuclear materials] were [Target-Predicate:stolen]
[VICTIM: from the Russian Navy]?

Parsing Answers
A(Q): Russia’s Pacific Fleet has also fallen prey to nuclear theft; in 1/96, approximately 7 kg of
HEU was reportedly stolen from a naval base in Sovetskaya Gavan.
FS(A(Q)): [VICTIM(P1): Russia’s Pacific Fleet] has also fallen prey to [Goods(P1): nuclear ]
[Target-Predicate(P1): theft]; in 1/96, [GOODS(P2): approximately 7 kg of HEU]
was reportedly [Target-Predicate (P2): stolen]
[VICTIM (P2): from a naval base] [SOURCE(P2): in Sovetskawa Gavan]

Result: exact answer=
“approximately 7 kg of HEU”
Outline

Part I. Introduction: The need for
Semantic Inference in QA
Current State-of-the-art in QA
 Parsing with Predicate Argument
Structures
 Parsing with Semantic Frames
 Special Text Relations

Additional types of relations

Temporal relations

TERQUAS ARDA Workshop
Causal relations
 Evidential relations
 Part-whole relations

Temporal relations in QA

Results of the workshop are accessible from
http://www.cs.brandeis.edu/~jamesp/arda/time/documentation/TimeMLuse-in-qa-v1.0.pdf

A set of questions that require the extraction of temporal
relations was created (TimeML question corpus)

E.g.:
• “When did the war between Iran and Iraq end?”
• “Who was Secretary of Defense during the Golf War?”

A number of features of these questions were identified
and annotated

E.g.:
•
•
•
•
Number of TEMPEX relations in the question
Volatility of the question (how often does the answer change)
Reference to repetitive events
Number of events mentioned in the question
Outline

Part II. Extracting Semantic
Relations from Questions and Texts
Knowledge-intensive techniques
 Unsupervised techniques

Information Extraction
from texts

Extracting semantic relations from questions and texts can
be solved by adapting the IE technology to this new task.

What is Information Extraction (IE) ?



The task of finding facts about a specified class of events
from free text
Filling a table in a database with the information – sush a
database entry can be seen as a list of slots of a template
Events are instances comprising many relations that span
multiple arguments
IE Architecture Overview
Phrasal parser
Rules
Entity coreference
Domain event rules
Domain coreference
Templette merging
Rules
Coreference filters
Merge condition
Domain API
Walk-through Example
... a bomb rigged with a trip wire that exploded and killed him...
Parser
... a bomb rigged with a trip wire/NG that/P exploded/VG and/P killed/VG him/NG...
Entity Coref
Domain Rules
him  A Chinese restaurant chef
... a bomb rigged with a trip wire that exploded/PATTERN and killed him/PATTERN...
TEMPLETTE
BOMB: “a bomb rigged with a trip wire”
Domain Coref TEMPLETTE
BOMB: “a bomb rigged with a trip wire”
LOCATION: “MIAMI”
Merging TEMPLETTE
TEMPLETTE
DEAD: “A Chinese restaurant chef”
TEMPLETTE
BOMB: “a bomb rigged with a trip wire”
DEAD: “A Chinese restaurant chef”
BOMB: “a bomb rigged with a trip wire”
DEAD: “A Chinese restaurant chef”
LOCATION: “MIAMI”
Learning domain event rules
and domain relations

build patterns from examples


generalize from multiple examples:
annotated text


Soderland ‘99, Califf ‘99
learning from corpus with relevance
judgements


Crystal, Whisk (Soderland), Rapier (Califf)
active learning: reduce annotation


Yangarber ‘97
Riloff ‘96, ‘99
co-learning/bootstrapping

Brin ‘98, Agichtein ‘00
Changes in IE architecture for enabling
the extraction of semantic relations
Document
Tokenizer
-Addition of Relation Layer
-Modification of NE and
pronominal coreference
to enable relation coreference
-Add a relation merging
layer
Entity
Recognizer
Entity
Coreference
Event
Recognizer
Relation
Recognizer
Relation Merging
EEML File
Generation
EEML Results
Event/Relation
Coreference
Walk-through Example
Entity: Person
Entity: Person
Event: Murder
The murder of Vladimir Golovlyov, an associate of the exiled
Entity: Person
Event: Murder
tycoon Boris Berezovsky, was the second contract killing in
Entity: City
Entity: Time-Quantity
the Russian capital in as many days and capped a week of
Entity: GeopoliticalEntity
Entity: Person
setbacks for the Russian leader.
Walk-through Example
Event-Entity Relation: Victim
Entity-Entity Relation: AffiliatedWith
The murder of Vladimir Golovlyov, an associate of the exiled
Event-Entity Relation: Victim
tycoon Boris Berezovsky, was the second contract killing in
Event-Entity Relation: EventOccurAt
the Russian capital in as many days and capped a week of
Entity-Entity Relation: GeographicalSubregion
setbacks for the Russian leader.
Entity-Entity Relation: hasLeader
Application to QA

Who was murdered in Moscow this week?


Name some associates of Vladimir Golovlyov.


Relations: AffiliatedWith
How did Vladimir Golovlyov die?


Relations: EventOccuredAt + Victim
Relations: Victim
What is the relation between Vladimir Golovlyov
and Boris Berezovsky?

Relations: AffliliatedWith
Outline

Part II. Extracting Semantic
Relations from Questions and Texts
Knowledge-intensive techniques
 Unsupervised techniques

Learning extraction rules
and semantic lexicons

Generating Extraction Patterns : AutoSlog (Riloff
1993), AutoSlog-Ts(Riloff 1996)

Semantic Lexicon Induction: Riloff & Shepherd
(1997), Roark & Charniak (1998), Ge, Hale, &
Charniak (1998), Caraballo (1999), Thompson &
Mooney (1999), Meta-Bootstrapping (Riloff &
Jones 1999), (Thelen and Riloff 2002)

Bootstrapping/Co-training: Yarowsky (1995),
Blum and Mitchell (1998), McCallum & Nigam
(1998)
Generating extraction rules

From untagged text: AutoSlog-TS (Riloff 1996)
STAGE 1
Pre-classified Texts
Sentence
Analyzer

Concept Nodes:
Subject: World Trade Center
Verb: was bombed
PP: by terrorists
AutoSlog
Heuristics
<x> was bombed
by <y>
The rule relevance is measured by:
Relevance rate * log2 (frequency)
STAGE 2
Pre-classified Texts
Concept Node
Dictionary:
<x> was killed
<x> was bombed by <y>
Concept Nodes: REL%
Sentence
Analyzer
<x> was bombed
bombed by <y>
<w> was killed
<z> saw
87%
84%
63%
49%
Learning Dictionaries for IE
with mutual bootrapping

Riloff and Jones (1999)
Generate all candidate extraction rules rom the training corpus using AutoSlog
Apply the candidate extraction rules to the training corpus and save the patterns
With their extractions to EPdata
SemLEx = {seed words}
Cat_EPlist = {}
MUTUAL BOOTSTRAPPING LOOP
1.
Score all extraction rules in Epdata
2.
best_EP = the highest scoring extraction pattern not already in Cat_Eplist
3.
Add best_EP to Cat_Eplist
4.
Add best_EP’s extraction to SemLEx
5.
Go to step 1.
The BASILISK
approach (Thelen & Riloff)
corpus
BASILISK = Bootstrapping Approach to SemantIc
Lexicon Induction using Semantic Knowledge
Key ideas:
extraction patterns and
their extractions
seed
words
semantic
lexicon
1/ Collective evidence over a large set
of extraction patterns can reveal strong
semantic associations.
best
patterns
2/ Learning multiple categories
simultaneously can constrain the
bootstrapping process
extractions
Pattern Pool
5 best candidate words
Candidate
Word Pool
Learning Multiple Categories
Simultaneously
•“One Sense per Domain” assumption: a word belongs to a single semantic category
within a limited domain.
The simplest way to take advantage of multiple categories is to resolve conflicts when they arise.
1. A word cannot be assigned to category X if it has already been assigned to category Y.
2. If a word is hypothesized for both category X and category Y at the same time, choose the
category that receives the highest score.
Bootstrapping multiple categories
Bootstrapping a single category
Kernel Methods for
Relation Extraction

Pioneered by Zelenko, Aone and Richardella (2002)

Uses Support Vector Machines and the Voted Perceptron
Alorithm (Freund and Shapire, 1999)
It operates on the shallow parses of texts, by using two
functions:




A matching function between the nodes of the shallow parse
tree; and
A similarity function between the nodes
It obtains very high F1-score values for relation extraction
(86.8%)
Outline

Part III. Knowledge representation and
inference






Representing the semantics of answers
Extended WordNet and abductive inference
Intentional Structure and Probabilistic
Metonymy
An example of Event Structure
Modeling relations, uncertainty and dynamics
Inference methods and their mapping to
answer types
Three representations

A taxonomy of answer types in which Named
Entity Classes are also mapped.

A complex structure that results from schema
instantiations

Answer type generated by the inference on the
semantic structures
Possible Answer Types
TOP
PERSON LOCATION DATE TIME PRODUCT NUMERICAL MONEY ORGANIZATION MANNER REASON
VALUE
institution,
establishment
clock time
time of day
prime time
midnight
team,
squad
financial educational hockey
institution institution team
DEGREE DIMENSION RATE DURATION PERCENTAGE COUNT
distance,
length
altitude
wingspan
width,
breadth
thickness
integer,
whole number
numerosity,
multiplicity
population denominator
Examples
TOP
PPERSON
RODUCT NUMERICAL MONEY ORGANIZATION MANNER REASON
ERSON LOCATION DATE TIME PPRODUCT
VALUE
PERSON
What
name
played
actress
Shine
What is the name of the
actress that played in Shine?
PRODUCT
What
produce
company
BMW
What does the BMW company
produce?
Outline

Part III. Knowledge representation and
inference






Representing the semantics of answers
Extended WordNet and abductive inference
Intentional Structure and Probabilistic
Metonymy
An example of Event Structure
Modeling relations, uncertainty and dynamics
Inference methods and their mapping to
answer types
Extended WordNet

eXtended WordNet is an ongoing project at the
Human Language Technology Research
Institute, University of Texas at Dallas.
http://xwn.hlt.utdallas.edu/)

The goal of this project is to develop a tool that
takes as input the current or future versions of
WordNet and automatically generates an
eXtended WordNet that provides several
important enhancements intended to remedy
the present limitations of WordNet.

In the eXtended WordNet the WordNet glosses
are syntactically parsed, transformed into logic
forms and content words are semantically
disambiguated.
Logic Abduction
Motivation:
Goes beyond keyword based justification by capturing:
• syntax based relationships
• links between concepts in the question and the candidate answers
QLF
ALF
XWN axioms
NLP axioms
Axiom
Builder
Success
Justification
Answer
Ranking
Ranked
answers
Answer
explanation
Lexical chains
Proof
fails
Relaxation
COGEX= the LCC Logic
Prover for QA
Inputs to the Logic Prover
A logic form provides a mapping of the question and candidate answer text
into first order logic predicates.
Question:
Where did bin Laden 's funding come from other than his own wealth ?
Question Logic Form:
( _multi_AT(x1) ) & bin_NN_1(x2) & Laden_NN(x3) & _s_POS(x5,x4) &
nn_NNC(x4,x2,x3) & funding_NN_1(x5) & come_VB_1(e1,x5,x11) &
from_IN(e1,x1) & other_than_JJ_1(x6) & his_PRP_(x6,x4) &
own_JJ_1(x6) & wealth_NN_1(x6)
Justifying the answer
Answer:
... Bin Laden reportedly sent representatives to Afghanistan
opium farmers to buy large amounts of opium , probably to raise
funds for al - Qaida ....
Answer Logic Form:
… Bin_NN(x14) & Laden_NN(x15) & nn_NNC(x16,x14,x15) &
reportedly_RB_1(e2) & send_VB_1(e2,x16,x17) &
representative_NN_1(x17) & to_TO(e2,x21) & Afghanistan_NN_1(x18)
& opium_NN_1(x19) & farmer_NN_1(x20) & nn_NNC(x21,x19,x20) &
buy_VB_5(e3,x17,x22) & large_JJ_1(x22) & amount_NN_1(x22) &
of_IN(x22,x23) & opium_NN_1(x23) & probably_RB_1(e4) &
raise_VB_1(e4,x22,x24) & funds_NN_2(x24) & for_IN(x24,x26) &
al_NN_1(x25) & Qaida_NN(x26) ...
Lexical Chains
Lexical Chains
Lexical chains provide an improved source of world knowledge by
supplying the Logic Prover with much needed axioms to link question
keywords with answer concepts.
Question:
How were biological agents acquired by bin Laden?
Answer:
On 8 July 1998 , the Italian newspaper Corriere della Serra indicated that
members of The World Front for Fighting Jews and Crusaders , which was
founded by Bin Laden , purchased three chemical and biological_agent
production facilities in
Lexical Chain:
( v - buy#1, purchase#1 ) HYPERNYM ( v - get#1, acquire#1 )
Axiom selection
XWN Axioms
Another source of world knowledge is a general purpose
knowledge base of more than 50,000 parsed and
disambiguated glosses that are transformed into logic form for
use during the course of a proof.
Gloss:
Kill is to cause to die
GLF:
kill_VB_1(e1,x1,x2) -> cause_VB_1(e1,x1,x3) & to_TO(e1,e2) &
die_VB_1(e2,x2,x4)
Logic Prover
Axiom Selection
Lexical chains and the XWN knowledge base work together to select and generate the axioms
needed for a successful proof when all the keywords in the questions are not found in the
answer.
Question:
How did Adolf Hitler die?
Answer:
… Adolf Hitler committed suicide …
The following Lexical Chain is detected:
( n - suicide#1, self-destruction#1, self-annihilation#1 ) GLOSS ( v - kill#1 ) GLOSS ( v die#1, decease#1, perish#1, go#17, exit#3, pass_away#1, expire#2, pass#25 ) 2
The following axioms are loaded into the Usable List of the Prover:
exists x2 all e1 x1 (suicide_nn(x1) -> act_nn(x1) & of_in(x1,e1) & kill_vb(e1,x2,x2)).
exists x3 x4 all e2 x1 x2 (kill_vb(e2,x1,x2) -> cause_vb_2(e1,x1,x3) & to_to(e1,e2) &
die_vb(e2,x2,x4)).
Outline

Part III. Knowledge representation and
inference






Representing the semantics of answers
Extended WordNet and abductive inference
Intentional Structure and Probabilistic
Metonymy
An example of Event Structure
Modeling relations, uncertainty and dynamics
Inference methods and their mapping to
answer types
Intentional Structure of
Questions
Example: Does Iraq
have
biological weapons
x
Predicate-argument
y
have/possess (Iraq, biological weapons)
structure
Arg-0
Question Pattern
Intentional Structure
Evidence
Means of Finding
Source
Consequence
??? Coercion
??? Coercion
??? Coercion
??? Coercion
Arg-1
possess (x,y)
?
Coercion of Pragmatic
Knowledge
0*Evidence (1-possess (2-Iraq, 3-biological weapons)
A form of logical metonymy
Lapata and Lascarides (Computational Linguistics,2003)
allows coercion of interpretations by collecting possible
meanings from large corpora.
Examples: Mary finished the cigarette
 Mary finished smoking the cigarette.
Arabic is a difficult language
 Arabic is a language that is difficult to learn
 Arabic is a language that is difficult to process automatically
The Idea
Logic metonymy is in part processed as verbal
metonymy. We model, after Lapata and
Lascarides, the interpretation of verbal
metonymy as:
p(e, o, v)
where:
v—the metonymic verb (enjoy)
o—its object (the cigarette)
e—the sought-after interpretation
(smoking)
A probabilistic model
By choosing the ordering
may be factored as:
e, v , o
, the probability
P(e, o, v)  P(e)  P(v | e)  P(o | e, v)
where we make the estimations:
ˆ ( e) 
P
f ( e)
ˆ ( v | e) 
; P
N
f ( v, e )
f ( e)
ˆ (o | e, v )  f (o, e, v )  P
ˆ (o | e)  f (o, e)
P
f (e, v )
f (e)
f (v, e)  f (o, e)
 P (e, o, v ) 
N  f ( e)
This is a model of interpretation and coercion
Coercions for intentional
structures
0*Evidence (1-possess (2-Iraq, 3-biological weaponry)
P(e  0,1, v)
1.
2.
3.
4.
v=discover (1,2,3)
V=stockpile (2,3)
V=use (2,3)
V=0 (1,2,3)
1.
2.
P(e,1,3)
e=develop (,3)
e=acquire ( ,3)
P(e,2,3)
1.
2.
e=inspections ( ,2, 3)
e=ban ( , 2, from 3)
P(e,3, topic)
Topic
Coercion
Outline

Part III. Knowledge representation and
inference






Representing the semantics of answers
Extended WordNet and abductive inference
Intentional Structure and Probabilistic
Metonymy
An example of Event Structure
Modeling relations, uncertainty and dynamics
Inference methods and their mapping to
answer types
ANSWER: Evidence-Combined:
Pointer to Text Source:
Answer Structure
A1: In recent months, Milton Leitenberg, an expert on biological weapons, has been looking
at this murkiest and most dangerous corner of Saddam Hussein's armory.
A2: He says a series of reports add up to indications that Iraq may be trying to develop a new
viral agent, possibly in underground laboratories at a military complex near Baghdad where
Iraqis first chased away inspectors six years ago.
A3: A new assessment by the United Nations suggests Iraq still has chemical and
biological weapons - as well as the rockets to deliver them to targets in other countries.
A4:The UN document says Iraq may have hidden a number of Scud missiles, as well as
launchers and stocks of fuel.
A5: US intelligence believes Iraq still has stockpiles of chemical and biological weapons and
guided missiles, which it hid from the UN inspectors
Content: Biological Weapons Program:
develop(Iraq, Viral_Agent(instance_of:new))
Justification: POSSESSION Schema
Previous (Intent and Ability): Prevent(ability, Inspection); Inspection terminated;
Status: Attempt ongoing Likelihood: Medium Confirmability: difficult, obtuse, hidden
possess(Iraq, Chemical and Biological Weapons)
Justification: POSSESSION Schema Previous (Intent and Ability): Prevent(ability, Inspection);
Status: Hidden from Inspectors Likelihood: Medium
possess(Iraq, delivery systems(type : rockets; target: other countries))
Justification: POSSESSION Schema Previous (Intent and Ability): Hidden from Inspectors;
Status: Ongoing Likelihood: Medium
ANSWER: Evidence-Combined:
Pointer to Text Source:
Answer Structure
A1: In recent months, Milton Leitenberg, an expert on biological weapons, has been looking
at this murkiest and most dangerous corner of Saddam Hussein's armory.
A2: He says a series of reports add up to indications that Iraq may be trying to develop a new
viral agent, possibly in underground laboratories at a military complex near Baghdad where
Iraqis first chased away inspectors six years ago.
A3: A new assessment by the United Nations suggests Iraq still has chemical and
biological weapons - as well as the rockets to deliver them to targets in other countries.
A4:The UN document says Iraq may have hidden a number of Scud missiles, as well as
launchers and stocks of fuel.
A5: US intelligence believes Iraq still has stockpiles of chemical and biological weapons and
guided missiles, which it hid from the UN inspectors
Content: Biological Weapons Program:
develop(Iraq, Viral_Agent(instance_of:new))
Justification: POSSESSION Schema
Previous (Intent and Ability): Prevent(ability, Inspection); Inspection terminated;
Status: Attempt ongoing Likelihood: Medium Confirmability: difficult, obtuse, hidden
possess(Iraq, Chemical and Biological Weapons)
Justification: POSSESSION Schema Previous (Intent and Ability): Prevent(ability, Inspection);
Status: Hidden from Inspectors Likelihood: Medium
possess(Iraq, delivery systems(type : rockets; target: other countries))
Justification: POSSESSION Schema Previous (Intent and Ability): Hidden from Inspectors;
Status: Ongoing Likelihood: Medium
ANSWER: Evidence-Combined:
Pointer to Text Source:
Answer Structure
A1: In recent months, Milton Leitenberg, an expert on biological weapons, has been looking
at this murkiest and most dangerous corner of Saddam Hussein's armory.
A2: He says a series of reports add up to indications that Iraq may be trying to develop a new
viral agent, possibly in underground laboratories at a military complex near Baghdad where
Iraqis first chased away inspectors six years ago.
A3: A new assessment by the United Nations suggests Iraq still has chemical and
biological weapons - as well as the rockets to deliver them to targets in other countries.
A4:The UN document says Iraq may have hidden a number of Scud missiles, as well as
launchers and stocks of fuel.
Temporal Reference/Grounding
A5: US intelligence believes Iraq still has stockpiles of chemical and biological weapons and
guided missiles, which it hid from the UN inspectors
Content: Biological Weapons Program:
develop(Iraq, Viral_Agent(instance_of:new))
Justification: POSSESSION Schema
Previous (Intent and Ability): Prevent(ability, Inspection); Inspection terminated;
Status: Attempt ongoing Likelihood: Medium Confirmability: difficult, obtuse, hidden
possess(Iraq, Chemical and Biological Weapons)
Justification: POSSESSION Schema Previous (Intent and Ability): Prevent(ability, Inspection);
Status: Hidden from Inspectors Likelihood: Medium
possess(Iraq, delivery systems(type : rockets; target: other countries))
Justification: POSSESSION Schema Previous (Intent and Ability): Hidden from Inspectors;
Status: Ongoing Likelihood: Medium
Answer Structure (continued)
ANSWER: Evidence-Combined:
Pointer to Text Source:
A1: In recent months, Milton Leitenberg, an expert on biological weapons, has
been
looking
at this murkiest and most dangerous corner of Saddam Hussein's armory.
A2: He says a series of reports add up to indications that Iraq may be trying to develop a new
viral agent, possibly in underground laboratories at a military complex near Baghdad where
Iraqis first chased away inspectors six years ago.
Present Progressive Perfect
A3: A new assessment by the United Nations suggests Iraq still has chemical and
biological weapons - as well as the rockets to deliver them to targets in other countries.
A4:The UN document says Iraq may have hidden a number of Scud missiles, as well as
launchers and stocks of fuel.
A5: US intelligence believes Iraq still has stockpiles of chemical and biological weapons and
guided missiles, which it hid from the UN inspectors
Content: Biological Weapons Program:
possess(Iraq, delivery systems(type : scud missiles; launchers; target: other countries))
Justification: POSSESSION Schema Previous (Intent and Ability): Hidden from Inspectors;
Status: Ongoing Likelihood: Medium
Present Progressive Continuing
possess(Iraq, fuel stock(purpose: power launchers))
Justification: POSSESSION Schema Previous (Intent and Ability): Hidden from Inspectors;
Status: Ongoing Likelihood: Medium
hide(Iraq, Seeker: UN Inspectors; Hidden: CBW stockpiles & guided missiles)
Justification: DETECTION Schema Inspection status: Past; Likelihood: Medium
ANSWER: Evidence-Combined:
Pointer to Text Source:
Answer Structure
A1: In recent months, Milton Leitenberg, an expert on biological weapons, has been looking
at this murkiest and most dangerous corner of Saddam Hussein's armory.
A2: He says a series of reports add up to indications that Iraq may be trying to develop a new
viral agent, possibly in underground laboratories at a military complex near Baghdad where
Iraqis first chased away inspectors six years ago.
A3: A new assessment by the United Nations suggests Iraq still has chemical and
biological weapons - as well as the rockets to deliver them to targets in other countries.
A4:The UN document says Iraq may have hidden a number of Scud missiles, as well as
launchers and stocks of fuel.
Uncertainty and Belief
A5: US intelligence believes Iraq still has stockpiles of chemical and biological weapons and
guided missiles, which it hid from the UN inspectors
Content: Biological Weapons Program:
develop(Iraq, Viral_Agent(instance_of:new))
Justification: POSSESSION Schema
Previous (Intent and Ability): Prevent(ability, Inspection); Inspection terminated;
Status: Attempt ongoing Likelihood: Medium Confirmability: difficult, obtuse, hidden
possess(Iraq, Chemical and Biological Weapons)
Justification: POSSESSION Schema Previous (Intent and Ability): Prevent(ability, Inspection);
Status: Hidden from Inspectors Likelihood: Medium
possess(Iraq, delivery systems(type : rockets; target: other countries))
Justification: POSSESSION Schema Previous (Intent and Ability): Hidden from Inspectors;
Status: Ongoing Likelihood: Medium
ANSWER: Evidence-Combined:
Pointer to Text Source:
Answer Structure
A1: In recent months, Milton Leitenberg, an expert on biological weapons, has been looking
at this murkiest and most dangerous corner of Saddam Hussein's armory.
A2: He says a series of reports add up to indications that Iraq may be trying to develop a new
viral agent, possibly in underground laboratories at a military complex near Baghdad where
Iraqis first chased away inspectors six years ago.
A3: A new assessment by the United Nations suggests Iraq still has chemical and
biological weapons - as well as the rockets to deliver them to targets in other countries.
A4:The UN document says Iraq may have hidden a number of Scud missiles, as well as
launchers and stocks of fuel.
Uncertainty and Belief
A5: US intelligence believes Iraq still has stockpiles of chemical and biological weapons and
guided missiles, which it hid from the UN inspectors
Mutliple Sources with reliability
Content: Biological Weapons Program:
develop(Iraq, Viral_Agent(instance_of:new))
Justification: POSSESSION Schema
Previous (Intent and Ability): Prevent(ability, Inspection); Inspection terminated;
Status: Attempt ongoing Likelihood: Medium Confirmability: difficult, obtuse, hidden
possess(Iraq, Chemical and Biological Weapons)
Justification: POSSESSION Schema Previous (Intent and Ability): Prevent(ability, Inspection);
Status: Hidden from Inspectors Likelihood: Medium
possess(Iraq, delivery systems(type : rockets; target: other countries))
Justification: POSSESSION Schema Previous (Intent and Ability): Hidden from Inspectors;
Status: Ongoing Likelihood: Medium
ANSWER: Evidence-Combined:
Pointer to Text Source:
A1: In recent months, Milton Leitenberg, an expert on biological weapons, has been looking
Answer Structure
at this murkiest
and most dangerous corner of Saddam Hussein's armory.
A2: He says a series of reports add up to indications that Iraq may be trying to develop a new
viral agent, possibly in underground laboratories at a military complex near Baghdad where
Iraqis first chased away inspectors six years ago.
A3: A new assessment by the United Nations suggests Iraq still has chemical and
biological weapons - as well as the rockets to deliver them to targets in other countries.
Event Structure Metaphor
A4:The UN document says Iraq may have hidden a number of Scud missiles, as well as
launchers and stocks of fuel.
A5: US intelligence believes Iraq still has stockpiles of chemical and biological weapons and
guided missiles, which it hid from the UN inspectors
Content: Biological Weapons Program:
develop(Iraq, Viral_Agent(instance_of:new))
Justification: POSSESSION Schema
Previous (Intent and Ability): Prevent(ability, Inspection); Inspection terminated;
Status: Attempt ongoing Likelihood: Medium Confirmability:
difficult, obtuse, hidden
possess(Iraq, Chemical and Biological Weapons)
Justification: POSSESSION Schema Previous (Intent and Ability): Prevent(ability, Inspection);
Status: Hidden from Inspectors Likelihood: Medium
possess(Iraq, delivery systems(type : rockets; target: other countries))
Justification: POSSESSION Schema Previous (Intent and Ability): Hidden from Inspectors;
Status: Ongoing Likelihood: Medium
Event Structure for
semantically based QA

Reasoning about dynamics

Complex event structure
• Multiple stages, interruptions, resources, framing

Evolving events
• Conditional events, presuppositions.

Nested temporal and aspectual references
• Past, future event references

Metaphoric references
• Use of motion domain to describe complex events.

Reasoning with Uncertainty


Combining Evidence from Multiple, unreliable sources
Non-monotonic inference
• Retracting previous assertions
• Conditioning on partial evidence
Relevant Previous Work
Event Structure
Aspect (VDT, TimeML), Situation Calculus
(Steedman), Frame Semantics (Fillmore),
Cognitive Linguistics (Langacker, Talmy),
Metaphor and Aspect (Narayanan)
Reasoning about Uncertainty
Bayes Nets (Pearl), Probabilistic Relational Models
(Koller), Graphical Models (Jordan)
Reasoning about Dynamics
Dynamic Bayes Nets (Murphy), Distributed
Systems (Alur, Meseguer), Control Theory
(Ramadge and Wonham), Causality (Pearl)
Outline

Part III. Knowledge representation and
inference






Representing the semantics of answers
Extended WordNet and abductive inference
Intentional Structure and Probabilistic
Metonymy
An example of Event Structure
Modeling relations, uncertainty and dynamics
Inference methods and their mapping to
answer types
Structured
Probabilistic Inference
Probabilistic inference

Filtering
• P(X_t | o_1…t,X_1…t)
• Update the state based on the observation sequence
and state set

MAP Estimation
• Argmaxh1…hnP(X_t | o_1…t, X_1…t)
• Return the best assignment of values to the hypothesis
variables given the observation and states

Smoothing
• P(X_t-k | o_1…t, X_1…t)
• modify assumptions about previous states, given
observation sequence and state set

Projection/Prediction/Reachability
• P(X_t+k | o_1..t, X_1..t)
Answer Type to Inference
Method
ANSWER TYPE
INFERENCE DESCRIPTION
Justify (Proposition)
MAP
Proposition is part
of the MAP
Ability (Agent, Act)
Filtering;
Smoothing
Past/Current Action
enabled given
current state
Prediction (State)
P;R’ MAP
Propogate current
information and
estimate best new
state
Hypothetical (Condition)
S, R_I
Smooth intervene
and compute state
Outline

Part IV. From Ontologies to Inference



From OWL to CPRM
FrameNet in OWL
FrameNet to CPRM mapping
Semantic Web





The World Wide Web (WWW) contains a large
and expanding information base.
HTML is accessible to humans but does not
formally describe data in a machine
interpretable form.
XML remedies this by allowing for the use of
tags to describe data (ex. disambiguating
crawl)
Ontologies are useful to describe objects and
their inter-relationships.
DAML+OIL (http://www.daml.org) is an
markup language based on XML and RDF that
is grounded in description logic and is
designed to allow for ontology development,
transfer, and use on the web.
Programmatic Access to the web
Web-accessible programs and devices
Knowledge Rep’n for the “Semantic
Web”
OWL/DAML-L (Logic)
OWL (Ontology)
RDFS (RDF Schema)
XML Schema
RDF (Resource Description Framework)
XML (Extensible Markup Language)
Knowledge Rep’n for “Semantic Web
Services”
DAML-S (Services)
DAML-L (Logic)
DAML+OIL (Ontology)
RDFS (RDF Schema)
XML Schema
RDF (Resource Description Framework)
XML (Extensible Markup Language)
DAML-S: Semantic Markup for Web
Services
DAML-S: A DARPA Agent Markup Language for Services
• DAML+OIL ontology for Web services:
• well-defined semantics
• ontologies support reuse, mapping, succinct markup, ...
• Developed by a coalition of researchers from Stanford, SRI,
CMU, BBN, and Nokia, Yale, under the auspices of DARPA.
• DAML-S version 0.6 posted October,2001
http://www.daml.org/services/daml-s
[DAML-S Coalition, 2001, 2002]
[Narayanan & McIlraith 2003]
DAML-S/OWL-S Compositional
Primitives
process
atomic
process
inputs
(conditional) outputs
preconditions
(conditional) effects
composite
process
composedBy
control
constructs
sequence
If-then-else
fork
while
...
The OWL-S Process
Description
PROCESS.OWL
Implementation
DAML-S translation to the modeling environment KarmaSIM
[Narayanan, 97]
(http://www.icsi.berkeley.edu/~snarayan)
Basic Program:
Input: DAML-S description of Events
Output: Network Description of Events in KarmaSIM
Procedure:
• Recursively construct a sub-network for each control
construct. Bottom out at atomic event.
• Construct a net for each atomic event
• Return network
Example of A WMD
Ontology in OWL
<rdfs:Class
rdf:ID="DevelopingWeaponOfMassDestruction">
<rdfs:subClassOf rdf:resource=
SUMO.owl#Making"/>
<rdfs:comment>
Making instances of WeaponOfMassDestruction.
</rdfs:comment>
</rdfs:Class>
http://reliant.teknowledge.com/DAML/SUMO.owl
Outline

Part IV. From Ontologies to Inference



From OWL to CPRM
FrameNet in OWL
FrameNet to CPRM mapping
The FrameNet Project
C Fillmore PI (ICSI)
Co-PI’s:
S Narayanan (ICSI, SRI)
D Jurafsky (U Colorado)
J M Gawron (San Diego State U)
Staff:
C Baker Project Manager
B Cronin Programmer
C Wooters Database Designer
Frames and Understanding

Hypothesis: People understand
things by performing mental
operations on what they already
know. Such knowledge is
describable in terms of information
packets called frames.
FrameNet in the Larger
Context




The long-term goal is to reason about the
world in a way that humans understand
and agree with.
Such a system requires a knowledge
representation that includes the level of
frames.
FrameNet can provide such knowledge for
a number of domains.
FrameNet representations complement
ontologies and lexicons.
The core work of FrameNet
1.
2.
3.
4.
5.
6.
characterize frames
find words that fit the frames
develop descriptive terminology
extract sample sentences
annotate selected examples
derive "valence" descriptions
The Core Data
The basic data on which FrameNet
descriptions are based take the form
of a collection of annotated
sentences, each coded for the
combinatorial properties of one word
in it. The annotation is done
manually, but several steps are
computer-assisted.
Types of Words / Frames
o
o
o
o
o
o
o
events
artifacts, built objects
natural kinds, parts and aggregates
terrain features
institutions, belief systems, practices
space, time, location, motion
etc.
Event Frames
Event frames have temporal
structure, and generally have
constraints on what precedes them,
what happens during them, and
what state the world is in once the
event has been completed.
Sample Event Frame:
Commercial Transaction
Initial state:
Vendor has Goods, wants Money
Customer wants Goods, has Money
Transition:
Vendor transmits Goods to Customer
Customer transmits Money to Vendor
Final state:
Vendor has Money
Customer has Goods
Sample Event Frame:
Commercial Transaction
Initial state:
Vendor has Goods, wants Money
Customer wants Goods, has Money
Transition:
Vendor transmits Goods to Customer
Customer transmits Money to Vendor
Final state:
Vendor has Money
Customer has Goods
(It’s a bit more complicated than that.)
Partial Wordlist for
Commercial Transactions
Verbs: pay, spend, cost, buy, sell,
charge
Nouns:
cost, price, payment
Adjectives: expensive, cheap
Meaning and Syntax

The various verbs that evoke this
frame introduce the elements of the
frame in different ways.


The identities of the buyer, seller, goods
and money
Information expressed in sentences
containing these verbs occurs in
different places in the sentence
depending on the verb.
She bought some carrots from the greengrocer for a dollar.
Customer
Vendor
from
BUY
Goods
for
Money
She paid a dollar to the greengrocer for some carrots.
Customer
to
Vendor
PAY
Goods
for
Money
She paid the greengrocer a dollar for the carrots.
Customer
Vendor
PAY
Goods
for
Money
FrameNet Product
For every target word,
 describe the frames or conceptual
structures which underlie them,
 and annotate example sentences
that cover the ways in which
information from the associated
frames are expressed in these
sentences.

Complex Frames

With Criminal_process we have, for
example,
sub-frame relations (one frame is a
component of a larger more abstract
frame) and
 temporal relations (one process
precedes another)

FrameNet Entities and
Relations

Frames




Frame Elements (Roles)
Binding Constraints




Identify
ISA(x:Frame, y:Frame)
SubframeOf (x:Frame, y:Frame)
Subframe Ordering


Background
Lexical
precedes
Annotation
A DAML+OIL Frame Class
<daml:Class rdf:ID="Frame">
<rdfs:comment> The most general class
</rdfs:comment>
<daml:unionOf rdf:parseType="daml:collection">
<daml:Class rdf:about="#BackgroundFrame"/>
<daml:Class rdf:about="#LexicalFrame"/>
</daml:unionOf>
</daml:Class>
<daml:ObjectProperty rdf:ID="Name">
<rdfs:domain rdf:resource="#Frame"/>
<rdfs:range rdf:resource="&rdf-schema;#Literal"/>
</daml:ObjectProperty>
DAML+OIL Frame Element
<daml:ObjectProperty rdf:ID= "role">
<rdfs:domain rdf:resource="#Frame"/>
<rdfs:range rdf:resource="&daml;#Thing"/>
</daml:ObjectProperty>
</daml:ObjectProperty>
<daml:ObjectProperty rdf:ID="frameElement">
<daml:samePropertyAs rdf:resource="#role"/>
</daml:ObjectProperty>
<daml:ObjectProperty rdf:ID="FE">
<daml:samePropertyAs rdf:resource="#role"/>
</daml:ObjectProperty>
FE Binding Relation
<daml:ObjectProperty rdf:ID="bindingRelation">
<rdf:comment> See http://www.daml.org/services
</rdf:comment>
<rdfs:domain rdf:resource="#Role"/>
<rdfs:range rdf:resource="#Role"/>
</daml:ObjectProperty>
<daml:ObjectProperty rdf:ID="identify">
<rdfs:subPropertyOf
rdf:resource="#bindingRelation"/>
<rdfs:domain rdf:resource="#Role"/>
<daml-s:sameValuesAs rdf:resource="#rdfs:range"/>
</daml:ObjectProperty>
Subframes and Ordering
<daml:ObjectProperty rdf:ID="subFrameOf">
<rdfs:domain rdf:resource="#Frame"/>
<rdfs:range rdf:resource="#Frame"/>
</daml:ObjectProperty>
<daml:ObjectProperty rdf:ID="precedes">
<rdfs:domain rdf:resource="#Frame"/>
<rdfs:range rdf:resource="#Frame"/>
</daml:ObjectProperty>
The Criminal Process
Frame
Frame Element
Description
Court
The court where the
process takes place
The charged
individual
The presiding Judge
Defendant
Judge
Prosecution
Defense
FE indentifies the
attorneys’ prosecuting
the defendant
Attorneys’ defending
the defendant
The Criminal Process Frame in
DAML+OIL
<daml:Class rdf:ID="CriminalProcess">
<daml:subClassOf rdf:resource="#BackgroundFrame"/>
</daml:Class>
<daml:Class rdf:ID="CP">
<daml:sameClassAs rdf:resource="#CriminalProcess"/>
</daml:Class>
DAML+OIL Representation of the
Criminal Process Frame Elements
<daml:ObjectProperty rdf:ID="court">
<daml:subPropertyOf rdf:resource="#FE"/>
<daml:domain rdf:resource="#CriminalProcess"/>
<daml:range rdf:resource="&CYC;#Court-Judicial"/>
</daml:ObjectProperty>
<daml:ObjectProperty rdf:ID="defense">
<daml:subPropertyOf rdf:resource="#FE"/>
<daml:domain rdf:resource="#CriminalProcess"/>
<daml:range rdf:resource="&SRI-IE;#Lawyer"/>
</daml:ObjectProperty>
FE Binding Constraints
<daml:ObjectProperty rdf:ID="prosecutionConstraint">
<daml:subPropertyOf rdf:resource="#identify"/>
<daml:domain rdf:resource="#CP.prosecution"/>
<daml-s:sameValuesAs rdf:resource="#Trial.prosecution"/>
</daml:ObjectProperty>
• The idenfication contraints can be between
• Frames and Subframe FE’s.
• Between Subframe FE’s
• DAML does not support the dot notation for paths.
Criminal Process Subframes
<daml:Class rdf:ID="Arrest">
<rdfs:comment> A subframe </rdfs:comment>
<rdfs:subClassOf rdf:resource="#LexicalFrame"/>
</daml:Class>
<daml:Class rdf:ID="Arraignment">
<rdfs:comment> A subframe </rdfs:comment>
<rdfs:subClassOf rdf:resource="#LexicalFrame"/>
</daml:Class>
<daml:ObjectProperty rdf:ID="arraignSubFrame">
<rdfs:subPropertyOf rdf:resource="#subFrameOf"/>
<rdfs:domain rdf:resource="#CP"/>
<rdfs:range rdf:resource="#Arraignment"/>
</daml:ObjectProperty>
Specifying Subframe
Ordering
<daml:Class rdf:about="#Arrest">
<daml:subClassOf>
<daml:Restriction>
<daml:onPropertyrdf:resource="#precedes"/>
<daml:hasClass rdf:resource="#Arraignment"/>
</daml:Restriction>
</daml:subClassOf>
</daml:Class>
DAML+OIL CP Annotations
<fn:Annotation>
<tpos> "36352897" </tpos>
<frame rdf:about ="&fn;Arrest">
<time> In July last year </time>
<authorities> a German border guard
</authorities>
<target> apprehended </target>
<suspect>
two Irishmen with Kalashnikov assault rifles.
</suspect>
</frame>
</fn:Annotation>
Outline

Part IV. From Ontologies to Inference



From OWL to CPRM
FrameNet in OWL
FrameNet to CPRM mapping
Representing Event Frames

At the computational level, we use a
structured event representation of
event frames that formally specify
The frame
 Frame Elements and filler types
 Constraints and role bindings
 Frame-to-Frame relations

• Subcase
• Subevent
Events and actions
schema Event
roles
before : Phase
transition : Phase
after : Phase
nucleus
constraints
transition :: nucleus
schema Action
evokes Event as e
roles
actor : Entity
undergoer : Entity
self  e.nucleus
before
transition
after
nucleus
actor
undergoer
The CommercialTransaction schema
schema Commercial-Transaction
subcase of Exchange
roles
customer participant1
vendor participant2
money entity1 : Money
goods entity2
goods-transfer transfer1
money-transfer transfer2
Implementation
DAML-S translation to the modeling environment KarmaSIM
[Narayanan, 97]
(http://www.icsi.berkeley.edu/~snarayan)
Basic Program:
Input: DAML-S description of Frame relations
Output: Network Description of Frames in KarmaSIM
Procedure:
• Recursively construct a sub-network for each control
construct. Bottom out at atomic frame.
• Construct a net for each atomic frame
• Return network
Outline

Part V. Results of Event Structure
Inference for QA



AnswerBank
Current results for Inference Type
Current results for Answer Structure
AnswerBank




AnswerBank is a collection of over a 1200 QA
annotations from the AQUAINT CNS corpus.
Questions and answers cover the different domains
of the CNS data.
Questions and answers are POS tagged, and
syntactically parsed.
Question and Answer predicates are annotated with
PropBank arguments and FrameNet (when
available) tags.


FrameNet is annotating CNS data with frame information
for use by the AQUAINT QA community.
We are planning to add more semantic information
including temporal, aspectual information
(TIMEML+) and information about event relations
and figurative uses.
Retrieved
Documents
Predicate Extraction
<Pred(args), Topic Model, Answer Type>
Model Parameterization
FrameNet
Frames
<Simulation Triggering >
C
O
N
T
E
X
T
PRM
OWL/OWL-S
Topic
Ontologies
< PRM Update>
Event Simulation
Answer Types for complex
questions in AnswerBank
ANSWER TYPE
EXAMPLE
NUMBER
Justify (Proposition)
What is the evidence
that IRAQ has WMD?
89
Ability (Agent, Act)
How can a Biological
Weapons Program be
detected?
71
Prediction (State)
What were the possible 63
ramifications of India’s
launch of the Prithvi
missile?
Hypothetical (Condition)
If Musharraf is
removed from power,
will Pakistan become a
militant Islamic State?
62
Answer Type to Inference
Method
ANSWER TYPE
INFERENCE DESCRIPTION
Justify (Proposition)
MAP
Proposition is part
of the MAP
Ability (Agent, Act)
Filtering;
Smoothing
Past/Current Action
enabled given
current state
Prediction (State)
P;R’ MAP
Propogate current
information and
estimate best new
state
Hypothetical (Condition)
S, R_I
Smooth intervene
and compute state
Outline

Part V. Results of Event Structure
Inference for QA



AnswerBank
Current results for Inference Type
Current results for Answer Structure
AnswerBank Data

We used 80 QA annotations from AnswerBank

Questions were of the four complex types
• Justification, Ability, Prediction, Hypothetical


Answers were combined from multiple sentences
(Average 4.3) and multiple annotations (average 2.1)
CNS Domains Covered were



WMD related (54%)
Nuclear Theft (25%)
India’s missile program (21%)
Building Models

Gold Standard:


From the hand-annotated data in the
CNS corpus, we manually built CPRM
domain models for inference.
Semantic Web based:

From FrameNet frames and from
semantic web ontologies in OWL
(SUMO-based, OpenCYC and others),
we built CPRM models (semi-automatic)
Percent correct by inference type
90
% correct (compared to gold standard
87
85
83
Manually
generated from CNS
data
83
80
75
73
70
66
66
65
OWL-based
Domain Model
63
60
55
51
50
Justification
Prediction
OWL-based Domain Model
Ability
Hypothetical
Manually generated from CNS data
Event Structure Inferences

For the annotations we classified complex
event structure inferences as

Aspectual
• Stages of events, viewpoints, temporal relations
(such as start(ev1, ev2), interrupt(ev1, ev2))

Action-Based
• Resources (produce,consume,lock), preconditions,
maintenance conditions, effects.

Metaphoric
• Event Structure Metaphor (ESM)
Events and predications (motion => Action),
objects (Motion.Mover => Action.Actor),
Parameters(Motion.speed =>Action.rateOfProgress)
ANSWER: Evidence-Combined:
Pointer to Text Source:
Answer Structure
A1: In recent months, Milton Leitenberg, an expert on biological weapons, has been looking
at this murkiest and most dangerous corner of Saddam Hussein's armory.
A2: He says a series of reports add up to indications that Iraq may be trying to develop a new
viral agent, possibly in underground laboratories at a military complex near Baghdad where
Iraqis first chased away inspectors six years ago.
A3: A new assessment by the United Nations suggests Iraq still has chemical and
biological weapons - as well as the rockets to deliver them to targets in other countries.
A4:The UN document says Iraq may have hidden a number of Scud missiles, as well as
launchers and stocks of fuel.
A5: US intelligence believes Iraq still has stockpiles of chemical and biological weapons and
guided missiles, which it hid from the UN inspectors
Content: Biological Weapons Program:
develop(Iraq, Viral_Agent(instance_of:new))
Justification: POSSESSION Schema
Previous (Intent and Ability): Prevent(ability, Inspection); Inspection terminated;
Status: Attempt ongoing Likelihood: Medium Confirmability: difficult, obtuse, hidden
possess(Iraq, Chemical and Biological Weapons)
Justification: POSSESSION Schema Previous (Intent and Ability): Prevent(ability, Inspection);
Status: Hidden from Inspectors Likelihood: Medium
possess(Iraq, delivery systems(type : rockets; target: other countries))
Justification: POSSESSION Schema Previous (Intent and Ability): Hidden from Inspectors;
Status: Ongoing Likelihood: Medium
Content of Inferences
Component Number F-Score
Manual
Aspectual
375
.74
F-Score
OWL
.65
ActionFeature
459
.62
.45
Metaphor
149
.70
.62
Conclusion

Answering complex questions requires semantic
representations at multiple levels.






All these representations should be capable of supporting
inference about relational structures, uncertain
information, and dynamic context.
Both Semantic Extraction techniques and Structured
Probabilistic KR and Inference methods have matured to
the point that we understand the various algorithms and
their properties.
Flexible architectures that



NE and Extraction-based
Predicate Argument Structures
Frame, Topic and Domain Models
embody these KR and inference techniques and
make use of the expanding linguistic and ontological
resources (such as on the Semantic Web)
Point the way to the future of semantically based QA
systems!
References (URL)

Semantic Resources




Probabilistic KR (PRM)



http://robotics.stanford.edu/~koller/papers/lprm.ps (Learning PRM)
http://www.eecs.harvard.edu/~avi/Papers/thesis.ps.gz (Avi Pfeffer’s PRM
Stanford thesis)
Dynamic Bayes Nets


FrameNet: http://www.icsi.berkeley.edu/framenet (Papers on FrameNet and Computational
Modeling efforts using FrameNet can be found here).
PropBank: http://www.cis.upenn.edu/~ace/
Gildea’s Verb Index; http://www.cs.rochester.edu/~gildea/Verbs/ (links FrameNet,
PropBank, and VerbNet
http://www.ai.mit.edu/~murphyk/Thesis/thesis.pdf (Kevin Murphy’s Berkeley
DBN thesis)
Event Structure in Language



http://www.icsi.berkeley.edu/~snarayan/thesis.pdf (Narayanan’s Berkeley PhD
thesis on models of metaphor and aspect)
ftp://ftp.cis.upenn.edu/pub/steedman/temporality/temporality.ps.gz
(Steedman’s article on Temporality with links to previous work on aspect)
http://www.icsi.berkeley.edu/NTL (publications on Cognitive Linguistics and
computational models of cognitive linguistic phenomena can be found here)