ppt - osm.cs.byu.edu

Download Report

Transcript ppt - osm.cs.byu.edu

Using Information
Extraction for Question
Answering
Done by
Rani Qumsiyeh
Problem

More Information added to the web everyday.
Search engines exist but they have a problem

This calls for a different kind of search engine.

History of QA

QA can be dated back to the 1960’s

Two common approaches to design QA:



Information Extraction
Information Retrieval
Two conferences to evaluate QA systems


TREC (Text REtrieval Conference)
MUC (Message Understanding Conference)
Common Issues with QA
systems
Information retrieval deals with
keywords.
 Information extraction learns the
question.
 The question could have multiple
variations which means

Easier for IR but more broad results
 Harder for IE but more EXACT results

Message Understanding
Conference (MUC)

Sponsored by the Defense Advanced Research
Projects Agency (DARPA) 1991-1998.

Developed methods for formal evaluation of IE
systems

In the form of a competition, where the participants
compare their results with each other and against
human annotators‘ key templates.

Short system preparation time to stimulate portability
to new extraction problems. Only 1 month to adapt the
system to the new scenario before the formal run.
Evaluation Metrics

Precision and recall:




Precision: correct answers/answers produced
Recall: correct answers/total possible answers
F-measure

Where is a parameter representing relative
importance of P & R:

E.g., =1, then P&R equal weight, =0, then only P
Current State-of-Art: F=.60 barrier
MUC Extraction Tasks
Named Entity task (NE)
 Template Element task (TE)
 Template Relation task (TR)
 Scenario Template task (ST)
 Coreference task (CO)

Named Entity Task (NE)

Mark into the text each string that
represents, a person, organization, or
location name, or a date or time, or a
currency or percentage figure
Template Element Task (TE)

Extract basic information related to
organization, person, and artifact
entities, drawing evidence from
everywhere in the text.
Template Relation task (TR)

Extract relational information on
employee_of, manufacture_of,
location_of relations etc. (TR
expresses domain independent
relationships between entities
identified by TE)
Scenario Template task (ST)

Extract prespecified event information
and relate the event information to
particular organization, person, or
artifact entities (ST identifies domain
and task specific entities and
relations)
Coreference task (CO)

Capture information on corefering
expressions, i.e. all mentions of a
given entity, including those marked in
NE and TE (Nouns, Noun phrases,
Pronouns)
An Example

The shiny red rocket was fired on Tuesday. It is the
brainchild of Dr. Big Head. Dr. Head is a staff scientist
at We Build Rockets Inc.

NE: entities are rocket, Tuesday, Dr. Head and We
Build Rockets
CO: it refers to the rocket; Dr. Head and Dr. Big Head
are the same
TE: the rocket is shiny red and Head‘s brainchild
TR: Dr. Head works for We Build Rockets Inc.
ST: a rocket launching event occurred with the various
participants.




Scoring templates

Templates are compared on a slot-byslot basis
Correct: response = key
 Partial: response » key
 Incorrect: response != key
 Spurious: key is blank

• overgen=spurious/actual

Missing: response is blank
Maximum Results Reported
KnowitAll, TextRunner,
KnowitNow

Differ in implementation, but do the same
thing.
Using them as QA systems

Able to handle questions that produce
1 relation



Who is the president of the US? “can
handle”
Who was the president of the US in
1998? “fails”
Produces a huge number of facts that
the user still has to go through.
Textract

Aims at solving ambiguity in text by
introducing more named entities.

What is Julian Werver Hill's wife's telephone
number?


equivalent to: What is Polly's telephone
number?
Where is Werver Hill's affiliated company
located?

equivalent to: Where is Microsoft located?
Proposed System

Determine what named entity we are
looking for using Textract.

Use Part of Speech tagging.

Use TextRunner as the basis for search.

Use WordNet to find synonyms.

Use extra entities in text as “constraints”
Example
Example

(WP who) (VBD was) (DT the) (JJ first) (NN
man) (TO to) (VB land) (IN on) (DT the) (NN
moon)

The verb (VB) is treated as the argument.
The noun (NN) is treated as the predicate
We make sure that position is maintained
We keep prepositions if they have two nouns.
(president of the US)
Other non stop words are constraints, i.e.,
“first”




Example
Anaphora Resolution

Use anaphora resolution to determine
that landed is not associated with landed
but wrote instead.
Use Synonyms

We use word net to find possible
synonyms for verbs and nouns to
produce more facts.

We only consider 3 synonyms as it
takes more time the more fact
retrievals we have to do.
Using constraints
Delimitations

Works well with Who, When, Where
questions as named entity is easily
determined.


Works less well with What, How
questions


Achieves about 90% accuracy on all
Achieves about 70% accuracy
Takes about 13 seconds to answer
question.
Future Work

Build an ontology to determine named
entity and parse question (faster)

Handle combinations of questions.

When and where did the holocaust
happen?