Question Answering Based on Semantic Graphs

Download Report

Transcript Question Answering Based on Semantic Graphs

Question Answering Based
on Semantic Graphs
Lorand Dali – [email protected]
Delia Rusu – [email protected]
Blaž Fortuna – [email protected]
Dunja Mladenić – [email protected]
Marko Grobelnik – [email protected]
Overview
Motivation
System Overview
Question Answering
Document Overview
Facts
Semantic Graph
Document Summary
Conclusions
Motivation
Motivation
Triplets
Facts stated in the text
The core of the sentence (subject, verb, object)
System Overview
Question Answering
Extract facts (triplets) from text
Index triplets to enable structured search on them
Analyze questions to obtain the queries for the
triplet search
Retrieve the answer and the document containing it
Browse the document overview
Question Answering
Question Answering
Question types:
Yes/No questions (Do animals eat fruit?),
list questions (What do animals eat?),
reason questions (Why do animals eat fruit?),
quantity questions (How much fruit do animals eat?),
location questions (Where do animals eat?) and
time questions (When do animals eat?).
Document Overview
Analyze the document containing the answer:
Highlight facts described by subject – verb – object triplets
(identified in the Penn Treebank parse tree)
Obtain the document semantic graph
View the automatic document summary
Semantic Graph
Document
Plain text format
Named entity extraction
Co-reference resolution
S – V – O triplet extraction
Triplet enhancement
According to traditional Chinese medical belief,
mental problems, laziness, malaria, epilepsy,
toothache and lack of sexual appetite can be
treated with tiger parts, leading to rampant
poaching of the animal in Asia
Asia , the
World
WorldWide
WideFund
Fund ( WWF
WWF ) said.
Asia - location
World Wide Fund - organization
Co-reference
Semantic Graph
WWF -organization
Document Summary
Feature Extractor
Linear SVM
Ranking
Features:
Linear Model
linguistic
document
The Kerinci conservation
project, an area of around
graph
three million hectares (7.
4 million acres) in west
Sumatra, was being supported by funds from the
World Bank, Subijanto said. [10.0912]
Subijanto, a spokesman for the Forestry Ministry,
said Indonesia was commited to protecting the
tigers, which live within Sumatra's four designated
conservation areas. [9.4155]
Document Summary
There are people wanting tiger products who didn't want them
before, " Ron Lilley,coordinator for species conservation at the WWF
in Jakarta, told Reuters.
Subijanto, a spokesman for the Forestry Ministry, said Indonesia was
commited to protecting the tigers, which live within Sumatra's four
designated conservation areas.
The Kerinci conservation project, an area of around three million
hectares (7. 4 million acres) in west Sumatra, was being supported
by funds from the World Bank, Subijanto said.
Conclusions
Enhanced question answering system
Question answering, where the answer is supported
by documents
Document browsing
Facts
Document semantic graph
Automatic document summary
Conclusions
Future work
System extensions: triplet extraction, named entity
recognition
Expand the search to look for answers in ontologies
Relax the requirement that the questions have a
predefined form
Improve the document overview functionality by
integrating external resources
Thank you!
Questions are
guaranteed in life,
answers aren’t.
Document Summary
Extracted features:
Linguistic Attributes (13) Document Attributes
(11)
Graph Attributes (9)
•Logical form tag
•Treebank tag
•Part of speech tag
•Depth of linguistic node
•8 semantic tags for
named entities
•Authority and Hub
weight, Page Rank
•Node degree
•Size of weakly
connected component
•Size of max length chain
•Frequency of verbs
among edges
•Sentence related: e.g. –
location of sentence
within doc
•Triplet related: e.g.frequency of triplet
element in sentence, in
doc, …
Document Summary
Object - Word
Rank (Information Gain)
Subject - Word
Verb - Word
Location Of Sentence In Document
Similarity With Centroid
Number Of Locations In Sentence
Number Of Named Entities In Sentence
Authority Weight Object
Hub Weight Subject
Size Weakly Conn Comp Object