311 - UMBC ebiquity research group

Download Report

Transcript 311 - UMBC ebiquity research group

Text Understanding Agents
and the Semantic Web
Akshay Java, Tim Finin, Sergei Nirenburg
01/04/2005
Outline
• Motivation: Language Understanding
Agents
• Ontological Semantics
• Bridging the Knowledge Gap
• Preliminary Evaluation
• SemNews: An Application Testbed
• Conclusion
• Q&A
Motivation
• Intelligent agents need knowledge and information.
• Most Web content is NL text.
• SW can benefit NLP tools in their language understanding
tasks
Facts from NL
Text
Images
WWW
Audio
video
Web of documents
NLP Tools
Natural
Language
RDF/OWL
Semantic
Web
Ontologies
Instances
triples
structured
information
Web of data
Motivation
Provides RDF
version of the
news.
Language
Understanding
Agents
Ontological Semantics
OntoSem is a Natural
Language Processing
System that processes
the text and converts
them into facts.
Supported by a
constructed world
model encoded in a
rich Ontology.
Ontological Semantics
Static Knowledge Sources
• Ontology
• 8000 concepts
• Avg 16 properties each
• Lexicons
• English: 45000 entries
• Spanish: 40000 entries
• Chinese: 3000 entries
• Fact repository
• 20000 facts
• Onomasticon
• NNNNN names
PROPERTY
FACET
The OntoSem Ontology
ONTOLOGY ::= CONCEPT+
CONCEPT ::= ROOT | OBJECT-OR-EVENT | PROPERTY
SLOT
::= PROPERTY | FACET | FILLER
FILLER
Text Meaning Representation (TMR)
Word sense
addressed
disambiguated
A persistent fact
stored in the FR
Semantic
dependency
established
Text Meaning Representation (TMR)
REQUEST-ACTION-69
AGENT
HUMAN-72
THEME
ACCEPT-70
BENEFICIARY
ORGANIZATION-71
SOURCE-ROOT-WORD ask
TIME
(< (FIND-ANCHOR-TIME))
ACCEPT-70
THEME
WAR-73
THEME-OF
REQUEST-ACTION-69
SOURCE-ROOT-WORD authorize
He asked the
UN to authorize
the war.
ORGANIZATION-71
HAS-NAME
United-Nations
BENEFICIARY-OF REQUEST-ACTION-69
SOURCE-ROOT-WORD UN
HUMAN-72
HAS-NAME
Colin Powell
AGENT-OF
REQUEST-ACTION-69
SOURCE-ROOT-WORD he
; reference resolution has been carried out
WAR-73
THEME-OF
ACCEPT-70
SOURCE-ROOT-WORD war
Mapping OntoSem to web based KR
Fact
Repository
TMR
NL Text
OntoSem
Lexicon
Ontology
OntoSem2OWL
TMRs
In OWL
OWL
Ontology
Mapping Rules for Classes
OntoSem LISP version
(make-frame patent
(
definition
(value (common "the exclusive right to make, use or sell an invention, which is granted to the inventor")))
(
is-a
(value (common intangible-asset legal-right))))
OWL Version:
<owl:Class rdf:about="&ontosem;patent">
<rdfs:subClassOf>
<owl:Class rdf:about="&ontosem;intangible-asset">
</owl:Class>
</rdfs:subClassOf>
<rdfs:subClassOf>
<owl:Class rdf:about="&ontosem;legal-right">
</owl:Class>
</rdfs:subClassOf>
<rdfs:comment> he exclusive right to make, use or
sell an invention, which is granted to the inventor
</rdfs:label>
</owl:Class>
Mapping Rules for Properties
• Properties can be
• ObjectProperty owl:ObjectProperty
• Datatype Property owl:DatatypeProperty
•
•
•
•
•
Property hierarchy is defined by owl:subPropertyOf
Domain maps to rdfs:domain
Range maps to rdfs:range
Restrictions are handled using owl:Restriction
Numeric datatypes are handled using XSD
Mapping Rules for Properties…
(make-frame controls
(domain
(sem (common physical-event physical-object
social-event social-role)))
(range
(sem (common actualize artifact
natural-object social-role)))
(is-a (value (common relation)))
(inverse (value (common controlled-by)))
(definition
(value (common
"A relation which relates concepts to what they
can control"))))
Mapping Rules for Properties…
<owl:ObjectProperty rdf:ID= "controls">
<rdfs:domain>
<owl:Class>
<owl:unionOf rdf:parseType="Collection">
<owl:Class rdf:about="#physical-event"/>
<owl:Class rdf:about="#physical-object"/>
<owl:Class rdf:about="#social-event"/>
<owl:Class rdf:about="#social-role"/>
</owl:unionOf>
</owl:Class>
</rdfs:domain>
<rdfs:range>
<owl:Class>
<owl:unionOf rdf:parseType="Collection">
<owl:Class rdf:about="#actualize"/>
<owl:Class rdf:about="#artifact"/>
<owl:Class rdf:about="#natural-object"/>
<owl:Class rdf:about="#social-role"/>
</owl:unionOf>
</owl:Class>
</rdfs:range>
<rdfs:subPropertyOf>
<owl:ObjectProperty rdf:about="#relation"/>
</rdfs:subPropertyOf>
<owl:inverseOf rdf:resource="#controlled-by"/>
<rdfs:label> "A relation which relates concepts to what they can control" </rdfs:label>
</owl:ObjectProperty>
(make-frame
(domain
(range
(is-a
(inverse
Mapping Rules for Facets
Facets are a way to restricting the fillers that can be used for a particular slot
• SEM and VALUE
• Maps them using owl:Restriction on a particular property.
• RELAXABLE-TO
• Add this to the classes present in owl:Restriction and add this information in
the annotation.
• DEFAULT
• No clear way to represent non-monotonic reasoning and closed world
assumptions in Semantic Web.
• DEFAULT-MEASURE
• similar to DEFAULT Facet, not handled.
• DEFAULT, DEFAULT-MEASURE used relatively less frequently
• NOT
• Not facet can be handled using owl:disjointOf
• INV
• need not be handled since is-a slot is already mapped to owl:inverseOf
Evaluation
Built Ontology translation tool using
Jena API
Swoop
Total Triples Generated ~ 102189 (including bnode)
Time to build the Model ~ 10-40 sec
Time to do RDFS Inference ~ 10 sec
Pellet
Wonderweb
http://w3c.org/RDF/Validator/
Time to do OWL Micro ~ 40 sec
Time to do OWL Full ~ ????
DL Expressivity: ELUIH
EL - Conjunction and Full Existential Quantification
After Translation
U - Union
H - Role Hierarchy
Total Number of Classes: 7747 (Defined: 7747, Imported: 0)
I - Role Inverse
Total Number of Datatype Properties: 0 (Defined: 0, Imported: 0)
OWL FULL
Total Number of Object Properties: 604 (Defined: 604, Imported: 0)
Total Number of Annotation Properties: 1 (Defined: 1, Imported: 0)
Total Number of Individuals: 0 (Defined: 0, Imported: 0)
NOTE: This is using no Restrictions
Evaluation
• Syntactic Correctness: was checked using OWL/RDF validators.
• Semantic Validation: Full semantic validation even for subsets of
OWL is difficult.
• Meaning Preservation: some subset of the native representation
features such as DEFAULTS, modality, case roles may be
underrepresented or not handled.
• Feature Minimization: Complex features could be difficult for
reasoners to handle hence we can perform the translations at each of the
levels – OWL Lite, OWL DL, OWL Full.
• Translation Complexity: OntoSem is an extensive and large
ontology (~8000 concepts). Translation itself is done syntactically but in
general translation might require reasoning which could be an issue.
An Application Testbed: SemNews
• Semantically Search and Browse news
• Aggregators collect the RSS news descriptions form
various sources.
• The sentences are processed by OntoSem and are
converted into TMRs
• Provides intelligent agents with the latest news in a
machine readable format
• http://semnews.umbc.edu/
http://semnews.umbc.edu
Fact Repository
Interface
Language Processing
Data Aggregators
1
11
2
RSS
Aggregator
Ontology &
Instance browser
OntoSem
3
4
News Feeds
FR
TMRs
Text Search
12
RDQL Query
13
Swoogle Index
14
6
5
OntoSem2OWL
Dekade Editor
OntoSem Ontology
(OWL)
Knowledge
Editor
Environment
9
7
8
Inferred
10
Triples
TMR
Semantic Web Tools
http://semnews.umbc.edu
Semantic RSS
15
Agent understandable news
Provides
RDF version
of the news.
http://semnews.umbc.edu
Semantacizing RSS
View structured
representation of
the RSS news
story.
Future versions
would enable
editing the facts
and provide
provenance
information
http://semnews.umbc.edu
News stories are ontologically linked
Find news
stories by
browsing
through the
OntoSem
ontology.
http://semnews.umbc.edu
Tracking Named Entities
Find stories on
a specific
named entity.
http://semnews.umbc.edu
Browsing Facts
Fact repository explorer for
named entity ‘Mexico’ shows
that it has a relation
‘nationality-of’ with
CITIZEN-235
Fact repository explorer for
instance CITIZEN-235
shows that the citizen is an
agent of ESCAPE-EVENT
http://semnews.umbc.edu
Querying the semanticized RSS
RDQL
Queries
Provides
structured
querying over
text represented in RDF.
http://semnews.umbc.edu
Semantic Alerts
Alerts can be
specified as
ontological
concepts/
keywords /
RDQL queries.
Subscribe to
results of
structured queries
http://semnews.umbc.edu
Beyond keyword search
• Conceptually searching for content
Find all news stories that have something to do with a
place and a terrorist activity.
• Context based querying
Find all events in which ‘George Bush’ was the
‘speaker’.
• Reporting facts
Find all politicians who traveled to Asia.
• Knowledge sharing
Populating instances by mapping FOAF and DC to
OntoSem ontology.
Current work
• Enron email corpus
• Profiles in terror
Conclusions
• Integrating language processing agents into the SW
would publish SW annotations and documents that
capture the text’s meaning.
• Migrating from native non-web based representation
to SW representation may be loss-full but is still
useful for many applications.
• SemNews application testbed demonstrates some
scenarios that can benefit from language
understanding agents.
For More Information
• Semnews application
http://semnews.umbc.edu/
• OntoSem NLP system
http://ilit.umbc.edu/
• UMBC ebiquity research group
http://ebiquity.umbc.edu/
• This presentation
http://ebiquity.umbc.edu/paper/html/id/260/
References
Software Used
[1] OntoSem http://ilit.umbc.edu/
[2] RDF Validation service http://w3c.org/RDF/Validator
[3] Jena Toolkit http://jena.sourceforge.net/
[4] Swoop Ontology Viewer http://www.mindswap.org/2004/SWOOP/
[5] Pellet OWL DL Reasoner http://www.mindswap.org/2003/pellet/
[6] Wonder Web OWL Validator http://phoebus.cs.man.ac.uk:9999/OWL/Validator
Papers
[1] Sergei Nirenburg and Victor Raskin, Ontological Semantics, Formal Ontology and Ambiguity
[2] Sergei Nirenburg and Victor Raskin, Ontological Semantics, MIT Press, Forthcoming
[3] Sergei Nirenburg, Ontological Semantics: Overview, Presentation CLSP JHU, Spring 2003
[4] Marjorie McShane, Sergei Nirenburg, Stephen Beale, Margalit Zabludowski, The Cross Lingual Reuse and Extension of
knowledge Resources in Ontological Semantics
[5] P.J Beltran-Ferruz, P.A Gonzalez-Calero, P. Gervas Converting Mikrokosmos frames into Description Logics.
[6] Sergei Nirenburg, Ontology Tutorial, ILIT UMBC
Mailing Lists
[1] Jena Developers [email protected]
[2] pellet users [email protected]
[3] Semantic web [email protected]
[4] W3c RDF Interest [email protected]
[5] W3c Semantic web [email protected]
Backup
slides
Reasoning Capabilities
Finding Transitive Closures
(RDFS reasoning)
Buildfile: build.xml
init:
compile:
dist:
[jar] Building jar: /home/aks1/software/eclipse/workspace/ontojena/dist/lib/ontojena.jar
Inferred Triples
run:
[java] MODEL OK
[java] Resource: http://ontosem.org/#fire-engine
[java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#fire-engine)
[java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#all)
[java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#physical-object)
[java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#inanimate)
[java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#wheeled-vehicle)
[java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#engine-propelled-vehicle)
[java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#wheeled-engine-vehicle)
[java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#artifact)
[java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#object)
[java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#land-vehicle)
[java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#vehicle)
[java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#truck)
[java] - (http://ontosem.org/#fire-engine rdfs:label ' "a truck with equipment for fighting fires"')
[java] - (http://ontosem.org/#fire-engine rdf:type owl:Class)
[java] fire-engine recognized as subclas of vehicle
BUILD SUCCESSFUL
Total time: 10 seconds
real 0m11.144s
user 0m9.530s
sys 0m0.190s
[aks1@trishuli ontojena]$
vehicle
Land-vehicle
Engine-propelled--vehicle
Wheeled--vehicle
Wheeled-engine-vehicle
Truck
Fire-engine
Mapping Rules
Property Related Constructs
Case
Frequency
Mapped Using
1
domain
617
rdfs:domain
2
domain with not facet
16
owl:disjointWith
3
range
406
rdfs:range
4
range with not facet
5
owl:disjointWith
5
inverse
260
owl:inverseOf
Mapping Rules
Facet related constructs
Case
Frequency
Mapped Using
1
value
18217
owl:Restriction
2
sem
5686
owl:Restriction
3
relaxable-to
95
annotation
4
default
350
Not handled
5
default-measure
612
Not handled
6
not
134
owl:disjointWith
7
inv
1941
Not required