Transcript 223

Integrating Language Understanding agents
into the Semantic Web
Akshay Java, Tim Finin, Sergei Nirenburg
11/04/2005
Outline
•
•
•
•
•
•
•
Motivation: Language Understanding Agents
Ontological Semantics
Bridging the Knowledge Gap
Preliminary Evaluation
SemNews: An Application Testbed
Conclusion
Q&A
Motivation
• Intelligent agents need knowledge and information.
• Majority of content on the web remains in NL text.
• SW can benefit NLP tools in their language understanding
task
Facts from NL
Text
Images
WWW
Audio
video
Web of documents
NLP Tools
Natural
Language
RDF/OWL
Semantic
Web
Ontologies
Instances
triples
structured
information
Web of data
Motivation
Provides RDF
version of the
news.
Language
Understanding
Agents
Ontological Semantics
OntoSem is a Natural
Language Processing
System that processes
the text and converts
them into facts.
Supported by a
constructed world
model encoded in a
rich Ontology.
Ontological Semantics
Text Meaning
Representation
(TMR)
Input
Text
Preprocessor
Grammar:
Ecology
Morphology
Syntax
Static Knowledge Resources
Syntactic
Analyzer
Lexicon and
Onomasticon
Semantic
Analyzer
Ontology and
Fact Repository
Mapping OntoSem to web based KR
• OntoSem ontology is a frame based representation
ONTOLOGY ::= CONCEPT+
CONCEPT ::= ROOT | OBJECT-OR-EVENT | PROPERTY
SLOT
::= PROPERTY | FACET | FILLER
• Translating OntoSem Ontology deals with mapping
its semantics into corresponding OWL
representation.
• OntoSem’s supporting fact repositories are also
mapped to OWL.
• The text meaning representation of the sentences is
now converted to OWL.
Mapping OntoSem to web based KR
Fact
Repository
TMR
NL Text
OntoSem
Lexicon
Ontology
OntoSem2OWL
TMRs
In OWL
OWL
Ontology
Mapping Rules for Classes
OntoSem LISP version
(make-frame patent
(
definition
(value (common "the exclusive right to make, use or sell an invention, which is granted to the inventor")))
(
is-a
(value (common intangible-asset legal-right))))
OWL Version:
•
•
•
•
•
•
•
•
•
•
•
•
•
<owl:Class rdf:about="&ontosem;patent">
<rdfs:subClassOf>
<owl:Class rdf:about="&ontosem;intangible-asset">
</owl:Class>
</rdfs:subClassOf>
<rdfs:subClassOf>
<owl:Class rdf:about="&ontosem;legal-right">
</owl:Class>
</rdfs:subClassOf>
<rdfs:comment>he exclusive right to make, use or
sell an invention, which is granted to the inventor
</rdfs:label>
</owl:Class>
Mapping Rules for Properties
• Properties can be
• ObjectProperty owl:ObjectProperty
• Datatype Property owl:DatatypeProperty
•
•
•
•
•
Property hierarchy is defined by owl:subPropertyOf
Domain maps to rdfs:domain
Range maps to rdfs:range
Restrictions are handled using owl:Restriction
Numeric datatypes are handled using XSD
Mapping Rules for Properties…
(make-frame controls
(domain
(sem (common physical-event physical-object
social-event social-role)))
(range
(sem (common actualize artifact
natural-object social-role)))
(is-a (value (common relation)))
(inverse (value (common controlled-by)))
(definition
(value (common
"A relation which relates concepts to what they
can control"))))
Mapping Rules for Properties…
<owl:ObjectProperty rdf:ID= "controls">
<rdfs:domain>
<owl:Class>
<owl:unionOf rdf:parseType="Collection">
<owl:Class rdf:about="#physical-event"/>
<owl:Class rdf:about="#physical-object"/>
<owl:Class rdf:about="#social-event"/>
<owl:Class rdf:about="#social-role"/>
</owl:unionOf>
</owl:Class>
</rdfs:domain>
<rdfs:range>
<owl:Class>
<owl:unionOf rdf:parseType="Collection">
<owl:Class rdf:about="#actualize"/>
<owl:Class rdf:about="#artifact"/>
<owl:Class rdf:about="#natural-object"/>
<owl:Class rdf:about="#social-role"/>
</owl:unionOf>
</owl:Class>
</rdfs:range>
<rdfs:subPropertyOf>
<owl:ObjectProperty rdf:about="#relation"/>
</rdfs:subPropertyOf>
<owl:inverseOf rdf:resource="#controlled-by"/>
<rdfs:label> "A relation which relates concepts to what they can control" </rdfs:label>
</owl:ObjectProperty>
(make-frame
(domain
(range
(is-a
(inverse
Mapping Rules for Facets
Facets are a way to restricting the fillers that can be used for a particular slot
• SEM and VALUE
• Maps them using owl:Restriction on a particular property.
• RELAXABLE-TO
• Add this to the classes present in owl:Restriction and add this information in
the annotation.
• DEFAULT
• No clear way to represent non-monotonic reasoning and closed world
assumptions in Semantic Web.
• DEFAULT-MEASURE
• similar to DEFAULT Facet, not handled.
• DEFAULT, DEFAULT-MEASURE used relatively less frequently
• NOT
• Not facet can be handled using owl:disjointOf
• INV
• need not be handled since is-a slot is already mapped to owl:inverseOf
Mapping Rules
Property Related Constructs
Case
Frequency
Mapped Using
1
domain
617
rdfs:domain
2
domain with not facet
16
owl:disjointWith
3
range
406
rdfs:range
4
range with not facet
5
owl:disjointWith
5
inverse
260
owl:inverseOf
Mapping Rules
Facet related constructs
Case
Frequency
Mapped Using
1
value
18217
owl:Restriction
2
sem
5686
owl:Restriction
3
relaxable-to
95
annotation
4
default
350
Not handled
5
default-measure
612
Not handled
6
not
134
owl:disjointWith
7
inv
1941
Not required
Translating TMR2OWL
Translating TMRs involves instantiation of concepts
mapped in OWL.
Example:
(COME-1740
(TIME
(VALUE (COMMON (FIND-ANCHOR-TIME))))
(DESTINATION
(VALUE (COMMON CITY-1740)))
(AGENT (VALUE (COMMON POLITICIAN-1740)))
(ROOT-WORDS (VALUE (COMMON (ARRIVE))))
(WORD-NUM (VALUE (COMMON 2)))
(INSTANCE-OF (VALUE (COMMON COME)))
<ontosem:come rdf:about="COME-1740">
<ontosem:destination rdf:resource="#CITY-1740"/>
<ontosem:agent rdf:resource="#POLITICIAN-1740"/>
</ontosem:come>
Evaluation
Built Ontology translation tool using
Jena API
Swoop
Total Triples Generated ~ 102189 (including bnode)
Time to build the Model ~ 10-40 sec
Time to do RDFS Inference ~ 10 sec
Pellet
Wonderweb
http://w3c.org/RDF/Validator/
Time to do OWL Micro ~ 40 sec
Time to do OWL Full ~ ????
DL Expressivity: ELUIH
EL - Conjunction and Full Existential Quantification
After Translation
U - Union
H - Role Hierarchy
Total Number of Classes: 7747 (Defined: 7747, Imported: 0)
I - Role Inverse
Total Number of Datatype Properties: 0 (Defined: 0, Imported: 0)
OWL FULL
Total Number of Object Properties: 604 (Defined: 604, Imported: 0)
Total Number of Annotation Properties: 1 (Defined: 1, Imported: 0)
Total Number of Individuals: 0 (Defined: 0, Imported: 0)
NOTE: This is using no Restrictions
Evaluation
• Syntactic Correctness: was checked using OWL/RDF validators.
• Semantic Validation: Full semantic validation even for subsets of
OWL is difficult.
• Meaning Preservation: some subset of the native representation
features such as DEFAULTS, modality, case roles may be
underrepresented or not handled.
• Feature Minimization: Complex features could be difficult for
reasoners to handle hence we can perform the translations at each of the
levels – OWL Lite, OWL DL, OWL Full.
• Translation Complexity: OntoSem is an extensive and large
ontology (~8000 concepts). Translation itself is done syntactically but in
general translation might require reasoning which could be an issue.
Reasoning Capabilities
Finding Transitive Closures
(RDFS reasoning)
Buildfile: build.xml
init:
compile:
dist:
[jar] Building jar: /home/aks1/software/eclipse/workspace/ontojena/dist/lib/ontojena.jar
Inferred Triples
run:
[java] MODEL OK
[java] Resource: http://ontosem.org/#fire-engine
[java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#fire-engine)
[java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#all)
[java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#physical-object)
[java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#inanimate)
[java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#wheeled-vehicle)
[java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#engine-propelled-vehicle)
[java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#wheeled-engine-vehicle)
[java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#artifact)
[java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#object)
[java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#land-vehicle)
[java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#vehicle)
[java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#truck)
[java] - (http://ontosem.org/#fire-engine rdfs:label ' "a truck with equipment for fighting fires"')
[java] - (http://ontosem.org/#fire-engine rdf:type owl:Class)
[java] fire-engine recognized as subclas of vehicle
BUILD SUCCESSFUL
Total time: 10 seconds
real 0m11.144s
user 0m9.530s
sys 0m0.190s
[aks1@trishuli ontojena]$
vehicle
Land-vehicle
Engine-propelled--vehicle
Wheeled--vehicle
Wheeled-engine-vehicle
Truck
Fire-engine
An Application Testbed: SemNews
• SemNews: Semantically Search and Browser news
• Aggregators collect the RSS news descriptions form
various sources.
• The sentences are processed by OntoSem and are
converted into Text Meaning Representations
(TMRs)
• Provides intelligent agents with the latest news in a
machine readable format
http://semnews.umbc.edu
Fact Repository
Interface
Language Processing
Data Aggregators
1
11
2
RSS
Aggregator
Ontology &
Instance browser
OntoSem
3
4
News Feeds
FR
TMRs
Text Search
12
RDQL Query
13
Swoogle Index
14
6
5
OntoSem2OWL
Dekade Editor
OntoSem Ontology
(OWL)
Knowledge
Editor
Environment
9
7
8
Inferred
10
Triples
TMR
Semantic Web Tools
http://semnews.umbc.edu
Semantic RSS
15
Agent understandable news
Provides
RDF version
of the news.
http://semnews.umbc.edu
Semantacizing RSS
View structured
representation of
the RSS news
story.
Future versions
would enable
editing the facts
and provide
provenance
information
http://semnews.umbc.edu
News stories are ontologically linked
Find news
stories by
browsing
through the
OntoSem
ontology.
http://semnews.umbc.edu
Tracking Named Entities
Find stories
about a specific
named entity.
http://semnews.umbc.edu
Browsing Facts
Fact repository explorer
for named entity
‘Mexico’ shows that it
has a relation
‘nationality-of’ with
CITIZEN-235
Fact repository explorer
for instance CITIZEN235 shows that the
citizen is an agent of
ESCAPE-EVENT
http://semnews.umbc.edu
Querying the semanticized RSS
RDQL
Queries
Provides
structured
querying
over text
converted
into RDF
representat
ion.
http://semnews.umbc.edu
Semantic Alerts
Alerts can be
specified as
ontological
concepts/
keywords /
RDQL queries.
Subscribe to
results of
structured queries
http://semnews.umbc.edu
Conclusions
• Integrating language processing agents into the SW
would publish SW annotations and documents that
capture the text’s meaning.
• Migrating from native non-web based representation
to SW representation may be loss-full but is still
useful for many applications.
• SemNews application testbed demonstrates some
scenarios that can benefit from language
understanding agents.
Q&A
Thank you.
http://ebiquity.umbc.edu
http://semnews.umbc.edu
References
Software Used
[1] OntoSem http://ilit.umbc.edu/
[2] RDF Validation service http://w3c.org/RDF/Validator
[3] Jena Toolkit http://jena.sourceforge.net/
[4] Swoop Ontology Viewer http://www.mindswap.org/2004/SWOOP/
[5] Pellet OWL DL Reasoner http://www.mindswap.org/2003/pellet/
[6] Wonder Web OWL Validator http://phoebus.cs.man.ac.uk:9999/OWL/Validator
Papers
[1] Sergei Nirenburg and Victor Raskin, Ontological Semantics, Formal Ontology and Ambiguity
[2] Sergei Nirenburg and Victor Raskin, Ontological Semantics, MIT Press, Forthcoming
[3] Sergei Nirenburg, Ontological Semantics: Overview, Presentation CLSP JHU, Spring 2003
[4] Marjorie McShane, Sergei Nirenburg, Stephen Beale, Margalit Zabludowski, The Cross Lingual Reuse and Extension of
knowledge Resources in Ontological Semantics
[5] P.J Beltran-Ferruz, P.A Gonzalez-Calero, P. Gervas Converting Mikrokosmos frames into Description Logics.
[6] Sergei Nirenburg, Ontology Tutorial, ILIT UMBC
Mailing Lists
[1] Jena Developers [email protected]
[2] pellet users [email protected]
[3] Semantic web [email protected]
[4] W3c RDF Interest [email protected]
[5] W3c Semantic web [email protected]
Backup
slides
Static Knowledge Sources
•
•
•
•
•
•
Ontology 8000 concepts
Avg 16 properties each
English Lexicon 45000 entries
Spanish Lexicon 40000 entries
Chinese Lexicon 3000 entries
Fact repository 20000 facts
[Sergei Nirenburg, Ontological Semantics:
Overview, Presentation CLSP JHU, Spring
2003]
Text Meaning Representation (TMR)
Text Meaning Representation (TMR)
He asked the UN to authorize the war.
REQUEST-ACTION-69
AGENT
HUMAN-72
THEME
ACCEPT-70
BENEFICIARY
ORGANIZATION-71
SOURCE-ROOT-WORD
ask
TIME
(< (FIND-ANCHOR-TIME))
ACCEPT-70
THEME
WAR-73
THEME-OF
REQUEST-ACTION-69
SOURCE-ROOT-WORD
authorize
ORGANIZATION-71
HAS-NAME
United-Nations
BENEFICIARY-OF
REQUEST-ACTION-69
SOURCE-ROOT-WORD
UN
Example from
[Marjorie McShane, Sergei
Nirenburg, Stephen Beale,
Margalit Zabludowski, The Cross
Lingual Reuse and Extension of
knowledge Resources in
Ontological Semantics]
HUMAN-72
HAS-NAME
Colin Powell
AGENT-OF
REQUEST-ACTION-69
SOURCE-ROOT-WORD
he
; reference resolution has been carried out
WAR-73
THEME-OF
ACCEPT-70
SOURCE-ROOT-WORD
war
PROPERTY
FACET
The OntoSem Ontology
FILLER