Semantic Web Fabric

Download Report

Transcript Semantic Web Fabric

CS690L
Ontologies Interoperability (Integration,
Mapping, Query)
Yugi Lee
STB #555
(816) 235-5932
[email protected]
www.sice.umkc.edu/~leeyu
1
CS690L - Lecture 4
Semantic Web Fabric
• Bootstrapping, Creation and Maintenance of Semantic Knowledge
– Collaborative and Sociological Processes, Statistical Techniques
– Ontology Building, Maintenance and Versioning Tools
• Re-use of Existing Semantic Knowledge (Ontologies)
• Annotation/Association/Extraction of Knowledge with/from
Underlying Data
• Information Retrieval and Analysis (Distributed Querying, Search,
Inference Middleware)
• Semantic Discovery and Composition of Services
• Distributed Computing/Communication Infrastructures
– Component based technologies, Agent based systems, Web Services
Repositories for managing data and semantic knowledge
– Relational Databases, Content Management Systems, Knowledge Base
Systems
[V. Kashyap, 2002]
CS690L - Lecture 4
2
What DB researchers have done ?
• Semantic Data Models
• Multi-database Schema
Heterogeneity
• Multi-database/Federated Database
Schema Integration
• Schema Evolution
• Object Oriented/XML/Deductive
Databases/Rule Based Systems
• Mediators and Wrappers
• Multidatabase/Federated Database
Query Processing
• Data Mining
• Probabilistic Databases
• Workflow-based Coordination
Systems
• Security in Database Systems
• Multimedia Databases
– Text and Information Retrieval
Systems
– Image Databases
• DB Research is well positioned to contribute to the Semantic Web, but:
– there has been little interest in issues related to Semantics in the DB community
– the Semantic Web can be the underlying theme that ties in all the disparate pieces
of work
[V. Kashyap, 2002]
CS690L - Lecture 4
3
What are the missing gaps ?
• Ontology Integration/Interoperation
– Problem is different from Schema Integration
– Need to address “semantics” of relationships such as
“synonyms”, “hyponyms”, etc.
• Ontology Impedance/Mismatch
– Relax the requirements of consistency and completeness
– Should be able to characterize the “information error/loss” that
occurs..
• Dynamic Ontologies
– Need to relax the assumption of the “staticness” of database
schemas Inferences based on Semantics of the Data
– Has been relatively ignored by the DB community
[V. Kashyap, 2002]
CS690L - Lecture 4
4
What are the missing gaps ?
• Semantics of Multimedia Data
– Need to focus more on non-traditional data such as text, images,
etc.
– Need to focus on “annotation mechanisms” as an addition to
wrappers/mediators
• Semantics of Processes/Plans/Workflows
• Performance/Scalability
– A traditional strong point of DB research
• The next wave of research (esp. in the context of the Semantic Web)
will focus on re-use of pre-existing data models/schemas/ontologies
that describes the content of information sources…
[V. Kashyap, 2002]
CS690L - Lecture 4
5
CS690L - Lecture 4
6
CS690L - Lecture 4
7
Inter-ontological relationships
• Synonyms
– leads to semantics preserving translations
•
Hyponyms/Hypernyms
– lead to semantics altering translations
– typically results in loss of recall and precision
•
List of Hyponyms
–
–
–
–
–
–
–
–
technical-manual hyponym manual
book hyponym book
proceedings hyponym book
thesis hyponym book
misc-publication hyponym book
technical-reports hyponym book
press hyponym periodical-publication
periodical hyponym periodical-publication
[V. Kashyap, 2002]
CS690L - Lecture 4
8
[V. Kashyap, 2002]
9
CS690L - Lecture 4
[V. Kashyap, 2002]
10
CS690L - Lecture 4
[V. Kashyap, 2002]
CS690L - Lecture 4
11
Role of Ontologies
• Content explication
Ontologies are used for the explicit description of the
information source
Approaches:
– Single ontology
– Multiple ontology
– Hybrid ontology
• Query model
• Verification (query containment)
[H. Wache, 2002]
CS690L - Lecture 4
12
Single Ontology Approach
•
•
•
•
SIMS
One global ontology
Hierarchical terminological database
Combination of several specialized
ontolgies (for modularization)
• Can be used when all information
sources to be integrated provide nearly
the same view on a domain
• Minimal ontology commitment
• Susceptible to changes in the
information sources
[H. Wache, 2002]
CS690L - Lecture 4
13
Multiple Ontologies
• OBSERVER
• Each information source is described by
its own ontology (source ontology)
• No shared vocabulary
• No common and minimal ontology
commitment is needed
• Simplifies integration and supports
changes in sources
• Difficult to compare different source
ontologies
• Inter-ontology mapping is needed
[H. Wache, 2002]
CS690L - Lecture 4
14
Multiple Ontologies
• COIN
• Semantics of each source is described by its own
ontology
• Built from a a global shared vocabulary
• Shared vocabulary contains basic terms of a
domain
• New sources can easily be added
• Supports acquisition and evolution of ontologies
• Source ontologies are comparable because of
shared vocabulary
• Existing ontologies can not easily be reused, but
have to be redeveloped from scratch
[H. Wache, 2002]
CS690L - Lecture 4
15
Query Model
•
•
•
•
Integrated global view
Global query schema
User formulates query in terms of the ontology
System reformulates queries in terms of subqueries for each source
• Structure of the query model should be more
intuitive for the user
[H. Wache, 2002]
CS690L - Lecture 4
16
Mappings Connecting to Information Sources
• Relate the ontologies to the actual content of an information source
• Approaches
– Structure resemblance
Produce a one-to-one copy of the structure of the database and encode it in a
language that makes automated reasoning possible
– Definition of terms
Use ontology to define terms from the database or the database scheme
– Structure enrichment (most common)
A logical model is built that resembles the structure of the information
source and contains additional definitions and concepts
Can be done using DLs
– Meta-annotation
Add semantic information to an information source ontobroker, SHOE
[H. Wache, 2002]
CS690L - Lecture 4
17
Inter-Ontological Mapping
Defined Mappings (KRAFT)
– special customized mediator agents
– Great flexibility
– Fails to ensure a preservation of semantics - no verification
Lexical Relations (OBSERVER)
– Extend a common DL model by quantified inter-ontology
relationships
– Synonym, hypernym, overlap, covering, disjoint
– Do not have formal semantics
[H. Wache, 2002]
CS690L - Lecture 4
18
Inter-Ontological Mapping
Top-level grounding (DWQ)
– Relate all ontolgies used to a single top-level ontology
– Inheriting concepts from a common top-level ontology
– Can resolve conflicts and ambiguities
Semantic correspondences
– Rely on a common vocabulary
– Uses semantic labels in order to compute correspondences
– Subsumption reasoning can be used to establish relations
between different terminolgies
[H. Wache, 2002]
CS690L - Lecture 4
19
Conclusions
• Data Models/Schemas/Ontologies will form the critical
infrastructure for the Semantic Web
• Re-use of pre-existing data models/schemas/ontologies is crucial
in describing the semantics of various information sources
• There is a need to relax consistency and completeness
requirements and estimate the “error” in the results returned.
• Semantics of information should be used to minimize “error” in
the information obtained
• The new environment is likely to be more “dynamic” in nature –
schemas, workflows, queries, etc. can no longer be assumed to be
static…
• DB research is well positioned to participate in the Semantic Web
if it “adapts” to these new requirements….
CS690L - Lecture 4
20
References
• Vipul Kashyap, The Semantic Web:Has the DB Community
Missed the Bus (again ?) NSF Workshop on DB & IS Research for
Semantic Web and Enterprises, April 3, 2002
• H.Wache, T.Vogele, U.Visser, H.Stuckenschmidt, G.Schuster,
H.Neumann and S.Hubner, Ontology-Based Integration of
Information: A Survey of Existing Approaches
CS690L - Lecture 4
21