UCSD SDSC/KnowlRep-Ludaescher

Download Report

Transcript UCSD SDSC/KnowlRep-Ludaescher

Ontologies in Data and Application
Integration – an Update
Kai Lin
Bertram Ludäscher
Knowledge-Based Information Systems Lab
Data and Knowledge Systems (DAKS)
San Diego Supercomputer Center
University of California San Diego
http://www.geongrid.org
Outline
1. Motivation
2. Ontology Cheat Sheet
3. Ontology-enabled Prototypes and Tools
4. Data & Service Registration (Structural + Semantic)
5. Scientific Workflows
GEON PI Meeting, VTech March 21—23rd 2004
2
GEON PI Meeting, VTech March 21—23rd 2004
3
Ontology Cheat Sheet (1/2)
• What is an ontology? An ontology usually …
– specifies a theory (a set of models) by …
– defining and relating …
– concepts representing features of a domain of interest
• Also an overloaded (sometimes sloppy) term for:
–
–
–
–
–
–
Controlled vocabularies
Database schema (relational, XML, …)
Conceptual schema (ER, UML, … )
Thesauri (synonyms, broader term/narrower term)
Taxonomies
Informal/semi-formal representations
• “Concept spaces”, “concept maps”
• Labeled graphs / semantic networks (RDF)
– Formal ontologies, e.g., in [Description] Logic (OWL)
• “formalization of a specification”
 constrains possible interpretation of terms
GEON PI Meeting, VTech March 21—23rd 2004
4
A Multi-Hierarchical Rock Classification
“Ontology” (GSC)
Genesis
Fabric
Composition
Texture
GEON PI Meeting, VTech March 21—23rd 2004
5
Ontology Cheat Sheet (2/2)
• What are ontologies used for?
– Conceptual models of a domain or application,
(communication means, system design, …)
– Classification of …
• concepts (taxonomy) and
• data/object instances through classes
– Analysis of ontologies e.g.
• Graph queries (reachability, path queries, …)
• Reasoning (concept subsumption, consistency checking, …)
– Targets for semantic data registration
– Conceptual indexes and views for
•
•
•
•
searching,
browsing,
querying, and
integration of registered data
GEON PI Meeting, VTech March 21—23rd 2004
6
Application Example: Geologic Map Integration
domain
knowledge
+/- a few hundred
million years
Nevada
GEON Metamorphism Equation:
+/- Energy
Geoscientists + Computer Scientists
Igneous Geoinformaticists
Geologic Map Integration in the Portal
• After registering datasets, ontologies (here: “classes”), and
an application (“OMI”), the datasets can be searched and
displayed in an integrated way.
GEON PI Meeting, VTech March 21—23rd 2004
8
Concept-Based Queries and Analysis
• After registering a
source with one or
more ontologies,
concept-based
queries and
analysis can be
launched
• Here: light-weight
client-side
processing (SVG)
GEON PI Meeting, VTech March 21—23rd 2004
9
Ontologies and Data Management
• Where do ontologies fit within data management
architectures?
• Several answers, specifically:
– An ontology is similar to a schema or conceptual model if one
exists, but is
– Developed independently of a particular application
– Probably given in a different language
– Inherently more general
– Usually not a very good schema (weak structure)
GEON PI Meeting, VTech March 21—23rd 2004
10
Ontologies and Data Management
( watch out for Semantic Data Registration later)
Ontology
use concepts from
(explicitly or implicitly)
Conceptual
Model
Schema
Design
Artifact
Conceptual
Model
Schema
Schema
Schema
 Metadata
Data
GEON PI Meeting, VTech March 21—23rd 2004
11
Creating and Sharing Concept Maps
(here: Seismology concept map & Cmap tool)
• Lock up scientists for 2+ days
• Add CS/KRDB types
• Create concept maps
• Refine
• Iterate
 from napkin drawings, to concept
maps, to ontologies
GEON PI Meeting, VTech March 21—23rd 2004
12
GEON PI Meeting, VTech March 21—23rd 2004
13
GEON PI Meeting, VTech March 21—23rd 2004
14
GEON PI Meeting, VTech March 21—23rd 2004
15
Graph (RDF) Queries on Ontologies
visualisation
RQL Query:
Show all “products”
Query Results
GEON PI Meeting, VTech March 21—23rd 2004
16
Community-Based Ontology Development
Current concept maps and
emerging ontologies:
1. Igneous Rocks/Plutons
2. Seismology
3. Geochemistry
• Draft of a geochemistry ontology
developed by scientists
17
GEON PI Meeting, VTech March 21—23rd 2004
Protégé (… not so ezOWL yet…)
GEON PI Meeting, VTech March 21—23rd 2004
18
Sparrow (a poor man’s OWL tool …)
Simple ASCII-based RDF and
OWL entry and manipulation
GEON PI Meeting, VTech March 21—23rd 2004
19
Semantic Data Registration
(joint work w/ Shawn Bowers)
What is Data/Ontology/… Registration?
• A mechanism by which data sources,
ontologies, services, …
• … are published in a repository/registry
• for the purpose of “smart” discovery,
querying, integration
GEON PI Meeting, VTech March 21—23rd 2004
21
Things to Register
• Data files (individual files)
– Shapefile as a blob (+ file type)
• Collections (of files; nested; eg satellite data)
• Databases (has schema and can be queried)
– Shapefile with schema registered
• Ontologies
• Services (web + grid services)
• Other/external applications
GEON PI Meeting, VTech March 21—23rd 2004
22
Connecting Datasets to Ontologies
DataCollectionEvent
Measurement
MeasurementContext
MeasurableItem
SpeciesCount
SpeciesAbundance
AbundanceCollectionEvent
Location
LTERSite
SBLTERSite
{naples,…}
⊑ contains.Measurement
⊑ measureOf.MeasurableItem ⊓
hasContext.MeasurementContext
⊑ hasTime.DateTime ⊓ hasLocation.Location
⊑ hasUnit.Unit ⊓ hasValue.UnitValue
⊑ MeasurableItem ⊓ hasSpecies.Species ⊓ hasUnit.RatioUnit …
⊑ Measurement ⊓ measureOf.SpeciesCount
⊑ DataCollectionEvent ⊓ contains.SpeciesAbundance
⊑ position.Coordinate
⊑ Location
⊑ LTERSite ⊓ position.SBLTERCoordinate
⊑ SBLTERSite
Ontology (snippet)
How can we “register”
the dataset to concepts in
the Ontology?
Dataset
Date
2000-09-08
2000-09-08
2000-09-08
2000-09-22
2000-09-18
2000-09-28
Site
CARP
CARP
CARP
NAPL
NAPL
BULL
Transect
1
4
7
7
1
1
GEON PI Meeting, VTech March 21—23rd 2004
SP_Code
CRGI
LOCH
MUCA
LOCH
PAPA
CYOS
Count
0
0
1
1
5
57
23
Step1: Selecting Relevant Concepts
Concepts from an Ontology
• DataCollectionEvent
• AbundanceCollectionEvent
• Location
• LTERSite
• SBLTERSite
• naples
• Measurement
• Abundance
• SpeciesAbundance
• MeasurementContext
•…
• MeasurableItem
• SpeciesCount
• Species
•…
Dataset
Date
2000-09-08
2000-09-08
2000-09-08
2000-09-22
2000-09-18
2000-09-28
Site
CARP
CARP
CARP
NAPL
NAPL
BULL
Transect
1
4
7
7
1
1
GEON PI Meeting, VTech March 21—23rd 2004
SP_Code
CRGI
LOCH
MUCA
LOCH
PAPA
CYOS
Count
0
0
1
1
5
57
24
Step1: Selecting Relevant Concepts
Concepts from an Ontology
• DataCollectionEvent
• AbundanceCollectionEvent
• Location
• LTERSite
• SBLTERSite
• naples
• Measurement
• Abundance
• SpeciesAbundance
• MeasurementContext
•…
• MeasurableItem
• SpeciesCount
• Species
•…
Dataset
Date
2000-09-08
2000-09-08
2000-09-08
2000-09-22
2000-09-18
2000-09-28
Site
CARP
CARP
CARP
NAPL
NAPL
BULL
Transect
1
4
7
7
1
1
GEON PI Meeting, VTech March 21—23rd 2004
SP_Code
CRGI
LOCH
MUCA
LOCH
PAPA
CYOS
Count
0
0
1
1
5
57
25
Step2: Generate Object Model
Concepts from an Ontology
• DataCollectionEvent
• AbundanceCollectionEvent
• Location
• LTERSite
• SBLTERSite
• naples
Abundance
Collection Event
contains
• Measurement
• Abundance
• SpeciesAbundance
• MeasurementContext
•…
• MeasurableItem
• SpeciesCount
• Species
•…
measureOf
SpeciesAbundanc
e
SpeciesCount
hasValue
hasTime
DateTime
hasLoc
hasSpecies
RatioValue
Species
hasUnit
RatioUnit
SBLTERSite
GEON PI Meeting, VTech March 21—23rd 2004
26
GEON PI Meeting, VTech March 21—23rd 2004
27
GEON PI Meeting, VTech March 21—23rd 2004
28
Applications of Semantic Registration
• Mentioned before:
– Smart data discovery, integration etc.
• New application:
– Generating data transformation semi-automatically
for chaining together computational services
GEON PI Meeting, VTech March 21—23rd 2004
29
Problem: Service Reusability
• Unless “designed to fit,” independent services
are structurally incompatible
• Generally, the source output type will not be a
subtype of the target input type
Structural
Type Ps
Source
Service
Incompatible
(⋠)
Structural
Type Pt
Desired Connection
Ps
GEON PI Meeting, VTech March 21—23rd 2004
Pt
Target
Service
30
Service Reusability
• A data transformation mapping () is required
to connect the services … artificially creating
subtype compatibility
• If such a  exists, the services are “structurally
feasible”
Structural
Type Ps
Incompatible
(⋠)

Source
Service
(Ps)
Structural
Type Pt
(≺)
Desired Connection
Ps
GEON PI Meeting, VTech March 21—23rd 2004
Pt
Target
Service
31
Service Reusability
• Idea:
– annotate services with semantic types (concept
expressions) primarily for discovery of services
Ontologies (OWL)
Compatible (⊑)
Semantic
Type Ps
Source
Service
Semantic
Type Pt
Desired Connection
Ps
GEON PI Meeting, VTech March 21—23rd 2004
Pt
Target
Service
32
Service Reusability
• Services can be semantically compatible, but
structurally incompatible
Ontologies (OWL)
Compatible (⊑)
Semantic
Type Ps
Structural
Type Ps
Incompatible
(⋠)

Source
Service
(Ps)
Semantic
Type Pt
Structural
Type Pt
(≺)
Desired Connection
Ps
GEON PI Meeting, VTech March 21—23rd 2004
Pt
Target
Service
33
The Ontology-Driven Framework
(work w/ Shawn Bowers, SEEK)
Ontologies (OWL)
Semantic
Type Ps
Compatible
Registration
Mapping (Input)
Registration
Mapping (Output)
Structural
Type Ps
Correspondence
Generate
Source
Service
(⊑)
Structural
Type Pt
(Ps)
Transformation
Ps
Semantic
Type Pt
Desired Connection
GEON PI Meeting, VTech March 21—23rd 2004
Pt
Target
Service
34
Example Generated Data
Transformation (in XQuery)
• Based on the structural correspondences and
certain assumptions, we derive the
transformation query:
<cohortTable>
{ for $s in /population/sample return
<measurement>
{ for $c in $s/meas/cnt return <obs>{$c/text()}</obs> }
{ for $l in $s/lsp return <phase>{$l/text()}</phase> }
</measurement>
}
</cohortTable>
GEON PI Meeting, VTech March 21—23rd 2004
35
Scientific Workflows
(Efrat Jaeger et al.)
Reverse Engineering a Scientific Workflow
using the KEPLER Tool (Efrat Jaeger)
GEON PI Meeting, VTech March 21—23rd 2004
37
A Scientific Workflow in Kepler
Extract mineral
composition for row Id.
Igneous Rock Diagrams
information.
Rock Name.
GEON PI Meeting, VTech March 21—23rd 2004
38
A Scientific Workflow in Kepler
GEON PI Meeting, VTech March 21—23rd 2004
39
A Scientific Workflow in Kepler
GEON PI Meeting, VTech March 21—23rd 2004
40
GEON PI Meeting, VTech March 21—23rd 2004
41
Reverse-Engineered the Geological Map
Integration in Kepler
GEON PI Meeting, VTech March 21—23rd 2004
42
DataMapper Sub-Workflow
GEON PI Meeting, VTech March 21—23rd 2004
43
Result launched via the BrowserUI actor
GEON PI Meeting, VTech March 21—23rd 2004
44
KEPLER and YOU
• Kepler …
– is a community-based, cross-project,
open source collaboration
– for “minute made” application
integration
– using web (grid) services as basic
building blocks
– has a joint CVS repository, mailing lists,
web site, …
– is gaining momentum thanks to
contributors and contributions
• BSD-style license allows commercial
spin-offs
– a pre-packaged, shrink-wrapped
version (“Kepler-to-GO”) coming soon
to a place near you…
GEON PI Meeting, VTech March 21—23rd 2004
45
FIN
–
Questions?
Additional Material
The KEPLER GUI (Vergil from Ptolemy II)
Drag and drop utilities, director
and actor libraries.
GEON PI Meeting, VTech March 21—23rd 2004
48
Running the workflow
GEON PI Meeting, VTech March 21—23rd 2004
49
Distributed Workflows in KEPLER
• Web and Grid Service plug-ins
– WSDL
– ProxyInit, GlobusGridJob, GridFTP, DataAccessWizard
– SRB
– SSH, SCP
• Web Service Harvester
– Imports all the operations of a specific WS (or of all
the WSs in a UDDI repository) as Kepler actors
• XSLT and XQuery transformers to link non-fitting services
together
• Web Service Deployment (…ongoing work…)
GEON PI Meeting, VTech March 21—23rd 2004
50
A Generic Web Service Actor
Given a WSDL and
the name of an
operation of a web
service, dynamically
customizes itself to
implement and
execute that method.
Configure - select service
operation

GEON PI Meeting, VTech March 21—23rd 2004
51
Set Parameters and Commit
Set parameters
and commit
GEON PI Meeting, VTech March 21—23rd 2004
52
WS Actor after Instantiation
GEON PI Meeting, VTech March 21—23rd 2004
53
Web Service Harvester
• Imports the web services in a repository
into the actor library.
• Has the capability to search for web
services based on a keyword.
GEON PI Meeting, VTech March 21—23rd 2004
54
Composing 3rd-Party WSs
Output of previous
web service
User interaction &
Transformations
GEON PI Meeting, VTech March 21—23rd 2004
Input of next
web service
55
Providing DB Access through Kepler
• Database connection actor:
– Opening a database connection and passing it to all actors
accessing this database.
• Database query actor:
– A generic actor that queries a database and provides its
result.
• DBConnection type and DBConnectionToken:
– A new IOPort type and a token to distinguish a database
connection from any general type.
Database Connection Actor
OpenDBConnection actor:
• Input: database connection information.
• Output: A DBConnectionToken, a reference to a
database connection instance, through a
DBConnection output port.
Database Query Actor
Database Query actor:
Input: A query string (SQL) and a database connection
reference.
Parameters: output type – XML, Record or String.
output each row separately or all at once.
Process: Execute query. Produce results according to
parameters.
Querying Example
Resource Description Framework (RDF)
Simple data model that consists of
– Resources (uniquely identified via URIs)
– Properties
– Values (resources or character strings)
Data organized into triples
(subject, property, value)
locatedIn
CaliforniaRegion
SonomaRegion
Subject
(Resource)
Property
(Resource)
Value
(Resource)
locatedIn(SonomaRegion, California)
GEON PI Meeting, VTech March 21—23rd 2004
60
RDF Schema
Adds a set of pre-defined properties to define classes and properties
Allows instances to be connected to classes
Sub-class and sub-property (is-a) relationships
Region is a class
locatedIn is a property
locatedIn connects Regions
locatedIn
Region
rdf:type
rdf:type
locatedIn
SonomaRegion
GEON PI Meeting, VTech March 21—23rd 2004
CaliforniaRegion
61
OWL
Adds additional pre-defined properties to further
constrain an ontology
(See http://www.w3.org/TR/owl-guide/)
Note, RDF(S) and OWL use XML
Some graphic tools exist (e.g., Protégé)
A Vintage is a class that is a
subclass of an unnamed class
whose instances always have
one hasVintageYear property.
<owl:Class rdf:ID="Vintage">
<rdfs:subClassOf>
<owl:Restriction>
<owl:onProperty rdf:resource="#hasVintageYear"/>
<owl:cardinality>1</owl:cardinality>
Note the uglified XML syntax…
</owl:Restriction>
The good news: meant for
</rdfs:subClassOf>
</owl:Class>
parsers, not humans!
GEON PI Meeting, VTech March 21—23rd 2004
62