Ontology - San Diego Supercomputer Center

Download Report

Transcript Ontology - San Diego Supercomputer Center

Semantic Mediation,
Ontologies and
Scientific Workflows
and all the rest (+/– Web Services)
Bertram Ludäscher
Knowledge-Based Information Systems Lab
San Diego Supercomputer Center
University of California San Diego
http://seek.ecoinformatics.org
http://www.geongrid.org
Outline
• Motivation (SEEK, GEON, ..)
• Ontologies 101
• Semantic Mediation, Data Registration, …
• Application Examples (Stargazing with Kepler…)
SDSC/LTER Workshop Feb’2004
2
Kepler Team, Projects, Sponsors
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Ilkay Altintas SDM
Chad Berkley SEEK
Shawn Bowers SEEK
Jeffrey Grethe BIRN
Christopher H. Brooks Ptolemy II
Zhengang Cheng SDM
Efrat Jaeger GEON
Matt Jones SEEK
Edward A. Lee Ptolemy II
Kai Lin GEON
Ashraf Memon GEON
Bertram Ludaescher BIRN, GEON, SDM, SEEK
Steve Mock NMI
Steve Neuendorffer Ptolemy II
Mladen Vouk SDM
Yang Zhao Ptolemy II
…
SDSC/LTER Workshop Feb’2004
3
Ptolemy II
SDSC/LTER Workshop Feb’2004
4
SEEK
Science Environment for Ecological Knowledge
• EcoGrid
• Uniform interfaces to manage environmental data
• Kepler
• Modeling scientific workflows
• Semantic Mediation System
• “Smart” data discovery and integration
•
•
•
Knowledge Representation (SEEK-KR)
Classification and Nomenclature (SEEK-TAXON)
Biodiversity and Ecological Analysis and Modeling (SEEK-BEAM)
SDSC/LTER Workshop Feb’2004
5
SEEK Overview
SDSC/LTER Workshop Feb’2004
6
Building the EcoGrid
NTL
AND
HBR
VCR
LUQ
Metacat node
VegBank node
Xanthoria node
SDSC/LTER Workshop Feb’2004
SRB node
DiGIR node
Legacy system
LTER Network (24)
Natural History Collections (>> 100)
Organization of Biological Field Stations (180)
UC Natural Reserve System (36)
Partnership for Interdisciplinary Studies of Coastal Oceans (4)
Multi-agency
Rocky Intertidal Network (60)
7
Heterogeneous Data integration
• Requires advanced metadata and processing
–
–
–
–
Attributes must be semantically typed
Collection protocols must be known
Units and measurement scale must be known
Measurement relationships must be known
• e.g., that ArealDensity=Count/Area
SDSC/LTER Workshop Feb’2004
8
Semantic Mediation
• Label data with semantic types
• Label inputs and outputs of analytical components with semantic types
Data
Ontology
Workflow Components
• Use reasoning engines to generate transformation steps
– Beware analytical constraints
• Use reasoning engine to discover relevant components
SDSC/LTER Workshop Feb’2004
9
Ecological ontologies
•
•
•
•
What was measured (e.g., biomass)
Type of measurement (e.g., Energy)
Context of measurement (e.g., Psychotria limonensis)
How it was measured (e.g., dry weight)
•
SEEK intends to enable community-created ecological ontologies using OWL
–
•
Represents a controlled vocabulary for ecological metadata
More about this in Bertram’s talk
SDSC/LTER Workshop Feb’2004
10
Ontologies 101 (based on a tutorial by Shawn Bowers and CSE291)
• Ontologies basics
• Ontologies and data management
• Benefits of ontologies
• Constructing ontologies
• Breakout Exercises
SDSC/LTER Workshop Feb’2004
11
What are ontologies?
It depends on who you ask
We focus on the data-management view
Generally speaking, an ontology
specifies a theory (a model) by …
defining and relating …
generic concepts representing features of
the real or abstract world (a domain of
interest)
SDSC/LTER Workshop Feb’2004
12
[Bunge]
Concepts, Symbols, and Things
• Humans use symbols (e.g., words) to
communicate
• Words are mapped to things indirectly through
concepts that denote (refer to) things
Concept
“Jaguar”
Ogden, C. K. & Richards, I. A. 1923. "The Meaning
of Meaning." 8th Ed. New York, Harcourt, Brace &
World, Inc
SDSC/LTER Workshop Feb’2004
13
[Carole Goble, Nigel Shadbolt]
Concepts, Symbols, and Things
Symbols and concepts are not precise
– The same symbol can stand for multiple things
– The same thing can have multiple symbols
– Concepts are usually not well-defined
Concept
“Jaguar”
Ogden, C. K. & Richards, I. A. 1923. "The Meaning
of Meaning." 8th Ed. New York, Harcourt, Brace &
World, Inc
SDSC/LTER Workshop Feb’2004
14
[Carole Goble, Nigel Shadbolt]
Concepts, Symbols, and Things
An ontology attempts to define and relate
specific concepts for certain sets of things via
agreed upon symbols
Concept
“Jaguar”
Ogden, C. K. & Richards, I. A. 1923. "The Meaning
of Meaning." 8th Ed. New York, Harcourt, Brace &
World, Inc
SDSC/LTER Workshop Feb’2004
15
What are ontologies?
Ontologies are typically created to:
Commit to a definition (a model) of a domain
Explicitly state assumptions concerning the definition
Have a wide scope (be general)
Support exchange and integration of heterogeneous
data sources and applications (more on this later…)
SDSC/LTER Workshop Feb’2004
16
What are ontologies?
Ontologies may be expressed
Informally using natural language (e.g., in philosophy
and sometimes biology)
Formally using a mathematical language, e.g., firstorder logic
We focus on formal ontologies
To be precise about what the theory proposes
SDSC/LTER Workshop Feb’2004
17
What are ontologies?
Formal ontologies can vary in detail
Controlled Vocabulary
(list of terms)
Simple Thesaurus
(synonyms)
Thesaurus
(broader/narrower terms)
Classification
(class, instance, is-a, maybe part-of)
Classification
(value, cardinality constraints)
Classification
(axioms such as disjoint, union, etc.)
Classification
(general logic constraints)
SDSC/LTER Workshop Feb’2004
18
What are ontologies?
Formal ontologies can vary in detail
Controlled Vocabulary
(list of terms)
Simple Thesaurus
(synonyms)
Thesaurus
(broader/narrower terms)
Expressiveness
Classification
(class, instance, is-a, maybe part-of)
Classification
(value, cardinality constraints)
Classification
(axioms such as disjoint, union, etc.)
Classification
(general logic constraints)
SDSC/LTER Workshop Feb’2004
19
Class, Instance, and Is-a
Animal
is-a
“Every Jaguar is an Animal”
x . Jaguar(x)  Animal(x)
Jaguar
Set of things (instances)
denoted by the class Animal
Set of things (instances)
denoted by the class Jaguar
SDSC/LTER Workshop Feb’2004
20
Properties and Cardinality Constraints
Animal
is-a
Carnivore
is-a
Jaguar
eats
A cardinality constraint
might state that carnivores
must eat at least one Animal
Question: Must Jaguars eat at
least one Animal?
SDSC/LTER Workshop Feb’2004
21
Value Restrictions
Animal
is-a
Carnivore
is-a
Jaguar
SDSC/LTER Workshop Feb’2004
eats
A value restriction for Jaguar
might restrict the eats property
to the specific animals eaten
by Jaguars
22
Value Restrictions
Jaguars restrict the eats
relationship to Marsh Deer, …
Animal
eats
Carnivore
Herbivore
eats
Marsh Deer
SDSC/LTER Workshop Feb’2004
23
Jaguar
Value Restrictions
Does anyone see a
problem with this choice
of representation?
Animal
eats
Carnivore
Herbivore
eats
Marsh Deer
SDSC/LTER Workshop Feb’2004
24
Jaguar
Value Restrictions
These different representations
propose the same basic
underlying theory
Animal
eats
JaguarFood
Herbivore Carnivore
Marsh Deer
Peccary Jaguar
eats
SDSC/LTER Workshop Feb’2004
25
What are ontologies?
Formal ontologies can vary in detail
Controlled Vocabulary
(list of terms)
Simple Thesaurus
(synonyms)
Thesaurus
(broader/narrower terms)
Expressiveness
Classification
(class, instance, is-a, maybe part-of)
Classification
(value, cardinality constraints)
Classification
(axioms such as disjoint, union, etc.)
Classification
(general logic constraints)
SDSC/LTER Workshop Feb’2004
26
What are ontologies?
An (informal) ontology of wine:
Wines are potable liquids made by wineries
within regions and with specific vintages
Wines are characterized by the type of grape
they are made with, their color (white, rose,
red), their sugar (dry, offdry, or sweet), their
body (light, medium, full), and their flavor
(delicate, moderate, strong)
Sauvignon Blanc, Merlot, Pinot Noir, and Riesling
are types of wines
SDSC/LTER Workshop Feb’2004
27
[OWL Guide]
Exercise
With a partner, take 5 minutes and try to define a
“formal” ontology for the wine example
– Select two or three classes
– Identify some relationships between them
– List any constraints (cardinality or value restrictions)
that exist between them
SDSC/LTER Workshop Feb’2004
28
What are ontologies?
(Philosophy) An ontological theory can answer
“ontological” questions
–
–
–
–
Is Merlot a potable liquid?
Are there wines made of things other than grapes?
How are Pinot Gris and Pinot Noir related?
Are there white wines that are dry, full, and strong
made in Napa Valley?
We will look at other uses later
SDSC/LTER Workshop Feb’2004
29
[Bunge]
Outline
• Ontologies basics
• Ontologies and data management
• Benefits of using ontologies
• Constructing ontologies
• Breakout Exercises
SDSC/LTER Workshop Feb’2004
30
Ontologies and Data Management
Where do ontologies fit within data management
architectures?
There is no specific answer to this question…
However, an ontology is similar to a schema or
conceptual model if one exists, but is
– Developed independently of a particular
application
– Probably given in a different language
– Inherently more general
– Usually not a very good schema (weak structure)
SDSC/LTER Workshop Feb’2004
31
Ontologies and Data Management
( watch out for Semantic Data Registration later)
Ontology
use concepts from
(explicitly or implicitly)
Conceptual
Model
Schema
Design
Artifact
Conceptual
Model
Schema
Schema
Schema
 Metadata
Data
SDSC/LTER Workshop Feb’2004
32
Outline
• Ontologies basics
• Ontologies and data management
• Benefits of ontologies
• Constructing ontologies
• Breakout Exercises
SDSC/LTER Workshop Feb’2004
33
Benefits of ontologies
Ontologies are often developed within a
community and are interdisciplinary
Explicitly capture “knowledge” about a domain
– Standard terms (symbols) for metadata values and
schema design
– Enables advanced searching techniques
(via reasoning)
– Enables exchange and integration
SDSC/LTER Workshop Feb’2004
34
Benefits of ontologies
Ontologies for metadata keywords
{sonoma county, wine}
{cabernet sauvignon, sonoma county, …}
{medium, red, dry, …}
SDSC/LTER Workshop Feb’2004
35
Benefits of ontologies
Ontologies for metadata keywords
Find information about
dry california red wines
{sonoma region, wine}
{cabernet sauvignon, sonoma region, …}
{medium, red, dry, …}
We use the ontology to “expand” and/or “focus” the query, e.g., that
cabernet sauvignon is red and dry; sonoma valley is in california
SDSC/LTER Workshop Feb’2004
36
Benefits of ontologies
Dataset
(wines by
regions)
What regional characteristics
produce the best-selling wines?
Dataset
(wine sales)
Dataset
(region
characteristics)
Integrate
Integration can be extremely complex
due to structural (schema and values)
and semantic (ontological) differences
Ontologies can help!
SDSC/LTER Workshop Feb’2004
Analysis
37
Benefits of ontologies
Dataset
(wines by
regions)
What regional characteristics
produce the best-selling wines?
Dataset
(wine sales)
Dataset
(region
characteristics)
Provides a uniform view of
disparate sources
SDSC/LTER Workshop Feb’2004
Integrate
Analysis
Registering datasets with ontologies
Map structure (schema) to concepts
Map data to classes/instances
(various ways to do this…)
38
Outline
• Ontologies basics
• Ontologies and data management
• Benefits of ontologies
• Constructing ontologies
• Breakout Exercises
SDSC/LTER Workshop Feb’2004
39
Constructing ontologies
Various Web-based standards are emerging for
defining ontologies
XML Schema
• Mainly for defining “vocabularies” and less-formal
ontologies (term-based is-a, some constraints)
• Mainly a structural/schema representation
– Topic Maps
• For advanced thesauri, subject indexes
– RDF/RDFS/OWL
• Formal ontologies based on description logics (a variant of
first-order logic) and semantic networks (more informal)
SDSC/LTER Workshop Feb’2004
40
Resource Description Framework (RDF)
Simple data model that consists of
– Resources (uniquely identified via URIs)
– Properties
– Values (resources or character strings)
Data organized into triples
(subject, property, value)
locatedIn
CaliforniaRegion
SonomaRegion
Subject
(Resource)
Property
(Resource)
Value
(Resource)
locatedIn(SonomaRegion, California)
SDSC/LTER Workshop Feb’2004
41
RDF Schema
Adds a set of pre-defined properties to define classes and properties
Allows instances to be connected to classes
Sub-class and sub-property (is-a) relationships
Region is a class
locatedIn is a property
locatedIn connects Regions
locatedIn
Region
rdf:type
rdf:type
locatedIn
CaliforniaRegion
SonomaRegion
SDSC/LTER Workshop Feb’2004
42
OWL
Adds additional pre-defined properties to further
constrain an ontology
(See http://www.w3.org/TR/owl-guide/)
Note, RDF(S) and OWL use XML
Some graphic tools exist (e.g., Protégé)
A Vintage is a class that is a
subclass of an unnamed class
whose instances always have
one hasVintageYear property.
<owl:Class rdf:ID="Vintage">
<rdfs:subClassOf>
<owl:Restriction>
<owl:onProperty rdf:resource="#hasVintageYear"/>
<owl:cardinality>1</owl:cardinality>
Note the uglified XML syntax…
</owl:Restriction>
The good news: meant for
</rdfs:subClassOf>
</owl:Class>
parsers, not humans!
SDSC/LTER Workshop Feb’2004
43
Protégé
SDSC/LTER Workshop Feb’2004
44
Description Logic
A language and syntax for describing
“concept” logics
–
–
–
–
–
Concept names C (denote sets of instances)
Class definitions D (denote sets of instances)
Subclass definition C ⊑ D
Equivalence definition C  D
Definition constructors
• intersection D ⊓ D
• union D ⊔ D
• Property existence hasProp.D
• Property restriction hasProp.D
• Cardinality =1 hasProp.D, >1 hasProp.D, <2 hasProp.D
SDSC/LTER Workshop Feb’2004
45
Description Logic
Wine ⊑ PotableLiquid ⊔ hasColor.{Red, Rose, White)
The class Wine is a sub-class of PotableLiquids that have at least
one (exists one) hasColor property whose values are either Red,
Rose, or White
WhiteWine  Wine ⊓ hasColor.{White)
WhiteWines are exactly Wines whose color is White
WhiteBurgandy ⊑ WhiteWine ⊓ Burgandy
The set of WhiteBurgandy wines is a subset of the set of
WhiteWines intersected with Burgandy wines
SauvignonBlanc ⊑ WhiteWine ⊓ =1 madeFromGrape.SauvignonBlancGrape
SDSC/LTER Workshop Feb’2004
46
Constructing Ontologies
In general, creating an ontology is hard
– Requires general agreement and understanding of a
domain
– Requires a clear, concise, and unambiguous
definition
– May invoke controversy
– Is a hard data-modeling problem (complex
constraints, broad domain)
SDSC/LTER Workshop Feb’2004
47
Outline
• Ontologies basics
• Ontologies and data management
• Benefits of ontologies
• Constructing ontologies
• Breakout Exercises
SDSC/LTER Workshop Feb’2004
48
Breakout Exercises
Divide into the same groups as yesterday
Develop an ontology for the domain you worked on:
•
•
•
Define relevant concepts
Define relationships among concepts
If you have time, work on simple constraints (cardinality, value
restrictions)
Capture (on paper, or in PPT if you feel ambitious) your ontology
in whatever way makes sense to you (e.g., as circle-line
drawings or as list of terms and properties). What assumptions
did you make in creating your ontology?
If you have time, develop a scenario for your ontology in terms of
your workflow. For example, to show how your ontology could
help integration or query.
SDSC/LTER Workshop Feb’2004
49
Some References
Mario Bunge. Treatise on Basic Philosophy, Vol. 3, Ontology I: The
Furniture of the World. D. Reidel Publishing Company, 1977.
Nicola Guarino. Formal ontology and information systems. In Proc. of
Formal Ontology in Information Systems, IOS Press, pp. 3-15, 1998.
Thomas R. Gruber. Toward principles for the design of ontologies used
for knowledge sharing. In Formal Ontology in Conceptual Analysis
and Knowledge Representation, Kluwer Academic Publishers,
1993.
Jeffrey Parsons and Yair Wand. Emancipating instances from the
tyranny of classes in information modeling. In ACM Transactions on
Database Systems, 25(2):228-268, 2000.
SDSC/LTER Workshop Feb’2004
50
Some References
Michael Smith, Chris Welty, and Deborah McGuinness. OWL Web
Ontology Language Guide. W3C Proposed Recommendation.
(http://www.w3.org/TR/owl-guide/). Includes Wine Ontology.
Protégé. Stanford Medical Informatics.
http://protege.stanford.edu/index.html. Freely available. Lots of
plug-ins.
SDSC/LTER Workshop Feb’2004
51
Data Registration
What is Data Registration?
• A mechanism by which data sources are
published in a repository or registry for the
purpose of
–
data discovery, querying, retrieval (“get”, “copy”),
update, transformation, migration, application
binding, query planning, concept-based rewriting,
…
SDSC/LTER Workshop Feb’2004
53
Things to Register
•
Data files (individual files)
–
•
•
•
•
e.g. shapefile as a blob (+ file type)
Collections (of files or subcollections)
Ontologies
Services (web + grid services)
Databases (has schema and can be queried)
–
–
–
–
–
e.g. shapefile as a DB with schema registered
schemas (relational, XML, …),
local integrity constraints,
access information (connection mechanism, protocols, query
capabilities, handles to actual data)
registration constraints to (identifiable/registered) ontologies
(aka “registration mappings”)
SDSC/LTER Workshop Feb’2004
54
Things to register (w/ metadata!)
aka Registration Objects
• Data files (individual files)
– Shapefile as a blob (+ file type)
• Collections (of files; nested; eg satellite data)
• Databases (has schema and can be queried)
– Shapefile with schema registered
• Ontologies
• Services (web + grid services)
• Other/external applications
SDSC/LTER Workshop Feb’2004
55
Connecting Datasets to Ontologies
DataCollectionEvent
Measurement
MeasurementContext
MeasurableItem
SpeciesCount
SpeciesAbundance
AbundanceCollectionEvent
Location
LTERSite
SBLTERSite
{naples,…}
⊑ contains.Measurement
⊑ measureOf.MeasurableItem ⊓
hasContext.MeasurementContext
⊑ hasTime.DateTime ⊓ hasLocation.Location
⊑ hasUnit.Unit ⊓ hasValue.UnitValue
⊑ MeasurableItem ⊓ hasSpecies.Species ⊓ hasUnit.RatioUnit …
⊑ Measurement ⊓ measureOf.SpeciesCount
⊑ DataCollectionEvent ⊓ contains.SpeciesAbundance
⊑ position.Coordinate
⊑ Location
⊑ LTERSite ⊓ position.SBLTERCoordinate
⊑ SBLTERSite
Ontology (snippet)
How can we “register”
the dataset to concepts in
the Ontology?
Dataset
Date
2000-09-08
2000-09-08
2000-09-08
2000-09-22
2000-09-18
2000-09-28
SDSC/LTER Workshop Feb’2004
Site
CARP
CARP
CARP
NAPL
NAPL
BULL
56
Transect
1
4
7
7
1
1
SP_Code
CRGI
LOCH
MUCA
LOCH
PAPA
CYOS
Count
0
0
1
1
5
57
Purpose of Semantic Registration
Expose “hidden” information:
– What do attributes represent?
– What do specific values represent?
– What conceptual “objects” are in the dataset?
Capture connections between the dataset and
ontology to:
– Find existing datasets (or parts of datasets) via
ontological concepts (discovery)
– Enable fine-grain integration of datasets (mediation)
– Generate metadata for new data products (in a
pipeline)
SDSC/LTER Workshop Feb’2004
57
Semantic Registration Framework
Step 1: Data provider selects relevant
ontological concepts (for the dataset)
Step 2: The semantic registration system
creates a structural representation based
on chosen concepts (data provide refines
if needed)
Step 3: The data provider maps the dataset
information to the generated structural
representation
SDSC/LTER Workshop Feb’2004
58
Step1: Selecting Relevant Concepts
Concepts from an Ontology
• DataCollectionEvent
• AbundanceCollectionEvent
• Location
• LTERSite
• SBLTERSite
• naples
• Measurement
• Abundance
• SpeciesAbundance
• MeasurementContext
•…
• MeasurableItem
• SpeciesCount
• Species
•…
Dataset
Date
2000-09-08
2000-09-08
2000-09-08
2000-09-22
2000-09-18
2000-09-28
SDSC/LTER Workshop Feb’2004
Site
CARP
CARP
CARP
NAPL
NAPL
BULL
Transect
1
4
7
7
1
1
59
SP_Code
CRGI
LOCH
MUCA
LOCH
PAPA
CYOS
Count
0
0
1
1
5
57
Step1: Selecting Relevant Concepts
Concepts from an Ontology
• DataCollectionEvent
• AbundanceCollectionEvent
• Location
• LTERSite
• SBLTERSite
• naples
• Measurement
• Abundance
• SpeciesAbundance
• MeasurementContext
•…
• MeasurableItem
• SpeciesCount
• Species
•…
Dataset
Date
2000-09-08
2000-09-08
2000-09-08
2000-09-22
2000-09-18
2000-09-28
SDSC/LTER Workshop Feb’2004
Site
CARP
CARP
CARP
NAPL
NAPL
BULL
Transect
1
4
7
7
1
1
60
SP_Code
CRGI
LOCH
MUCA
LOCH
PAPA
CYOS
Count
0
0
1
1
5
57
Step2: Generate Object Model
Concepts from an Ontology
• DataCollectionEvent
• AbundanceCollectionEvent
• Location
• LTERSite
• SBLTERSite
• naples
Abundance
Collection Event
contains
• Measurement
• Abundance
• SpeciesAbundance
• MeasurementContext
•…
• MeasurableItem
• SpeciesCount
• Species
•…
measureOf
SpeciesAbundanc
e
SpeciesCount
hasValue
hasTime
DateTime
SDSC/LTER Workshop Feb’2004
hasLoc
hasSpecies
RatioValue
SBLTERSite
61
Species
hasUnit
RatioUnit
SDSC/LTER Workshop Feb’2004
62
SDSC/LTER Workshop Feb’2004
63
SDSC/LTER Workshop Feb’2004
64
A System for Semantic Integration of
Geologic Maps via Ontologies
Kai Lin
Bertram Ludäscher
Geologic Map Integration
• Given:
– Geologic maps from different state geological
surveys (shapefiles w/ different data schemas)
– Different ontologies:
• Geologic age ontology
• Rock classification ontologies:
– Multiple hierarchies (chemical, fabric, texture, genesis) from
Geological Survey of Canada (GSC)
– Single hierarchy from British Geological Survey (BGS)
• Problem
– Support uniform queries using different ontologies
– Support registration w/ ontology A, querying w/
ontology B
SDSC/LTER Workshop Feb’2004
66
Geologic Map Integration
domain
knowledge
+/- a few hundred
million years
Nevada
GEON Metamorphism Equation:
+/- Energy
Geoscientists + Computer Scientists
Igneous Geoinformaticists
A Multi-Hierarchical Rock
Classification Ontology (GSC)
Genesis
Fabric
Composition
Texture
SDSC/LTER Workshop Feb’2004
68
Implementation in OWL:
Not only “for the machine” …
SDSC/LTER Workshop Feb’2004
69
System Overview
Data
Data
Ontology enabled
Map Integrator {A,B}
ontology
A
ontology
B
Data
ontology
C
Application (B)
Application (C)
Data
Data sets
SDSC/LTER Workshop Feb’2004
Ontologies
70
Applications
Ontology Repository
• Accept user-defined ontologies in OWL
• Any ontology saved in the system can be imported into a userdefined ontology ( inter-ontology references)
• Provide tool to browse the ontologies in the repository
……………..
composition.owl
<owl:Ontology>
<owl:imports rdf:resource=
"http://compute5.sdsc.geongrid.org:8080/workbench/jsp/ontologies/genesis.owl" />
</owl:Ontology>
…………….
<owl:Class rdf:ID="Ultramafite">
<rdfs:subClassOf rdf:resource="#Ultramafic"/>
<rdfs:subClassOf rdf:resource=
"http://compute5.sdsc.geongrid.org:8080/workbench/jsp/ontologies/genesis.owl#Igneous">
</owl:Class>
……………..
SDSC/LTER Workshop Feb’2004
71
Ontology Mapping: Motivation
• Align ontologies
• Integrate data sets which are registered to different
ontologies
• Query data sets through different ontologies
• Ontology parameterization
Data set 1
register
Ontology 1
Ontology mappings
register
Data set 2
SDSC/LTER Workshop Feb’2004
Ontology 2
72
queries
Ontology Mapping: Definition
An ontology mapping consists of :
• a class mapping f:
a partial mapping from the class set of Oa to the
class set of Ob preserving the subclass relation
• a property mapping g:
a partial mapping from the property set of Oa to the
property set of Ob such that if p is a property between
A1 and A2 in Oa, then g(p) is a property between f (A1)
and f(A2) in Ob
f(A1)
A1
p
g(p)
f(A2)
A2
SDSC/LTER Workshop Feb’2004
Oa
73
Ob
Ontology Mapping: Combining Ontologies
The result O of combining ontologies Oa and Ob is a
pushout of the following ontology mappings f and g :
Example:
Oc
Oa
Ob
O
Oc
A1
A
Oa
p
A2
O
Ob
B1
q
SDSC/LTER Workshop Feb’2004
A2
p
A
B2
q
B2
74
Ontology Switching
Given an ontology mapping f from Oa to Ob, Oa can be
used to query any data sets which are registered to Ob.
Data set 1
register
Ontology Ob
Data set 2
register
Ontology mapping
Ontology Oa
SDSC/LTER Workshop Feb’2004
75
queries
Geology Workbench : Initial State
click on Ontologies
click on Datasets
click on Applications
An Ontology-based Mediator
SDSC/LTER Workshop Feb’2004
76
Geology Workbench: Uploading Ontologies
click
on
Ontology
Submission
Choose
Click
antoOWL
checkfile
its to
detail
upload
SDSC/LTER Workshop Feb’2004
77
Name Space
Can be used to import this
ontology into others
Geology Workbench: Data (to Ontology!) Registration
Step 1: Choose Classes
Click on Submission
Data set name
Select a shapefile
Choose an ontology class
SDSC/LTER Workshop Feb’2004
78
Geology Workbench: Data Registration
Step 2: Choose Columns for Selected Classes
It contains information about
geologic age
AREA
PERIMETER
AZ_1000
AZ_1000_ID
GEO
PERIOD
ABBREV
DESCR
D_SYMBOL
P_SYMBOL
SDSC/LTER Workshop Feb’2004
79
Geology Workbench: Data Registration
Step 3: Resolve Mismatches
Two terms are
not matched any
ontology terms
Manually mapping
algonkian into
the ontology
SDSC/LTER Workshop Feb’2004
80
Geology Workbench: Ontology-enabled Map Integrator
All areas with the
age Paleozoic
Click on the name
Choose interesting
Classes
SDSC/LTER Workshop Feb’2004
81
Geology Workbench: Change Ontology
Run it
New query interface
Switch from Canadian
Rock Classification to
British Rock
Classification
Ontology mapping
between British Rock
Classification and Canadian
Rock Classification
Submit a mapping
SDSC/LTER Workshop Feb’2004
82
Back to Scientific Workflows, Kepler
(and yes, web services…)
Web Services & Scientific Workflows in Kepler
• Web services = individual components (“actors”)
• “Minute-Made” Application Integration:
– Plugging-in and harvesting web service components is easy and fast
• Rich SWF modeling semantics (“directors” and more):
– Different and precise dataflow models of computation
– Clear and composable component interaction semantics
 Web service composition and application integration tool
• Coming soon:
–
–
–
–
Shrinked wrapped, pre-packaged “Kepler-to-Go” (v0.8)
SWFs with structural and semantic data types (better design support)
Grid-enabled web services (for big data, big computations,…)
Different deployment models (SWF WS, web site, applet, …)
SDSC/LTER Workshop Feb’2004
84
Genomics Example:
Promoter Identification Workflow
Source: Matt Coleman (LLNL)
SDSC/LTER Workshop Feb’2004
85
Ecology: GARP Analysis Pipeline for
Invasive Species Prediction
Test sample (d)
Registered
Ecogrid
Database
EcoGrid
Query
Species
presence &
absence points
(native range)
(a)
Registered
Ecogrid
Database
+A1
+A2
+A3
Sample
Data
Training
sample
(d)
Data
Calculation
GARP
rule set
(e)
Map
Generation
Native
range
prediction
map (f)
Model quality
parameter (g)
Integrated
layers
(native range) (c)
Environmental
layers (native
range) (b)
Invasion
area prediction
map (f)
Map
Generation
Layer
Integration
Registered
Ecogrid
Database
Environmental
layers (invasion
area) (b)
Layer
Integration
User
Model quality
parameter (g)
Integrated layers
(invasion area) (c)
EcoGrid
Query
Registered
Ecogrid
Database
Validation
Validation
Archive
To Ecogrid
Selected
prediction
maps (h)
Generate
Metadata
Species presence
&absence points
(invasion area) (a)
SDSC/LTER Workshop Feb’2004
Source:
NSF SEEK (Deana Pennington et. al, UNM)
86
Source: NIH BIRN (Jeffrey Grethe, UCSD)
SDSC/LTER Workshop Feb’2004
87
KEPLER Core Capabilities (1/2)
• Capturing scientific workflows
– Accessing available workflows through the Grid
• Designing scientific workflows
– Composition of actors (tasks) to perform a scientific WF
• Actor prototyping
• Accessing heterogeneous data
– Data access wizard to search
and retrieve Grid-based resources
– Relational DB access and query
– Ability to link to EML data sources
SDSC/LTER Workshop Feb’2004
88
KEPLER Core Capabilities (2/2)
• Data transformation actors to link heterogeneous
data
• Executing scientific workflows
– Distributed and/or local computation
– Various models for computational semantics and
scheduling
– SDF and PN: Most common for scientific workflows
• External computing environments:
– C++, Python, C (… Perl--planned ...)
• Deploying scientific tasks and workflows as web
services (… planned …)
SDSC/LTER Workshop Feb’2004
89
The KEPLER GUI (Vergil)
Drag and drop utilities, director
and actor libraries.
SDSC/LTER Workshop Feb’2004
90
Running the workflow
SDSC/LTER Workshop Feb’2004
91
Distributed SWFs in KEPLER
• Web and Grid Service plug-ins
– WSDL, GWSDL
– ProxyInit, GlobusGridJob, GridFTP, DataAccessWizard
• WS Harvester
– Imports all the operations of a specific WS (or of all
the WSs in a UDDI repository) as Kepler actors
• WS-deployment interface (…ongoing work…)
• XSLT and XQuery transformers to link non-fitting
services together
SDSC/LTER Workshop Feb’2004
92
A Generic Web Service Actor
Given a WSDL and
the name of an
operation of a web
service, dynamically
customizes itself to
implement and
execute that method.
Configure - select service
operation

SDSC/LTER Workshop Feb’2004
93
Set Parameters and Commit
Set parameters
and commit
SDSC/LTER Workshop Feb’2004
94
WS Actor after Instantiation
SDSC/LTER Workshop Feb’2004
95
Web Service Harvester
• Imports the web services in a repository
into the actor library.
• Has the capability to search for web
services based on a keyword.
SDSC/LTER Workshop Feb’2004
96
Composing 3rd-Party WSs
Output of previous
web service
User interaction &
Transformations
SDSC/LTER Workshop Feb’2004
97
Input of next
web service
GEON Kepler Examples
• Geon Classifier (Efrat)
A workflow for classifying igneous rocks.
• Geologic Map Information Integration
A workflow for map rendering using web
services(created by Ilkay and Ashraf).
• Database Access (Efrat)
Generic actors for connecting and querying
a database.
SDSC/LTER Workshop Feb’2004
98
Problem Description
• Classification of Igneous rocks
• Data sets
– Virginia rock database (provides mineral
composition).
– Igneous rock diagrams and a transition table for
traversing between diagrams.
• Method
– Iterations of finer descriptive levels using a point-inpolygon algorithm.
SDSC/LTER Workshop Feb’2004
99
British Classification of Igneous Rocks
SDSC/LTER Workshop Feb’2004
100
Mineral Classification of Igneous Rocks
• Inputs:
– A row id from the Virginia rock database (contains
mineral composition).
– A dataset of diagrams for classification.
• Outputs:
– The rock name.
– A browser display of each classification level. A new
feature added in Kepler.
• Execution:
– Divided into levels. Each provides a finer level of
granularity.
– At each level, a point is classified within a diagram
using a PointInPolygon algorithm.
SDSC/LTER Workshop Feb’2004
101
Classifying with Kepler
Extract mineral
composition for row Id.
Igneous Rock Diagrams
information.
SDSC/LTER Workshop Feb’2004
Rock Name.
102
Classifying with Kepler
SDSC/LTER Workshop Feb’2004
103
Classifying with Kepler
Finer granularity
Extracted from the mineral
composition and this level’s
diagram coordinates.
Diagrams information and
transitions between them.
Classifier: Locates
the point’s region.
SVG to polygons.
Displays the point in the
diagram for this level.
SDSC/LTER Workshop Feb’2004
104
SDSC/LTER Workshop Feb’2004
105
Geologic Map Integration
• Ontology-enabled Map Integration (OMI)
– Integration of Heterogeneous Geological Datasets
• Data sets
– State geology map datasets
(rocky mountain area)
– State boundaries and coast lines.
• Rock Type Ontologies
SDSC/LTER Workshop Feb’2004
106
SWF Designed in Kepler
SDSC/LTER Workshop Feb’2004
107
DataMapper Sub-Workflow
SDSC/LTER Workshop Feb’2004
108
Result launched via the BrowserUI actor
SDSC/LTER Workshop Feb’2004
109
Providing DB Access through Kepler
• Database connection actor:
– Opening a database connection and passing it to all actors
accessing this database.
• Database query actor:
– A generic actor that queries a database and provides its
result.
• DBConnection type and DBConnectionToken:
– A new IOPort type and a token to distinguish a database
connection from any general type.
Database Connection Actor
OpenDBConnection actor:
• Input: database connection information.
• Output: A DBConnectionToken, a reference to a
database connection instance, through a
DBConnection output port.
Database Query Actor
Database Query actor:
Input: A query string (SQL) and a database connection
reference.
Parameters: output type – XML, Record or String.
output each row separately or all at once.
Process: Execute query. Produce results according to
parameters.
Querying Example
KEPLER and YOU
• Kepler …
– is a community-based, crossproject, open source collaboration
– uses web services as basic building
blocks
– has a joint CVS repository, mailing
lists, web site, …
– is gaining momentum thanks to
contributors and contributions
• BSD-style license allows
commercial spin-offs
– a pre-packaged, shrink-wrapped
version (“Kepler-to-GO”) coming
soon to a place near you…
SDSC/LTER Workshop Feb’2004
114