ontology - geongrid

Download Report

Transcript ontology - geongrid

1
Ontology Enabled Data Discovery and Integration
Kai Lin
San Diego Supercomputer Center
University of California, San Diego
A. K. Sinha, Z. Malik, A. Rezgui, A. Dalton
Virginia Tech
2
Motivations
• A better way to discover and understand datasets
Use the knowledge in ontologies to find datasets
• A better way to query datasets
Query through ontologies without knowing the schemas
• A better way to integrate multiple datasets
Integrate multiple datasets on-the-fly if they are mapped to
ontologies
3
What Is Ontology
unambiguous definition
of all concepts, attributes
and relationships
conceptual model
of a domain
A formal, explicit specification of a shared conceptualization
machine-readability
commonly accepted
understanding
4
Why Represent Domain Knowledge as Ontology
•
•
•
•
Separate domain knowledge module from the operational module
Configurable knowledge module
Share and reuse domain knowledge
Analyze domain knowledge
5
What’s Inside An Ontology?
• Concepts: Classes + Class-hierarchy
– instances
• Properties: often also called “Roles” or “Slots”
– labeled instance-value-pairs
• Axioms/Relations:
–
–
–
–
relations between classes (disjoint, covers)
inheritance (multiple? defaults?)
restrictions on slots (type, cardinality)
Characteristics of slots (symm., trans., …)
• reasoning tasks:
– Classification: Which classes does an instance belong to?
– Subsumption: Does a class subsume another one?
– Consistency checking: Is there a contradiction in my axioms/instances?
6
Resource Description Framework (RDF)
XML Schema is not enough for semantics
• only describe Grammar, i.e. syntax of single documents
• can not express inheritance for concepts
• no means to express complex integrity constraints
• in an unambiguous way
The author of ‘page.html‘ is Peter Morris
<document href=”page.html”>
<author>Peter Morris</author>
</document>
What is the “correct” way of expressing it?
<author>
<fistName>Peter</fistName>
<lastName>Morris</lastName>
<documents>
<uri>page.html</uri>
</documents>
</author>
Resource Description Framework (RDF)
an infrastructure for the encoding, exchange and reuse of structured metadata
7
RDF Idea
RDF is intended to provide a simple way for making statements about resources
Resources objects that are uniquely identified by an URI (Uniform Resource
Identifier)
• Anything can have a URI.
• an entire Web page,
• a whole collection of pages e.g. an entire Website,
• object that is not directly accessible via the Web such as a printed book.
Property a specific aspect, characteristic, attribute, or relation used to describe a
resource has a specific meaning, defines its permitted values
• Lives-In, CarColor, WorkFor, HasA, IncludedIn, hasAuthor…
Statement a specific resource together with a named property plus the value of
that property for that resource. Each RDF statement can be written down as a
triple (Subject, Property, Object) or a graph
Value
property
Resource
Resource
8
A RDF Example
http://www.polleres.net/page.html
http://purl.org/dc/elements/1.1/creator
creationDate
http://purl.org/dc/elements/1.1/language
April 1,2004
English
http://www.polleres.net/peter
<?xml version="1.0"?>
<rdf:RDF xmlns:rdf = “http://www.w3.org/1999/02/22-rdf-syntax-ns#”
xmlns:dc = “http://purl.org/dc/elements/1.1/”>
<rdf:Description rdf:about = “http://www.polleres.net/page.html”>
<dc:creator>
<rdf:Description rdf:about = “http://www.polleres.net/peter”>
<hasName>Peter Morris</hasName>
</rdf:Description>
</dc:creator>
</rdf:Description>
</rdf:RDF>
hasName
Peter Morris
9
A General RDF Format
value of
property-A
value of
property-B
<?xml version="1.0"?>
<Resource-A>
<property-A>
<Resource-B>
<property-B>
<Resource-C>
<property-C>
Value-C
</property-C>
</Resource-C>
</property-B>
</Resource-B>
</property-A>
</Resource-A>
Convention:
• A capital letter to start a type (class) name
• A lowercase letter to start a property name
10
RDF Schema (RDFS)
RDFS is a simple ontology language
Core Class
Core Property
•
•
•
•
•
•
•
•
•
•
•
•
•
•
rdfs:Resource
rdfs:Literal
rdf:XMLLiteral
rdfs:Class
rdfs:Property
rdfs:DataType
rdfs:Container
rdf:type
rdfs:subClassOf
rdfs:subPropertyOf
rdfs:domain
rdfs:range
rdfs:label
rdfs:comment
• RDF: triples for making assertions about resources
• RDFS extends RDF with “schema vocabulary”, e.g.:
– Class, Property
– type, subClassOf, subPropertyOf
– range, domain
 representing simple assertions, taxonomy + typing
11
RDFS Example
Resource
Class
subClassOf
type
Vehicle
subClassOf
Property
type
producedBy
Company
subClassOf
LandVehicle
SeaVehicle
subClassOf
subClassOf
numberOfEngine
HoverVehicle
Number
12
Limitations of RDFS
• RDFS too weak to describe resources in sufficient detail:
– No localised range and domain constraints
• Can’t say that the range of hasChild is person when applied to persons
and elephant when applied to elephants
– No existence/cardinality constraints
• Can’t say that all instances of person have a mother that is also a
person, or that persons have exactly 2 parents
– No transitive, inverse or symmetrical properties
• Can’t say that isPartOf is a transitive property, that hasPart is the
inverse of isPartOf or that touches is symmetrical
– No in/equality
• Can’t say that a class/instance is the same as some other
class/instance, can’t say that some classes/instances are definitely
disjoint/different.
– No boolean algebra
• Can’t say that that one class is the union, intersection, complement of
other classes, etc.
13
OWL Language - Overview
•
•
Three species of OWL
– OWL DL stays in Description Logic fragment
– OWL Lite is “easier to implement” subset of OWL DL
– OWL Full is union of OWL syntax and RDF
OWL DL based on Description Logic
– In fact it is equivalent to SHOIN(Dn) DL
•
OWL DL Benefits from many years of DL research
– Well defined semantics
– Formal properties well understood (complexity, decidability)
– Known reasoning algorithms
– Implemented systems (highly optimised)
•
OWL full has all that and all the possibilities of RDF/RDFS which destroy
decidability
Full
DL
Lite
14
OWL Layers (Lite, DL, Full)
• OWL Light
•(sub)classes, individuals
•(sub)properties, domain, range
•intersection
•(in)equality
•cardinality 0/1
•datatypes
•inverse, transitive, symmetric
•hasValue
•someValuesFrom
•allValuesFrom
RDF Schema
Full
DL
Lite
•OWL DL
•Negation (disjointWith, complementOf)
•unionOf
•Full Cardinality
•Enumerated types (oneOf)
• OWL Full
• Allow meta-classes etc
15
Ontology Inconsistency
• You may define Classes were no individual can fulfill its
definition. Via reasoning engines such a definition can be
found also in big ontologies.
–
–
–
–
Cow ≡ Animal ⊓ Vegetarian
Sheep ⊑ Animal
Vegetarian ≡ eats  Animal
MadCow ≡ Cow ⊓ eats.Sheep
16
Open/Close World Assumption
Close World Assumption
– The fact in the ontology describe completely what I know, all that is
not in the ontology is assumed to be false..
Open World Assumption (used in OWL)
– There are something not described by the ontology
An ontology says:
There is a train at 14:00
There is a train at 15:00
Is there a train at 17:00?
no by Close World Assumption
unknown by Open World Assumption
17
Resource Discovery in GEON
• A Resource Registration System for Data Providers
– Register ontologies (domain knowledge)
– Register datasets with metadata including data access information
– Optionally register datasets to ontologies (which is crucial for data integration
and smart search)
• A Search Engine for Data Users
–
–
–
–
Metadata based search
Spatial coverage based search
Temporal coverage based search
Concept based search
• Both are available through a public portal on the web
18
GEON Data Registration System
Resource
Metadata
Resource Registration System
SRB
Metadata
(ADN)
Metadata
(ADN)
Metadata
(ADN)
Metadata
Excel
(ADN)
GeoTIFF
Shapefile
Catalog
General Information
Subjects
Format
Keywords
Spatial coverage's
Temporal coverage's
…………
GEON Search
Resource Schemas
Ontology Annotations
Access Control
Integrated Resources
Log
19
Database Registration
GEON JDBC Driver
Application
GEON Mediator
Table
Original Database
Table
Table
select tables and
views to register
Published Database
View
Table Def
Table
View
Table Def
View Def
20
Write Protection
Database
Mediator
C
UPDATE B
B
B
A
• Only accepts SELECT statements
• Rejects any requests other than SELECT
21
Read Protection on Unregistered Tables and Views
Database
Mediator
C
B
SELECT *
FROM A
B
A
An unregistered table or view is invisible to an end user
• The data in the table can’t be viewed by SELECT statement
• The schema can’t be fetched
22
Item Level Ontological Data Registration for Discovering
Ontology: Dataset Properties
GeometricalObject_2D
Polygon
Surface
Circle
Rectangle
mentions
uses
has instances
The search engine uses ontologies to
find more results, for example, the
fact that Polygon is a subclass of
GeometricalObject is used in the
searching.
Search for GeometricalObject_2D
Return datasets associated with Polygon
23
Data Integration Challenges: Heterogeneities
• Syntactical Heterogeneity
heterogeneous data format
e.g. 02-04-2004 vs. 02/04/04
• Structural Heterogeneity
heterogeneous data models and schemas
e.g. 02-04-2004 is saved as three columns or one columns
• Semantics Heterogeneity
fuzzy metadata, terminology, “hidden” semantics, implicit assumptions
GEON Preferred Solution:
• Datasets are semantically registered first
• Heterogeneities is resolved by registration
24
Database Integration
Integration at three levels
Level 1: Federation Based Integration
• Users should be knowledgeable to each databases
Level 2: View Based Integration
• The intended users are somebody who want to do integration for
others or make integration results reusable
Level 3: Ontology Based Integration
• The easiest way for end users
25
Level 1: Federation Based Integration
• Use SQL to query the federated database
• Structural and semantic heterogeneity should be
solved by users themselves
backend
Mediator
A
B
A
B
C
D
C
D
SELECT * FROM A, E WHERE ……
backend
E
E
F
F
G
G
26
Level 2: View Based Integration
• Allow defining views on top of the federated databases
• Allow hiding the original backend schemas
• Integration results can be shared and reused
Mediator
backend
A
B
C
D
A
C
B
D
V
backend
E
F
E
F
G
G
W
SELECT * FROM V, W WHERE ……
27
Level 3: Ontology Based Integration
• Require ontology annotations for backend databases
• Use simple ontology query language to query the integrated database
• Users don’t need know the backend schemas and local semantics
Mediator
backend
A
B
C
D
A
C
B
D
Ontology Based Query
backend
E
F
E
F
G
G
28
Ontology Enabled Data Integration
• Ontology Enabled Semantic Integration
Ontology1
dataset1
Ontology2
dataset2
ontology3
dataset3
dataset4
Challenges for Computer Scientists and Domain Scientists
– Computer Scientists: build an integration system based on the
ontological registration of datasets
– Domain Scientists: create domain ontologies
– Data Providers: register datasets to ontologies
29
Ontological Data Registration for Data integration
• Registering a dataset to an ontology for data integration is a
procedure to generate a partial model of the ontology from the
dataset itself
individuals
dataset
From
registration
Not all the constraints in
the ontology are satisfied
by the generated individuals
ontology
p
30
Registering Relational Tables to Ontology Classes
• Associate one or more columns under an optional SQL condition to a
selected class in the ontology
Location
……
Latitude
……
23.5
……
……
Longitude
……
(23.5, 47.9) is the name of
an individual of the class
Location
……
Same name indicates the
same location
47.9
……
……
• Provide a mapping method if no explicit names of individuals should be
generated
RockSample
GeologicAge
……
GeologicalAge
Jurassic/Triassic
Precambrian
…………
Precambrian
Cenozoic
Paleozoic
31
Registering Tables to Ontology Object Properties
• Associate two entities which are already registered to the domain
class and the range class of a selected object property in the
ontology
Rock
hasAge
GeologicAge
……
RockSampleID
……
PERIOD
……
……
……
……
……
……
32
ODAL (Ontological Database Annotation Language)
• Create a partial model of ontologies from database
• Independent on any GUI
• Independent on any concrete implementations
• reusable
GUI
<odal:NamedIndividuals odal:id="RockSample"
odal:database="VTDatabase">
<odal:Class odal:resource="http://geon.vt.edu#RockSample" />
<odal:Table>Samples</odal:Table>
<odal:Table>RockTexture</odal:Table>
<odal:Table>RockGeoChemistry</odal:Table>
generate
<odal:Table>ModalData</odal:Table>
<odal:Table>MineralChemistry</odal:Table>
<odal:Table>Images</odal:Table>
<odal:Column>ssID</odal:Column>
</odal:NamedIndividuals>
to ODAL
processor
The values in the column ssID of the table Samples, RockTexture, RockGeoChemistry,
ModalData,MineralChemistry and Images represent instances of RockSample
33
ODAL: Import Ontologies
The Ontologies used for annotating a database can be imported as
follows:
<?xml version="1.0"?>
<odal:ODAL xmlns:rdf = “http://www.w3.org/1999/02/22-rdf-syntax-ns#”
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:odal = “http://www.sdsc.edu/odal#” >
<odal:Ontology>
<odal:Imports rdf:resource="http://www.library.org/Book.owl"/>
<odal:Imports rdf:resource="http://www.writer.org/Writer.owl"/>
</odal:Ontology>
……
</odal:ODAL>
34
ODAL: Database Connection Declaration
The target databases for making annotation is declared as follows:
<?xml version="1.0"?>
<odal:ODAL xmlns:rdf = “http://www.w3.org/1999/02/22-rdf-syntax-ns#”
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:odal = “http://www.sdsc.edu/odal#” >
……
<odal:Database odal:id="PublicationDatabase">
<odal:DatabaseProductName>Oracle<odal:DatabaseProductName>
<odal:DatabaseProductVersion>9.1.21<odal:DatabaseProductVersion>
<odal:Host>oracle.sdsc.edu</odal:Host>
<odal:Port>3456</odal:Port>
<odal:DatabaseName>Publications</odal:DatabaseName>
</odal:Database>
……
</odal:ODAL>
35
ODAL: Simple Named Individuals
Suppose the book ontology contains a class Book and the schema
Collection contains a table book-price with a column ISBN.
<odal:NamedIndividuals odal:id="BookInTableBookPrice"
odal:database="PublicationDatabase" >
<odal:Class odal:resource="http://www.amazon.com/Book.owl#Book"/>
<odal:Schema>Collections</odal:Schema>
<odal:Table>book-price</odal:Table>
<odal:Column>ISBN</odal:Column>
</odal:NamedIndividuals>
The statement says that each value in the column ISBN represents a book
individual.
odal:id gives a name to the declaration, and represents the set of the
individuals generated by the statement.
36
ODAL: Named Individuals from Multiple Columns
Suppose an ontology contains a class Location and a database table
Rock-Sample with two columns Latitude and Longitude.
<odal:NamedIndividuals odal:id="LocationInTableRockSample" >
<odal:Class odal:resource="http://www.usgs.org/Space.owl#Location"/>
<odal:Schema>California</odal:Schema>
<odal:Table>Rock-Sample</odal:Table>
<odal:Column>Latitude</odal:Column>
<odal:Column>Longitude</odal:Column>
</odal:NamedIndividuals>
The statement says that a pair of latitude and longitude gives a location
37
ODAL: Named Individuals with Conditions
<odal:NamedIndividuals odal:id="MaleEmployeeInTableEmployee" >
<odal:Class odal:resource="http://www.abc.com/Employee.owl#MaleEmployee"/>
<odal:Table>employee</odal:Table>
<odal:Column>EmployeeId</odal:Column>
<odal:Condition><![CDATA[ Gender=’M’ >]]</odal:Condition>
</odal:NamedIndividuals>
<odal:NamedIndividuals odal:id="FemaleEmployeeInTableEmployee" >
<odal:Class odal:resource="http://www.abc.com/Employee#FemaleEmployee"/>
<odal:Table>employee</odal:Table>
<odal:Column>EmployeeId</odal:Column>
<odal:Condition><![CDATA[ Gender=’F’ >]]</odal:Condition>
</odal:NamedIndividuals>
A condition in an odal:Condition element should be a boolean expression which is
valid to be used in any WHERE clauses of SQL queries
38
ODAL: Data Type Property Declaration
…
SSN
…
age
…
…
1234-56-7890
…
8
…
Person
hasAge
double
<odal:NamedIndividuals odal:id="PersonInTablePerson" >
<odal:Class odal:resource="http://www.foo.org/Person.owl#Person"/>
<odal:Table>Person</odal:Table>
<odal:Column>ssn</odal:Column>
</odal:NamedIndividuals>
<odal:OntologyProperty>
<odal:DatatypeProperty odal:resource="http://www.foo.org/Person.owl#hasAge"/>
<odal:Table>person</odal:Table>
<odal:Domain odal:resource="PersonInTablePerson" />
<odal:Range odal:resource="age" />
</odal:OntologyProperty>
39
Conditions for Joining from Different Resources
• Usually we don’t make join on individuals cross different resources
Rock
RockSampleID
RockID
10001
10001
…...
……
We don’t know whether 10001 represents the same rock in the two
resources. By default, we assume they are not.
• A set of datatype properties can be declared as a key for a class in the
ontology. We do join cross multiple resources based on keys.
e.g. { hasLatitude, hasLongitude}
can be declared as a key of Location
Two locations from different resources are same if they have the same
latitude and longitude
40
SOQL (Simple Ontology Query Language)
Query single or integrated resources
• via ontologies (i.e., high level logical views)
• independent on any physical presentation (i.e. schemas)
RockSample
location
hasSiO2
ValueWithUnit value
Location
lat
long
float
unit
string
GUI
generate
SELECT X.location.*;
FROM RockSample X
WHERE X.location.lat > 60
AND X.location.long > 100
AND X.hasSiO2.value < 30
AND X.hasSiO2.unit =‘weightPercetage’
to SOQL
processor
41
The Architecture of GEON Semantic Mediator
Oracle
DB2
SQL
Server
MySQL
PostgreSQL
PostGIS
Query Execution
Query
Optimization
Query
Planning
Internal Database
SQL Parser
Spatial SQL against federal schemas
Semantic Query Rewriter
Mediator JDBC Driver
SOQL
GUI
Portal or Application
SOQL
Parser
Ontology
Reasoner
ODAL Processor
OWL
SOQL Processor
ODAL
42
Question: Finding all seismic stations within 1 mile from railroads
GEON
SOQL
GUI
SELECT X.code, X.location.*
FROM SeismicStation X, Railroad Y
WHERE distance(X.location, Y.geometry) < 1
SOQL Processor
SELECT X2.stationcode, X2.lat, X2.lon
FROM railroads_of_the_united_states X1,
stationdatatable X2
WHERE distance(X1.the_geom, MakePoint(X2.lat, X2.lon)) < 1
Schema Mediator
distance(X1.the_geom, MakePoint(X2.lat, X2.lon)) < 1
SELECT X1.the_geom
FROM railroads X1
Railroad
shapefile
Seismic
Stations
SELECT X2.stationcode, X2.lat, X2.lon
FROM stationdatatable X2
WHERE bounding box condition
43
Questions?
44
How to Connect to GEON Databases
• Download GEON JDBC Driver
• Use the following code to create a connection
// load driver
Class.forName ("org.geongrid.jdbc.driver.Driver");
// set the mediator URL
String url = "jdbc:geon://geon01.sdsc.edu:2532/GEON-63cb404c-6038-11d9-a69f”;
// open the connection
Connection conn = DriverManager.getConnection(url, "geonuser", "geongrid");
GEON JDBC protocol
The host name and port number
of GEON Mediator
GEON ID
Note: the original account information is invisible to end users