Transcript RDF Part II

Part II
Reification
 We can make statements about the RDF statements
themselves. This can be used to annotate information
 In science, it is common to quote someone, or provide
provenance or date stamp information, like who conducted
certain experiment or simulation, and when it was done
 Explicit reification, which is used in database modeling, is
also used in RDF to write more sophisticated statements
about other statements using built-in vocabulary
 This is done by first making a reified model of the
statement, with type, subject, predicate, and object
properties
 We make a new resource to represent the entire statement
RDF Reification vocabulary
 Reification is done in RDF by using the following qualified names
to annotate the statement: rdf : Statement (resources that are
statement), and rdf : subject, rdf : predicate, and rdf : object
properties
 For example, if we want to say that “Bill Fritz says that Dinwoody
Formation formed in Triassic”, we do it by first assigning a qualified
name to the statement, such as q : n1, and then use it in the
reification quad statements:
q : n1
rdf : type
rdf : subject
rdf : predicate
rdf : object
rdf : Statement;
strat : Dinwoody;
strat : formed-in;
time : Triassic.
Person : Bill Fritz
s : says
q : n1
i.e., the statement n1, which is an rdf statement, the subject, predicate,
and objects of which are given by the three qualified names, and that
Dr. Fritz made this statement. This statement is using a bnode.
rdf : Statement
rdf : subject
attributed-to
says
strat : Dinwoody
strat : formed-in
Bill Fritz
time : Triassic
Alternative way to reify it
 Bill Fritz says that Dinwoody Formation
formed in Triassic
Fritz
S
S
S
S
says
rdf:type
rdf:subject
rdf:predicate
rdf:object
S
rdfs:Statement
DinwoodyFormation
formedIn
Triassic
SPARQL
 SPARQL (pronounced sparkle) is the standard RDF
query language
 SPARQL uses variables for the subject, predicate, and
object of an RDF triple
 The queries are made of parts called ’triple pattern”,
which has variables represented by a letter preceded
by a question mark (?), e.g., ?x.
SPARQL Queries, Example
 Which epoch precedes Miocene (Oligocene)
?x
time : precedes
time : Miocene.
 Which minerals are part-of granite (quartz, feldspars, micas)
petr : Mineral
?y
petr : Granite.
 Pollutant pollute which aquifer?
hydro : Pollutant
hydro : pollute
?z.
 The SPARQL engine needs the ontologies (in this case, Time,
Petrology, and Hydrogeology) to return the associated
responses to these queries
Graph Pattern Query
 A graph pattern query (given within {} braces) is the one with a set of triple
patterns.
 For example the following two triples:
 Which orogeny deformed (tect: namespace) the Tertiary system (strat:
namespace)?
 Zagros orogeny (tect: namespace) formed (strat: namespace) which
mountain range?
 The set of two triples are given in N3 as:
{?orogeny
tect : ZagrosOrogeny
tect : deformed
struc : formed
strat : TertiarySystem
?MtRange}
 For these queries to work, all the triple patterns must match the nodes and
edges of the ontologies in these namespaces!
Inferencing
 The Semantic Web languages allow explicit expression
of the relationship between classes of objects
strat: Triassic
partOf
strat: Mesozoic
 Compared to databases, which require programming
to drive data from complex hierarchical structures,
these languages allow smarter integration and
connection of data, making it easier to query and use
the data
What is Inferencing?
 The Semantic Web languages provide ‘inferencing’, meaning that we
can derive other related [unstated] information from a set of stated
information
 The mechanisms for inference are provided in the language
constructs, like rdfs:subclassOf, which make ‘inference-based
semantics’ possible
 Through inferencing, we should be able to query a broader (general)
term (e.g., Fault Rock) and get information about their narrower
(specialized) subclass terms that extend it, e.g.,
Mylonite
subClassOf
FaultRock
If we know FaultRock isA Rock, and Rock is Solid, and
Solid isNot Liquid, then we can infer that Mylonite is Solid, and
Mylonite isNot liquid.
Note: isNot is modeled by saying that Liquid disjointWith Solid
 The Web Ontology Language (OWL) provides formal
…
meaning to its constructs such as
rdfs: Class and rdfs : subClassOf
C’
 It is inferred from the language that:
if C is a subClassOf C’, then
every member x of class C is also a member
of class C’
y
C
 For example, if the Idaho batholith is a Batholith,
x
and Batholith rdfs: subClassOf IgneousBody, then
IdahoBatholiths rdfs:subclassOf IgneousBody
 So, if we search for igneous bodies in general, we may be
offered information about the narrower Batholith term,
and data about the Idaho batholiths may be provided
Type Propagation Rule
 The ‘type propagation rule’ gives the definition of the
meaning of the C subClassOf C’ statement:
IF
?C
AND
?x
THEN
?x
C’
rdfs : subClassOf
?C’.
rdf : type
?C.
y
C
rdf :type
?C’.
x
 if C isA C’, and x is an instance of C, then x is an instance
of C’.
Example for inference
 If all porphyritic textures are igneous texture, and all igneous
textures are texture, and the individual texture1 is porphyritic:
 Applying predicate logic:
 If x is porphyritic texture, then x is igneous texture
PorphyriticTexture (x)  IgneousTexture (x)
Texture
 If x is igneous texture, then x is texture
IgneousTexture (x)  Texture (x)
Given the following two instances:
IgneousTexture (IgneousTexture1) and
PorphyriticTexture (PorphyriticTexture1)
IgneousTexture
IgneousTexture1
Then we infer the following unasserted facts:
IgneousTexture (PorphyriticTexture1)
Texture (IgneousTexture1)
Texture (PorphyriticTexture1)
PorphyriticTexture
PorphyriticTexture1
Multiple Subclassing
B
C
 The Web Ontology Language (OWL),
and its sub-languages
(RDF and RDFS), provide formal
constraint for the meaning of their
constructs to make inferencing from
combinations of terms possible
 Like object-oriented programming
A
x
Brittle
Ductile
(OOP) languages, multiple subclassing
(inheritance) exists in RDFS
 If A subClassOf B and A subClassOf C,
then if x is an instance (individual) of A, then
x is instances of both B and C
(which follows from the type propagation rule)
Semibrittle
x
Benefits of Inference Rules
 This inference-based semantics is very powerful for the
integration of heterogeneous data provided from
autonomous, distributed sources on the Web, and
making the distributed data useful
 The reason why inference rules make data, which are
constrained by the OWL constructs, more useful, is
that RDFS and OWL inferencing query engines, that
know OWL inference rules, will infer (during a query)
unasserted information from the directly asserted
triples in the RDF store
Assume the triple store contains two asserted RDF triples
struc : FaultRock
struc : Mylonite
rdfs : subClassOf
rdf : type
petr : Rock
struc : FaultRock
 Suppose the following SPARQL code queries the
triple store, and wants to find out about
things that are of type Rock, which is
defined in the ‘petr’ namespace
?x
rdf : type
Rock
FaultRock
petr : Rock .
Mylonite
 Despite the fact that there is no triple for the
struc:Mylonite subject, with predicate rdf:type and object
petr:Rock in the above asserted triples, the query will return (in
addition to the stated ?x = struc : FaultRock ) the following
inferred result using the rdfs inference query engine:
?x = struc : Mylonite
Inferred Triples
 Inference engines, applying their set of inference rules
return unasserted, inferred triples from asserted
triples
 The inferred triples may or may not be saved in the
triple store, and may be generated only at the time of
querying
Example
 The following diagram shows the hierarchy
of the pyroxene minerals in the min : Mineralogy ontology
 This means that Diopside isA Pyroxene, and
Pyroxene isA Silicate, and Silicate isA Mineral
Inferred Triples
 Given the following asserted triples:
min : Diopside rdf : type
min : Pyroxene rdf : type
min : Silicate
rdf : type
min : Pyroxene
min : Silicate
min : Mineral
 We can derive the following inferred triples using the type
propagation rule on the asserted triples:
min : pyroxene rdf : type
min : diopside rdf : type
min : diopside rdf : type
min : Mineral
min : Silicate
min : Mineral
RDF and Relational Database
 Every statement in RDF is like a value in a cell of a
database table which requires three values for its
complete representation:
Table
p
 a row identifier (subject, s)
s
o
 a column identifier (predicate, p)
 the value in each table cell (object, o)
 Note: for a 3x3 table, we have 9 triples!
 Recall that we refer to the ‘subject-predicate-object’
statement as a ‘triple’
Triples: Building blocks for RDF
 Subject (S) is the thing for which we are making the
statement.
In this case it is the record, i.e., row
p
s
 Predicate (P) is the property for the
subject entity in the row
 In this case it is the column or field
 Object (O) is the value for the property at the cell
o
Data Federation
 RDF is designed for data federation of any kind
(database, spreadsheet, XML), originated from multiple
sources
 These data can be converted into a set of triples and put in
the RDF data store (federated graph), ready to be queried
 In the RDF triple: ‘Course instructor Babaie’, course is
the subject, instructor is the predicate, and Babaie is the
value for the instructor:
Subject
Course
Predicate
instructor
Object
Babaie
Directed Graph
 An RDF store commonly has more s
p1
p2
p3
o1
o2
o3
than one triple referring to the
same subject (S), i.e., 1 s, many o’s
 The picture is shown for one row only!
 This translates to one row, (i.e., record)
of a relational database table
with multiple fields (columns)
s
p1
p2
p3
o1
o2
o3
 This leads to the ‘directed graph’, which shows triples as
‘edges’ (labeled by predicates) radiating from one subject
‘node‘ to different object nodes
Sample Table
p1
p2
sampleID
lithology
type
p3
purpose
S1
N235
basalt
powder
K-Ar dating
S2
N300
granite
chip
thin section
Directed Graph only
shown for N235
Investigator
takes
basalt
lithology
SampleID
N235
purpose
type
powder
K-Ar
dating
URI (Uniform Resource Identifier)
 Merging a distributed group of directed groups requires
mapping nodes in each graph
 Even if nodes in different graphs have the same name, it is not
guaranteed that the nodes are from the same resource!
 To make matching of the nodes possible, we need to use the URI
(Uniform Resource Identifier), which is a superclass of the URL
(every URL is a URI, but not the other around).
 A URI is a global identifier for a resource (has information about
server name, protocol, port number, file name) which is required
for a global networking
 URI refers to either a Web name or a location, compared to
the URL which only refers to a Web location
URI Prefix
 Nodes from two graphs can be merged if they have the
same URI
 We use a prefix to represent the long URI strings, e.g.,
‘geochem’ and ‘struc’ can represent the Geochemistry and
structural geology prefixes which may have a URI:
http://www.usgs.org/ontologies/Geochemistry.owl#
http://www.usgs.org/ontologies/StructuralGeology.owl#
 If the Geochemistry or Structural Geology ontology has a
class called Analysis or Foliation, respectively, we designate
them as:
geochem : Analysis
struc : Foliation
Default Namespace
 If there is only one (default) namespace, we show the
class name with a colon followed by the class name
(e.g., : Fracture).
 OWL, RDF, RDFS, and XSD have their own standard
namespace
 Thus, rdf : type is a typing construct in the rdf
namespace. Here are some more:
struc : Fold
geochem : oxidize
rdf : type
rdf : type
struc : Structure
rdf : Property
Relational
database
tables and RDF
Record
ID
s1
p1
o11
p2
o12
p3
o13
Record
s2
o21
o22
o23
 Rows in a relational table represent a single record
 Each record maps to an individual entity
 This means that each row should have a unique URI,
which in the database is represented by the unique
identifier (ID column, the primary key)
Relational
Database to
RDF Graph
Geochem : Sample
ID
lithology type purpose
1
2
 The best practice is to design a URI for the table, with a prefix:
xmlns : geochem =
http://www.gsi.ir/ontologies/geochemistry.owl#Sample
 We identify each row by concatenating the table name (Sample)
with the ID of each row, for example,
geochem : Sample1, geochem : Sample2, etc.
 To make the fields also unique, we concatenate the table name
(Sample) with the column name, like:
geochem : Sample_lithology,
geochem : Sample_type, geochem : Sample_purpose
Example for RDB to RDF
 Notice that, during conversion of a relational table to
RDF, each cell in the table converts into one RDF triple
 In the table in the next slide, we have:
7 rows and 5 columns,
which lead to 35 triples
 Note: Only triples for two samples are shown!
Geochem:Sample
lithology
Geochem:Sample
analysis
Geochem:Sample
location
Geochem:Sample
Number
Geochem : Sample
ID
number
location
analysis
lithology
1
N122
Neyriz
REE
Gabbro
Geochem:Sample1
2
N150
Neyriz
Trace element
Pyroxenite
3
Z338
Zabol
Pb Isotope
Basalt
4
R120
Rasht
Sr Isotope
Granite
5
S214
Sabzevar
XRD
Gabbro
6
R123
Rasht
XRD
Granite
7
S220
Sabzevar
Major oxides
Dunite
Geochem:Sample2
Geochem:Sample3
Geochem:Sample4
Geochem:Sample5
Geochem:Sample6
Geochem:Sample7
Relational database (RDB) to RDF
 Fields (columns) of the table become properties (predicate):
geochem : Sample_number
geochem : Sample_location
etc.
 Each row provides the subject, for example,
geochem : Sample1
geochem : Sample2
etc.
 The following table shows part of the RDF graph of the
previous Sample table in the Geochemistry database:
RDF triples for the Sample table in the
Geochemistry database (only 2 samples shown!)
Subject
geochem : Sample1
geochem : Sample1
geochem : Sample1
geochem : Sample1
geochem : Sample1
geochem : Sample2
geochem : Sample2
geochem : Sample2
geochem : Sample2
geochem : Sample2
…
Predicate
geochem : sampleId
geochem : sampleNumber
geochem : sampleLocation
geochem : sampleAnaysis
geochem : sampleLithology
geochem : sampleId
geochem : sampleNumber
geochem : sampleLocation
geochem : sampleAnalysis
geochem : sampleLithology
…
Object
1
N122
Neyriz
REE
Gabbro
2
N150
Neyriz
Trace Element
Pyroxenite
…
In this case, the objects are not class (object) resources.
Here they are literal values (i.e., string).
The type for each individual (i.e., each row) is the table (in this case,
Sample).
These types are also given in the RDF graph.
Subject
geochem : Sample1
geochem : Sample2
geochem : Sample3
geochem : Sample4
geochem : Sample5
geochem : Sample6
geochem : Sample7
Predicate
rdfs : type
rdfs : type
rdfs : type
rdfs : type
rdfs : type
rdfs : type
rdfs : type
Object
geochem : Sample
geochem : Sample
geochem : Sample
geochem : Sample
geochem : Sample
geochem : Sample
geochem : Sample