rdb2rdf_sssw11 - Department of Computer Science

Download Report

Transcript rdb2rdf_sssw11 - Department of Computer Science

Relational Databases to RDF
(a.k.a RDB2RDF)
Juan F. Sequeda
Dept of Computer Science
University of Texas at Austin
I want RDF… but my data is in RDB!
2
Why RDB2RDF?
• Semantic Web
– Deep Web is 500 times bigger than Static
Web (2008)
– Where do you think that the majority of the
data is stored?
– If we want a Semantic Web, we need data to
be on the web as RDF and interlinked!
• Where do you think this data is going to come
from?
RDB
RDB
RDB
RDB2RDF
RDB2RDF
RDB
RDB2RDF
RDB
RDB
Why RDB2RDF?
• Data Integration
– Do you know why RDF is cool?
• because it’s a graph!
– How do link/integrate two different graphs?
• add edges between nodes or merge nodes!
Real world scenario
• Boss: Find me clients that are based in
cities who have a population less than 1
million?
• You: ???
id
Clients
Name
c_id
Locations
c_id city
state
10
ACME Inc
20
20
Austin
TX
11
Foo Bars
21
21
Dallas
TX
Real world scenario
• You: I found the population
information… but it’s in a different
database. Can you add a column to the
Location table in order to insert the new
data?
• DBA: NO!
id
Clients
Name
c_id
Locations
c_id city
state
10
ACME Inc
20
20
Austin
TX
11
Foo Bars
21
21
Dallas
TX
Location
state pop
id
city
1
Austin
TX
790390
2
Dallas
TX
1197816
ACME Inc
ex:name
Austin
http://db1/cl
ient10
rdf:type
ex:city
ex:basedIn
Austin
ex:state
rdf:type
Dallas
http://db1/
client11
ex:city
ex:basedIn
ex:state
ex:pop
http://db2/loc1
TX
Dallas
ex:state
ex:state
http://db2/loc2
id
Clients
Name
c_id
Locations
c_id city
state
10
ACME Inc
20
20
Austin
TX
11
Foo Bars
21
21
Dallas
TX
790390
TX
ex:city
http://db1/loc21
Foo Bars
TX
ex:city
http://db1/loc20
ex:Client
ex:name
TX
ex:pop
1197816
Location
state pop
id
city
1
Austin
TX
790390
2
Dallas
TX
1197816
ACME Inc
ex:name
TX
Austin
http://db1/cl
ient10
rdf:type
ex:city
ex:state
ex:basedIn
ex:pop
http://db2/loc1
ex:Client
rdf:type
Dallas
http://db1/
client11
ex:name
ex:city
ex:basedIn
TX
ex:state
ex:pop
http://db2/loc2
Foo Bars
id
Clients
Name
c_id
Locations
c_id city
state
10
ACME Inc
20
20
Austin
TX
11
Foo Bars
21
21
Dallas
TX
790390
1197816
Location
state pop
id
city
1
Austin
TX
790390
2
Dallas
TX
1197816
A bit of history
• Relational Databases on the Web. TimBL,
1998
• W3C Workshop on RDF Access to Relational
Databases, October 2007
– Report: http://www.w3.org/2007/03/RdfRDB/report
• W3C RDB2RDF Incubator Group, 2008-2009
– Survey:
http://www.w3.org/2005/Incubator/rdb2rdf/RDB2RDF_Survey
Report.pdf
• W3C RDB2RDF Working Group, 2009 –
today
– R2RML: RDB to RDF Mapping Language
– A Direct Mapping of Relational Data to RDF
RDB and the Semantic Web
RIF
OWL
RDFS
RDF
12
RDB and the Semantic Web
TRIGGERS
CONSTRAINTS
TABLE
DEFINITION
RELATIONAL
MODEL
13
RDB and the Semantic Web
TRIGGERS
RIF
CONSTRAINTS
OWL
TABLE
DEFINITION
RDFS
RELATIONAL
MODEL
RDF
14
Overview
R2RML: RDB to RDF Mapping Language
• Language for expressing customized
mappings from relational databases to
RDF datasets
• Give precise control to the developer
– You create the structure you want
– You choose the target vocabulary
• No RDFS/OWL is created from the
schema
16
R2RML Mapping
RDB
R2RML
manual
RDF
Direct Mapping
• Automatic transformation from Relational Database to
RDF
– Click a button… Voila!
• Generate RDFS/OWL of the database schema
• If this doesn’t get you where you want…use existing
languages for mapping
– RDF to RDF with RIF or SPARQL Construct
• Semantic Web community
– Create SQL Views and directly map those
• Database community
18
Direct Mapping
RDB
Direct
Mapping
automatic
SQL Views
RDF
RIF/
SPARQL
Construct
RDF
Hybrid
• Instead of starting from a blank R2RML
file…
• 1) Direct Mapping
• 2) Manual Editing
20
Hybrid Mapping
Direct
Mapping
Direct
Mapping
in R2RML
Modify
RDB
R2RML
RDF
Materialize Triples
• Data is not dynamic
• Dump RDB into RDF and then insert into
triplestore
• RDF dump may not be consistent with
RDB
22
Materialized Triples
SPARQL
RDB
Dump
RDF
Virtual Triples
• Data is dynamic
• Need to query RDB with SPARQL
• Translate SPARQL to SQL
– Comparing the overall performance […] of the fastest rewriter with the
fastest relational database shows an overhead for query rewriting of
106%. This is an indicator that there is still room for improving the
rewriting algorithms [Bizer and Schultz 2009]
– Current rdb2rdf systems are not capable of providing the query
execution performance required [...] it is likely that with more work on
query translation, suitable mechanisms for translating queries could be
developed. These mechanisms should focus on exploiting the
underlying database system’s capabilities to optimize queries and
process large quantities of structure data [Gray et al. 2009]
– Ultrawrap solves this 
• RDF data is consistent with RDB data
24
Virtual Triples
SPARQL
RDB
Mapping
RDF
RDB2RDF Space
Materialized
Triples
Direct
Mapping
Hybrid
Custom
Mapping
Virtual
Triples
Tuples to Triples
PREDICATE
SID
NAME
AGE
Alice
25
SUBJECT
1
OBJECT
2
Bob
26
http://ex.com/person1
http://ex.com/age
25
Current Status of W3C RDB2RDF WG
• R2RML: RDB to RDF Mapping Language
Working Draft
http://www.w3.org/TR/r2rml/
• A Direct Mapping of Relational Data to RDF
Working Draft
http://www.w3.org/TR/rdb-direct-mapping/
• Last Call: Sept 1 (hopefully)
28
Implementations
• Ultrawrap
– SPARQL and semantically equivalent SQL have equal
execution time
– Commercial databases
– http://ribs.csres.utexas.edu/ultrawrap
• Spyder
– Oracle and HSQLDB
– http://www.revelytix.com/content/spyder
• Other non-standard RDB2RDF
– D2R Server, Virtuoso, Triplify, …
29
Publicity
• International Semantic Web Conference
– Oct 23 – 27 in Bonn, Germany
• Posters and Demos
– August 15
• Consuming Linked Data Workshop
– August 15
• Outrageous Ideas Track
– Sept 5
• Semantic Web Challenge
– Sept 30
Join the
Facebook group
SSSW2011
• 2nd Linked Data-a-thon
– Oct 1
http://iswc2011.semanticweb.org/
30
Thank You
Acknowledgments:
- RiBS @ UT Austin
- W3C RDB2RDF WG members
- David McNeil - Revelytix
@juansequeda