- Tetherless World Constellation

Download Report

Transcript - Tetherless World Constellation

Rajashree Deka
Tetherless World Constellation
Rensselaer Polytechnic Institute


The majority of data underpinning the Web are
stored in Relational Databases (RDB).
Advantages:
 Secure and scalable architecture.
 Efficient storage.
 Reliability.

Disadvantages:
 Difficult to share data across large organizations
where different database schemata are used.
 Most importantly, there is no check on semantics.


Semantic web getting more mature, growing need
for RDF applications to access content of legacy
databases.
Compared to RDB, RDF is:
 More expressive.
 More easily processed and interpreted.
 Easily reasoned over by software agents.
 Need a way to make data in RDBMS available as
RDF.
In order to generate Semantic Web content from a
RDB, Tim Berners-Lee proposed a very direct
mapping:
 Each table in the RDB is a RDF class.
 Each field (column) name is a RDF property.
 Each record is a RDF node - an instance of the RDF
class and so can play the role of a subject or an object
in a RDF statement.

Semi-automatic generation of ontology from RDB
 Read all records, export as RDF triples.
 Mappings are direct, complex mappings do not usually
appear.
 Need to convert to RDF regularly.
 Does not allow the population of an existing ontology –
a BIG limitation!

Map existing RDB to an existing ontology
 Customize mapping according to existing ontology.
 Complex mappings can be implemented.


Provides an integrated environment for accessing
the content of non-RDF, relational databases as
virtual, read-only RDF graphs.
Using D2RQ we can:
 Query a non-RDF database using SPARQL queries.
 Access information in a non-RDF database using the
Jena API or the Sesame API.
 Access the content of the database as Linked Data over
the Web.
 D2RQ mapping language –
describes the relation
between ontology and RDB
 D2RQ engine – uses
mappings to rewrite Jena
and Sesame API calls to
SQL queries.
 D2R server - provides a
Linked Data view, a HTML
view for debugging and a
SPARQL Protocol endpoint
over the database.
D2RQ mapping language formally defined by
http://www4.wiwiss.fu-berlin.de/bizer/d2rq/0.1/
 D2RQ namespace is defined by
http://www.wiwiss.fu-berlin.de/suhl/bizer/D2RQ/0.1#
 Database compatibility:






Oracle
MySQL
PostgreSQL
Microsoft SQL Server
ODBC data sources (e.g. Microsoft Access) - mapping
generator and automatic detection of column types do not
work.
Two command line tools (only on Windows and Unix
systems ):

Mapping generator:




Analyzes database schema.
Generates a default mapping file.
Resultant D2RQ map is an RDF document in N3 format.
Mapping can be used as-is or can be customized.
 Dump script:
 Writes the content of the RDB into a single RDF file.
 Supported syntaxes are "RDF/XML" (the default),
"RDF/XML-ABBREV", "N3", "N-TRIPLE".
Ontology is mapped to a database schema using:


d2rq:ClassMaps – Represents a class or a group of
similar classes in the ontology. Specifies how
instances of the class are identified.
d2rq:PropertyBridges – A ClassMap has a set of
PropertyBridges which specify how the properties
of an instance are created.
# Table dataset (default mapping)
map:dataset a d2rq:ClassMap;
d2rq:dataStorage map:database;
d2rq:uriPattern
"dataset/@@dataset.dataset_id@@";
d2rq:class vocab:dataset;
d2rq:classDefinitionLabel "dataset";
.
map:dataset__label a d2rq:PropertyBridge;
d2rq:belongsToClassMap map:dataset;
d2rq:property rdfs:label;
d2rq:pattern "dataset #@@dataset.dataset_id@@";
.
map:dataset_dataset_id a d2rq:PropertyBridge;
d2rq:belongsToClassMap map:dataset;
d2rq:property vocab:dataset_dataset_id;
d2rq:propertyDefinitionLabel "dataset dataset_id";
d2rq:column "dataset.dataset_id";
d2rq:datatype xsd:int;
# Table dataset (customized mapping)
map:dataset a d2rq:ClassMap;
d2rq:dataStorage map:database;
d2rq:uriPattern "http://escience.rpi.edu/ontology/BCODMO/bcodmo/2/0/DeploymentDatasetCollection_@@dataset.
dataset_id@@";
d2rq:class bcodmo:DeploymentDatasetCollection;
d2rq:classDefinitionLabel "DeploymentDatasetCollection";
.
map:seeAlsoStatement a d2rq:PropertyBridge;
d2rq:belongsToClassMap map:dataset;
d2rq:property rdfs:seeAlso;
d2rq:uriPattern
"http://osprey.bcodmo.org/dataset.cfm?id=@@dataset.datase
t_id@@&flag=view";
.
map:hasIdentifier a d2rq:PropertyBridge;
d2rq:property bcodmo:hasIdentifier;
d2rq:belongsToClassMap map:dataset;
d2rq:column "dataset.dataset_id";
d2rq:datatype xsd:int;
.
map:dataset_dataset_id a d2rq:PropertyBridge;
d2rq:belongsToClassMap map:dataset;
d2rq:property bcodmo:hasParameter;
d2rq:refersToClassMap map:parameters;
d2rq:propertyDefinitionLabel "dataset dataset_id";
d2rq:join "dataset.dataset_id =
dataset_parameters.dataset_id";
d2rq:join "dataset_parameters.parameters_id =
parameters.parameters_id";
.


Customization is very direct in the case where
a class in the ontology is represented by a
table in the database.
Mapping is complicated or sometimes not
possible when a class in the ontology is not a
table in the database, but a record in a
database table.
Define primary keys wherever possible and
create indexes.
 Indicate directions in d2rq:joins.
 Set d2rq:autoReloadMapping to false
whenever not needed.
 Use hint properties:

 d2rq:valueMaxLength
 d2rq:valueRegex
 d2rq:valueContains
Performs reasonably well with basic triple patterns,
performance deteriorates when SPARQL features
such as OPTIONAL, FILTER and LIMIT are used.
 Does not have reasoning capability. Reasoning can
be added by using the D2RQ engine within Jena.
 Integration of multiple databases or other data
sources using D2RQ alone is not possible.
 Read-only, cannot perform INSERT, DELETE or
UPDATE operations.
 Cannot handle complicated database structures
like VIEWS.

Virtuoso RDF View:
 Uses table to class and column to predicate
approach.
 RDB data are represented as virtual RDF graphs.
 Customization of mapping possible.
 Triplify:
 Maps HTTP-URI requests to relational database
queries expressed in SQL.
 No SPARQL support.

R2O:
 XML based declarative mapping language.
 DartGrid Semantic Web toolkit:
 Provides a visual tool to define mapping.
 RDBToOnto
 User oriented tool that creates static mapping (RDF
dump).
 Asio Semantic Bridge for Relational Databases
(SBDR) and Automapper:
 Uses table to class approach.

Prof. Peter Fox
 Patrick West
 Eric Rozell
 Ankesh Khandelwal
 Evan Patton
