Part 1 - LSDIS
Download
Report
Transcript Part 1 - LSDIS
RDF languages and storages
part 1 - expressivness
Maciej Janik
Conrad Ibanez
CSCI 8350, Fall 2004
Outline
Comparison of RDF languages
RQL
Sesame implementation
SquishQL - bases for RDQL
Redland store
Sesame
Web-based architecture
Persistent RDF store
use of traditional DBMS
use of dedicated RDF triple storage
Database independent
Scalable architecture
Query engine that implements RQL
Sesame - architecture
Written in Java
Modules:
HTTP/SOAP handler
Admin module
Query module
Export module
Repository
Abstraction Layer
Use of PostgreSQL
Sesame - modules
Admin module
incrementaly add RDF/RDFS
clearing repository
schema operations
recognise ‘type’, ‘subClassOf’, ‘subPropertyOf’
consistency checking
adding inferred facts to repository
RDF Export module
export RDF to standard XML-serialized format
Sesame - modules
Query module
query plan and optimizer similar to already known
DB solutions
query is translated to a set of simple RAL calls
each leaf of the query plan can ‘evaluate itself’
and pull data from RAL
data are returned as streams
lack of optimization on storage level
Sesame - modules
RAL - Repository Abstraction Layer
makes Sesame storage independent
API supportes RDF Schema semantics (e.g.
subsumption reasoning)
can be stacked one on another
interface oriented for persistance storage
(DBMS, Object-Relational DB)
data returned as streams
can even use net-based RDF services (!)
Due to poor performance, implemented
cache as one of RALs
cache mainly for RDFS, as it needs code
support in reasoning (subClassOf, ...)
Sesame - issues
Due to portability (RAL) cannot optimize for
underlying data storage
Incremental uploads (schema) are slow due to
rebuilding table in PostreSQL
Scaled up to 400,000 statements (RDF from
Wordnet)
very loosely connected graph
took 94 minutes (71 statements per second)
Slow upload of new data due to lots of required
database operations
Query works slow due to the same issues
Redland, Rasqual, Raptor
Storage for RDF triples - do not implement
any language by itself
This is the main module to include in RDF
manipulation system
Implemented in pure C for portability
Rich API enables to build modules on top of it
Rasqual - RDF query module
RDQL
SPARQL
Raptor - a fast RDF parser
Redland
Triple: Subject - Predicate - Object
API enables retrieval of triples
Highly optimized for performance
Indexes
SP 2 O
PO 2 S
SO 2 P
P 2 SO
S2P
-
get
get
get
get
get
target
source
relations between nodex
nodes in relation
relations for subject
Redland - RDF Model stores
Memory based
memory
Persistent
double-linked list
small models
basic indexes on triples
hashes - bdb memory
native storage with
DBD hashes, no
persistence
hashes with BDB
hashes - memory
3store
BDB hashes on disk
native storage, scales
tolow million of tuples
triplestore from AKT
project
not well supported
mysql
uses MYSQL DB
Redland - class diagram
Efficient implementation
of triple in memory
use of pointers
URI value separated
Strict memory
management - no leaks
Abstraction of model to
support different
storages
Fast parser / serializer
Redland
API available in different languages
API for manipulating
triples, URI/literals, graphs
Portable - can built in most OSes
Scalable to handle millions of triples
C, C#, Java, Perl, Python, PHP, Ruby, Tcl
while using of persistent storage
but indexing is very space-consuming
Support for context and hierarchy of models
RDF languages and storages
part 2 - indexing semi-structure data
Maciej Janik
Conrad Ibanez
CSCI 8350, Fall 2004