Part 2 - PPT
Download
Report
Transcript Part 2 - PPT
RDF languages and storages
part 2 - indexing semi-structure data
Maciej Janik
Conrad Ibanez
CSCI 8350, Fall 2004
Outline
Jena storage
Indexing techniques
Jena
Implemented in Java
One of the most popularly used RDF
storages and query engines
Supports RDF, RDFS and OWL
In memory and persistent storage
(Oracle, MySQL, PostgreSQL)
RDQL
Reasoning/inference engine
Jena - storage schema
Previous version used normalized relational
DB tables
statements
literals
resources
Taken approach to store triples as (Subject,
Predicate, Object) in denormalized tables
Optimization for common statement patterns
- grouping of properties
Jena - storage
Normalized tables
Denormalized
„Efficient RDF Storage and Retrieval in Jena2” - Wilkinson et al.
Jena - storage
Do certain trade-off
for space and search
time
Cluster properties
that are likely to be
accessed together optimize for
common patterns
Special treatment of
reified statements
Jena - graph abstraction
Graph interface is
separated from
(persistent) triple
storage layer
Special support for
different types of
graphs - optimized for
performance
Support operations like
add, delete, find.
Jena - query processing
Converting multiple patterns in query into one
query to DB
Use DB query optimizer instead of executing
multiple queries from Jena level (as it was in
Jena1)
Associate a table with pattern (best) or span
pattern between tables (requires join operation)
Query may span between different graphs, but it
can be optimized only if they are in the same
database
What to index? How to index?
Indexing semistructured data
XML cannot be indexed directly as
relational DB
Indexing may take advantage of tree
structure
depth of node
common path from the root
convert each path to string expression
precalculate the path tree
Indexing semistructured data
Idea is based on
Particia’s trie
Index should scale
with the growth of
data
Path together with leaf
is encoded into string
-> the Index Fabric
„A Fast Index for Semistructured Data” - Brian F. Cooper et al.
A Layered Index
„A Fast Index for Semistructured Data” - Brian F. Cooper et al.
Index Fabric
Index is used to accelerate path expressions - mainly
for queries that ask for root-to-leaf path
Idea of prefix encoding
xml: <A>alpha<B>beta<C>gamma</C></B></A>
paths: <A>alpha ; <A><B>beta ; <A><B><C>gamma
encoded: A alpha ; A B beta ; A B C gamma
infix (not common): A alpha B beta C gamma
Convert path to string for fast searches
Replace tags with ‘non-terminal’ characters (like in
automata)
Index Fabric - raw paths
„A Fast Index for Semistructured Data” - Brian F. Cooper et al.
Graphs - how to index?
Backbone
http://www.aisee.com/
Graphs - how to index?
Tree-type
- prefixes
- tries
http://www.aisee.com/
Graphs - how to index?
T-index
Path templates
2-index
1-index
„Index Structure for Path Expressions” - Tova Milo, Dan Suciu
Graphs - how to index?
Landmarks
http://www.aisee.com/
Indexing - summary
Indexing semistructure data
index fabric - encoding, multilayered
common prefixes - trie structure
backbone - highways between points
landmarks - county division
path templates - precalculated expressions
clustering - grouping by theme access
Indexing such data is NOT easy, solution
depends how you want to search the graph
References
„Efficient RDF Storage and Retrieval in Jena2”
- Kevin Wilkinson, Craig Sayers, Harumi Kuno,
Dave Reynolds
„A Fast Index for Semistructured Data” Brian F. Cooper, Neal Sample, Michael J.
Franklin, Gisli Hjaltason, Moshe Shadmon
„Index Structures for Path Expressions” Tova Milo, Dan Suciu