Storage Schema for Jena1 and Jena2
Download
Report
Transcript Storage Schema for Jena1 and Jena2
Efficient RDF Storage
and Retrieval in Jena2
Written by: Kevin Wilkinson, Craig Sayers,
Harumi Kuno, Dave Reynolds
Presented by: Umer Fareed 파리드
Outline
Introduction
Overview of Jena
Overview of RDF
Storage Schema for Jena1 and Jena2
Jena2 Architecture
Jena2 Query Processing
Miscellaneous Topics
Related and Future Work
Conclusion
Introduction
Semantic Web programmer’s Toolkit
Open-source project grown out of HP Labs
Semantic Web Programme
Offers a simple abstraction of the RDF graph as
its central internal interface
Supports a number of database engines (e.g.,
Postgresql, MySQL, Oracle)
A flexible architecture that facilitate porting to
new SQL database engines
Introduction
Facilitates experimentation with different
database layouts.
Jena2 : Second generation of Jena
New internal architecture and capabilities
Minimizes changes in API
Maintains persistent storage
Addresses performance and scaling issues in
Jena1
Outline
Introduction
Overview of Jena
Overview of RDF
Storage Schema for Jena1 and Jena2
Jena2 Architecture
Jena2 Query Processing
Miscellaneous Topics
Related and Future Work
Conclusion
Overview of Jena
Jena1 provided rich API for manipulating RDF
graphs
User can choose to store RDF graphs in
memory or in databases
In Jena2, architecture was modified to achieve
two goals:
Provide a simple minimalist view of the RDF graph
Allow easy access to, and manipulation of, data in
graphs enabling the data to be exposed as triples
Overview of Jena
Jena2 Architectural Overview
Overview of Jena
At abstract level, Jena2 storage implement three
operations:
statement, to remove an RDF statement from the
database;
find add statement, to store an RDF statement in a
database;
delete operation; to retrieve all statements that match
a pattern of the form <S,P,O> where each S, P, O is
either a constant or a don’t-care
Outline
Introduction
Overview of Jena
Overview of RDF
Storage Schema for Jena1 and Jena2
Jena2 persistence Architecture
Jena2 Query Processing
Miscellaneous Topics
Related and Future Work
Conclusion
Overview of RDF
RDF is a W3C standard
Means of expressing and exchanging semantic
metadata
RDF was originally designed for the
representation and processing of metadata
about remote information sources
Provides a simple tuple model,
<Subject,Property,Object>, to express all
knowledge
Overview of RDF
Provide some predefined basic properties
such as type, class, subclass, etc.
RDF permits resources to be associated with
arbitrary properties
Statements associating a resource with new
properties and values may be added to an RDF
fact base at any time.
Require efficient and flexible mapping to provide
persistent storage
Outline
Introduction
Overview of Jena
Overview of RDF
Storage Schema for Jena1 and Jena2
Jena2 Architecture
Jena2 Query Processing
Miscellaneous Topics
Related and Future Work
Conclusion
Storage Schema for Jena1 and Jena2
Storing Arbitrary RDF Statements in Jena1
Jena1 use two different database schemas;
1. Relational Databases
2. Berkeley Database
For relational databases, the schema consisted of
a statement table, a literals table and a resources
table
For Berkeley DB, all parts of a statement were
stored in a single row
Storage Schema for Jena1 and Jena2
Each statement was stored three times:
once indexed by subject, once by
predicate and once by object
Berkeley DB schema used a single
access method to store statements
Jena graphs stored using Berkeley DB
were observed to be faster than graphs
stored in relational databases
Storage Schema for Jena1 and Jena2
Jena1 Schema (Normalized)
Storage Schema for Jena1 and Jena2
Storing Arbitrary RDF Statements in Jena2
o Jena2 schema trades-off space for time
o Uses a denormalized schema in which resource URIs
and simple literal values are stored directly in the
statement table
A separate literals table is only used to store literal
values
A separate resources table is used to store long URIs
Many find operations without a join are possible by
storing values directly in the statement table
Storage Schema for Jena1 and Jena2
Jena2 Schema (Denormalized)
Storage Schema for Jena1 and Jena2
A denormalized schema uses more database space
because the same value (literal or URI) is stored
repeatedly
Jena1 and Jena2 permit multiple graphs to be stored in a
single database instance
Jena2 supports the use of multiple statement tables in a
single database so that applications can flexibly map
graphs to different tables
Use of multiple statement tables may improve
performance through better locality and caching
Outline
Introduction
Overview of Jena
Overview of RDF
Storage Schema for Jena1 and Jena2
Jena2 Architecture
Jena2 Query Processing
Miscellaneous Topics
Related and Future Work
Conclusion
Jena2 Architecture
Jena2 Persistent Architecture is
implemented using
Specialized Graph Interface
Persistence layer presents a Graph interface to the
higher levels of Jena supporting the usual Graph
operations of add, delete and find
Each logical graph is implemented using an ordered list
of specialized graphs
An operation on the entire logical graph, such as add ,
delete or find, is processed by invoking add, delete, find
on each specialized graph
Jena2 Architecture
Results of the individual operations are combined and
returned as the result for the entire graph
An operation can be completely processed for the entire
graph by one specialized graph resulting in process
optimization
Each specialized graph maps the graph operations onto
appropriate tables in the database
Many-to-one mapping between specialized graphs and
database tables
Jena2 Architecture
Graphs Comprise Specialized Graphs Over Tables
Jena2 Architecture
Database Driver
The driver is responsible for data definition operations
such as database initialization, table creation and
deletion, allocating database identifiers
Responsible for mapping graph objects between their
Java representation and their database encoding.
Use a combination of static and dynamically generated
SQL for data manipulation
Maintains a cache of prepared SQL statements to
reduce the overhead of query compilation
Jena2 Architecture
Configuration and Meta-Graphs
Configuration parameters are specified as RDF
statements.
A meta-graph, a separate, auxiliary RDF graph
containing metadata about each logical graph is
associated with each Jena2 persistent store
Meta-graph may be queried just as any other Jena graph
but, unlike other graphs, it may not be modified and it
does not support reification.
Meta-graph may also specify additional property,
property-class tables and indexes
Outline
Introduction
Overview of Jena
Overview of RDF
Storage Schema for Jena1 and Jena2
Jena2 Architecture
Jena2 Query Processing
Miscellaneous Topics
Related and Future Work
Conclusion
Jena2 Query Processing
Two forms of Jena Querying:
Find Processing
RDQL Processing
In find querying, the find operation returns all statements
satisfying a pattern.
In Jena1, a find pattern is evaluated with a single SQL
select query over the statement table.
For pattern evaluation in Jena2, the pattern is passed to
each specialized graph handler. The results are
concatenated and returned to the application
Jena2 Query Processing
An RDQL query in Jena1 is converted into a
pipeline of find patterns connected by join
variables
Query is evaluated in a nested-loops fashion by
using the result of a find operation over one
pattern
Generation of patterns for new find operations
• Goal of Jena2 query processing is to convert
multiple triple patterns into a single query for
evaluation by the database engine
Outline
Introduction
Overview of Jena
Overview of RDF
Storage Schema for Jena1 and Jena2
Jena2 Architecture
Jena2 Query Processing
Miscellaneous Topics
Related and Future Work
Conclusion
Miscellaneous Topics
Jena2 Performance Toolkit
Explore various layout options and understand
performance trade-offs
Jena Transaction Management
The underlying database needs to support
transactions
Bulk Load
Significant reduction in the time to load
persistent graphs
Outline
Introduction
Overview of Jena
Overview of RDF
Storage Schema for Jena1 and Jena2
Jena2 Architecture
Jena2 Query Processing
Miscellaneous Topics
Related and Future Work
Conclusion
Related Work
Jena2 schema design
Supports a denormalized schema used for
storing generic triple statements as well as
Property tables to store subject-value pairs
related by arbitrarily specified properties
Provides an efficient implementation for
reification
Most systems support only a fixed set of
underlying tables that implement a (nonschema-specific) generic store
Future Work
Performance measurements indicate that the
denormalized schema of Jena2 is twice as fast
for many operations than the normalized
schema of Jena1
Jena2 algorithm is a modest improvement over
the Jena1 nested-loops approach RDQL query
processing
An important enhancement in Jena2 for typed
literals will be to store them as native SQL types
rather as strings.
Support for OWL and reasoning in Jena2.
Outline
Introduction
Overview of Jena
Overview of RDF
Storage Schema for Jena1 and Jena2
Jena2 Architecture
Jena2 Query Processing
Miscellaneous Topics
Related and Future Work
Conclusion
Conclusion
Jena2 supports application-specific schema
Retains the flexibility to store arbitrary graphs
Use of property-class tables beneficial for query
languages that expose higher-level abstractions
to applications
More work needed on efficient algorithms query
processing and optimization