Высокоуровневый доступ к данным на осно

Download Report

Transcript Высокоуровневый доступ к данным на осно

High-level Data Access Based on Query
Rewritings
Ekaterina Stepalina
Higher School of Economics
High-Level Data Access
• Concentration on application domain tasks
• Abstraction from data sources
• Efficient work
Research
• This problem is actively considered on modern scientific conferences on knowledge
representation and ontologies – OWLED (2009), (ICDE IIMAS, 2008) , the Semantic
Web magazine (2011 – the Mastro System)
• W3C developed OWL 2, OWL 2 QL (2008) and etc.
Ontology-Based Data Access (OBDA)
• Large amounts of data (distributed, inconsistent)
• Main task – query answering (domain-oriented
and efficient)
What is Ontology?
• Ontology is a knowledge domain described on some knowledge
representation language.
• Entity-Relationship and UML Class diagrams can be seen as ontology
languages.
Logic-Based Knowledge
Representation
• Enables semantic processing of data
• Enables inference of implicit knowledge
• Well studied and actively developed
– Description logics (Baader,1999), esp. DL-Lite
• Standardized
– OWL 2 Profiles
DL-Lite Best Suites for OBDA
• High expressive and computationally efficient
• Allows delegating query answering to DBMSs
and using all advantages of modern relational
technologies
• Supported by the W3C standard - OWL 2 QL
Query Answering Problem
• Given a query
and an n-tuple of objects
from A. Decide, whether
, or the ntuple is the answer for
with respect to K.
For knowledge represented in DL-Lite, we
can formulate queries in domain concepts,
translate them into ordinary SQL queries and
perform over separate databases.
OBDA System Architecture
•
•
•
•
•
Ontology Editor
OBDA-Enabled Reasoner
Mapping Processor
Data Source Manager
Consistency Checker
Query Rewritings
• OBDA-Enabled Reasoner rewrites the initial
ontology query into a set of UCQ (union
conjunctive query).
• Mapping Processor builds an SQL from UCQ
and given mappings.
• The initial query syntax may differ (SparQL,
datalog query, etc.)
TBox and ABox in DL
• TBox is a finite set of concept and role
inclusion axioms:
• ABox is a finite set of assertions:
• Where - object’s name, A – concept name, P
– role name, q – integer.
Interpretation
• Interpretation (the particular instance of KB)
is a pair if non-empty domain
and an
interpretation function
:
,
, and
.
• UNA (unique name assumption):
OWL 2 QL
• UNA is ignored; (in)equality must be defined
explicitly
• Language expressive power reduced up to
(other designation -
).
• Basic conceptual modeling relations are
available: (A)sym, (Ir)Ref, Tran
• Main constraints of
:
– Functional relations cannot be defined
– Particular roles cannot be assigned only to specific concepts, all roles
are applied to all concepts
– Disjunction coverage of knowledge domain cannot be defined
Query Rewriting Sample
• RDB tables: Person(name, age), Lives (person, city), Manages (boss,
employee).
• Query: Get the names and ages of all people living in the same city with
their boss.
• UCQ:
• Simplified UCQ:
•
•
•
•
SQL query:
SELECT P.name, P.age
FROM Person P, Manages M, Lives L1, Lives L2
WHERE P.name=L1.person AND P.name=M.employee AND
M.boss=L2.person AND L1.city=L2.city
Query Rewriting Algorithms
• CGLLR (Calvanese et al., 2007)
- Applies all suitable TBox axioms to q ( x)
- Replaces axioms containing existential
qualifications with another 3 axioms, which
increases the number of UCQ
• RQR (Pérez-Urbina, Horrocks, Motik, 2009)
- Generates clauses from TBox assertions and
then resolve clauses with query
- Potentially supports more expressive DLs
Query Rewriting Benchmark
• 9
ontologies with axioms,
containing -existential qualification:
– Vicodi (V)
– Stock exchange (S)
– University (U,UX)
– Adolena (A,AX)
– Synthetic (P1, P5,P5X)
Comparison Results
• RQR is more preferable to implement in
OBDA-enabled reasoners, than CGLLR:
– Generates less UCQ, especially for ontologies with
large number of existential qualifications
– May be further optimized and advanced to more
expressive DLs, than
Produced Rewritings
Running Time, ms
Current Work
• Preparing an ontology for a real application –
interactive television platform (IPTV) for
testing algorithms on real data
• Optimizing RQR – reducing the number of
generated clauses
• Main idea – not advance RQR, but support
more expressiveness and all OWL 2 QL
constructors in powerful mappings
References
•
•
•
•
•
The Description Logic Handbook: Theory, Implementation and
Applications. Cambridge University Press, 2002. ISBN 0521781760.
Edited by F. Baader, D. Calvanese, D. McGuinness, D. Nardi, P. F.
Patel-Schneider.
F. Baader. Logic-Based Knowledge Representation. In M.J.
Wooldridge and M. Veloso, editors, Artificial Intelligence Today,
Recent Trends and Developments, number 1600 in Lecture Notes in
Computer Science, pages 13–41. Springer Verlag, 1999.
Artale, A.; Calvanese, D.; Kontchakov, R. and Zakharyaschev, M.
(2009) The DL-Lite family and relations. Journal of Artificial
Intelligence Research 36 (1), pp. 1-69. ISSN 1076-9757.
H.P´erez-Urbina, I.Horrocks, and B.Motik. Efficient Query Answering
for OWL 2. In Proceedings of the 8th International Semantic Web
Conference (ISWC2009), Chantilly, Virginia, USA, 2009.
H.P´erez-Urbina, B.Motik, and I.Horrocks. Tractable Query
Answering and Rewriting under Description Logic Constraints.
JournalofAppliedLogic, 2009.
High-level Data Access Based on
Query Rewritings
Questions?