No Slide Title

Download Report

Transcript No Slide Title

On Propagation of Deletions and
Annotations through Views
Wang-Chiew Tan
University of Pennsylvania
Database Group
Joint work with
Peter Buneman and Sanjeev Khanna
Data Annotations (share annotations)
• Knowledge sharing through “annotations”
• Annotations on data at various levels of granularity, annotations
on annotations
• Improve accuracy of data
– data and annotations can be reviewed by independent parties
• Annotations:
– loosely structured
• Source Data:
– proprietary
– fixed schema
• A system that overlays annotations on existing data
• “big business” in scientific databases
Wang-Chiew Tan, Penn Database Group
2
Data Annotations (share annotations)
Serves fine French Cuisine
in elegant setting. Jackets
required.
NYRestaurants (Source Table)
Cost
Restaurant
Peacock Alley
Bull & Bear
Pacifica
Soho Kitchen & Bar
Extensive wine list!
Type
Zip
$$$
$$$
French 10022
Seafood 10022
$
$
Chinese 10013
American 10022
Yummy chicken curry!!
Cheap Restaurants (View 2)
All Restaurants (View 1)
Restaurant
Peacock Alley
Bull & Bear
Pacifica
Soho Kitchen & Bar
Cost
Type
$$$
$$$
French
Seafood
$
$
Chinese
American
Restaurant
Pacifica
Soho Kitchen & Bar
Cost
$
$
Type
Chinese
American
Wang-Chiew Tan, Penn Database Group
3
Data Annotations
• Communicate “meta data” through annotations
– “bounce” or “spread” annotations around by piggybacking
annotations on data items in the source-query-view model.
Model:
Source:
Relational
Database
Query
• An annotation is placed in the view
– where do we place the annotation on source?
View : result of
query applied
on source
Not an easy
problem!
• Annotation placement problem presented in relational setting
– results carry over to fragments of XML (hierarchical model)
Wang-Chiew Tan, Penn Database Group
4
Location and Propagation Rules
• A location is a triple: (R, t, A)
relation name
tuple in R
A is an attribute in schema of R
• Propagation Rules:
– Select: R
– Project: R
– Join:
– Union:
A1
A2
A3
A1
A2
A3
A1
A2
A2
R2
A2
A3
A1
A2
A3
A1
A2
A3
A3
A3
R2
R1
R1
A1
A1
A2
A3
A1
A2
A3
Wang-Chiew Tan, Penn Database Group
5
Annotation Placement Problem
Q
V=Q(S)
S
• Annotation Placement Problem:
– Given a view V = Q(S) and an annotation A placed in the view V,
decide if there is an annotation in the source that when propagated
to the view, produces no other annotation except A.
• Q = query
• S = data source
– “side-effect-free annotation” : an annotation on the source that
produces no other annotation except A in the view
Wang-Chiew Tan, Penn Database Group
6
A Dichotomy Theorem
Q
V=Q(S)
S
Theorem:
(a) It is NP-hard to decide if there is a side-effect-free
annotation for a PJ query.
(b) There is a polynomial time algorithm for queries which
do not simultaneously contain a Project and a Join
operation.
Wang-Chiew Tan, Penn Database Group
7
Project and Join Query
• Intuition: PJ can encode 3SAT
(x1 + x2 + x3)
...
C1
...
Assignment
tuples:
All possible
satisfying
assignments
for C1
Dummy tuple
T - true
F - false
( x3 + x5 + x2)
Cm
x1
x2
x3 C1
x3
x5
x2 Cm
F
F
F
F
T
T
T
d
F
F
T
T
F
T
T
d
F
T
F
T
F
F
T
d
F
F
F
T
T
T
T
d
F
T
T
F
F
T
T
d
T
F
T
F
T
F
T
d
C1
C1
C1
C1
C1
C1
C1
C1
...
Cm
Cm
Cm
Cm
Cm
Cm
Cm
Cm
Assignment
tuples:
All possible
satisfying
assignments
for Cm
Dummy tuple
Query Output
Query:Join, then Project on C1 … Cm C1
...
Cm
Wang-Chiew Tan, Penn Database Group
8
Project and Join Query
• Intuition: PJ can encode 3SAT
T - true
F - false
(x1 + x2 + x3) … ( x3 + x5 + x2)
C1
Assignment
tuples:
All possible
satisfying
assignments
for C1
Dummy tuple
Cm
x1
x2
x3 C1
x3
x5
x2 Cm
F
F
F
F
T
T
T
d
F
F
T
T
F
T
T
d
F
T
F
T
F
F
T
d
F
F
F
T
T
T
T
d
d
F
T
T
F
F
T
T
d
d
T
F
T
F
T
F
T
d
d
C1
C1
C1
C1
C1
C1
C1
C1
...
Output
Query:
Join, then Project on C1 … Cm
C1
C1
...
...
Cm
Cm
Cm
Cm
Cm
Cm
Cm
Cm
C’m
Assignment
tuples:
All possible
satisfying
assignments
for Cm
Dummy tuples
Cm
C’m
Wang-Chiew Tan, Penn Database Group
9
Related Work on Annotations
• Superimposed Information (D. Maier, L. Delcambre [WebDB’99])
– data “placed over” existing information
eg. bookmark files, schema of a database
• Annotation Systems
– Annotea (W3C)
• annotate web pages
• location is defined with XPointer
– Multivalent Browser (R. Wilensky, T. A. Phelps. UC Berkeley DL Project)
• annotate on PDF files, HTML, etc.
• robust locations
– BioDAS (Distributed Annotation Server) (L.Stein et. al )
• annotate on genome sequences
• notion of location is genome specific
• No one has formally studied annotation placement problem
Wang-Chiew Tan, Penn Database Group
10
The classical view deletion problem
• A view tuple is to be deleted
– What changes should be made to the source?
• Many kinds of view-to-source deletion translations
– eg. deletion-to-insertion, deletion-to-modification, etc.
• Update Semantics of Relational Views
(F. Banchilon, N. Spyratos, [TODS’81])
• On the correct translation of Update Operations on Relational Views
(U. Dayal, P. Bernstein, [TODS’82])
• Algorithms for Translating View Updates to Database Updates for
Views Involving Selections, Projections and Joins
(A. M. Keller, [PODS’85])
– deletion-to-deletion
• Run-Time translations of View Tuple Deletions Using Data Lineage
(Y. Cui, J. Widom, [2001])
– exploits lineage information to find “side-effect free” deletions whenever
possible
Wang-Chiew Tan, Penn Database Group
11
View Deletion Problem
(Deletion-to-deletion translation)
• View Deletion Problem (minimize view side-effect):
– Given a view V=Q(S) and a tuple t in V, decide if there is a sideeffect free deletion for t
– “side-effect-free deletion” : a set of source tuples whose removal
from the database will only remove t from the view
Source:
Relational
Database
Query
View : result of
query applied
on source
Wang-Chiew Tan, Penn Database Group
12
A Dichotomy Theorem
Theorem:
(a) It is NP-hard to decide if there is a side-effect free
deletion for a PJ or JU query in normal form.
(b) There is a polynomial time algorithm to find the set of
source deletions with minimum side-effects for all other
queries, i.e., queries that involve only S,P,U or S,J
operators).
• Theorem (a) is true even for a constant size PJ query involving
only two relations!
PROJ A,C(R1 JOIN R2)
Wang-Chiew Tan, Penn Database Group
13
View Deletion: PJ Query
Theorem:
It is NP-hard to decide if there is a side-effect free
deletion for a PJ query in normal form.
(x1+x2+x3)(x2+x4+x5)(x4+x1+x3)
R2
R1
B C
A B
a
a
a
a
a
c2
c2
c2
x1
x2
x
x
3
x
4
x5
x2
x
4
5
x1
x2
x
x3
x4
x5
x1
x
2
x3
x4
x
1
3
c
c
c
c
c
c1
c1
c1
c3
c3
c3
PROJ A,C(R1 JOIN R2)
A C
a
a
a
c2
c2
c2
c
c1
c
c3
c
c
1
For each xi,
decide whether
to delete (a,xi) or
(xi,c).
3
Wang-Chiew Tan, Penn Database Group
14
Ongoing and Future Work
• Implementation of annotation system
– on RDBMS
• special cases of PJ queries with polynomial time algorithm
– PJ queries that do not project out key information
– on XML
– effects on query languages?
Wang-Chiew Tan, Penn Database Group
15
Do we need an “annotation-conscious” QL?
•
The same query in different languages, but different annotation behavior
Emp(Name, Sal, Dept)
Department(Dept, Manager)
[Name:”Joe”, Sal:50K , Dept:”Marketing” ] [Dept:”Marketing” , Manager:”Jane”]
Relational Algebra:
Emp JOIN Department
[Name:”Joe”, Sal:50K , Dept:”Marketing”
SQL:
SELECT e.Name, e.Sal, e.Dept, d.Manager
FROM Emp e, Department d
WHERE e.Dept ==a d.Dept
, Manager:”Jane”]
[Name:”Joe”, Sal:50K , Dept:”Marketing” , Manager:”Jane”]
•
Equivalent queries in the same language, but different annotation behavior
Q1 = SELECT e.Name, e.Sal
FROM Emp e
WHERE e.Sal = “50K”
[Name:”Joe”, Sal:50k ]
Q2 = SELECT e.Name, “50K” AS Sal
FROM Emp e
WHERE e.Sal = “50K”
[Name:”Joe”, Sal:50k]
Wang-Chiew Tan, Penn Database Group
16
Do we need an “annotation-conscious” QL?
• Relational algebra seems to suggest a natural set of propagation
rules
• SQL seems to suggest another natural propagation rule
– one that is based on variable bindings
• Not clear how we extend the semantics of query languages so
that annotation propagation is “well-behaved”.
• Should a query language be “annotation-conscious” ?
OR
• Should the user be allowed to control which annotation gets
propagated to where?
Wang-Chiew Tan, Penn Database Group
17
End of Talk
Wang-Chiew Tan, Penn Database Group
18