Frankfurt travel table

Download Report

Transcript Frankfurt travel table

Bx Tutorial, Database Flavor:
Updatable or Invertible Mappings
James F. Terwilliger
Microsoft Corporation
How Does the DB Field Use Mappings?
DB
DB
DB
DB
DB
DB
DB
DB
DB
DB
DB
DB’
DB
DB
App Model
Over Store
Data
Warehouse,
Schema
Versioning
Federated
System
DB
Exchanged
Data Between
Applications
Model (Metadata) Management
Capabilities?
S
Model?
Virtual?
M
Language?
T
Model?
Virtual?
In General, Two Approaches
S
S
M
M
T
T
The View Update Problem
S
Concrete
Database
M
T
Application Model,
External Schema
• Early work abstracted away the exact language of M, focusing
on what it means to be an updatable view
• As work progressed, focus shifted somewhat to a choice of M –
SQL – and deciding when an update policy can be computed
The View Update Problem
Query
Query
Update
S
Relational
(Concrete)
M
T
SQL
Let’s use the declarative query tool
we know and love – SQL – as a way to
express views!
(What could possibly go wrong!)
Relational
(Tables only)
(Virtual)
View Update References
• Francois Bançilhon and Nicolas Spyratos. Update
semantics of relational views. TODS, 6(4):557–575,
December 1981.
• Umeshwar Dayal and Philip A. Bernstein. On the
correct translation of update operations on relational
views. TODS, 7(3):381–416, September 1982.
• G. Gottlob, P. Paolini, and R. Zicari. Properties and
update semantics of consistent views. TODS,
13(4):486–524, 1988.
• Jens Lechtenbörger. The impact of the constant
complement approach towards view updating. PODS,
49–55, 2003.
View Updates: The Basics
View definition
V
Update
statement
f
u
u
u(V)
D
f
u(D)
(Unique) Transformed
update against the
physical database
Update translations available for some
syntactic restrictions on f
Constant Complement
(Semantics of View Updates)
D
V
V’
D
• Updates leave the view
complement unchanged
• Complement may not be
unique (must be chosen to
determine update semantics)
Great! Where Can I Get It?
• Most database vendors do not implement past
the SQL92 standard
– View must have:
•
•
•
•
No set operators
No distinct, no grouping
No joins or multiple FROM items
No smoking, talking, or chewing gum
– Basically, only simple select/project queries
View Update Limitations (Among Many)
• Large queries are hard to debug (and read!)
• Given a large query, how to report to the user
why a query is not updatable?
• DB  Tables, not DB  DB
• Syntactic restrictions are very strict
• It is assumed that a query language can make
a good view expression language
Outline
• A brief look at updatable views
• Transformation-based approaches
– PRISM/PRIMA
– Guava/Channels
• Holistic approaches
– Object-relational mappings
– Data exchange and model management
• Whole schemas on both ends
• Interesting and notable mappings
• Support for interesting update/bidirectional scenarios
Slides adapted with permission from Carlo Curino
PRISM/PRIMA (PANTA RHEI)
PRISM
S
Application
Model
Version N
M
T
Application
Model
Version N+1
PRISM
Query
Update
S
Relational
(Virtual)
M
SMOs
(Schema Modification
Operators)
T
Relational
(Concrete)
Schema Evolution: common practice
• Starting point:
• a Schema S1,
• a database DB1 under S1, and
• a set of queries Q1 formulated over S1
Schema Evolution: common practice
• Evolution in the real world:
• The DBA defines an SQL DDL script modifying S1 into S2
• The DBA defines an SQL DML script migrating data from DB1 to DB2
• Queries in Q1 might fail, the DBA adapts them manually: Q2 = Q1’
+ Q2_new (new queries added on S2)
Schema Evolution: common practice
• Evolution in the real world:
• The DBA defines an SQL DDL script modifying S2 into S3
• The DBA defines an SQL DML script migrating data from DB2 to DB3
• Queries in Q2 might fail, the DBA adapts them manually as in Q3 =
Q2’ + Q3_new (new queries added on S3)
Schema Evolution: common practice
• DB Administrator (DBA) nightmares:
• Data Migration: Data loss, redundancy, efficiency of the migration,
efficiency of the new design
• Impact on Queries and applications
• What is the real impact of schema evolution? [iceis2008]
Schema Evolution: Ideal World
• Evolution in an ideal world:
• Evolution design is assisted and predictable
• Data migration scripts are generated automatically
• Legacy Queries (and updates, views, integrity constraints,…)
are automatically adapted to fit the new schema
Prism: from 30,000 feet above
Desiderata
PRISM proposed solutions
Support Evolution Design
Schema Modification Operators (SMO)
(a language to express changes to the schema)
Increase Predictability of
evolution
SMO static analysis: foresee impact on schema,
data and queries
Automate data migration
Generate SQL scripts from SMO sequences
Automate query support
Derive from SMOs logical mapping between
schema versions
Automatic query rewriting (more recently work
integrity constraints propagation and updates
rewriting)
Schema Modification Operators
• SMOs:
• Atomic changes
• SQL-inspired syntax
• Operates on both schema and data
• Practical completeness (coverage of available evolution scenarios)
Data Migration
• SMOs to SQL translation:
• Each SMO has a clear semantics that can be represented by a set
of SQL statements (DML + DDL)
• Optimization issues (single/multi SMO)
DECOMPOSE TABLE R INTO S(x,y), T(x,z)
SMO to SQL translation
RENAME TABLE R TO S;
CREATE TABLE T(x varchar,z
varchar);
INSERT INTO T(SELECT x,z FROM S);
ALTER TABLE S DROP COLUMN z;
Logical Mapping
• SMOs to logical mapping translation:
• Language: Disjunctive Embedded Dependencies (DED)
• DED is a subset of First Order Logic with expressions in the
following form:
• Used to capture the relationships between (instances of)
consecutive schema versions
JOIN TABLE R, S INTO T WHERE cond
SMO to DED translation
Query Rewriting Semantics
• Data Migration:
• d2 is equivalent to d1 migrated
by M and modified by U2
• d2 = M(d1) ± U2
• Query Answering:
• Goal: answer q1 on d2
• q1(M-1(d2))
• Query Rewriting Semantics:
• Executing the rewritten query q1’ is equivalent to migrating d2
back to schema S1, by applying the M-1, and executing q1
• q1’(d2) = q1(M-1(d2))
Inverting SMOs
• The Invertibility problem (How do we get M-1?):
• DEDs are not invertible!
• We deal with the inversion at the SMO-level (easier but not free)
• Inverting SMOs:
• Not every SMO is perfectly invertible
• in this case we use SMO quasi-inverse1 (intuitively “the best you
can do”-inverse)
• Not every SMO has a unique inverse
• in this case the DBA must disambiguate these (rare) cases
1
we apply the notion of quasi-inverse by Fagin, Kolaitis, Popa, Tan, in [fkpt2006, fkpt2007b]
SMO Invertibility: tricky cases
• Quasi-Inverse example:
JOIN TABLE R, S INTO T WHERE cond;
a valid (and reasonable) quasi-inverse is:
DECOMPOSE TABLE T INTO R(x,y), S(x,z);
• Multiple Inverses example:
COPY TABLE R INTO S;
the system provides the following candidates:
1. DROP TABLE S;
2. DROP TABLE R; RENAME TABLE S INTO R;
3. MERGE R,S INTO R;
Query Rewriting
• The Rewriting Process:
• From the Inverse SMOs we derive the inverse DEDs mapping M-1
• Rewrite queries by chase&backchase (using M-1) [popa2000,
deutsch2006]
• MARS [mars2003] optimized chase&backchase implementation
Q1: SELECT * FROM
R
MARS Query Rewriting
Q1’: SELECT *
FROM S,T
WHERE S.x = T.x
References
Panta Rhei Website
• schemaevolution.org
PRISM
• Carlo Curino, Hyun Jin Moon, Carlo Zaniolo: Graceful database schema
evolution: the PRISM workbench. PVLDB 1(1): 761-772 (2008)
• Carlo Curino, Hyun Jin Moon, Alin Deutsch, Carlo Zaniolo: Update Rewriting
and Integrity Constraint Maintenance in a Schema Evolution Support System:
PRISM++. PVLDB 4(2): 117-128 (2010)
Slides used with permission by James Terwilliger… hey wait, that’s me!
GUAVA, CHANNELS
Channels
S
Concrete
Database
M
T
Application
Model
Channels
S
Relational
(Concrete)
M
“Channel”
Query
Update
Schema Δ
T
Relational
(Virtual)
Example: Unpivot
Employee
Employee’
ID
Name Age
Gender
1
Joe
male
2
Sue
… …
31
…
…
Exp
ID Attribute
Value
1
1
Name
Joe
4
1
Gender
male
…
1
Exp
1
2
Name
Sue
2
Age
31
2
Exp
4
Schema: eliminate Name, Age, … columns
Schema: add Attribute, Value Columns
Instance: for each row, produce one row for
each non-null attribute value, with the
attribute name placed in the Attribute
column and the value in the Value column.
… …
…
33
The unpivot CT – operationally
input schema
current input instance
output schema
with output instance
input DB
input DB
insert into Employee
delete from Employee
update Employee
add column to Employee
add domain element to
Exp domain
query that uses Employee
insert into Employee’
delete from Employee’
update Employee’
add column to Employee’
add domain element to
Exp domain
query that uses Employee’
input DML/DDL/queries
CT
output DML/DDL/queries
unpivot the Employee table
where ID is the key for the table
(unpivot all remaining attributes)
DDL/DML operations accepted by a
channel
VLDB 2010
35
CTs are composable into a channel
(CTs simply pass statements on untouched that
they don’t need to change)
Horiz.
Merge
Natural
Schema
Query,
Insert,
Update,
Delete,
Create,
Alter, Drop
statements
Vert.
Part.
Apply
Horiz.
Merge
Unpivot Unpivot
Native
Schema
Query,
Insert,
Update,
Delete,
Create,
Alter, Drop
Error, Loop36
statements
List of CTs that we have defined/implemented
• VPartition – vertically partition into two tables with a 1:1
relationship
• VMerge – vertically merge two tables with a 1:1 relationship
• HPartition – horizontally partition based on the value in a column
• HMerge – horizontally merge, and introduce a provenance
column
• Pivot/Unpivot – move data to schema (pivot) or schema to data
(unpivot) (Move from/to generic/triple form to regular form.)
• Apply – apply an invertible function to data in each tuple
• Adorn – add columns, specify data to be held in the new
columns, such as environment data
• Audit – add start/end timestamps, and ensure that data is
deprecated rather than deleted
HMerge (operationally) working on a query:
Replace P_Client to Person appropriately
P_Client:
FName LName Age
Bob
Smith
19
Person:
…
…
…
FName LName
P_Staff:
HMerge
FName LName Cert
Ted
Jones
X
…
…
…
T
Age
19
Bob
Smith
P_Client
Ted
Jones
P_Staff
…
…
…
Cert
X
…
…
πFName, Lname, Age σT=P_Client Person
Query Q = P_Client
(SELECT * FROM P_Client)
HMerge (operationally) working on DML Insert statement
P_Client:
FName LName Age
Bob
Smith
19
Person:
…
…
…
FName LName
P_Staff:
HMerge
FName LName Cert
Ted
Jones
X
Gail Brown X
…
…
…
I (P_Staff, {FName, LName, Cert},
{(Gail, Brown, X)})
T
Age
19
Cert
Bob
Smith
P_Client
Ted
Jones
P_Staff
…
Gail
…
Brown
…
P_Staff
…
…
X
…
…
…
…
…
X
I (Person,
{FName, LName, T, Age, Cert},
{Gail, Brown, P_Staff, null, X})
HMerge (operationally) working on DDL:
Create table (P_Admin)  add column and add domain
element statements
P_Client:
FName LName Age
Bob
Smith
19
Person:
…
…
…
FName LName
LName
FName
P_Staff:
HMerge
FName LName Cert
Ted
Jones
X
…
…
…
P_Admin:
FName LName Pay
…
…
…
TT
Bob
Bob
Smith
Smith
P_Client
P_Client
Ted
Ted
Jones
Jones
P_Staff
P_Staff
……
…
…
…
…
Add “P_Admin” to the
domain of the column
Age Cert
Cert
Age
19
19
XX
…
…
…
…
Add “Pay”
column
Pay
Formal definition of a CT
• A channel transformation is a 4-tuple (S, I, Q, U)
–
–
–
–
A schema transformation S
An instance transformation I
A query transformation Q
An update transformation U
• S and I define semantics
– Never implemented, but define what the CT means
• Q and U define operation
– Defines the translation of statements against the
natural schema to statements against the native
schema
Query Correctness
Instance transformation
for the channel transform
Query against the virtual
database
Query transformation for
the channel transform
d
I
d
q
q
q = Q(s, q)
Q
Transformed query against
physical database
The query transformation is correct with
respect to the instance transformation
Update Correctness for Schema
Schema transformation
for the channel transform
s
Update
statement
(DML or DDL)
S
u
u
u(s)
s
S
u(s)
Update transformation for
the channel transform
u = U(s, u)
Transformed update
against the physical
database
The update transformation is correct with
respect to the schema transformation
Update Correctness for Instance
S
s
Update
statement
against table
“t”
u
u
u(s)
qt
Query:
“SELECT * FROM t”
s
PutGet
u(s)
Q
qt
u = U(s, u)
qt = Q(u(s), qt)
The update transformation is correct with
respect to the instance transformation
References
Overall Architecture (Guava):
• James F. Terwilliger, Lois M. L. Delcambre, Judith Logan: Querying through
a user interface. Data Knowl. Eng. 63(3): 774-794 (2007)
Channels:
• James F. Terwilliger, Lois M. L. Delcambre, David Maier, Jeremy Steinhauer,
Scott Britell: Updatable and Evolvable Transforms for Virtual Databases.
PVLDB 3(1): 309-319 (2010)
OBJECT-RELATIONAL MAPPINGS
Object-Relational Mappings: Overview
• Applications written in an object-oriented
language have object-oriented data tiers
• Persistence is a relational database
• “Impedance mismatch”
– Map object constructs to relational constructs
Object-Relational Mappings
M
S
Concrete
Database
T
Application
Model
(Arrow Direction Could Go
Either Way)
Object-Relational Mappings
S
Relational
(Concrete)
M
• Specification
• Relational
equivalences
• Mapping strategies
Query
Update
(Schema Δ)
T
ObjectOriented
(Virtual)
An O-R Mapping Is…
• … generally an operational specification rather
than a declarative query or set of queries
• … tailored more to the purpose of mapping
inheritance and relationships to relations
rather than a general-purpose mapping
Mapping Patterns
(TPT)
Mapped to
(TPC)
(TPH)
Mapping Patterns:
TPH Sub-Categories
Name (string)
Salary (integer)
Name (string)
Office (integer)
Fully disjoint
Reuse by column
Reuse by domain
Name1 (string)
Name2 (string)
Salary (integer)
Office (integer)
Name (string)
Salary (integer)
Office (integer)
String1 (string)
Integer1 (integer)
Clear column
provenance
Clear name reuse
Maximum data
density
Mapping Patterns: Etc.
Horizontal Partitioning
Vertical Partitioning
Association Join Tables
0..1
Origin = ‘A’
Origin = ‘B’
OR
?
*
ORM Product Space
•
•
•
•
•
Ruby on Rails
Hibernate/NHibernate
SQLAlchemy
Microsoft Entity Framework
Oracle TopLink
• Some major tradeoffs:
– Expressiveness
– Specification style
Hibernate Example
<hibernate-mapping>
<class name="eg.hibernate.mapping.dataobject.Person" table="TB_PERSON" polymorphism="implicit">
<id name="id" column="ID">
Client Class
Store Table
<generator class="assigned"/>
</id>
<set name="rights" lazy="false">
<key column="REF_PERSON_ID"/>
<one-to-many class="eg.hibernate.mapping.dataobject.Right" />
</set>
<joined-subclass name="eg.hibernate.mapping.dataobject.Individual" TPT-Style Mapping
table="TB_INDIVIDUAL">
<key column="id"/>
<property name="firstName" column="FIRST_NAME" type="java.lang.String" />
<property name="lastName" column="LAST_NAME" type="java.lang.String" />
</joined-subclass>
<joined-subclass name="eg.hibernate.mapping.dataobject.Corporation" TPT-Style Mapping
table="TB_CORPORATION">
<key column="id"/>
<property name="name" column="NAME" type="string" />
<property name="registrationNumber" column="REGISTRATION_NUMBER" type="string" />
</joined-subclass>
</class>
XML fragments almost correspond to
</hibernate-mapping>
individual O-to-R transformations
Entity Framework (EF):
A Brief Overview
Client-side (Objects):
Mapping specified at
schema level
Classes
Mapping compiled to
views
Object Queries
(LINQ)
Object Updates
Store side (Relations):
Q1 = Q1’
Q2 = Q2’
Q3 = Q3’
…
Tables
(selectproject
only)
Query view VQ
Update view VU
Preserve fidelity of the
source data
Merge view VM
56
EF Simple Example
Client-side (Classes):
Person:
id
name
title
πid, name Person = πid, name Person1
πid, title Person = πid, title Person2
Store side (Relations):
Person1(
id integer PRIMARY KEY,
name varchar(50),
)
Person2(
id integer PRIMARY KEY,
title varchar(50),
details varchar(2000)
)
Person = πid, name, title Person1 ⋈ Person2
57
Entity Framework: Major Results
• Validation procedure ensures that a collection of
mapping fragment roundtrips
PutGet
– Each client state maps to a valid state
– Client state travel to store and back is invariant
– Guarantees query and update safety
• Mapping compilation procedure expressive
enough for common mapping scenarios, and
many uncommon ones
– All of the mapping schemes previously noted
ORM Challenges
• Generally not extensible
– Limited mapping expressiveness
– Cannot add new mapping capabilities
• EF mappings must be validated
– Not always easy to write correct mappings
– Error detection and correction is hard
References
ORM overview:
• Scott Ambler: http://www.agiledata.org/essays/mappingObjects.html
Entity Framework formalism:
• Sergey Melnik, Atul Adya, Philip A. Bernstein: Compiling mappings to bridge
applications and databases. ACM Trans. Database Syst. 33(4): (2008)
Entity Framework overview:
• Atul Adya, José A. Blakeley, Sergey Melnik, S. Muralidhar: Anatomy of the
ADO.NET entity framework. SIGMOD Conference 2007: 877-888
References
Hibernate:
hibernate.org/
Oracle TopLink:
oracle.com/technetwork/middleware/toplink/overview/index.html
SQLAlchemy:
sqlalchemy.org/
Entity Framework:
msdn.microsoft.com/en-us/library/bb399572.aspx
Ruby on Rails:
http://rubyonrails.org/
Slides adapted with permission from Jorge Perez
DATA EXCHANGE
Data Exchange
S
Concrete
Instance
M
T
Concrete
Instance
Data Exchange
-1
M
S
Relational
(Concrete)
M
Query
T
FO Predicate Calculus
(source-to-target
tuple generating
dependencies)
Relational
(Concrete)
References
Another Dagstuhl Session! All About Data Exchange!
• http://www.tks.informatik.uni-frankfurt.de/events/deis10/topics
Data Exchange Tutorial
• Marcelo Arenas, Pablo Barceló, Leonid Libkin, Filip Murlak: Relational and XML
Data Exchange Morgan & Claypool Publishers 2010
• Includes a discussion of composition and inversion of mappings
Foundations of Schema Mapping Management
• Marcelo Arenas, Jorge Pérez, Juan L. Reutter, Cristian Riveros: Foundations of
schema mapping management. PODS 2010: 227-238
Details of Maximal Recovery
• Marcelo Arenas, Jorge Pérez, Cristian Riveros: The recovery of a schema
mapping: Bringing exchanged data back. ACM Trans. Database Syst. 34(4): (2009)