Transcript USE Banks

Multidatabase manipulations
Part 2
Witold Litwin
http://ceria.dauphine.fr/witold.html
1
Multidatabase manipulations
(Kandinsky: Ligne avec Accompagnement, 1937 )
2
Multidatabase manipulations
3
MSQL
(Litwin, Abdellatif, Nicolas, Zeroual, 1989
L. Suardi, M. Rusinkiewicz, 1992)

An extension to SQL
– Contains by definition every SQL-x

Allows for non-procedural multidatabase base
manipulations
  MSQL queries impossible to formulate in SQL
 An MSQL query may replace several SQL queries

Developed in 1986-89
» INRIA, projet B A BA,

initialement sous projet du projet-pilote SIRIUS (J. Le Bihan,
puis W. Litwin)
» Dr. Thesis of MM. Abdellatif, Nicolas, Zeroual

4
Compiler implemented at Houston University
– Team od Prof. M. Rusinkiewicz, 1990-1993
MSQL
(Litwin, Abdellatif, Nicolas, Zeroual, 1989
L. Suardi, M. Rusinkiewicz, 1992)
 Research
vehicle for functions for the MBD
environment
–
–
–
–
–
to address relations in different databases
to manipulate semantically heterogeneous data
to create MDB views
to transfer data (and schemas) between DBs
to define MDB dependencies
 Present
to limited extent in most of
commercial DBMSs &DAMSs
5
MSQL
(Basic new properties)
 SQL Query
– Uses 1st order predicate calculus
– Is compiled for optimization into the relational algebra
– Result is a table
 MSQL Query
– May use higher-order predicate calculus
– Is compiled for optimization into the multirelational algebra
– Result is a multitable
» A set of relations (tables)
» May be constituted from one or no tables
6
MSQL
(More on functions specific to MDB env.)
 Addressing
of tables in different DBs
– Implicitly or by qualification by (multi)database
names
» Introduced around 1985 by relational multibase system
prototype MRDSM
– B A BA project at INRIA
» Unknown at that time of any relational language

7
See the overview of relational DBMSs existing in 1987 (M.
Brodie)
MSQL
(More on functions specific to MDB env.)
 Manipulation
of semantically heterogeneous
data
– Multiple Queries
» With multiples identifiers
» With semantic variables

Ranging over data names
– Scale and Precision
– Units of measure
– Implicit joins
 Capabilities
still unknown of SQL
 Capabilities known at present to some dialects
– Limited with respect to MSQL
8
MSQL : example
View
View
SIL = Internal
Logical Schema
9
CIC
BNP
SG
SIL
SIL
SIL
Conceptual Schemas
(the multischema)
DB bnp :
br (br#, brname, street, street#, city, zipcode, tel)
account (acc#, cl#, balance, br#)
client (cl#, clname, cltel, cltype, street, street#, city, zipcode)
spe-acc (acc#, br#, cl#, balance, curr)
DB sg :
branch (bra#, braname, street, s#, town, zip, t#, class)
acc (acc#, bra#, c#, balance)
client (c#, cname, ct#, ctype, street, s#, town, zip)
DB cic :
br (br#, brname, street, street#, city, zipcode, tel)
account (ac#, br#, cl#, balance, open_date)
client(cl#, clname, cltel, cltype, street, street#, city, zipcode)
10
Semantic Heterogeneity
In Banks
 Same
names can designate different data
 Different names can designate same data
– same client, same town..
 The
value of a primary key is valid only in one
DB
– how to identify same client in diff. banks ?
11
MSQL Commands






12
CREATE TABLE
CREATE DATABASE
CREATE MULTIDATABASE
CREATE VIEW
ALTER TABLE
ALTER VIEW
ALTER MULTIDATABASE
DROP TABLE
DROP DATABASE
DROP MULTIDATABASE
DROP VIEW
MSQL
CREATE DATABASE
> MSQL
CREATE DATABASE boulogne ;
CREATE DB |.com.org.user.boulogne ;
CREATE MULTIDATABASE Banks (bnp cic sg );
Query scope
USE Banks;
CREATE DATABASE boulogne FROM bnp ;
13
MSQL
CREATE MULTIDATABASE


MSQL
CREATE MDB EC-Banks (f-banks-i-banks, s-banks, gbanks, e-banks );
CREATE MULTIDATABASE can create :
– flat MDBs (only contain DBs)
– nested MDBs (DBs or MDBs)
» can be potentially any network of DBs or MDBs
like through the links on the WEB
 what about cycles ?

14
MSQL
CREATE TABLE
Import
use banks ;
CREATE TABLE boulogne.loan FROM bnp.loan ;
CREATE TABLE fake_checks
(Chq#
INT,
Montant_Euro CURRENCY [EURO]
.... );
One has created four (empty) tables :
Unit of
mesure
bnp. fake_checks , cic. fake_checks ... boulogne.fake_checks
CREATE TABLE boulogne.client (c#, cn, ct#)
FROM bnp.client (cl#, clname, cltel)
PRIMARY KEY (c#)
(cn, ct#) OUTER REFERENCES (clname, cltel);
15
MSQL
CREATE TABLE with References
USE AuPrintemps
/* MDB AuPrintemps
CREATE TABLE MusicDep.Inventory
….
FOREIGN KEY (Item#) REFERENCES Central.Stock(I#);
No unauthorized Item# in the inventory of the Music
Department
 Other options


PRIMARY KEY (…) REFERENCES T(…) ;
[T1(A)] [LEFT|RIGHT] REFERENCES T2(B) ;
– Generates implicit equijoin, or left or right implicit outerjoins when a
query selects attributes A and B.
16
MSQL
ALTER MULTIDATABASE
use banks ;
alter banks
include vernes
remove cic

17
Alter MDB can create
– flat MDBs (only contain DBs)
– nested MDBs
MSQL
Elementary queries
USE bnp cic
SELECT bnp.br.brname, cic.br.braname, bnp.br.street
FROM bnp.br, cic.br
WHERE bnp.br.street = cic.br.street ;
bnp.br.brname
cic.br.braname
bnp.br.street
vaugirard 3
bd. montparnasse
vaugirard
Prefixing with DB names was unknown to SQL
- and is in DB2 SQL since last year only
18
MSQL
Default DB
USE bnp
SELECT br.brname, cic.br.brname, br.street
FROM br, cic.br
WHERE br.street = cic.br.street ;
br.name
cic.br.brname
br.street
vaugirard 3
bd. montparnasse
vaugirard
Tables of the default database are not prefixed
19
MSQL
Elementary queries without prefixed names
USE bnp sg
SELECT br.brname, branch.braname, br.street
FROM br, branch
WHERE br.street = branch.street ;
20
br.name
branch.braname
br.street
vaugirard 3
bd. montparnasse
vaugirard
Table names are unique
within the query scope
Updates
USE (bnp b) sg ;
UPDATE account
SET account.balance = account.balance + 500
WHERE account.balance > acc.balance
AND b.client.clname = sg.client.cname AND
b.client.street = sg.client.street ;

21
What does it mean ?
Multiples Queries
USE Banks
SELECT *
FROM br%
WHERE street = 'champs elysées' ;
22
Multiple Queries
USE Banks
SELECT *
FROM br%
WHERE street = 'champs elysées' ;
USE bnp
SELECT *
FROM br
WHERE street = 'champs elysées' ;
USE sg
SELECT *
FROM branch
WHERE street = 'champs elysées' ;
USE cic
SELECT *
FROM br
WHERE street = 'champs elysées' ;
23
Results (a multitable)
bnp.br
brname
br#
sembat
123
sevres
456
street
champs elysées
champs elysées
cic.br
street
bra# braname
champs elysées
jaures
123
champs elysées
sevres
765
sg.branch
brname
br#
sembat
abc
gare
a1f
24
street
champs elysées
champs elysées
street#
130
120
st#
153
20
s#
110
30
city
Boulogne
Sevres
town
Boulogne
Sevres
city
Boulogne
Chaville
tel
12 34
12 56
zipcode
92100
92105
zip
92100
92105
zipcode
92100
92110
t#
3214
1243
class
A
B
tel
12.45
34.56
Multiple Updates
Begin
Use Banks
Update cl*
set street = 'Charles de Gaulle"
where street = 'Etoile'
If SQLCODE <> 0 then Rollback ;
Commit
Use Banks vital cic
Update cl*
set street = 'Charles de Gaulle"
where street = 'Etoile'

25
MSQL transaction semantics is more general than ACID
– may include COMP (compensation) statement, list of accept.
states....
Semantic Variables in MSQL
use bnp sg
let x be town city
select *
from b%
where x = 'Paris' and street = 'r. de Rivoli'
26
Semantic Variables in MSQL
use bnp sg
let x be town city
select *
from b%
where x = 'Paris' and street = 'r. de Rivoli'
use bnp
select *
from br
where town = 'Paris' and street = 'r. de Rivoli'
use sg
select *
from branch
where city = 'Paris' and street = 'r. de Rivoli'
27
Semantic Variables in MSQL
use bnp sg
let x be town city
select *
from b%
where x = 'Paris' and street = 'r. de Rivoli'
Alternatively:
use bnp sg
let x be to% city
select *
from b%
where x = 'Paris' and street = 'r. de Rivoli'
28
Semantic Variables in MSQL
use banks
let X be banks.*
select a%, balance, c%name
from X.a% a, X.c% c
where a. a% = c. c%
 The query illustrates the multitable pair-wise join
 Semantic variable a over relation name account is not
necessary, but simplifies the typing of the query
29
Semantic Variables in MSQL
 Semantic
variables can be compound and with
values selected by queries from some dictionaries
use banks
let (x, y) be :select X.attr Y.attr
from FD X, FD Y
where X.mean = tel and Y.mean = city
select * from client
where x = '123' and y = 'Paris'
FD
mean attr
tel
t#
tel
tel
city city
city town
city burgh
FD
33
banks
Semantic Variables in MSQL
 Can
be applied to MSQL DD statements
use banks
create database cic2 ;
let x be a% b% c%
create table cic2.x from cic.x ;

34
Copies cic schema except for one table
Name homogenization
The labels
USE Banks ;
LET t BE tel t#
SELECT %name branch_name, t tel#, s%# street#
FROM br% br
WHERE street = ‘Champs Elysées’ ;
The result :
multitable:
{( bnp.br.branch_name, bnp.br.tel#, bnp.br.street# ),
( sg.br.branch_name, sg.br.tel#, sg.br.street# )
( cic.br. branch_name, cic.br.tel#, cic.br.street# )}
35
Multidatabase Views
bnp
my_bank
sg
A partial view of DBs bnp and sg in DB my_bank
USE my_bank bnp sg ;
CREATE VIEW my_bank.same_street_branches
(bnp_name, bnp_s#, sg_name, sg_s#, street, city)
AS SELECT brname, street#, braname, s#, street, city
FROM bnp.br b, sg.branch s
WHERE b.street = s.street AND b.city = s.town ;
The views in my_bank
can be considered
Import Schemes
36
Multidatabase Union Views
Use Banks
Create View bnp.all-banks as
Use banks
let x be town city
let y be banks.*
Select y.br% ( y, br#, br%name branch, street, street#, x city, zip% zip, t% tel)
Union *
Union * unions all the tables of the selected multitable
 It scales to all the tables named br% of Banks, if new banks enters
the MDB Banks in the future
Current DBMS, e.g., SQL Server, require to alter the union view
definition in such a case

37
Key words and Aggregate
Functions in MSQL
 Key
words and Aggregate Functions of SQL
– par definition
» DISTINCT, GROUP BY, ORDER BY
» COUNT, AVG, SUM…
– operate at each table of a multitable
 Their extensions to multitables
» MDISTINCT, MCOUNT, MGROUP BY, MORDER BY
MAVG, MSUM...
– operate at whole multitable
– important for warehousing
38
Example
USE Banks
SELECT COUNT (*)
FROM br% br
WHERE street = 'champs elysées' ;
39
Example
USE Banks
SELECT COUNT (*)
FROM br% br
WHERE street = 'champs elysées' ;
bnp.br
2
cic.br
2
sg.br
2
40
Example
USE Banks
SELECT MCOUNT (*)
FROM br% br
WHERE street = 'champs elysées' ;
41
Example
USE Banks
SELECT MCOUNT (*)
FROM br% br
WHERE street = 'champs elysées' ;
br
6
Exercises in warehousing :
-Average balance per client in each bank
Average balance per client in BANKS
-Sum of client assets per bank
-Sum of client assets in BANKS
42
Aggregate Functions
IMPLEMENTATION ISSUES

All-in-one (traditional computation)
– Possibly in parallel
– The calculus can take long time.

Successive approximations
– Some kind of sampling
»
»
»
»
result1, from any 1st DB to come
(result1 + result2) / 2
…
sampling within each database


several ACM-Sigmod & VLDB papers dealt with query evaluation using
sampling
Precomputing
– Incremental evaluation using interdatabase dependencies
– Common to warehousing
43
Aggregate Functions
MERGE ON


form a single tuple from all the tuples of the same objet in the
multitable
– Uses outer jointures
Find millionaires in Banks and form the tuple for each
millionaire
USE Banks ;
LET x.y BE clname.cltel cname.ct#
LET z BE Banks.*
SELECT *
FROM z.a%
WHERE z.a%.c%# = z.client.c%#
AND z.a%.balance > 1 000 000
MERGE ON x y ;
44
Aggregate Functions
MERGE ON
nulls
nulls
nulls
45
USE Banks ;
LET x.y BE clname.cltel cname.ct#
LET z BE Banks.*
SELECT *
FROM z.a%
WHERE z.a%.c%# = z.client.c%#
AND z.a%.balance > 1 000 000
MERGE ON x y ;
Aggregate Functions
NAME
 Transform
a name (table, attribute..) into
attribute value
USE Banks ;
LET x.y BE br.city branch.town
SELECT %name branch_name, NAME (.x) bank
FROM x
WHERE y = 'Nice'
UNION * ;
Note: Union * unions all the tables of the selected multitable

46
the result is the table :
branch_name bank
Jaures
CIC
DeGaulle
BNP
Aggregate Functions
CHOOSE

Chooses at most n tuples among the selected ones
– the 1st found as does the function TOP (default) in any or some order,
specified by ORDER BY (default)
– strictly random (RND)
– these that were not chosen by the previous execution of the query in the
same transaction (NEW)
– preferably in the DBs listed, and in the listed order
– at most j per DB
– selecting at most m tuples sharing the values of the attr. in the list A,
supposed global key of some objet.
CHOOSE (n, (m, <A>), [<B>] | j, [<B>], [RND | NEW]
<A> ::= <list of attr.> <B> ::= <list of DBs>
47
Aggregate Functions
CHOOSE
Choose a millionaire randomly
USE Banks ;
SELECT c.*
FROM c% c, a% a
WHERE c.c%# = a.c%# AND a.ba% > 1.000.000
CHOOSE (1) RND ;


48
Function very important in MBD
environment
– information overload
Aggregate Functions
TIMEOUT
 Fix time limit of a query
– the system should possibly deliver all the relevant
tuples
– however, any query arriving to timeout is considered
executed successfully
TIMEOUT (t [unit]) ;
<unit> := ms | s | m | h | d
s - seconds (default)
USE Banks
SELECT *
FROM br%
WHERE street = 'champs elysées'
TIMEOUT (10) ;
49
Aggregate Functions
POST

Make a query continuous
– One manipulates each tuple found during the life time of the
query
– Even those created after the query start
– TIMEOUT may be used to limit the life time
USE Immo LaCentrale Orpi ;
SELECT *
FROM logem%
WHERE prix < 1,000,000 AND Ville = 'Paris'
POST ;
50
Aggregate Functions
ESTIMATE

Compute the cost of a query before the execution and can
start the execution after an authorization
ESTIMATE (type, price, time, count, size, report)
[WITH EXEC_PROMPT]

type of estimate :
– exact (can be long to compute)
– approximate
price of the query (in $, FF...).
completion time
number of tuples
size of the resultant, in bytes
report on the estimate itself
– precision...





51
Privileges in MSQL
USE bnp sg cic ;
GRANT SELECT ON client TO Nicolas Abdellatif ;
client is a multitable :
client = (bnp.client, sg.client, cic.client)
GRANT ALL ON etoile.account TO Nicolas Abdellatif
FROM bnp.account ;
GRANT ALL ON etoile.account TO Nicolas FROM Zeroual
ON bnp.account ;
52
Interdatabase Queries
Transfer data between DBs
 Source and target are multitables

INSERT...
53
Interdatabase Queries

INSERT
– insert selected tuples
» except these with the key already in the target

STORE
– insert selected tuples
» replacing these with the key already in the target

REPLACE
– insert selected tuples and delete the rest of the target

UPDATE
– update the tuples selected in the target with the values in the
source tables

COPY
– copies tuples and the source schema
54
MSQL

There are more interesting capabilities
– e.g. Multidatabase Dependencies
» referential integrity & (outer) join links
» multidatabase triggers

local autonomy
» dynamic attributes for retrieval and updates
 The
language design will never be finished
– MSQL 1, 2, 3...
MSQL : A multidatabase Language. Information Science Journal : Special
Issue on Database Systems, 48, 2, (July 1989).
Execution of Extended Multidatabase SQL. Intl. IEEE Conf. on Data Eng.
Vienna, 1992
55
O*SQL
 For
OO or RO common model, consider in
addition:
– MDB inheritance
– MDB type/subtype integration
» derived types
– OID heterogeneity & UUIDs
– Type / function value semantic heterogeneity
» dynamic type hierarchies
» higher-order OO languages
 Relations
56
with inherited attributes
Elements of MSQL in commercial DBMSs

Main DBMSs evolved to MDBSs
– yet primitive but it's better than nothing
Sybase, Oracle, Informix, MsAccess, SQL Server,....

There are also MDBSs which are only access
systems to DBMSs
EDA-SQL, DEC DB Integrator, DBJoiner (IBM),
Ingres*, UniSQL/M, Uniface, Q+E, OAdaptor (HP),
Telebase...

57
"Data Warehouses"
MSQL in commercial DBMSs
(Department Store Data Warehouse, using MsAccess, SQL Server...)
Au Printemps
Music
Dep
Music
Dep
Home Appl.
Dep
Payroll
Parly 2
Bd. Haussman
Food
Dep
Jeans
Dep
Jeans
Dep
Central
Warehouse
Home Appl.
Dep
Car
Food
Dep
Orders
58
Books
Dep
MBD Manipulations in MsAccess
 One
can perform limited MBD operations
between
– MsAccess DBs
– An DB of MsAccess and
» any other DB under a DBMS ODBC compatible
» Paradox, Btrieve, Dbase
» Any OLE compatible program

59
Excel...
MBD Manipulations in MsAccess
Paradox
Gateway
MsAccess
B1
Sybase ODBC
driver
B2
Export
Import
ODBC
Distr.
Connect.
ODBC
B3
Paradox
60
Excel
Oracle
Sybase
MsAccess & MSQL
 Open


61
B <=> USE B
ATTACH table
Open B1
attach B2.T' as T 
create view B1.T as select * from B2.T'
» DROP VIEW corresponds to Delete in MsAccess
menu
Clause IN <externalDB>
Open B1
Select a, b, c From D IN B2 
select a, b c from B2.D
Examples MsAccess
 Source DB: MsAccess
SELECT [Customer ID]
FROM Customers IN MYDATA.MDB
WHERE [Customer ID] Like "A*";
 Source DB: Paradox
SELECT [CustomerID]
FROM Customers
IN "C:\PARADOX\DATA\SALES" "Paradox 4.x;"
WHERE CustomerID Like "A*";
 Every data transfer from/to DB non-MsAccess or OLE
compatible software has data repr. conversions
– Semantic Heterogeneity oblige
62
Elementary Queries in MS-Access



Open a DB and query other DBs
– one has to define aliases in FROM
DB open here is called s-p1.mdb
– but this name has no importance here
Joins of tables in other databases
SELECT TOP 10 C.[Contact Name], C.City
FROM [c:\access\nwind2.mdb].Customers AS C,
[c:\access\ordentr2.mdb].customers AS O
WHERE (o.Id= C.[customer Id]);
63
Result
Contact Name City
Pat Parkes
Gladys Lindsay
Elizabeth Lincoln
Olivia LaMont
Terry Hargreaves
Elizabeth Brown
Sylvia Dunn
Ann Devon
Ronald Merrick
Bill Lee
64
London
Seattle
Tsawassen
San Francisco
London
London
London
London
London
Pocatello
Elementary Queries in MS-Access

Join of a local and external table
SELECT TOP 10 S.SName, C.[Contact Name], C.City
FROM S, [nwind2.mdb].Customers AS C
WHERE ((S.City= C.City))
Order by [contact name];
65
Result
66
SName Contact Name
City
Clark
Clark
Clark
Clark
Clark
Clark
Clark
Clark
Clark
Clark
London
London
London
London
London
London
London
London
London
London
Ann Devon
Archibald Langford
Cornelia Giles
David Bird
Elizabeth Brown
G.K.Chattergee
Gerald Pipps
Hari Kumar
Jane Austen
Jeffrey Jefferies
MsAccess & MSQL

Clause INTO <externalDB> dans Select INTO
ou INSERT INTO
Open B1
Select a, b, c INTO T IN B2 From D 
Use B1 ;
copy into B2.T
select a, b c from D ;
» D can be a view or a subquery
» One cannot combine clauses IN et INTO
– INSERT de MsAccess has (sub)semantics of
INSERT in MSQL
67
MsAccess & MSQL
 IMPORT
& EXPORT
– menu commands
– equivalent to MSQL query
Use B1 ;
copy into T1 * from B2.T2 ;
68
MsAccess & MSQL
Comparison
 Formulation
of MBD elementary queries and views
– first one has to define ATTACH's
– then one formulates SQL monodatabase query
– then, perhaps one needs to delete the ATTACH's
 Much
more procedurality than under MSQL
– in Banks, one would need in practice that each DB attaches
all the tables of any other DB
» Good luck DBA !
 Multiple
queries and other capabilities of MSQL
» yet unknown of MsAccess
69
SQL Server, Sybase, Interbase

MBD Architecture similar to that of MsAccess, but more
powerful
– gateways to Oracle, IMS, DB2
– ODBC

Transac-SQL support the following MSQL functions and is
the MBD dialect least procedural in the industry
– elementary queries
» to Sybase DBs at the same site
USE B ;
select * from T where B1.T1.a = T.a ;
» Only one DB per USE
» some restrictions at the level of interdatabase queries
– multidatabase CREATE VIEW, and MDB triggers
70
Oracle, RDB, Informix

Have an operation similar to ATTACH called
Create link:
Create public database link bnp
connect to bnp_unix
Create public database link cic
connect to cic_vms
SELECT br.brname, b.braname, br.street
FROM br @bnp, br@ cic b
WHERE br.street = b.street ;
71
Oracle, RDB, Informix

MBD queries are possible only once the links are
defined
– Hence these DBMSs are + procedural than Sybase
for MBD operations

Starting from V7, Oracle supports however
MBDs queries without links
» postfixing par the DB name

72
as in the last ex.
EDA-SQL, DB Integrator,
DBJoiner, Ingres* & al

SQL MDBSs for access to DBMSs



73
» but there is always one for the
MDB catalogs
 auxiliary DB
One has to create links and logical
DBs
» almost virtual DBs
only DB Integrator supports elem.
MDB queries
– called multischema queries
No other MSQL functions
logical
DB
ODBC
– in theory, without their own
DBs
logical
DB
Gateway
BD
lMS
BD
RDB
BD
Ingres
UniSQL & O-Adaptor
Similar to previous ones
except that for the logical
DB
logical
DB
– UniSQL uses RO model
– O-Adaptor an OO model

No MDB queries (other
than link creations)
logical
DB
Gateway
BD
lMS
74
ODBC

BD
RDB
BD
Ingres
Telebase (USA)

MDBMS for access to inf. retr. DBs
– 1000+ DBs at many sites
CCS
» with different local languages




STAIRS, INSPEC, DIALOG...
Drivers
Extended Common Command Set
Language (CCS)
No joins only Boolean clauses
Supports the MSQL functions
– multidatabases names
DBs
DIALOG
» Called Categories
– multiple queries
» Called Scans
75
DBs
INSPEC
DBs
STAIRS
Messidor : 1st Heterogeneous Multidatabase
Information Retrieval Access System
76
Démonstration par C. Moulinoux (STERIA), INRIA, 1987
Meta-search engines
Metacrawler, BigHub.com, AskJeewes.com,
Copernic…
 Query simultaneously several search engines

– Altavista, Yahoo, Excite, Hotbot…
 Boolean
Manipulation Langages
 Multiples Queries
– Apply the mdb aggregate functions Mdistinct Name,
Mdistinct, Choose, Timeout…
77
Data Warehouses

Data
warehouse
» elaborated DS implementation
elaborated decision support
functions

incremental propagation
» an MDB view or a DB redundant
with respect to existent ones is
created
78
ODBC
Popular new concept for MDBSs
– data warehouse <=> an MDB or
federation in an enterprise
– With elaborated management of
interdatabase dependencies
– new ideas:
Data
mart
Gateway
DB
lMS
DB
RDB
DB
Ingres
An Instructive Call for Papers
International Journal of Cooperative Information Systems
Special Issue on
Design and Management of Data Warehouses
Guest editors: Manfred A. Jeusfeld and Martin Staudt
Data Warehousing embraces technology and industrial practice to
systematically collect data from the enterprise and to use that data in a
highly aggregated form for managing the enterprise thru decisions. Little
attention is currently paid to design and manage a data warehouse (DW) in
such a way that it accomplishes its purpose, i.e. to support the management
of the enterprise. Existing solutions are focusing on technical aspects
like efficient source data extraction. Their parameters are however
incomprehensible to the stakeholders who decide on the introduction of a
data warehouse.
Data warehouses are important in managing large enterprises and in
communicating highly aggregated information between the various
departments. Interoperable tools and integrated methods to manage data
warehouses in order to fulfill the enterprise goals are desperately needed.
Such tools should cover all aspects of data warehousing:
79
- selection of data sources
- data cleaning
- conceptual/logical/physical data warehouse design
- enterprise modeling
- data warehouse quality monitoring
- data warehouse refreshment methods
- architecture design
- data mart customization, etc.
Exemple : Architecture de DB2 Data
Warehouse
80
Conclusion






81
MDB Manipulations - among most important R & D directions
Other key-words:
– Interoperability
– Integration
– Distributed Heterogeneous DBs
– Data Warehouses
MSQL is a research vehicle + advanced for relational MBDs
The root for further resarch proposals
– MSQL with Integrity Constraints, IDL, SchemaSQL…
» For the latter, see especially ACM-TODS journal, Dec. 2001
Basic MSQL capabilities are in commercials DBMSs, Information Retrieval Ssytems, MDB
Access Systems, Data Warehouses, XML standard proposals…
– Others will follow
But there is still a lot to do
– in the industry and in research
Exercises and Research Problems








82
All these in the text ; mdb queries especially.
Express various elementary mdb queries using Amos, MsAccess (SQL and QBE), SQL Server,
DB2, Oracle, Interbase…
Invent your own instructive queries to BANKS
Under MsAccess, design an MDB Form for an elementary query and for a multiple query.
Explain how you did i in a short report.
Consider 3 attributes B1.T1.a, B2.T2.b, B3.T3.c. The attribute types are INT and unit of
measures are KG, G, mG. Consider that '=' operator has the usual mathematical meaning, with
the usual rounding up of values with a different precision. Prove or disprove that the usual
associativity of equijoins
(a JOIN b) JOIN c = a JOIN (b JOIN c)
does not hold anymore. Comment on the consequences for the current relational query
optimizers.
If you had to evaluate a JOIN b using manual unit conversion, would you rather convert a to b
or vice versa ?
Propose and justify a reasonable algorithm for the multiple join evaluation.
A unit conversion algorithm A may be a long calculus. Would you rather:
– apply A to every value V of a the manipulated table
– project or order the table first, then apply A once for every different value
Exercises and Research Problems

Propose an execution tree expressed in mdb algebra for the query of slide 30

Add unit conversion to your favorite query optimizer (Ph. D Thesis)
Try to express the example queries to Banks using SchemaSQL language. For each query,
present also the result. Can it be a multirelation ?
Try to express the example queries to Banks using IDL language. For each query, present also
the result. Can it be a multirelation ?
Consider that bnp.balance is in US$ and cic.balance is in FF. Consider that the exchange rate is
in some table ExRa in DB called Currency. Is it possible to find accounts with the same
balance in both DBs using a single MSQL query ?
Consider that to perform a multidatabase join A JON B one has to bring both tables into a
database. What are your options for an elementary MDB query processing, if there are
selections, joins, and projections ?
Idem, if you consider distributed join processing ?
Consider a multitable R = (R1, R2, R3), a table T and the query Q
select * from R where R.a = T.a and T.b = '123' ;






What are your option for Q's optimization ?

83
Propose an implementation for your favorite MSQL aggregate function (Ph. D. Thesis or a
part of it at least)