Transcript Week7

Lecture 7


Distributed Data Bases:
Principles and Architectures
1
Distributed Data Base (DDB) –
Definition


A logically interrelated collection of shared
data, physically distributed over a
computer network.
Implies data description at two levels:


Global (the view of the whole);
Local (where data is actually held).
2
DDBMS (Distributed DBMS)

The software system that permits the
management of the distributed data base
and makes the distribution transparent to
users:

Transparent – users are unaware of the
underlying local structure:
Data requests do not specify distribution sites
- but they may notice performance differences (e.g.
if local data has to be moved to another site along
a slow line).

3
Characteristics of a DDB




Collection of logically related shared data.
Data is split into a number of fragments
(horizontal or vertical – restrict or project).
Fragments may be replicated.
Fragments / replicas are allocated to sites.


Fragments are in effect views;
Replicas are duplicates – only acceptable if
redundancy is controlled.
4
Why distribute?

Natural match of data with location

Can have each division, department or office
hold its own data with some degree of
autonomy.



Autonomy – to have control (self-determination,
self-rule)
Users can decide policies locally (devolved).
Still need global DBA to ensure entire system
works.
5
Why distribute? (cont.)

More flexible operation:

Improved availability
- one node failure does not bring the whole system down.

Improved reliability
- replication ensures that copies of data are still available if a
node fails.

Improved performance
- accessing most data locally reduces network overheads.

Readily handle expansion
- can add new nodes with local schema
- followed by simple adjustments to global definition.
6
Problems with Distribution

Complexity





Global and local schema must be integrated.
Design techniques involve more stages.
Replications rigorously handled.
Network needs to be robust.
Costs
More people effort needed to handle the complexity
– although cheaper to buy power with smaller
machines rather than larger ones.

7
Problems with Distribution
(cont.)

Security


Integrity


Many more potential access points for wouldbe violators.
Need to ensure that combination of local and
global constraints gives the required effect.
Experience
Fairly immature technology
– not yet translated to standards.

8
Homogeneous and
Heterogeneous DDBMS


A homogeneous DDBMS uses the same
database product at all sites.
A heterogeneous DDBMS uses different
data base products at various sites
– may arise from corporate mergers.
9
Degrees of Heterogeneity
 Same software, different hardware can be



handled fairly easily.
Oracle 9i : Oracle 8i – differences slight.
Oracle 9i : SQL Server – same underlying
relational model, different syntax in places.
Oracle 9i : Objectivity – object-relational
(SQL-1999) and ODMG respectively,
different underlying model.
10
Interoperability


Ability to work with each other.
In a loosely coupled environment:
Full details of each system not needed
– BUT need to have interfaces for reliably exchanging
messages without error or misunderstanding


Solutions:



Standardized specifications;
Mediation.
Differences in implementation

may still lead to breakdowns in communication.
11
Simple Problem in
Interoperability - 1

Two schemas in SQL-1999:
A
author varchar2(50),
title varchar2(300),
keyword1 varchar2(30),
keyword2 varchar2(30);
B
author_surname varchar2(40),
author, initials varchar2(10),
title varchar2(200),
keywd keywordarr;
CREATE TYPE keywordarr AS
VARRAY(8) OF varchar2(30);

Note: homogeneous model – both SQL-1999 – but
difficulties.
12
Different Standards

e.g. Names:




Person (surname, first_name, …)
or Person (first_name, surname, …)
or Person (name, …)
First two may easily be made equivalent
but convention in third needs to be
understood.

Note also possibilities of A.N.Other, AN
Other, A N Other, etc.
13
Possible Solutions


In schema B, define a function which amalgamates the two
parts of author into one value.
 Will need to look manually at format of author in
schema A.
 If format inconsistent, need some pre-processing.
Other inconsistencies require decisions:




Fixed two entries for keyword vs. array dimension 8.
Different name for keyword attribute.
Different size for title fields (presumably adopt higher).
In a heterogeneous environment, we need also to relate
schema constructions, e.g. is CLASS the same as TABLE?
14
Simple Problem in
Interoperability - 2

Homogeneous Models
The same information may be held as attribute
name, relation name or a value in different data
bases.
 e.g. library fines

could be held in a dedicated relation
Fine (amount, borrower_id)
– or as an attribute of Loan (id, isbn, date_out, fine)
– or as a value Charge (1.25, ‘fine’).

15
Architectures for Interoperability 1
1. Global schema integration

Produces a single new schema (C) for the
different information systems with schemas
(A, B).
C
A
B
16
Global Schema Integration

Advantages


Transparent to end users – appears as a single
information system.
Disadvantages




Difficult to perform integration – needs
human understanding.
Local autonomy lost.
Static – does not evolve automatically.
Tightly-coupled.
17
Architectures for Interoperability 2
2. Federated Data Base Systems






Less tightly coupled schema than in 1.
Each service specifies sharable objects
through an export schema.
Common data model.
Internal command language.
Decentralised control (local autonomy).
5-level architecture for federated system.
e.g. Objectivity as Federated OODBMS
18
FDBMS Terminology



IS – Internal Schema defining layout on
disk of a conceptual schema
CS – Conceptual Schema defining logical
data base (e.g. relational – tables,
attributes, domains).
ES – External Schema defining views on
conceptual schema.
19
Federated DBMS – 5-level
Architecture
Global ES
Global ES
Global CS
Local ES
Local ES
Local CS
Local CS
Local IS
Local IS
DB
DB
20
Federated Data Base:
Loosely coupled

Created by users.



AE, BE are export schemas.
V is a view.
A, B are base schemas, retaining autonomy over
those parts not exported.
V
AE
A
BE
B
21
Federated Data Bases:
Tightly coupled




Created by administrators.
Global schema integration on all export
schemas.
More formal than loosely-coupled.
Much effort to resolve semantic
inconsistencies.
22
Federated Data Base Systems –
General Advantages



Local autonomy preserved.
Not all data needs to be integrated.
Provide meta-data structures for views
(external and export schema, data
dictionary).
23
Federated Database Systems Disadvantages by Approach

Loosely coupled
Duplication by different users in building views.
 Updating data defined in views can be difficult.


Tightly coupled

Similar to global schema integration:
Complex, difficult to make changes
dynamically.
Much effort needed to resolve semantic
inconsistencies.
24
Multi-Data-Base Language
Approach





No attempt at schema integration.
All sites maintain complete autonomy.
The various schemas can be heterogeneous,
inconsistent w.r.t. services provided, and duplicate
information in different ways.
Language (e.g. MSQL) is used to integrate data
bases at run time.
Relational model used as Common Data Model.
25
Multi-Data-Base Language
Approach


A, B are schemas.
MSQL is the run-time language.
MSQL
A
B
26
Multi-Data-Base Language
Approach – Advantages




No preparatory work to understand
semantics of schema.
Dynamic – access latest versions.
Very skilled users can succeed in reaching
their goals.
Interesting work on multi-data-base
dependencies.
27
An Example Multi-Data-Base
Language

MSQL (Multidatabase SQL)
Biased towards the relational model.
 Illustrates problems.


Consider 2 data bases:
Each on publications of a computing society;
 and query:
 “What is the name, e-mail address, title for
each publication of an author appearing in both
of the society’s data bases?”

28
MSQL Schema

Schema 1 (for AIIA Database):





Contacts (PersonID, Name, Email, …)
Conference (Name, Type, …)
Attendees (ID, Conf_ID, Speaker, …)
Publ_Papers (P_ID, Title, Author_ID, …)
Schema 2 (for IFIP Database):




Member_Socs (Soc_Name, …)
Conf (Conf_ID, …)
Publ_Papers (P_Ref, Title, Conf_Ref, …)
Authors (Name, Email, Paper_ID, …)
Underlined attributes are primary keys.
Attributes in italics are foreign keys.
29
MSQL for Query
USE AIIA, IFIP
SELECT Name, Email, Title
FROM Authors,
IFIP.Publ_Papers IFIP_Paper,
Contacts,
AIIA.Publ_papers AIIA_Paper
WHERE Authors.Name = Contacts.Name
AND Contacts.Person_ID = AIIA_Paper.Author_ID
AND Authors.Paper_ID = IFIP_Paper.P_Ref;
The USE clause declares the multi-data-bases which are used as
qualifiers in the FROM clause to distinguish tables with the same name
(thereafter distinguished by aliasing).
Retrieves Name, E-mail address and Title from both data bases.
30
Potential Problems with MSQL
Are names and domains of corresponding
attributes the same?
 Can use LET command to create
equivalences of names, but this does not
solve domain incompatibility.
 What if one schema is not relational? The
E-R model is often used as a neutral schema
for translation and comparison of
heterogeneous features.

31
Multi-data-base Language –
Disadvantages in General




Distribution is not transparent.
Users must resolve inconsistencies
themselves.
Common language may restrict scope of
heterogeneity (relational bias).
Local autonomous systems may change
their schema freely (so existing queries
fail).
32
Comparison of Approaches

By coupling:

How tightly is the interoperable system
connected to its underlying systems?

By adaptability:

How freely can the interoperable system evolve
in line with the underlying schema?
By transparency:
 How much understanding of the interoperable
system do end-users need to have?

33
Comparison of Approaches
Approach Coupling
Adaptability
Transparency
Global Schema
Integration
Tight
Low
High
Federated
Data Bases
Medium
Medium
Medium
Multi-data-base
Languages
Loose
High
Low
34
Summary
Trend:
From Global Schema Integration
– through Federated Data Bases
– to Multi-data-base Language.


Towards looser coupling, higher
adaptability, and lower transparency.
35
Further Reading

Management of Heterogeneous and
Autonomous Database Systems
Elmagarmid, Ahmed
Rusinkiewicz, Marek
Sheth, Amit
Morgan Kaufmann (1999).
36