distributed_db_arch

Transcript distributed_db_arch

Advanced Databases CG096
Lecture 9: Distributed Databases –
Principles and Architectures
Nick Rossiter
1
Distributed Database (DDB) -Definition


A logically interrelated collection of shared
data physically distributed over a computer
network.
Implies data description at two levels:


Global (the view of the whole)
Local (where data is actually held)
2
Distributed DBMS


DDBMS
The software system that permits the
management of the distributed database
and makes the distribution transparent to
users

Transparent – users are unaware of the
underlying local structure
Data requests do not specify distribution sites
 But they may notice performance differences (e.g.
if local data moved to another site with slow line)

3
Characteristics of DDB




Collection of logically related shared data
Data is split into a number of fragments
(horizontal or vertical (select or project))
Fragments may be replicated
Fragments/replicas are allocated to sites


Fragments are in effect views
Replicas are duplicates – only acceptable if
redundancy is controlled
4
Why distribute?

Natural match of data with location

Can have each division, department or office
hold its own data with some degree of
autonomy



Autonomy – to have control (self-determination,
self-rule)
Users can decide policies locally (devolved)
Still need global DBA to ensure entire system
works
5
Why distribute? (continued)

More flexible operation

Improved availability


Improved reliability


Replication ensures that copies of data are still available if a
node fails
Improved performance


One node failure does not bring the whole system down
Accessing most data locally reduces network overheads
Readily handle expansion


Can add new nodes with local schema
Followed by simple adjustments to global definition
6
Problems with distribution

Complexity





Global and local schema must be integrated
Design techniques involve more stages
Replications rigorously handled
Network made robust
Costs

Although cheaper to buy power with smaller
machines rather than larger ones

More people effort in distributed than centralised approach
to handle the complexity
7
Problems with distribution
(continued)

Security


Integrity


Many more potential access points for would
be violators
Need to ensure that combination of local and
global constraints gives the required effect
Experience


Fairly immature technology
Not yet translated to standards
8
Homogeneous and
Heterogeneous DDBMS

Homogeneous DDBMS uses


the same database product at all sites
Heterogeneous DDBMS uses


different database products across its sites
may arise from corporate mergers
9
Degrees of heterogeneity vary
 Same software, different hardware can be



handled fairly easily
Oracle 9i : Oracle 8i – differences slight
Oracle 9i : SQL Server – same underlying
relational model, different syntax in places
Oracle 9i : Objectivity – object-relational
(SQL-1999) to ODMG, different
underlying model.
10
Interoperability


Ability to work with each other.
In loosely coupled environment:



Solutions:



full details of each system not needed BUT
need to have interfaces for reliably exchanging
messages without error or misunderstanding
standardized specifications
mediation
Differences in implementation:

may still lead to breakdowns in communication
11
Simple Problem in
Interoperability
Two schemas in SQL-1999
A
B
author varchar2(50),
author_surname varchar2(40),
author, initials varchar2(10),
title varchar2(300),
title varchar2(200),
keyword1 varchar2(30),
keywd keywordarr;
keyword2 varchar2(30);

CREATE TYPE keywordarr AS
VARRAY(8) OF varchar2(30);
Note: homogeneous model -- both SQL-1999 -- but
difficulties.
12
Different Standards

For example -- Names:



Person(surname, first_name, ..)
or Person(first_name, surname, …)
or Person(name, …)

First two may easily be made equivalent
but convention in third needs to be
understood.

Note also possibilities of A.N.Other, AN Other, A N Other.
13
Possible Solutions


In schema B define function which amalgamates the two
parts of author into one value.
 Will need to look manually at format of author in
schema A.
 If format inconsistent, need some pre-processing.
Other inconsistencies require decisions:




Fixed two entries for keyword versus array dimension 8.
Different name for keyword attribute
Different size for title fields (presumably adopt higher).
In heterogeneous environment, need also to relate schema
constructions. Is class same as table?
14
Simple Problem in
Interoperability 2

Homogeneous Models

The same information may be held as attribute
name, relation name or a value in different
databases

e.g. fines in library;



could be held in a dedicated relation Fine(amount,
borrowed_id)
or as an attribute Loan(id, isbn, date_out, fine)
or as a value Charge(1.25, ‘fine’)
15
Architectures for Interoperability 1
1. Global schema integration
Produces single new schema (C) for the
different information systems with schemas
(A, B).
C
A
B
16
Global Schema Integration

Advantages


Transparent to end users -- appears as single
information system
Disadvantages




Difficult -- needs human understanding to
perform integration
Local autonomy lost
Static - does not evolve automatically
Tightly-coupled
17
Architectures for Interoperability2
2. Federated Database Systems






Less tightly coupled schema (than in 1)
Each service through an export schema
specifies sharable objects
Common data model
Internal command language
Decentralised control (local autonomy)
Five-level architecture for federated system
e.g. Objectivity as Federated OODBMS
18
Federated DBMS - five-level
Architecture
Global ES
Global ES
Global CS
Local ES
Local ES
Local CS
Local CS
Local IS
Local IS
DB
DB
19
Terminology FDBMS



IS is Internal Schema defining layout on
disk of a conceptual schema
CS is Conceptual Schema defining logical
database (e.g. relational -- tables, attributes,
domains)
ES is External Schema defining views on
conceptual schema
20
Federated Databases: Looselycoupled


Created by users. AE, BE are export
schema.
V is view. A,B are base schema, autonomy
retained over that part of schema not
exported.
V
AE
A
BE
B
21
Federated Databases: TightlyCoupled




Created by administrators
Global schema integration on all export
schemas
More formal than loosely-coupled
Much effort to resolve semantic
inconsistencies
22
Federated Database Systems General Advantages



Preserves local autonomy
Not all data needs to be integrated
Provides metadata structures for views
(external and export schema, data
dictionary)
23
Federated Database Systems Disadvantages by Approach

Tightly-coupled


similar to global schema integration
1) complex, difficult to make changes
dynamically
2) much effort in resolving semantic
inconsistencies
Loosely-coupled
duplication by different users in building views
 updating data defined in views can be difficult

24
Multidatabase Language
Approach





No attempt at schema integration
All sites maintain complete autonomy
Various schema in services provided can be
heterogeneous, inconsistent and duplicate
information in different ways.
Language (e.g. MSQL) is used to integrate
databases at run time.
Relational data model used as Common Data
Model
25
Multidatabase Language
Approach - Diagram
MSQL
A,B are schema
MSQL is runtime
language
A
B
26
Multidatabase Language
Approach - Advantages




No preparatory work to understand
semantics of schema
Dynamic -- access latest versions
Very skilled users can succeed in reaching
their goals
Interesting work on multidatabase
dependencies
27
Example Multidatabase
Language

MSQL (Multidatabase SQL)
Biased towards relational model
 Illustrates problems


Consider 2 databases
Each on publications of a computing society
 And query:
 “What is the name, email, title for each
publication of an author appearing in both of
the society’s databases?”

28
MSQL - Schema

Schema 1 (for AIIA Database):
 Contacts (PersonID, Name, Email, …)
 Conference (Name, Type, …)
 Attendees(ID, Conf_ID, Speaker, …)
 Publ_Papers(P_ID, Title, Author_ID, …)

Schema 2 (for IFIP Database):




Member_Socs(Soc_Name, …)
Conf (Conf_ID, …)
Publ_Papers(P_Ref, Title, Conf_Ref, …)
Authors(Name, Email, Paper_ID, …)
Underlined attributes are primary key; attributes in italics are foreign key.
29
MSQL for Query
USE AIIA, IFIP
SELECT Name, Email, Title
FROM Authors,
IFIP.Publ_Papers IFIP_Paper,
Contacts,
AIIA.Publ_papers AIIA_Paper
WHERE Authors.Name = Contacts.Name
AND Contacts.Person_ID = AIIA_Paper. Author_ID
AND Authors.Paper_ID = IFIP_Paper.P_Ref;
The USE statement declares the multidatabases which are aliased in
the FROM statement to distinguish tables with the same name.
Retrieves Name, Email and Title from both databases.
30
Potential Problems with MSQL
Are domains on name comparable?
 Can use LET command to create
equivalencies of names but does not solve
domain mismatch.
 What if one schema not relational? EntityRelationship model often used as neutral
schema for translation and comparison of
heterogeneous features

31
Multidatabase Language Disadvantages in General




Distribution is not transparent
Users must resolve inconsistencies
themselves
Common language may restrict scope of
heterogeneity (relational bias)
Local autonomous system may change
schema freely (so that existing queries fail)
32
Comparison of Approaches

By coupling:

how tightly is the interoperable system
connected to its underlying systems

By adaptability:

the ability for the interoperable system to
evolve in line with underlying schema
By transparency:
 the need for the end-user to understand the
underlying schema

33
Comparison of Approaches
Coupling
Adaptability
Transparency
Global Schema
Integration
Tight
Low
High
Federated
Data Bases
Medium
Medium
Medium
Multidatabase
Languages
Low
High
Low
Approach
34
Summary
Trend:
From Global Schema Integration
Federated Database
Multidatabase Language
 of lower coupling, higher adaptability,
and lower transparency.

35
Further Reading

Management of Heterogeneous and
Autonomous Database Systems
Elmagarmid, Ahmed
Rusinkiewicz, Marek
Sheth, Amit
Morgan Kaufmann 1999.
36

distributed_db_arch

Transcript distributed_db_arch

Directory