distributed_db_arch_replication

Transcript distributed_db_arch_replication

Advanced Databases CG096
Lecture 10: Distributed Databases –
Replication and Fragmentation
Nick Rossiter
1
Overview

Last week:



Saw difficulty in handling logical relationships
between distributed information
Potential solutions such as federated DDBMS
This week:

Look at an area where distributed databases are
extensively used


replication
For backup


for improving reliability of service
such as for mirror site
2
Strategies for Data Allocation 1

Centralised

Single database, users distributed across network

High communication costs



Low reliability and low availability


Failure of central site leads to no access to entire database
system
Storage costs


All data access by users over network
No local references
No duplication so minimal
Performance

Likely to be unsatisfactory
3
Strategies for Data Allocation 2

Fragmented

Database distributed by fragments (disjoint views)

Low communication costs


Reliability and availability vary depending on failed site



Failure of one part loses fragments situated there
Other fragments continue to be available
Storage costs


Fragments located near their main users (if good design)
No duplication so minimal
Performance

Likely to be satisfactory – better than centralised as less
network traffic
4
Strategies for Data Allocation 3

Complete Replication

Database completely copied to each site

Communication costs:



High reliability and high availability


Can switch from failed site to another
High Storage costs


High for update, low for read
Need to propagate updates through system
Complete duplication
Performance


High for reads
Potentially poor for updates with propagation of updates
5
Strategies for Data Allocation 4

Selective Replication

Fragments are selectively replicated

Communication costs:


Reliability and availability vary depending on failed site



Failure of one part loses fragments situated there
Other fragments continue to be available
Storage costs


Low (if good design)
Duplication of some fragments mean that it is not minimal but
less than with complete replication
Performance

Likely to be satisfactory – better than centralised as less
network traffic
6
Fragmentation -- Further Details


A fragment is a view on a table.
Two main types

Horizontal (classification by value)


subset of tuples obtained by restrict operation
(algebra) or WHERE clause (SQL)
Vertical (classification by property)

subset of columns obtained by project operation
(algebra) or SELECT clause (SQL)
7
Other Forms of Fragmentation

Mixed (classification by both value and
property)


Derived (association)


both horizontal and vertical fragmentation are
used to obtain a single fragment
an expression such as a join connects the
fragments
None

The whole of a table appears without change
in a view
8
Why fragment?



Most applications use only part of the data
in a table
To minimise network traffic, do not send
more data than is strictly necessary to any
site
Data not required by an application is not
visible to it, enhancing security
9
Factors against fragmentation

Performance


may be affected adversely by the need for
some applications to reconstruct fragments
into larger units
Integrity

more difficult to control with dependencies
possibly scattered across fragments
10
Three rules for fragmentation R1

R1) Completeness

If a table T is decomposed into fragments



every value found in T must be found in at least
one of the fragments
Otherwise get loss of data
So no loss of data as a whole in
fragmentation
11
Three rules for fragmentation R2

R2) Reconstruction



It must be possible to reconstruct T from the
fragments using a relational operation
(typically a natural join)
Otherwise decomposition into fragments is
lossy
Functional dependencies are preserved
12
Three rules for fragmentation R3

R3) Disjointness


A data item may not appear in more than one
fragment unless it is a component of a
primary key
Avoids duplication and potential
inconsistency


although transactions should avoid latter
Primary key duplication allows
reconstructions to be made
13
Strategy for Designing a Partially
Replicated Distributed Database
1


Design global database using standard
methodology
Examine regional distribution of business.

What data should be held by each part of
business?
Some data is only used locally (not exported, as in
Federated DDBMS)
 Some data is mostly used locally

14
Strategy for Designing a Partially
Replicated Distributed Database
2

Transactions give many clues as to ideal
placement of fragments



a transaction will perform slowly if it requires data
from different sites, unless the network connecting
them is very fast
a transaction performing much replication of updates
will perform slowly if there is frequent contention for
resources (locking)
frequently used transactions should be optimised;
infrequently used ones can be ignored
15
Strategy for Designing a Partially
Replicated Distributed Database
3

Decide on which relations are not to be
fragmented. These will normally be
replicated everywhere:


as easy to update and to maintain integrity.
Fragment remaining relations to suit:


locality
transactions
16
Transparencies in DDBMS


Transparency hides details at lower levels
(often implementation ones) from user
Four main types:




Distribution
Transaction
Performance
DBMS
17
Distribution Transparency

The DDB is perceived by the user as a
single, logical unit even though the data is:


distributed over several sites
fragmented in various ways
18
Significance of Full Distribution
Transparency




User does not need to know anything about
the distribution techniques
User addresses global schema in queries
User will, however, not understand why
some queries take longer than others
Highest form of distribution transparency is
termed

fragmentation transparency
19
Reduced forms of distribution
transparency

Location transparency



user needs to know about fragmentation but
not about placements at sites
user does not need to know which
replications exist
Local mapping transparency


the most limited transparency
user needs to know about fragmentation and
sites
20
Transaction Transparency


Ensures that all transactions maintain the
DDB’s integrity and consistency
Each transaction is divided into
subtransactions




one subtransaction for each site
usually execute subtransactions in parallel
gains in efficiency
More complicated than in centralised
system
21
Forms of Transaction
Transparency

Concurrency Transparency


all concurrent transactions (centralised and
distributed) execute independently
DDBMS must ensure that:
each subtransaction is executed in the normal
spirit of transactions (ACID)
 the subtransactions as a whole, forming one
transaction, are executed ACID-style
 the mixture of subtransactions and whole
transactions is executed ACID-style

22
Transactions -- problems with
replication



Failure Transparency
Users are unaware of problems such as that below
encountered during transaction execution
If say 6 copies of a data item (at 6 sites) need to
be updated:



problems if only 5 are currently reachable
need to delay COMMIT until all sites processed
otherwise inconsistent data

unless allow delayed asynchronous update
23
Performance transparency

Requires:

the DDBMS to determine the most costeffective way to handle a request
which fragment to use
 (if replicated) which copy of a fragment to use
 which site to use


avoidance of any performance degradation
compared with a centralised system
24
DBMS transparency


Hides knowledge of which DBMS is being
used
The most difficult transparency of all


particularly with heterogeneous models
See problems highlighted in lecture 9:



Global Schema Integration
Federated Databases
Multidatabase Languages
25
Replication Servers

Copying and maintenance of data on
multiple servers


Replication -- the process of generating and
reproducing multiple copies of data at one or
more sites
Servers – provides the file resources – the
distributed database
26
Benefits of Replication




Increased reliability
Better data availability
Potential for better performance (with good
design)
Warm stand-by

As in mirror site, shadowing actions of main
site and cutting in if main site crashes
27
Timing of Replication

Synchronous




Immediate according to some common signal such as
time
Ideal as ensures immediate consistency
Assumes availability of all sites
Asynchronous



Independently with delays ranging from a few
seconds to several days
Immediate consistency is not achieved
More flexible as at any one time not all sites need to
be available
28
Types of data replicated

Across heterogeneous data models


Object replication




Mapping required (hard)
More varied than just base data
Also auxiliary structures such as indexes
Stored procedures and functions
Scalability

No volume restrictions
29
Replication administration

Subscription mechanism


Allows a permitted user to subscribe to
replicated data/objects
Initialisation mechanism

Allows for the initialisation of a target
replication
30
Ownership of Replicated Data 1

Master/Slave

Master site
Primary owner of replicated data
 Sole right to change data
 Publish and subscribe procedure
 Asynchronous replication as slave sites receive
copies of the data


Slave site
Receive read-only data from master site
 Slaves can be used as mobile clients

31
Ownership of Replicated Data 2

Workflow Ownership




Flexible master designation
Dynamic ownership model
Right to update data moves along the chain of
command (replicating sites)
For example, as order is processed the master
right moves to each department in turn
32
Ownership of Replicated Data 3

Update-anywhere




Peer-to-peer model
Multiple sites can update data
Conflict resolution required
More complex implementation
33
Distribution and Replication in
Oracle 9i



Materialised views
Formerly known as snapshots
Views are updated by


Refresh mechanism
Variable frequency to suit application
Fast – based on identified changes
 Complete – replaces existing data
 Force – tries Fast – if not possible – does
Complete

34
Oracle 9i transparency

Does not support


Fragmentation transparency
Supports

Site (location) transparency
35
Summary of Distributed DBMS

An area under keen development as improves




However, disadvantages remain:



Availability of data
Overall reliability of system
Performance (with good design)
Implementation can be complex (expensive)
Heterogeneity in models is poorly handled
Use for replicating data is main application today
36

distributed_db_arch_replication

Transcript distributed_db_arch_replication

Directory