Transcript View File

Distributed Databases
RESOURCES:
Chapter 13 (Hoffer)
Chapter 10 (Rob Coronel)
Outline

Options for distributing data



Replication Schemes
Distributed DBMS Products
C.J. Date’s Twelve Commandments
Options for Distributing a
Database

Four basic strategies




Data Replication
Horizontal Partitioning
Vertical Partitioning
Combination of above
Data Replication



Data distribution option that supports a separate copies of
database stored at two or more sites
 Full database copies or fragments of database can be kept at
multiple sites
Replication may use either synchronous or asynchronous
technology
Replication Types
 Fully replicated DB


Partially replicated DB


Multiple copies of each database fragment stored at multiple sites
Multiple copies of some database fragments are stored at multiple
sites
Unreplicated DB

Each database fragment stored at a single site – there are no
duplicate fragments
Data Replication Schemes

Different schemes exist for updating data
copies



Snapshot Replication (Push strategy)
Near Real-time Replication (Push strategy)
Pull Replication
Snapshot Replication(1)

Assume multiple sites are updating the same data

Case1- Full refresh of database
 Updates from all replicated sites are periodically collected at a
master or primary site – forming consolidated result of all
changes
 With some DDBMS, this list of changes is collected in a snapshot
log - a table of row identifiers for records to go into snapshot
 A read-only snapshot of the replicated portion of the database is
taken at master site
 Finally, Snapshot is sent to each site where there is a copy

Case2 - Differential or Incremental refresh
 only those pages that have changed since the last snapshot, can
be sent.
 Snapshot log for each replicated table is joined with the
associated base to form the set of changed rows to be sent to
replicated sites
Snapshot Replication(2)


Applications (for Decision Support Sytems,
Datawarehousing and Data-mining) that do not
require current data can use this replication scheme
Cost of snapshot refresh depends on type of
snapshot


Simple Snapshot
 References all or portion of one table
 Can be handled by differential refresh in some DDBMS
Complex Snapshot
 References multiple tables usually from transactions that
involve joins
 Can be handled by time consuming full refresh in some
DDBMS
Near Real-time Replication



store and forward messages for each completed
transaction, can be broadcasted across the network
These messages inform all nodes to update data as
soon as possible without forcing a confirmation to
the originating node
One way to generate these messages is by using
triggers


A trigger can be stored at each local database
when a piece of replicated data is updated, trigger
executes corresponding update commands against remote
database replicas
Pull Replication




This is a pull strategy
The target, not the source node, controls when a
local database is updated
Local database determines when it needs to be
refreshed
For refresh, it requests a snapshot or the emptying
of an update message queue
Advantages and
Disadvantages of Replication

Advantages






Reliability
Fast Response
Possible avoidance of complicated distributed
transaction integrity routines
Node decoupling
Reduced network traffic
Disadvantages


Storage requirements
Complexity and cost of updating
See
pg 499
Replication suitability

Replication is suitable for


applications where most requests are read-only
and data are static (e.g. catalogs, telephone
directories, train schedules)
Not suitable for



Online applications (e.g. airline reservation)
ATM machine transactions
Financial activity based applications
See more on pg.501 – when to use replication
Options for Distributing a
Database

Four basic strategies




Data Replication
Horizontal Partitioning
Vertical Partitioning
Combination of above
Horizontal Partitioning


Data is distributed across different sites based on
one or more primary keys.
Rows of a relation are distributed to many sites




Rows of a table/relation are divided into subsets of rows
with each subset/fragment placed at a different node
Considering example of Bank, data specific to each branch
can be placed at the branch server
When a customer conducts transaction at home branch, it
is processed locally in minimum time
When transaction is conducted at another branch, it can be
sent for processing to home branch and response can be
sent back to initiating branch
Horizontal Partitioning


Used when an organization’s function is distributed,
but each site is concerned with only subset of data
For example, branch offices in an organization deal
mostly with a set of local customers and the related
customer data need not be accessed by other
branch offices.
Advantages of Horizontal
Partitioning

Efficiency


Local optimization


optimized performance for local access
Security


data is stored where it is used
data irrelevant to a particular site is not made
available
Ease of querying

its easy to combine rows by unions across horizontal
partitions
Disadvantages of Horizontal
Partitioning

Inconsistent access speed


Accessing data from several partitions can be significantly
different from local-only data access
Backup vulnerability


Since data are not replicated, when data at one site
becomes damaged or inaccessible, there is no alternative
site for that data
Data may be lost if proper backups are not performed
Options for Distributing a
Database

Four basic strategies




Data Replication
Horizontal Partitioning
Vertical Partitioning
Combination of above
Vertical Partitioning

Data is split by columns across multiple systems. The
primary key is replicated at each site. Hence, columns
of a relation are distributed to many sites



Columns are divided into subsets of columns with each
subset/fragment having unique columns except for the key
column.
The relations at each site must share a common domain for
original table reconstruction
Considering example of a bank, the bank can have customer
data



data columns specific to personal information can become one
subset,
the columns related to financial and account status can become
another subset
PK – Account no. will be present in both subsets
Advantages and Disadvantages of
Vertical Partitioning

Same as those for horizontal partitioning with the
exception that


Combining data across vertical partitions is more difficult
than that across horizontal partitions
Reason – the need to match PKs or other qualifications to
join rows across partitions
Options for Distributing a
Database

Four basic strategies




Data Replication
Horizontal Partitioning
Vertical Partitioning
Combination of above
Combination of Operations




There are unlimited combinations of the preceding
strategies.
Some data may be centrally stored, while other data
are replicated at various sites.
For a given relation both horizontal and vertical
partitions may be desirable for data distribution.
Figure (next slide) shows that



Engg. Parts, Accounting and Customer data are each
centralized at different locations
Standard parts data are partitioned horizontally among the
three locations
Price List is replicated at all the three locations
Combination of Operations
RegionA
RegionB
RegionA
parts
RegionB
parts
RegionC
Region
C parts
Engg
Parts
Account
ing
Custom
ers
PriceList
PriceList
PriceList
Comparison of Distributed DB
Design strategies

Reading Assignment – pg 505
Distributed DBMS Products(1)

IBM Products

DPropR (DataPropagator Relational)




Distributed Relational Database Architecture
(DRDA)


Works with DB2
Primary site and asynchronous updates
Read-only sites subscribe to primary sites
Heterogeneous databases
DataJoiner

Middleware to access non-IBM databases
Distributed DBMS Products(2)

Sybase products

Replication Server





SQL Anywhere


Primary site and distributed read-only sites
Update to read-only site as one transaction
Hierarchical replication
Data and stored procedures replicated
Mobile databases
OmniSQL

Heterogeneous databases
Distributed DBMS Products(3)

Oracle products

Table Snapshot Option


Periodic snapshots sent to read-only sites
Symmetric Replication Option




Asynchronous and synchronous with multiple
updatable copies and replication from any node to any
other node (bidirectional)
Differential refresh
DBA controls replication
Two-phase commit
Distributed DBMS Products(4)

Microsoft products

SQL Server




Primary site and distributed read-only sites
Publish and subscribe, with articles and publications
One database can pass copies of publications to other
sites
Mobile databases
Distributed DBMS Products(5)

Computer Associates products

CA-Ingres/Replicator




All database copies updatable
Hierarchical representation
DBA registers data for replication and other sites
subscribe
Ingres/Net and Ingres/Star



Decomposes query to distributed , homogeneous sites
Two-phase commit
Also used with non-ingres databases
C.J. Date’s Twelve Commandments


1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
C. J. Date’s Commandments describe a fully distributed database.
Although no current DDBMS conforms to all of them, the rules do
constitute a useful distributed database target. The rules are:
Local site Independence
Central site Independence
Failure Independence
Location Transparency
Fragmentation Transparency
Replication Transparency
Distributed Query Processing
Distributed Transaction Processing
Hardware Independence
Operating System Independence
Network Independence
Database Independence