Distributed Databases

Download Report

Transcript Distributed Databases

Distributed Databases
by Chien-Pin Hsu
CS157B Section 1
Nov 11, 2004
Dr. Sin-Min Lee
Distributed Database System
A distributed database system consists of
loosely coupled sites that share no physical
component
Appears to user as a single system
Database systems that run on each site are
independent of each other
Processing maybe done at a site other than the
initiator of request
Homogenous Distributed Database
Systems
 All sites have identical software
 They are aware of each other and agree to
cooperate in processing user requests
 It appears to user as a single system
An Homogenous Distributed Database
Systems example
 A distributed system connects three databases: hq, mfg, and sales
 An application can simultaneously access or modify the data in
several databases in a single distributed environment.
What can we do?
A single query from a Manufacturing client on
local database mfg can retrieve joined data from
the products table on the local database and the
dept table on the remote hq database.
 For a client application, the location and
platform of the databases are transparent.
Makes life easier!!
For example, if you are connected to database
mfg but want to access data on database hq,
creating a synonym on mfg for the remote dept
table enables you to issue this query:
SELECT *
FROM
dept
In this way, a distributed system gives the
appearance of native data access.
 Users on mfg do not have to know that the data
they access resides on remote databases.
Heterogeneous Distributed Database
System

In a heterogeneous distributed
database system, at least one of the
databases uses different schemas and
software.
 A database system having different schema may
cause a major problem for query processing.
 A database system having different software may
cause a major problem for transaction processing.
Distributed Data Storage
 Replication
– System maintains multiple copies of data, stored in
different sites, for faster retrieval and fault tolerance.
 Fragmentation
– Relation is partitioned into several fragments stored in
distinct sites
Replication and fragmentation can be combined
• Relation is partitioned into several fragments: system
maintains several identical replicas of each such
fragment.
Advantages of Replication
Availability: failure of site containing relation
r does not result in unavailability of r is
replicas exist.
Parallelism: queries on r may be processed
by several nodes in parallel.
Reduced data transfer: relation r is available
locally at each site containing a replica of r.
Disadvantages of Replication
Increased cost of updates: each replica of
relation r must be updated.
Increased complexity of concurrency
control: concurrent updates to distinct
replicas may lead to inconsistent data unless
special concurrency control mechanisms are
implemented.
• One solution: choose one copy as primary copy
and apply concurrency control operations on
primary copy.
Fragmentation
 Data can be distributed by storing individual
tables at different sites
 Data can also be distributed by decomposing a
table and storing portions at different sites –
called Fragmentation
 Fragmentation can be horizontal or vertical
Why use Fragmentation?
 Usage - in general applications use views so it’s appropriate to
work with subsets
 Efficiency - data stored close to where it is most frequently used
 Parallelism - a transaction can divided into several sub-queries to
increase degree of concurrency
 Security - data more secure - only stored where it is needed
Disadvantages:
Performance - may be slower
Integrity - more difficult
Horizontal Fragmentation
Each fragment, Ti , of table T contains a
subset of the rows
Each tuple of T is assigned to one or more
fragments.
Horizontal fragmentation is lossless
Horizontal Fragmentation Example
 A bank account schema has a relation
Account-schema = (branch-name, account-number, balance).
 It fragments the relation by location and stores each fragment
locally: rows with branch-name = `Hillside` are stored in the Hillside
in a fragment
Vertical Fragmentation
 Each fragment, Ti, of T contains a subset of the
columns, each column is in at least one fragment,
and each fragment includes the key:
Ti = attr_listi (T)
T = T1
T2 …..
Tn
 All schemas must contain a common candidate key (or
superkey) to ensure lossless join property.
 A special attribute, the tuple-id attribute may be added to
each schema to serve as a candidate key.
Vertical Fragmentation Example
A employee-info schema has a relation
employee-info schema = (designation, name,
Employee-id, salary).
It fragments the relation to put information in two
tables for security concern.
Commit Protocols
 Commit protocols are used to ensure
atomicity across sites
 Atomicity states that database modifications must
follow an “all or nothing” rule.
 a transaction which executes at multiple sites must
either be committed at all the sites, or aborted at all
the sites.
The Two-Phase Commit (2 PC) Protocol
What is this?
 Two-phase commit is a transaction protocol designed
for the complications that arise with distributed
resource managers.
 Two-phase commit technology is used for hotel and
airline reservations, stock market transactions,
banking applications, and credit card systems.
 With a two-phase commit protocol, the distributed
transaction manager employs a coordinator to
manage the individual resource managers. The
commit process proceeds as follows:
Phase1: Obtaining a Decision
Step 1  Coordinator asks all participants
to prepare to commit transaction Ti.
 Ci adds the records <prepare T> to the log
and forces log to stable storage (a log is a file
which maintains a record of all changes to the
database)
 sends prepare T messages to all sites where
T executed
Phase1: Making a Decision
Step 2  Upon receiving message, transaction
manager at site determines if it can commit the
transaction
 if not:
add a record <no T> to the log and send abort T
message to Ci
 if the transaction can be committed, then:
1). add the record <ready T> to the log
2). force all records for T to stable storage
3). send ready T message to Ci
Phase 2: Recording the Decision
 Step 1  T can be committed of Ci received a ready T
message from all the participating sites: otherwise T
must be aborted.
 Step 2  Coordinator adds a decision record, <commit
T> or <abort T>, to the log and forces record onto stable
storage. Once the record is in stable storage, it cannot
be revoked (even if failures occur)
 Step 3  Coordinator sends a message to each
participant informing it of the decision (commit or abort)
 Step 4  Participants take appropriate action locally.
Two-Phase Commit Diagram
Costs and Limitations
There have been two performance issues
with two phase commit:
– If one database server is unavailable, none of
the servers gets the updates.
– This is correctable through network tuning
and correctly building the data distribution
through database optimization techniques.