No Slide Title - American University

Download Report

Transcript No Slide Title - American University

Distributed Databases
BUAD/American University
Distributed Databases
1
Definitions
• Distributed Database: A single logical
database that is spread physically across
computers in multiple locations (possibly
global) that are connected by a data
communications link.
• Decentralized Database: A collection of
independent databases on non-networked
computers. (possibly global)
BUAD/American University
Distributed Databases
2
Reasons for
Distributed Database
• Local business units want control over data.
• Consolidate data across local databases for
integrated decision making.
• Reduce telecommunications costs.
• Reduce the risk of telecommunications
failures.
BUAD/American University
Distributed Databases
3
Distributed Database Options
• Homogeneous - Same DBMS at each node.
• Heterogeneous - Different DBMSs at
different nodes.
• Systems - Supports some or all of the
functionality of one logical database.
BUAD/American University
Distributed Databases
4
Homogeneous, Non-Autonomous
Database
• Data is distributed across all the nodes.
• Same DBMS at each node.
• All data is managed by the distributed
DBMS (no exclusively local data.)
• All access is through one, global schema.
• The global schema is the union of all the
local schema.
BUAD/American University
Distributed Databases
5
Focus on The Following
Heterogeneous Environment
• Data distributed across all the nodes.
• Different DBMSs may be used at each
node.
• Local access is done using the local DBMS
and schema.
• Remote access is done using the global
schema.
BUAD/American University
Distributed Databases
6
Objectives and Trade-offs
• Location Transparency - User does not have
to know the location of the data.
• Local Autonomy - Local site can operate
with its database when central site is down.
• Synchronous Distributed Database - All
copies of the same data are always identical.
• Asynchronous Distributed Database - Some
data inconsistency is tolerated.
BUAD/American University
Distributed Databases
7
Advantages of
Distributed Database
•
•
•
•
•
Increased reliability and availability.
Local control over data.
Modular growth.
Lower communication costs.
Faster response for certain queries.
BUAD/American University
Distributed Databases
8
Disadvantages of
Distributed Database
•
•
•
•
Software cost and complexity.
Processing overhead.
Data integrity exposure.
Slower response for certain queries.
BUAD/American University
Distributed Databases
9
Options for
Distributing a Database
•
•
•
•
Data replication.
Horizontal partitioning.
Vertical partitioning.
Combinations of the above.
BUAD/American University
Distributed Databases
10
Data Replication
• Advantages – Reliability.
– Fast response.
– May avoid complicated distributed transaction integrity
routines (if replicated data is refreshed at scheduled
intervals.)
– De-couples nodes (transactions proceed even if some
nodes are down.)
– Reduced network traffic at prime time (if updates can
be delayed.)
BUAD/American University
Distributed Databases
11
Data Replication
• Disadvantages –
–
–
–
Additional requirements for storage space.
Additional time for update operations.
Complexity and cost of updating.
Integrity exposure of getting incorrect data if
replicated data is not updated simultaneously.
• Therefore, better when used for non-volatile
data.
BUAD/American University
Distributed Databases
12
Types of Data Replication
• Snapshot Replication – Changes are periodically sent to a master site
which sends an updated snapshot out to the other
sites.
• Near Real-Time Replication – Broadcast update orders without requiring
confirmation.
• Pull Replication – Each site controls when it wants updates.
BUAD/American University
Distributed Databases
13
Issues in Data Replication Use
• Data timeliness.
• Useful if DBMS cannot reference data from more than
one node.
• Batched updates can cause performance problems.
• Updates complicated with heterogeneous DBMSs or
database design.
• Telecommunications speeds may limit mass updates.
BUAD/American University
Distributed Databases
14
Horizontal Partitioning
• Different records of a file at different sites.
• Advantages – Data stored close to where it is used.
– Local access optimization.
– Security.
• Disadvantages
– Accessing data across partitions.
– No data replication.
BUAD/American University
Distributed Databases
15
Vertical Partitioning
• Different columns of a file at different sites.
• Advantages and disadvantages are the same
as for horizontal partitioning except that
combining data across partitions is more
difficult because it requires joins.
BUAD/American University
Distributed Databases
16
Five Distributed Database
Organizations
Centralized database, distributed access.
Replication with periodic snapshot update.
Replication with near real-time
synchronization of updates.
Partitioned, one logical database.
Partitioned, independent, non-integrated
segments.
BUAD/American University
Distributed Databases
17
Factors in Choice of
Distributed Strategy
•
•
•
•
•
•
Funding, autonomy, security.
Site data referencing patterns.
Growth and expansion needs.
Technological capabilities.
Costs of managing complex technologies.
Need for reliable service.
BUAD/American University
Distributed Databases
18
Requirements for a
Distributed DBMS
• Ability to locate data with a distributed data
dictionary.
• Determine the location from which to retrieve data
and the location at which to process each part of a
distributed query.
• Heterogeneous DBMS translation.
• Security, concurrency, query optimization, failure
recovery.
• Consistency of replicated data.
BUAD/American University
Distributed Databases
19