Transcript Chapter 12
12
Chapter 12
Distributed Database
Management Systems
Database Systems:
Design, Implementation, and Management,
Seventh Edition, Rob and Coronel
1
12
In this chapter, you will learn:
• What a distributed database management
system (DDBMS) is and what its components are
• How database implementation is affected by
different levels of data and process distribution
• How transactions are managed in a distributed
database environment
• How database design is affected by the
distributed database environment
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
2
12
The Evolution of Distributed Database
Management Systems
• Distributed database management system
(DDBMS)
– Governs storage and processing of logically
related data over interconnected computer
systems in which both data and processing
functions are distributed among several sites
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
3
12
The Evolution of Distributed Database
Management Systems (continued)
• Centralized database required that corporate
data be stored in a single central site
• Dynamic business environment and
centralized database’s shortcomings
spawned a demand for applications based on
data access from different sources at multiple
locations
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
4
12
The Evolution of Distributed Database
Management Systems (continued)
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
5
12
DDBMS Advantages and Disadvantages
• Advantages include:
–
–
–
–
–
Data are located near “greatest demand” site
Faster data access
Faster data processing
Growth facilitation
Improved communications
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
6
12
DDBMS Advantages and Disadvantages
(continued)
• Advantages include (continued):
–
–
–
–
Reduced operating costs
User-friendly interface
Less danger of a single-point failure
Processor independence
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
7
12
DDBMS Advantages and Disadvantages
(continued)
• Disadvantages include:
–
–
–
–
–
Complexity of management and control
Security
Lack of standards
Increased storage requirements
Increased training cost
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
8
12
DDBMS Advantages and Disadvantages
(continued)
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
9
12
DDBMS Advantages and Disadvantages
(continued)
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
10
12
DDBMS Advantages and Disadvantages
(continued)
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
11
12
Characteristics of Distributed
Management Systems
• Application interface
• Validation
• Transformation
• Query optimization
• Mapping
• I/O interface
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
12
12
Characteristics of Distributed
Management Systems (continued)
• Formatting
• Security
• Backup and recovery
• DB administration
• Concurrency control
• Transaction management
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
13
12
Characteristics of Distributed
Management Systems (continued)
• Must perform all the functions of centralized
DBMS
• Must handle all necessary functions imposed
by distribution of data and processing
– Must perform these additional functions
transparently to the end user
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
14
12
Characteristics of Distributed
Management Systems (continued)
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
15
12
DDBMS Components
• Must include (at least) the following
components:
–
–
–
–
Computer workstations
Network hardware and software
Communications media
Transaction processor (application processor,
transaction manager)
• Software component found in each computer
that requests data
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
16
12
DDBMS Components (continued)
• Must include (at least) the following
components (continued):
– Data processor or data manager
• Software component residing on each
computer that stores and retrieves data located
at the site
• May be a centralized DBMS
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
17
12
DDBMS Components (continued)
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
18
12
Levels of Data and Process Distribution
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
19
12
Single-Site Processing,
Single-Site Data (SPSD)
• All processing is done on single CPU or host
computer (mainframe, midrange, or PC)
• All data are stored on host computer’s local
disk
• Processing cannot be done on end user’s
side of system
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
20
12
Single-Site Processing,
Single-Site Data (SPSD) (continued)
• Typical of most mainframe and midrange
computer DBMSs
• DBMS is located on host computer, which is
accessed by dumb terminals connected to it
• Also typical of first generation of single-user
microcomputer databases
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
21
12
Single-Site Processing,
Single-Site Data (SPSD) (continued)
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
22
12
Multiple-Site Processing,
Single-Site Data (MPSD)
• Multiple processes run on different computers
sharing single data repository
• MPSD scenario requires network file server
running conventional applications that are
accessed through LAN
• Many multiuser accounting applications,
running under personal computer network, fit
such a description
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
23
12
Multiple-Site Processing,
Single-Site Data (MPSD) (continued)
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
24
12
Multiple-Site Processing,
Multiple-Site Data (MPMD)
• Fully distributed database management
system with support for multiple data
processors and transaction processors at
multiple sites
• Classified as either homogeneous or
heterogeneous
• Homogeneous DDBMSs
– Integrate only one type of centralized DBMS
over a network
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
25
12
Multiple-Site Processing,
Multiple-Site Data (MPMD) (continued)
• Heterogeneous DDBMSs
– Integrate different types of centralized DBMSs
over a network
• Fully heterogeneous DDBMS
– Support different DBMSs that may even
support different data models (relational,
hierarchical, or network) running under
different computer systems, such as
mainframes and microcomputers
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
26
12
Multiple-Site Processing,
Multiple-Site Data (MPMD) (continued)
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
27
12
Distributed Database
Transparency Features
• Allow end user to feel like database’s only
user
• Features include:
–
–
–
–
–
Distribution transparency
Transaction transparency
Failure transparency
Performance transparency
Heterogeneity transparency
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
28
12
Distribution Transparency
• Allows management of physically dispersed
database as though it were a centralized
database
• Following three levels of distribution
transparency are recognized:
– Fragmentation transparency
– Location transparency
– Local mapping transparency
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
29
12
Distribution Transparency (continued)
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
30
12
Distribution Transparency (continued)
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
31
12
Transaction Transparency
• Ensures database transactions will maintain
distributed database’s integrity and
consistency
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
32
12
Distributed Requests and Distributed
Transactions
• Distributed transaction
– Can update or request data from several
different remote sites on network
• Remote request
– Lets single SQL statement access data to be
processed by single remote database
processor
• Remote transaction
– Accesses data at single remote site
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
33
12
Distributed Requests and Distributed
Transactions (continued)
• Distributed transaction
– Allows transaction to reference several
different (local or remote) DP sites
• Distributed request
– Lets single SQL statement reference data
located at several different local or remote DP
sites
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
34
12
Distributed Requests and Distributed
Transactions (continued)
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
35
12
Distributed Requests and Distributed
Transactions (continued)
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
36
12
Distributed Requests and Distributed
Transactions (continued)
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
37
12
Distributed Requests and Distributed
Transactions (continued)
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
38
12
Distributed Requests and Distributed
Transactions (continued)
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
39
12
Distributed Concurrency Control
• Multisite, multiple-process operations are
much more likely to create data
inconsistencies and deadlocked transactions
than are single-site systems
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
40
12
Distributed Concurrency Control
(continued)
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
41
12
Two-Phase Commit Protocol
• Distributed databases make it possible for
transaction to access data at several sites
• Final COMMIT must not be issued until all
sites have committed their parts of
transaction
• Two-phase commit protocol requires each
individual DP’s transaction log entry be
written before database fragment is actually
updated
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
42
12
Performance Transparency
and Query Optimization
• Objective of query optimization routine is to
minimize total cost associated with execution
of request
• Costs associated with request are function of:
– Access time (I/O) cost
– Communication cost
– CPU time cost
• Must provide distribution transparency as well
as replica transparency
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
43
12
Performance Transparency
and Query Optimization (continued)
• Replica transparency
– DDBMS’s ability to hide existence of multiple
copies of data from user
• Query optimization techniques include:
– Manual or automatic
– Static or dynamic
– Statistically based or rule-based algorithms
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
44
12
Distributed Database Design
• Data fragmentation
– How to partition database into fragments
• Data replication
– Which fragments to replicate
• Data allocation
– Where to locate those fragments and replicas
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
45
12
Data Fragmentation
• Breaks single object into two or more
segments or fragments
• Each fragment can be stored at any site over
computer network
• Information about data fragmentation is
stored in distributed data catalog (DDC), from
which it is accessed by TP to process user
requests
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
46
12
Data Fragmentation (continued)
• Strategies
– Horizontal fragmentation
• Division of a relation into subsets (fragments) of
tuples (rows)
– Vertical fragmentation
• Division of a relation into attribute (column)
subsets
– Mixed fragmentation
• Combination of horizontal and vertical
strategies
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
47
12
Data Fragmentation (continued)
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
48
12
Data Fragmentation (continued)
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
49
12
Data Fragmentation (continued)
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
50
12
Data Fragmentation (continued)
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
51
12
Data Fragmentation (continued)
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
52
12
Data Fragmentation (continued)
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
53
12
Data Fragmentation (continued)
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
54
12
Data Replication
• Storage of data copies at multiple sites
served by computer network
• Fragment copies can be stored at several
sites to serve specific information
requirements
– Can enhance data availability and response
time
– Can help to reduce communication and total
query costs
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
55
12
Data Replication (continued)
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
56
12
Data Replication (continued)
• Replication scenarios
– Fully replicated database
• Stores multiple copies of each database
fragment at multiple sites
• Can be impractical due to amount of overhead
– Partially replicated database
• Stores multiple copies of some database
fragments at multiple sites
• Most DDBMSs are able to handle the partially
replicated database well
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
57
12
Data Replication (continued)
• Replication scenarios (continued)
– Unreplicated database
• Stores each database fragment at single site
• No duplicate database fragments
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
58
12
Data Allocation
• Deciding where to locate data
• Allocation strategies
– Centralized data allocation
• Entire database is stored at one site
– Partitioned data allocation
• Database is divided into several disjointed parts
(fragments) and stored at several sites
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
59
12
Data Allocation (continued)
• Allocation strategies (continued)
– Replicated data allocation
• Copies of one or more database fragments are
stored at several sites
• Data distribution over computer network is
achieved through data partition, data
replication, or combination of both
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
60
12
Client/Server vs. DDBMS
• Way in which computers interact to form
system
• Features user of resources, or client, and
provider of resources, or server
• Can be used to implement a DBMS in which
client is the TP and server is the DP
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
61
12
Client/Server vs. DDBMS (continued)
• Client/server advantages
– Less expensive than alternate minicomputer
or mainframe solutions
– Allow end user to use microcomputer’s GUI,
thereby improving functionality and simplicity
– More people in job market have PC skills than
mainframe skills
– PC is well established in workplace
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
62
12
Client/Server vs. DDBMS (continued)
• Client/server advantages (continued)
– Numerous data analysis and query tools exist
to facilitate interaction with DBMSs available in
PC market
– Considerable cost advantage to offloading
applications development from mainframe to
powerful PCs
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
63
12
Client/Server vs. DDBMS (continued)
• Client/server disadvantages
– Creates more complex environment
• Different platforms (LANs, operating systems,
and so on) are often difficult to manage
– An increase in number of users and
processing sites often paves the way for
security problems
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
64
12
Client/Server vs. DDBMS (continued)
• Client/server disadvantages (continued)
– Possible to spread data access to much wider
circle of users
• Increases demand for people with broad
knowledge of computers and software
• Increases burden of training and cost of
maintaining the environment
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
65
12
C. J. Date’s Twelve Commandments for
Distributed Databases
• Local site independence
• Central site independence
• Failure independence
• Location transparency
• Fragmentation transparency
• Replication transparency
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
66
12
C. J. Date’s Twelve Commandments for
Distributed Databases (continued)
• Distributed query processing
• Distributed transaction processing
• Hardware independence
• Operating system independence
• Network independence
• Database independence
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
67
12
Summary
• Distributed database stores logically related data in
two or more physically independent sites connected
via computer network
• Distributed processing is division of logical database
processing among two or more network nodes
• Distributed databases require distributed processing
• Main components of DDBMS are transaction
processor and data processor
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
68
12
Summary (continued)
• Current database systems can be classified
by extent to which they support processing
and data distribution
• Homogeneous distributed database system
integrates only one particular type of DBMS
over computer network
• Heterogeneous distributed database system
integrates several different types of DBMSs
over computer network
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
69
12
Summary (continued)
• DDBMS characteristics are best described as set of
transparencies
• Transaction is formed by one or more database
requests
• Distributed concurrency control is required in network
of distributed databases
• Distributed DBMS evaluates every data request to
find optimum access path in distributed database
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
70
12
Summary (continued)
• The design of distributed database must
consider fragmentation and replication of data
• Database can be replicated over several
different sites on computer network
• Client/server architecture refers to way in
which two computers interact over computer
network to form a system
Database Systems: Design, Implementation, & Management, 7th Edition, Rob & Coronel
71