Design, Implementation, and Management THIRD EDITION

Download Report

Transcript Design, Implementation, and Management THIRD EDITION

1
Database Systems: Design,
Implementation, and Management
CHAPTER 10
Distributed Database Management System
Chapter Objectives
Understand concepts of distributed DBMS
 Understand various transparency features of
distributed databases
 Understand distributed database design issues

2
What Is A Distributed DBMS?




Decentralization of business operations and globalization
of businesses created a demand for distributing the data and
processes across multiple locations.
Distributed database management systems (DDBMS) are
designed to meet the information requirements of such
multi-location organizations.
A DDBMS manages the storage and processing of logically
related data over interconnected computer systems in which
both data and processing functions are distributed among
several sites.
Distributed processing shares the database’s logical
processing among two or more physically independent sites
that are connected through a network.
3
DDBMS Advantages









Data located near site with greatest demand
Faster data access
Faster data processing
Growth facilitation
Improved communications
Reduced operating costs
User-friendly interface
Less danger of single-point failure
Processor independence
4
DDBMS Disadvantages






Complexity of management and control
Security
Lack of standards
Increased storage requirements
Greater difficulty in managing data environment
Increased training costs
5
Distributed Processing
Shares database’s logical processing among physically, networked independent sites
Figure 10.1 Distributed Processing Environment
6
Distributed Database

Distributed database stores a logically related database over
two or more physically independent sites connected via a
computer network.
7
8
Distributed Database
Stores logically related database over physically
independent sites
Figure 10.2
Distributed Database vs. Distributed Processing

Distributed processing
 Does
not require distributed database
 May be based on a single database on single computer
 Copies or parts of database processing functions must
be distributed to all data storage sites

Distributed database
 Requires

distributed processing
Both
 Require
a network to connect components
9
Functions of DDBMS












Application/end user interface
Validation to analyze data requests
Transformation to determine request components
Query optimization to find the best access strategy
Mapping to determine the data location
I/O interface to read or write data
Formatting to prepare the data for presentation
Security to provide data privacy
Backup and recovery
DB Administration
Concurrency Control
Transaction Management
10
What Is A Distributed DBMS?
Figure 10.3 Centralized Database Management System
11
What Is A Distributed DBMS?
Figure 10.4 Fully Distributed Database Management System
12
DDBMS Components





13
Computer workstations that form the network system.
Network hardware and software components that reside in
each workstation.
Communications media that carry the data from one
workstation to another.
Transaction processor (TP) receives and processes the
application’s data requests.
Data processor (DP) stores and retrieves data located at the
site. Also known as data manager (DM).
Distributed Database Components
14
Figure 10.5
Levels of Data & Process Distribution

15
Depending on the levels of data and process distribution we
can envisage three different configurations:
 SPSD: Single site process, single site data (Centralized)
 MPSD: Multiple site processing, single site data
 MPMD: Multiple site processing, multiple site data
(Fully distributed)
 SPMD: Single site processing, multiple site data
(Logically unsound)
Levels of Data & Process Distribution

16
Single-Site Processing, Single-Site Data (SPSD)
 All processing is done on a single CPU or host computer.
 All data are stored on the host computer’s local disk.
 The DBMS is located on the host computer.
 The DBMS is accessed by dumb terminals.
 This is an example of a centralized DBMS
Levels of Data & Process Distribution
Figure 10.6 Nondistributed (Centralized) DBMS
17
Levels of Data & Process Distribution

18
Multiple-Site Processing, Single-Site Data (MPSD)
 Typically, MPSD requires a network file server on which
conventional applications are accessed through a LAN.
 A popular variation of the MPSD approach is known as a
client/server architecture.
Levels of Data & Process Distribution
Figure 10.7 Multiple-Site Processing, Single-Site Data
19
Levels of Data & Process Distribution

Multiple-Site Processing, Multiple-Site Data (MPMD)
 Fully distributed DBMS with support for multiple DPs
and TPs at multiple sites.
 Homogeneous DDBMS integrate only one type of
centralized DBMS over the network.
 Heterogeneous DDBMS integrate different types of
centralized DBMSs over a network.
20
Distributed DB Transparency


A DDBMS ensures that the database operations are
transparent to the end user.
Different types of transparencies are:
 Distribution transparency
 Transaction transparency
 Failure transparency
 Performance transparency
 Heterogeneity transparency
21
Distribution Transparency


Distribution transparency allows us to manage a
physically dispersed database as though it were a
centralized database.
Three Levels of Distribution Transparency
 Fragmentation transparency
 Location transparency
 Local mapping transparency
22
23
Distribution Transparency

Example:
Employee data (EMPLOYEE) are distributed over three
locations: New York, Atlanta, and Miami.
Depending on the level of distribution transparency
support, three different cases of queries are possible:
Distributed DBMS
Employee Table
Fragment
Location
E1
E2
E3
New York
Atlanta
Miami
Distribution Transparency

When a DBMS support fragmentation transparency the
user views a single logical database
 SELECT
*
FROM EMPLOYEE
WHERE SALARY > 50000;
24
Distribution Transparency

When the DBMS supports location transparency the
user needs to know the fragment names but need not
know the actual location of the fragments
 SELECT
*
FROM E1
WHERE SALARY > 50000
UNION
SELECT *
FROM E2
WHERE SALARY > 50000
UNION
SELECT *
FROM E3
WHERE SALARY > 50000;
25
Distribution Transparency

When the DBMS supports local mapping transparency
the user needs to know the fragment names as well as
the actual location of the fragments

SELECT *
FROM E1 NODE
WHERE SALARY
UNION
SELECT *
FROM E2 NODE
WHERE SALARY
UNION
SELECT *
FROM E3 NODE
WHERE SALARY
NY
> 50000
ATL
> 50000
MIA
> 50000;
26
Distribution Transparency


27
Distribution transparency is supported by a distributed data
dictionary which captures the distributed global schema.
A local transaction processor uses this global schema to
translate user requests into subqueries (remote requests) that
will be processed by different data processors.
Transaction Transparency




28
A distributed transaction updates and/or requests data from
multiple remote sites.
Transaction transparency ensures that the transaction will be
completed only if all database sites involved in the
transaction complete their part of the transaction.
It maintains database integrity of a distributed database.
Giving a 5% raise to all employees in the previous example
involves updating the database at multiple locations. If the
transaction cannot be committed in one location, it must be
rolled back in all locations.
Distributed DB Transparency




29
Failure Transparency ensures that failure of a node will not
affect the operation of a DDBMS
Performance Transparency ensures that the system
performance will not degrade because of the distributed
nature of the database.
 Query optimization becomes very complex in a
distributed database due to fragmentation and replication
of data in multiple remote nodes.
Heterogeneity Transparency allows the integration of
different types of DBMSs (multi vendor, multi model) under
a common global schema.
The DDBMS transparently translates the user requests from
one local schema to another.
Distributed Database Design


All design principles and concepts discussed in the context
of a centralized database also apply to a distributed
database.
Three additional issues are relevant to the design of a
distributed database:
 data fragmentation
 data replication
 data allocation
30
Data Fragmentation



Data fragmentation allows us to break a single object (a
database or a table) into two or more fragments.
Three type of fragmentation strategies are available to
distribute a table:Horizontal, Vertical, Mixed.
Horizontal fragmentation divides a table into fragments
consisting of sets of tuples
 Each fragment has unique rows and is stored at a
different node
 Example: A bank may distribute its customer table by
location
31
Data Fragmentation

Vertical fragmentation divides a table into fragments
consisting of sets of columns
 Each fragment is located at a different node and consists
of unique columns - with the exception of the primary
key column, which is common to all fragments
 Example: The Customer table may be divided into two
fragments, one fragment consisting of Cust ID, name,
and address may be located in the Service building and
the other fragment with Cust ID, credit limit, balance,
dues may be located in the Collection building.
32
Data Fragmentation



Mixed fragmentation combines the horizontal and vertical
strategies.
A fragment may consist of a subset of rows and a subset of
columns of the original table.
Example: Customer table may be divided by state and
grouped by columns. The service building in Texas will
store Customer service related information for customers
from Texas.
33
Data Replication





Data replication involves storing multiple copies of a
fragment in different locations. For example, a copy
may be stored in New York and another in San
Francisco.
It improves response time and data availability.
Data replication requires the DDBMS to maintain data
consistency among the replicas.
A fully replicated database stores multiple copies of each
database fragment.
A partially replicated database stores multiple copies of
some database fragments at multiple sites.
34
Data Allocation





Data allocation decision involves determining the location
of the fragments so as to achieve the design goals of cost,
response time and availability.
Three data allocation strategies are: centralized, partitioned
and replicated.
A centralized allocation strategy stores the entire database
in a single location.
A partitioned strategy divides the database into disjointed
parts (fragments) and allocates the fragments to different
locations.
In a replicated strategy copies of one or more database
fragments are stored at several sites.
35