Database Systems: Design, Implementation, and Management

Download Report

Transcript Database Systems: Design, Implementation, and Management

The Evolution of Distributed DBMS
 Social and Technical Changes in the 1980’s

Business operations became more decentralized
geographically.

Competition increased at the global level.

Customer demands and market needs favored a
decentralized management style.

Rapid technological change created low-cost
microcomputers. The LANs became the basis for
computerized solutions.

The large number of applications based on DBMSs and
the need to protect investments in centralized DBMS
software made the notion of data sharing attractive.
The Evolution of Distributed DBMS
 Two Database Requirements in a Dynamic Business
Environment:

Quick ad hoc data access became crucial in the quickresponse decision making environment.
 The decentralization of management structure based on
the decentralization of business units made
decentralized multiple-access and multiple-location
databases a necessity.
 Developments in the 1990’s affecting DBMS

The growing acceptance of the Internet and the World
Wide Web as the platform for data access and
distribution.
 The increased focus on data analysis that led to data
mining and data warehousing.
The Evolution of Distributed DBMS
 DDBMS Advantages
 DDBMS Disadvantages

Data are located near the
“greatest demand” site.

Complexity of
management and control

Faster data access

Security

Faster data processing

Lack of standards

Growth facilitation


Improved
communications
Increased storage
requirements

Reduced operating costs

User-friendly interface

Less danger of a singlepoint failure

Processor independence
Distributed Processing
and Distributed Database
 Distributed processing shares the database’s logical
processing among two or more physically
independent sites that are connected through a
network. (See Figure 10.1)
 Distributed database stores a logically related
database over two or more physically independent
sites connected via a computer network. (See Figure
10.2)
Distributed Processing Environment
Figure 10.1
Distributed Database Environment
Figure 10.2
Distributed Processing
and Distributed Database
 Distributed processing does not require a distributed
database, but a distributed database requires
distributed processing.
 Distributed processing may be based on a single
database located on a single computer. In order to
manage distributed data, copies or parts of the
database processing functions must be distributed to
all data storage sites.
 Both distributed processing and distributed
databases require a network to connect all
components.
What Is A Distributed DBMS?
 A distributed database management system (DDBMS)
governs the storage and processing of logically
related data over interconnected computer systems
in which both data and processing functions are
distributed among several sites.
What Is A Distributed DBMS?
 Functions of a DDBMS

Application interface

Validation to analyze data requests

Transformation to determine request’s components

Query-optimization to find the best access strategy

Mapping to determine the data location

I/O interface to read or write data

Formatting to prepare the data for presentation

Security to provide data privacy

Backup and recovery

Database administration

Concurrency control

Transaction management
Centralized Database Management System
Figure 10.3
Fully Distributed Database Management System
Figure 10.4
DDBMS Components
 Computer workstations that form the network
system.
 Network hardware and software components that
reside in each workstation.
 Communications media that carry the data from one
workstation to another.
 Transaction processor (TP) receives and processes
the application’s data requests.
 Data processor (DP) stores and retrieves data located
at the site. Also known as data manager (DM).
Distributed Database System Components
Figure 10.5
DDBMS Components
 DDBMS protocol determines how the DDBMS will:

Interface with the network to transport data and
commands between DPs and TPs.

Synchronize all data received from DPs (TP side) and
route retrieved data to the appropriate TPs (DP side).

Ensure common database functions in a distributed
system -- security, concurrency control, backup, and
recovery.
Levels of Data & Process Distribution
 Single-Site Processing, Single-Site Data (SPSD)

All processing is done on a single CPU or host
computer.
 All data are stored on the host computer’s local disk.
 The DBMS is located on the host computer.
 The DBMS is accessed by dumb terminals.
 Typical of most mainframe and minicomputer DBMSs.
 Typical of the 1st generation of single-user
microcomputer database.
Table 10.1
Nondistributed (Centralized) DBMS
Figure 10.6
Levels of Data & Process Distribution
 Multiple-Site Processing, Single-Site Data (MPSD)

Typically, MPSD requires a network file server on which
conventional applications are accessed through a LAN.

A variation of the MPSD approach is known as a
client/server architecture.
Figure 10.7
Levels of Data & Process Distribution
 Multiple-Site Processing, Multiple-Site Data (MPMD)

Fully distributed DBMS with support for multiple DPs
and TPs at multiple sites.

Homogeneous DDMS integrate only one type of
centralized DBMS over the network.

Heterogeneous DDBMS integrate different types of
centralized DBMSs over a network. (See Figure
10.8)
Figure 10.8 Heterogeneous Distributed Database Scenario
Distributed DB Transparency
 DDBMS transparency features have the
common property of allowing the end users
to think that he is the database’s only user.

Distribution transparency

Transaction transparency

Failure transparency

Performance transparency

Heterogeneity transparency
Distribution Transparency
 Distribution transparency allows us to manage a
physically dispersed database as though it were a
centralized database.
 Three Levels of Distribution Transparency

Fragmentation transparency
 Location transparency
 Local mapping transparency
Table 10.2
Distribution Transparency
 Example (Figure 10.9):
Employee data (EMPLOYEE) are distributed over three
locations: New York, Atlanta, and Miami.
Depending on the level of distribution transparency
support, three different cases of queries are possible:
Figure 10.9 Fragment Locations
Distribution Transparency
 Case 1: DB Supports Fragmentation Transparency
SELECT *
FROM EMPLOYEE
WHERE EMP_DOB < ‘01-JAN-1940’;
Distribution Transparency
 Case 2: DB Supports Location Transparency
SELECT *
FROM E1
WHERE EMP_DOB < ‘01-JAN-1940’;
UNION
SELECT *
FROM E2
WHERE EMP_DOC < ‘01-JAN-1940’;
UNION
SELECT *
FROM E3
WHERE EMP_DOC < ‘01-JAN-1940’;
Distribution Transparency
 Case 3: DB Supports Local Mapping Transparency
SELECT *
FROM E1 NODE NY
WHERE EMP_DOB < ‘01-JAN-1940’;
UNION
SELECT *
FROM E2 NODE ATL
WHERE EMP_DOC < ‘01-JAN-1940’;
UNION
SELECT *
FROM E3 NODE MIA
WHERE EMP_DOC < ‘01-JAN-1940’;
Distribution Transparency
 Distribution transparency is supported by distributed
data dictionary (DDD) or a distributed data catalog
(DDC).
 The DDC contains the description of the entire
database as seen by the database administrator.
 The database description, known as the distributed
global schema, is the common database schema
used by local TPs to translate user requests into
subqueries.
Performance Transparency and
Query Optimization
 The objective of a query optimization routine is to
minimize the total cost associated with the execution
of a request. The costs associated with a request are
a function of the:

Access time (I/O) cost involved in accessing the
physical data stored on disk.

Communication cost associated with the transmission
of data among nodes in distributed database systems.

CPU time cost associated with the processing overhead
of managing distributed transactions.
Performance Transparency and
Query Optimization
 Query optimization must provide distribution
transparency as well as replica transparency.
 Replica transparency refers to the DDBMSs ability to
hide the existence of multiple copies of data from the
user.
 Most of the query optimization algorithms are based
on two principles:

Selection of the optimum execution order
 Selection of sites to be accessed to minimize
communication costs
Performance Transparency and
Query Optimization
 Operation Modes of Query Optimization

Automatic query optimization means that the DDBMS
finds the most cost-effective access path without user
intervention.
 Manual query optimization requires that the
optimization be selected and scheduled by the end user
or programmer.
 Timing of Query Optimization


Static query optimization takes place at compilation
time.
Dynamic query optimization takes place at execution
time.
Performance Transparency and
Query Optimization
 Optimization Techniques by Information Used

Statistically based query optimization uses statistical
information about the database.
 In the dynamic statistical generation mode, the
DDBMS automatically evaluates and updates the
statistics after each access.
 In the manual statistical generation mode, the
statistics must be updated periodically through a
user-selected utility.

Rule-based query optimization algorithm is based on a
set of user-defined rules to determine the best query
access strategy.
Distributed Database Design
 The design of a distributed database
introduces three new issues:

How to partition the database into fragments.

Which fragments to replicate.

Where to locate those fragments and replicas.
Data Fragmentation
 Data fragmentation allows us to break a single object
into two or more segments or fragments.
 Each fragment can be stored at any site over a
computer network.
 Data fragmentation information is stored in the
distributed data catalog (DDC), from which it is
accessed by the transaction processor (TP) to
process user requests.
 Three Types of Fragmentation Strategies:

Horizontal fragmentation
 Vertical fragmentation
 Mixed fragmentation
A Sample CUSTOMER Table
Figure 10.16
Data Fragmentation
 Horizontal Fragmentation
Division of a relation into subsets (fragments) of tuples
(rows). Each fragment is stored at a different node, and
each fragment has unique rows. Each fragment represents
the equivalent of a SELECT statement, with the WHERE
clause on a single attribute.
Table 10.3 Horizontal Fragmentation of the CUSTOMER Table By State
Table Fragments In Three Locations
Figure 10.17
Data Fragmentation
 Vertical Fragmentation
Division of a relation into attribute (column) subsets.
Each subset (fragment) is stored at a different node,
and each fragment has unique columns -- with the
exception of the key column. This is the equivalent of
the PROJECT statement.
Table 10.4 Vertical Fragmentation of the CUSTOMER Table
Vertically Fragmented Table Contents
Figure 10.18
Data Fragmentation
 Mixed Fragmentation
Combination of horizontal and vertical strategies. A
table may be divided into several horizontal subsets
(rows), each one having a subset of the attributes
(columns).
Table 10.5 Mixed Fragmentation of the CUSTOMER Table
Figure 10.19
Data Replication
 Data replication refers to the storage of data copies
at multiple sites served by a computer network.

Fragment copies can be stored at several sites to serve
specific information requirements.
 The existence of fragment copies can enhance data
availability and response time, reducing
communication and total query costs.
Figure 10.20
Data Replication
 Mutual Consistency Rule

Replicated data are subject to the mutual consistency
rule, which requires that all copies of data fragments be
identical.

DDBMS must ensure that a database update is
performed at all sites where replicas exist.

Data replication imposes additional DDBMS processing
overhead.
Data Replication
 Replication Conditions

A fully replicated database stores multiple copies of all
database fragments at multiple sites.
 A partially replicated database stores multiple copies of
some database fragments at multiple sites.
 Factors for Data Replication Decision

Database Size
 Usage Frequency
Data Allocation
 Data allocation describes the processing of
deciding where to locate data.
 Data Allocation Strategies

Centralized
The entire database is stored at one site.

Partitioned
The database is divided into several disjoint parts
(fragments) and stored at several sites.

Replicated
Copies of one or more database fragments are stored at
several sites.
Data Allocation
 Data allocation algorithms take into
consideration a variety of factors:

Performance and data availability goals

Size, number of rows, the number of relations that an
entity maintains with other entities.

Types of transactions to be applied to the database, the
attributes accessed by each of those transactions.
Client/Server vs. DDBMS
 Client/server architecture refers to the way
in which computers interact to form a
system.
 It features a user of resources or a client
and a provider of resources or a server.
 The architecture can be used to implement a
DBMS in which the client is the transaction
processor (TP) and the server is the data
processor (DP).
Client/Server Architecture
 Client/Server Advantages

Client/server solutions tend to be less expensive.

Client/server solutions allow the end user to use the
microcomputer’s graphical user interface (GUI), thereby
improving functionality and simplicity.

There are more people with PC skills than with
mainframe skills.

The PC is well established in the workplace.

Numerous data analysis and query tools exist to
facilitate interaction with many of the DBMSs.

There are considerable cost advantages to off-loading
application development from the mainframe to PCs.
Client/Server Architecture
 Client/Server Disadvantages

The client/server architecture creates a more complex
environment with different platforms.

An increase in the number of users and processing
sites often paves the way for security problems.

The burden of training a wider circle of users and
computer personnel increases the cost of maintaining
the environment.
C. J. Date’s 12 Commandments for
Distributed Database
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
Local Site Independence
Central Site Independence
Failure Independence
Location Transparency
Fragmentation Transparency
Replication Transparency
Distributed Query Processing
Distributed Transaction Processing
Hardware Independence
Operating System Independence
Network Independence
Database Independence