Transcript ch13

Chapter 13
Client/Server Database
and Distributed Database
Fundamentals of Database Management Systems
by
Mark L. Gillenson, Ph.D.
University of Memphis
Presentation by: Amita Goyal Chin, Ph.D.
Virginia Commonwealth University
John Wiley & Sons, Inc.
Chapter Objectives

Describe the concepts and advantages of
the client/server database approach.

Describe the concepts and advantages of
the distributed database approach.

Explain how data can be distributed and
replicated in a distributed database.
13-2
Chapter Objectives

Describe the problem of concurrency control in
distributed database.

Describe the distributed join process.

Describe data partitioning in a distributed
database.

Describe distributed directory management.
13-3
Local Area Network (LAN)

An arrangement of
personal computers
connected together by
communications lines.

The PCs must be located
fairly close to each other.

Allows sharing of
resources such as
servers, printers, etc.
13-4
Local Area Network (LAN)

The PCs on a LAN can
certainly operate
independently, but they
can also communicate
with each other.

A gateway computer on
the LAN can link the LAN
and its PCs to other
LANs, to one or more
mainframe computers, or
to the Internet.
13-5
Two-Tiered Client/Server
Arrangement

Clients = PCs on the LAN

Server = powerful computer on the LAN

A shared database can be stored on a
LAN server so that all of the PCs on the
LAN can access it.
13-6
File Server Approach

When a LAN client needs to query, update, or
otherwise use a file on the server, the entire file
must be sent from the server to that client.

All of the querying, updating, or other processing
is then performed at the client.

If changed, the entire file is then shipped back to
the server.
13-7
DBMS Server Approach

The database is located at the server.

The processing is split between the client
and the server.

Result: there is much less data traffic on
the network.
13-8
Two-Tier Approach

Some databases can be
stored on a client PC’s
own hard drive while
other databases that the
client might access are
stored on the LAN’s
server.

Software has been
developed that makes the
location of the data
transparent to the user at
the client.
13-9
Two-Tier Approach

The user issues a query
at the client.

The software first checks
to see if the required data
is on the PC’s own hard
drive. If yes, the data is
retrieved from it.

If it is not on local drive,
the software looks for it
on the server.
13-10
Three-Tier Approach

If the software doesn’t find the data on the client PC’s
hard drive or on the LAN server, it can leave the LAN
through a gateway computer and look for the data on, for
example, a large, mainframe computer that may be
reachable from many LANs.
13-11
Three-Tier Approach

Another use of the term “Three-Tier Approach” with the
following three tiers:



The client PCs
Servers known as application servers
Other servers known as database servers
13-12
The Distributed Database
Concept

Instead of having one, centralized
database, we are going to spread the data
out among various cities on the distributed
network, each of which has its own
computer and data storage facilities.

All of this distributed data is still
considered to be a single logical database.
13-13
The Distributed Database
Concept

Location transparency - The user just
issues the query, and the result is
returned.
 A person
or process anywhere on the
distributed network queries the database.
 It
is not necessary to know where on the
network the data being sought is located.
13-14
Distributed DBMS

Distributed database management system
 Sophisticated
software
 Manages location transparency
13-15
Distributing the Data

Headquartered in NY,
a company’s
database consists of
6 large tables: A, B,
C, D, E, F.

With a centralized
database, all 6 tables
would be located in
NY.
13-16
Distributing the Data

The company has major sites in Los
Angeles, Memphis, New York, Paris, and
Tokyo.

The first and simplest idea in distributing
the data would be to disperse the six
tables among the five sites, perhaps
based on frequency of use of each table.
13-17
Distributing the Data

Tables A and B are kept
at New York

Table C is moved to
Memphis

Tables D and E are
moved to Tokyo

Table F is moved to Paris.
13-18
Distributing the Data

Paris employees can now access Table F
without incurring telecommunications
costs associated with accessing Table F in
NY.

Local autonomy - Paris employees, e.g.,
can take responsibility for Table F -- its
security, backup and recovery, and
concurrency control.
13-19
Distributing the Data:
Problems

When the database was centralized at New
York, a query issued at any of the sites that
required a join of two or more of the tables could
be handled in the standard way by the computer
at New York.

The result would then be sent to the site that
issued the query.

In the dispersed approach, a join might require
tables located at different sites!
13-20
Replicated Tables

Duplicated tables at two or more sites on
the network.

Advantages
 Availability
- during a site failure, data can still
be accessed at a replicated location.
 Local
access - Replicate table at a site
requiring frequent access.
13-21
Replicated Tables

Disadvantages
 Security
risk
 Concurrency
control - How do you keep data
consistent when it is replicated in tables on
three continents?
13-22
Full Data Replication

The maximum
approach of
replicating every table
at every site.

Great for availability

Great for joins
13-23
Full Data Replication

Worst for concurrency
control - every change to
every table has to be
reflected at every site.

Worst for security

Takes up a lot of disk
space
13-24
Partial Replication

Have a copy of the
entire database at
headquarters in New
York and have each
table replicated
exactly once at one of
the other sites.
13-25
Partial Replication

Improves availability each table is now at
two sites.

Security and
concurrency
exposures are limited.

Joins occur at NY.
13-26
Partial Replication

New York could tend to
become a bottleneck.

If a table is heavily used
in both Tokyo and Los
Angeles, it can only be
placed at one of the two
sites (plus the copy of the
entire database in New
York), leaving the other
with speed and telecom
cost problems.
13-27
Replication Principles

Place copies of tables at the sites that use
them most heavily in order to minimize
telecommunications costs.

Ensure that there are at least two copies
of important or frequently used tables to
realize the gains in availability.
13-28
Replication Principles

Limit the number of copies of any one
table to control the security and
concurrency issues.

Avoid any one site becoming a bottleneck.
13-29
Replication Principles
13-30
Concurrency Control in
Distributed Database

The “lost update” problem.

The protections that we discussed earlier
that can be put into place to handle the
problem of concurrent update in a single
table are not adequate to handle the new,
expanded problem in distributed database
systems.
13-31
Asynchronous Approach

If retrieved data does not necessarily have
to be up-to-the-minute accurate, we can
use “asynchronous” approaches to
updating replicated data.
13-32
Asynchronous Schemes

The site where the data was updated can send a
message to update to the other sites that contain
a copy of the same table.

One of the sites can be chosen to accumulate all
of the updates to all of the tables, and transmit
changes regularly.

Each table can have one of the sites be declared
the “dominant” site for that table, which
periodically transmits updates to the other sites.
13-33
Synchronous Approach

If retrieved data does have to be up-to-theminute accurate.

All data in replicated tables worldwide
must always be consistent, accurate, and
up-do-date.

Use two-phase commit.
13-34
Two-Phase Commit:
Prepare Phase

Each computer on the network has a special log file in addition to its
database tables.

The computer at the initiating site sends the updated data to the
other sites that have copies of the table to be updated.

The computers at the other sites record the changes in their logs
(but not in the actual database tables.)

These computers attempt to lock the database tables involved in the
update.

If they are successful (the tables are not busy and can be locked) they
inform the initiating site.
13-35
Two-Phase Commit:
Commit Phase

If all of the other sites reported they were
successful in logging the update and
locking the tables, the initiating site issues
instructions to transfer the update from the
logs to the actual database tables.
13-36
Two-Phase Commit

Either all of the replicated files have to be
updated or none of them must be updated.

A complex, costly, and time-consuming process.

The more volatile the data in the database, the
less attractive is this procedure for updating
replicated tables in the distributed database.
13-37
Distributed Joins

A query that is run from one of the computers in
a distributed database system that requires a
join of two or more tables that are not all at the
same computer.

The distributed DBMS must have its own built-in
expert system that is capable of figuring out an
efficient way to handle a request for a distributed
join.
13-38
Distributed Joins

The DBMS evaluates various options for
performing a join by considering:

The number and size of the records from each table
involved in the join.

The distances and costs of transmitting the records
from one site to another to execute the join.

The distance and cost of shipping the result of the join
back to the site that issued the query in the first place.
13-39
Partitioning

The purpose is to have records or columns
of a table resident at the sites that use
them the most frequently.

Horizontal Partitioning

Vertical Partitioning
13-40
Horizontal Partitioning

A relational table can
be split up so that
some records are
located at one site,
other records are
located at another
site, and so on.

e.g., partitioning of
Table G.
13-41
Vertical Partitioning

The columns of a table are divided up
among several cities on the network.

Each such partition must include the
primary key attribute(s) of the table.

Makes sense when different sites are
responsible for processing different
functions involving an entity.
13-42
Distributed Directory
Management

A distributed DBMS must include a directory that
keeps track of where the database tables, the
replicated copies of database tables (if any), and
the table partitions (if any) are located.

When a query is presented at any site on the
network, the distributed DBMS can automatically
use the directory to find out where the required
data is located and maintain location
transparency.
13-43
Directory Location

The entire directory could be stored at only
one site.

Copies of the directory could be stored at
several of the sites.

A copy of the directory could be stored at
every site. (This is generally the best
solution.)
13-44
Distributed DBMS Summary

Centralized Database Advantages
 Single
site provides high degree of security,
concurrency, and backup and recovery
control.
 No
need for a distributed directory since all of
the data is in one place.
 No
need for distributed joins since all of the
data is in one place.
13-45
Distributed DBMS Summary

Centralized Database Disadvantages
 All
data accesses from other than the site with
the database incur communications costs.
 The
site with the database can become a
bottleneck.
 Possible
availability problem: if the site with
the database goes down, there can be no
data access.
13-46
Distributed DBMS Summary

Dispersing Tables on the Network Advantages

Local autonomy.

Reduced communications costs because each table
can be located at the site that most heavily uses it.

Improved availability because portions of the
database are available even if one or some of the
sites are down.
13-47
Distributed DBMS Summary

Dispersing Tables on the Network Disadvantages
 Several sites have to be concerned with
security, concurrency, and backup and
recovery.
 Requires
a distributed directory and the
software to support location transparency.
 Requires
distributed joins.
13-48
Distributed DBMS Summary

Targeted Data Replication Advantages
 Greatly
reduced communications costs for
read-only data access because copies of
tables can be located at multiple sites that
most heavily use them.
 Greatly
improved availability because if a site
with a database table goes down, there may
be another site with a copy of that table.
13-49
Distributed DBMS Summary

Targeted Data Replication Disadvantages
 Multi-site
concurrency control when data in
replicated tables is updated.
13-50
Distributed DBMS Summary

Partitioned Tables Advantages
 Greatest
local autonomy because data at the
record or column level can be stored at the
site(s) that most heavily use it.
 Greatly
reduced communications costs
because data at the record or column level
can be stored at the site(s) that most heavily
use it.
13-51
Distributed DBMS Summary

Partitioned Tables Disadvantages
 Retrieving
all or a large portion of a table may
require multi-site accesses.
13-52
“Copyright 2004 John Wiley & Sons, Inc. All rights reserved. Reproduction or
translation of this work beyond that permitted in Section 117 of the 1976
United States Copyright Act without express permission of the copyright owner
is unlawful. Request for further information should be addressed to the
Permissions Department, John Wiley & Sons, Inc. The purchaser may make
back-up copies for his/her own use only and not for distribution or resale. The
Publisher assumes no responsibility for errors, omissions, or damages caused
by the use of these programs or from the use of the information contained
herein.”
13-53