PPT - Ajay Ardeshana
Download
Report
Transcript PPT - Ajay Ardeshana
Unit - 4
Introduction to the
Other
Databases
Introduction : The Distributed Database System (DDBS) is a database
physically stored on several computer systems across several
sites connected together via communication network.
Each site is typically managed by DBMS that is capable of
running independently of the other site.
In other words, each site is a database system site in its own
right and has its own local users, its own local DBMS, and its
own data communication managers.
It site has its own transaction management software,
including its own locking, logging and recovery software.
Although geographically dispersed, s distributed
database system manages and controls the entire
database as a single collection of data.
The location of all data items, and degree of autonomy of
individual sites have a significant impact on all aspect of the
system, including query optimization and processing,
concurrency control and recovery.
In DDBS, both data and transaction processing are divided
between one or more computers connected by network, each
computer playing a special role in the system.
The computers in the distributed systems communicates with
one other via various communication media. They do not
share main memory or disk.
A DDBS allows applications to access data from local or
remote database.
DDBS use client/server architecture to process information
requests. The computer in DDBS are referred to by a number
of different names such as sites or nodes.
Distributed database system located at geographically
distributed locations because of the need of using the part of
database locally then to the remote access.
For example, local branches of a multinational or a national
banks or a large company can have their localized databases
situated at different branches.
The advancement in communication and networking system
triggered the development of distributed database approach.
It became possible to allow these distributed systems to
communicate among themselves, so that the data can be
effectively access among computer systems in different
geographical locations.
As a result the different site machines are quit likely to be
heterogeneous, with entirely different individual
architecture.
General Distributed Database Architecture
Desired Properties of DDBS : Distributed database should have the following properties : Distributed data independence.
Distributed transaction atomicity.
1. Distributed data independence : This property enables users to ask queries without specifying
where the reference relations or copies or fragments of the
relation, are located.
This principle is a natural extension of physical and
logical data independence.
Further, queries that span multiple sites should be optimized
systematically in a cost-based manner, taking into account
communication cost and difference in local communication
cost.
2. Distributed Transaction Atomicity : This property enables users to write
transactions that access and update data at
several sites just as they would write
transaction over purely local data.
I particularly, the effects of a transaction
across sites should continue to be atomic.
That is, all changes persist if the transaction
commits, and non persist if aborts.
Types of Distributed Databases : In distributed database system the data and software
are distributed over multiple sites connected by a
communication network.
However DDBS can describe various systems that
differ from one another in many respect depending
on various factors, such as, degree of homogeneity,
degree of local autonomy, and so on.
Following two types of distributed database are most
commonly used : Homogeneous DDBS.
Heterogeneous DDBS.
1. Homogeneous DDBS : This is the simplest form of distributed database where there
are several sites, each running their own applications on the
same DBMS software.
All sites have identical software, are aware of one another
and agree to cooperate in processing user request.
The application can all see the same schema and run the
same transactions.
That is, there is location transparency in homogeneous
DDBS. The provision of location transparency from the core
of distributed database management system (DDBMS)
development.
In homogeneous DDBS, the use of a single DBMS avoids any
problem of mismatches database capabilities between nodes,
since the data all managed within a single framework.
Homogeneous Distributed Database
2. Heterogeneous DDBS : In this DDBS, different sites run under the control of
different DBMSs, essentially autonomously and are
connected somehow to enable access to data from multiple
sites.
Different site may use different schemas and different DBMS
software.
The sites may not be aware of one another and they may
provide only limited facilities for cooperation in transaction
processing.
In other words, in heterogeneous DDBS, each site is an
independent and centralized DBMS that has its own local
users, local transactions and database administrator (DBA).
Heterogeneous Distributed Database
Advantages of DDBS : Sharing of data where users at one site may be able to
access the data residing at another sites and at the same time
retain control over the data at their own site.
Increase efficiency of processing by keeping the data
close to the point where it is more frequently used.
Efficient management of distributed data with different
level of transparency.
It enables the structure of the database to mirror the
structure of the enterprise in which the local data can be
kept locally, while at the same time remote data can be
accessed when necessary.
Increased local autonomy where each site is able to
retain degree of control over data that are stored locally.
Increase accessibility by allowing to access data between
several sites via communication network.
Increase availability in which if one site is fail, the
remaining sites may be able to continue operating.
Increase reliability due to greater accessibility.
Improved performance.
Improved scalability.
Easier expansion with the growth of organization in terms
of adding more data, increasing database size and adding
more CPUs.
Parallel evaluation by subdividing a query into subqueries involving data from several sites.
Disadvantages of DDBS : Recovery of failure is more complex.
Increase complexity in the system designing and
implementation.
Increase transparency lead to a compromise between ease of
use and the overhead cost of providing transparency.
Increase software development cost.
Greater potential for bugs.
Increase processing overhead.
Technical problem of connecting dissimilar machines.
Difficulty in database integrity control.
Security concern of replicate data in multiple location and the
network.
1. Client / Server Architecture : Client / Server Architectures are those in which a DBMS
related workload is split into two logical components namely
client and server, each of which typically execute on different
systems.
Client is the user of the resources where as the server is the
provider of the resources.
It has one or more client processors and one or more server
processors. The applications and tools are put on client
platforms and they are connected to the database
management system that reside on the server platform.
The applications and tools act as a client of a DBMS, making
request for its services. The DBMSs in tern, serves these
requests and return the result to the client(s).
Clients are responsible for user interface
issues and servers manage data and execute
transactions.
In other words the client/server architecture
can be used to implement a DBMS in which
the client is the transaction processor (TP)
and the server is the data processor (DP).
A client process could run on personal
computer and send queries to the server
running on a mainframe computer.
All modern information systems are based on
client/server architecture.
Client/Server database Architecture
Components of client/server architecture : Client in form of workstation as the user’s contact
point.
DBMS server as a common resources performing
specialized tasks for devices requesting their
services.
Communication network connecting the clients
and the servers.
Software applications connecting clients, servers
and network to create a single logical architecture.
Client applications issues the SQL statements
for data access, just as they do in centralized
computer environment.
The networking interface enables client
applications to connects to the server, and
send SQL statements which are created by the
clients to the server, and revise the result or
error written code to the client, which is send
by the server after processing the SQL
statement.
Benefits of Client/Server Architecture
Relatively simple to implement because of the centralized
server and clean separation of functionalities.
Better adaptability to the computing environment to meet the
ever-changing business needs of the organization.
Use of Graphical User Interface (GUI) on microcomputer by
the user at client, improve the functionality and simplicity.
It is to less expensive then to mini or mainframe solution.
Expensive server machines are optimally utilized because
users are interfering with the inexpensive client machines.
Overall productivity improvement due to decentralized
operations.
Improve performance with more processing power.
Limitations of Client/Server Architecture
The client/server architecture does not allow
a single query to span multiple servers
because the client process would have to be
capable to breaking such a query into
appropriate sub-queries to be execute at the
different sites and then putting together to get
the answer to the sub-queries.
An interface in the number of users and
processing sites often create security problem.
2. Collaborating Server Systems :In collaborating server architecture, there are
several database servers, each capable of
running transactions against local data, which
cooperatively execute transactions spanning
multiple servers.
When a server receives a query that requires a
access to data at other servers, it generates
appropriate sub-queries to be execute by
other server and put the result together to
compute answers to the original query.
3. Middleware Systems : The middleware database architecture, also called data access
middleware, is designed to allow a single query to span
multiple servers, without requiring all database servers to be
capable of managing such multisite execution strategies.
Data access middleware provides users with a consistent
interface to multiple DBMSs and file system in transparent
manner.
Data access middleware simplifies the heterogeneous
environment for programmers and provide users with an
easier means of accessing live data in multiple source.
It eliminate the needs for programmers to code many
environment specific requests or calls in any applications that
need access to current data rather to copies of data.
The direct request or call for data movement to several
DBMSs are handle by the middleware, and hence the major
rewrite of application program is not required.
The middleware is basically a layer of software, which works
as a special server and coordinate the execution of queries
and transactions across one or more independent data
servers.
The middleware server is capable of executing joins and other
relational operations on data obtain from the other servers,
but typically does not itself maintain any data.
Middleware might be responsible for routing a local request
to one or more servers, transporting the request by
supporting various networking protocols, converting data
from one format to another.
Middleware System
Data access middleware architecture consists of
middleware application programming interface (API),
middleware engine, drivers and native interfaces.
API usually consists of a series of available function calls
as well as series of data access statements (dynamic SQL,
OBE and so on).
The middleware engine is basically an application
programming interface for routing of request to various
drivers and performing other functions. It handles the
data access requests that has been issued.
Drivers are used to connect the various data sources and
they translate the request received from the API into the
proper format which is understand by targeted data
source.
1.) Data Fragmentation :2.) Data Allocation :3.) Data Replication :-
Data Fragmentation : This is apply to the relational database system to partition the
relations among network sites.
Technique of breaking up database into logical unite, which
may be assigned for storage at the various sites is called
Data Fragmentation.
In the fragmentation the relation can be partitioned into a
several fragments for physical storage purpose and there may
be several replaces of each fragment.
These fragments contain sufficient information to allow
reconstruction of the original relation.
All fragment of the given relation will be independent.
None of the fragment can be derived from the others.
For example, let us consider a relation EMPLOYEE :
Now this relation can be fragment into three fragments as
follows :ID
NAME
DEPT_ID
SALARY
E-101
XYZ
3
12,000
E-102
XYZ
4
15,000
E-103
XYZ
2
13,000
E-104
XYZ
3
14,500
E-105
XYZ
4
12,000
E-106
XYZ
2
15,000
Main Relation :- EMPLOYEE
Fragments
AT SITE
Mumbai_Emp
Mumbai
Based on
Dept_ID = 2
Jamsedpur_Emp
Jamsedpur
Dept_ID = 3
London_Emp
London
Dept_ID = 4
The above fragmented relation can be stored at various site as
shown in table in which the tuples for Mumbai employees
with Dept_ID = 2 are stored at Mumbai site, tuples for
Jamsedpur Employees with Dept_ID = 3 are stored at
Jamsedpur site, tuples for London Employees with
Dept_ID=4 are stored at London site.
In this example the fragmented names are Mumbai_Emp,
Jamsedpur_Emp, London_Emp.
Reconstruction of original relation is done via suitable JOIN
and UNION operations.
The system that support data fragmentation should also
support fragmentation independence also called as
fragmentation transparency.
That means the users should not be logically concerned about
the fragmentation.
The users should have a fillings as if the data were not
fragmented at all.
In other words, fragmentation independence implies that the
users will be presented with a view of data in which the
fragments are logically recombine by means of suitable
JOINs and UNIONs.
It is the responsibility of the system optimizer to determine
which fragment need to be physically accessed in order to
satisfy any given user request.
Following are the two different schemas for fragmenting a
relation :
Horizontal Fragmentation : Vertical Fragmentation : Mixed Fragmentation :-
Horizontal Fragmentation : A Horizontal Fragmentation of a relation is a subset of the
tuples with all attributes in that relation.
Horizontal fragmentation split the relation horizontally by
assigning each tuple or a group of tuples of a relation to one
or more fragments, where each tuple or a subset has a certain
logical meaning.
These fragments can be assigned to different sites in the
distributed database system.
A horizontal fragmentation is produced by specifying a
predicate that performs a restriction on the tuples in the
relation.
Relation :- Mumbai_Emp
:- London_Emp
Relation :-Relation
Jamsedpur_Emp
ID NAME SALARY
NAMEDEPT_ID
DEPT_ID
SALARY
ID DEPT_ID
SALARY
ID
NAME
E-103
XYZ
E-101
E-106
XYZ
E-104
2E-102XYZ
2E-105XYZ
XYZ
13,000
XYZ
15,000
3
4
3
4
12,00015,000
14,50012,000
σ<condition>(R)
The horizontal fragmentation can be written in
terms of relational algebra as :
MUMBAI_EMP : σ Dept_ID = 2 (EMPLOYEE)
JAMSEDPUR_EMP : σ Dept_ID = 3 (EMPLOYEE)
LONDON_EMP : σ Dept_ID = 4 (EMPLOYEE)
In horizontal fragmentation, UNION operation is
done to reconstruct the original relation.
Vertical Fragmentation : A Vertical Fragmentation split the relation by decomposing
“Vertically” columns (attributes).
A vertical fragment of relation keeps only certain attributes of
the relation at the particular site, because each sites may not
need all the attributes of the relation.
Thus vertical fragmentation groups together the attributes in
the relation that are used jointly by the important transaction
A simple vertical fragmentation is not quit proper when the
two fragments are store separately. Since there is no common
attribute between the two fragments, we can not put the
original EMPLOYEE relation together.
Therefore it is necessary to include a primary attribute or
candidate attribute in every vertical fragmentation.
П a1, a2, …an (R)
For example :
Fragment EMPLOYEE table….
MUMBAI_EMP : (TID, EMP_ID, EMP_NAME)
JAMSEDPUR_EMP : (TID, DEPT_ID)
LONDON_EMP : (TID, EMP_SALARY)
MUMBAI_EMP : П TID, EMP_ID, EMP_NAME (EMPLOYEE)
JUAMSEDPUR_EMP : П TID, DEPT_ID (EMPLOYEE)
LONDON_EMP : П TID, EMPSALARY (EMPLOYEE)
The original relation is obtain by performing JOIN
operation.
Relation :- Mumbai_Emp
Relation :-Relation
Jamsedpur_Emp
:- London_Emp
TID
EMP_ID
TID
EMP_NAME
TID DEPT_IDEMP_SALARY
T-1
E-10215
T-1 XYZ T-1
2
12,000
T-2
E-14587
T-2 XYZ T-2
3
15,000
T-3
E-45875
T-3 XYZ T-3
2
16,000
T-4
E-87456
T-4 XYZ T-4
3
18,000
Mixed Fragmentation : Sometimes, horizontal or vertical fragmentation of database
schema by itself is insufficient to adequately distribute the
data for some applications. For that mixed or hybrid
fragmentation is required.
Thus horizontal fragmentation of a relation is followed by
further vertical fragmentation or vice versa is called Mixed
Fragmentation.
A mixed fragmentation is defined by SELECT or
PROJECTION operation of the relation algebra.
П a1, a2, …an (σ<condition>(R))
σ<condition>(П a1, a2, …an (R))
The original can be obtain by performing JOIN and UNION
operations of relation algebra.
Data Allocation :Data allocation describe the process of
deciding about locating or placing data to
several sites.
Following are the data fragment strategies
that are used in Distribute Database System :
Centralized
Partitioned or fragmented
Replication
1. Centralized Strategies :
In this strategy entire single database and DBMS is
stored at one site. However user are geographically
distributed across the network.
The local reference is there for all the sites, except
centralize site for all the data access.
Thus the communication costs are high.
Because of the entire database is there on one site,
there is a loss of entire database in case of failure of
single system.
Hence the reliability and availability are low.
2. Partitioned Strategies :
In this strategy database is divided in to several
disjoint parts (fragments) and stored at several
sites.
The data item is located at the site where it is used
more frequently.
Since there is no replication, the storage cost is low.
The failure of system at particular site will result in
the loss of data of that site not entirely. Hence the
reliability and availability are high.
The communication cost is low and overall
performance is good as compare to the centralized.
3. Replication Strategies :
In this strategy copies of one or more
database fragments are stored at several
sites.
Thus the locality and of reference,
reliability, availability and performance
are very high, but the communication
cost and storage cost are very high.
Data replication is the technique that permits storage of
certain data in more then one sites.
The system maintains several identical copies of relation and
store each copy at a different site.
Data replication is introduce the availability of the system.
If a copy is not available due to failure of system, it should be
possible to access another copy.
Data can be replicate as :
REPLICATE LONDON_EMP AS
LONDON-MUMBAI_EMP AT SITE ‘Mumbai’
REPLICATE MUMBAI_EMP AS
MUMBAI-LONDON_EMP AT SITE ‘London’
Data replication should also support
replication independence also known as
replication transparency.
That means user should be able to behave as if
the data were in fact not replicate at all.
Replication independence simplifies user
program and terminal activities.
It is the responsibility of System Optimizer to
determine which replicas physically need to
be accessed in order to satisfy any given user
request.
Advantages of data replication : Data replication enhances the performance of read
operations by increasing speed at site. That means
with data replication, application can operate on
local copies instead of having a communication with
remote sites.
Data replication increases the availability of data to
read-only transactions. That means a given
replicated object remains available for processing, at
least for retrieval, so long as at least one copy
available.
Disadvantages of data replication : Increase overhead of update transactions. That
means, when a given replicated object is updates all
copies of that object must be updated.
More complexity in controlling concurrent updates
by several transactions to replicate data.