Lecture12 - Distributed Databases

Download Report

Transcript Lecture12 - Distributed Databases

COIS20026 Database
Development & Management
Week 12 – Distributed Databases
Prepared by: Pramila Gupta
Updated by: Angelika Schlotzer & Satish Balmuri
Week 12 - Distributed Databases
Reading

Readings for this week:
Study guide module 12
 Text book readings as directed by
study guide

2
Objectives





Describe what is meant by a distributed
database
Describe how this differs from a decentralised
database
List the reasons for and against a distributed
database
Describe the difference between homogenous
and heterogeneous distributed databases
Describe location transparency and local
autonomy
3
Objectives (cont’d)



Explain horizontal partitioning and vertical
partitioning
Define local transaction and global
transaction
List & describe the 4 key objectives of a
distributed database:




Location transparency
Replication transparency
Failure transparency
Concurrency transparency
4
Distributed vs decentralised

Distributed database



Appears as one database to the user
Users should not normally be aware of the
location of any given data
Decentralised database:


Does not appear as one database to the user
User will have to manually navigate to data at
another site – will have to know where it is.
5
Architecture

DBMS runs on multiple sites on a
network

normally organisations will use one of
the big six DBMS 

ORACLE, DB2, Informix, Sybase, Ingres,
Microsoft
and only use 1 ‘database engine’ DBMS

specialist knowledge (personnel) required
to manage/program them
6
Architecture (cont’d)
 There
will be problems/limitations
getting 2 different DBMS to work
together (standards are emerging to
make this easier)
 when all DBMS in a distributed
database are the same, we call it a
homogeneous system as distinct from
a heterogeneous system (refer to
figures 13-2 and 13-3)

each DBMS manages a collection of
tables (as part of databases)
7
Architecture (cont’d)


These tables are exposed to (can be
used by) users (end-users & programs)
on other sites
the goal is:
that users are unaware of the physical
location of tables
 to a user, a distributed database looks like a
local database


distributed database systems are
typically only used by large
organisations
8
Why use a Distributed Database
System


Large organisations are
geographically dispersed entities
it may make sense to keep data
where it is generated & most often
used


to reduce data transfer costs/network
bandwidth
improve access speeds
9
Why use a Distributed Database
System (cont’d)

Politics typically plays a part

increased local autonomy is a factor
10
Why not use a Distributed
Database System




Expensive to buy
even more expensive to
manage/maintain
specialised knowledge (personnel) is
needed to setup, manage & maintain
more database personnel are
required (to manage the different
sites)
11
Principles & Objectives

Fundamental principle of a
distributed database:


to the user, the distributed database
should look like a local database
12 objectives for a distributed
database system:

local autonomy
 local
DBMS is autonomous
12
Principles & Objectives (cont’d)
 Local
DBMS can perform its functions
independently of other sites
 if some other site is down, local DBMS
can still function
 in practice, local DBMS must
cooperate with other DBMS
 hence, will be partly dependent on
other sites for some services

eg access to a table where the ‘primary
copy’ is held on another site
13
Principles & Objectives (cont’d)
 So:
local autonomy - to the
maximum extent possible
 no reliance on central site
 no
site in network should assume
special role as ‘central site’
 otherwise, system is vulnerable to
failure of this site
 actually, this is just one aspect of the
local autonomy issue
14
Principles & Objectives (cont’d)

Continuous operation
 minimise
unplanned shutdowns
 there should be no need for planned
shutdowns (eg to add a new site)

location independence
 otherwise
known as ‘transparency’
 it should be transparent to a user /
programmer that some tables are
held at a remote site
15
Principles & Objectives (cont’d)
 Someone
needs to know where they
are - the database administrator(s)
 by hiding these details from the user /
programmer:
life is simpler for the user/programmer
 applications do not become dependent
on the location of the tables (ie no data
dependence)

16
Principles & Objectives (cont’d)

Fragmentation independence
 fragments:
 horizontal
 table
rows are held in different
locations (eg Australasian account
records held in Melbourne
(M_Account) and European account
records held in Paris (P_Account)
 users see a single, unified Account
table
17
Principles & Objectives (cont’d)
 Note:
relational systems are well
suited to handle this
fragmentation;
 eg- Account virtual table can be
defined in terms of physical
tables as: SELECT * FROM
A_Account UNION SELECT *
FROM E_Account
 eg - specification of a fragment
where a row is stored
18
Principles & Objectives (cont’d)
 eg
- specification of a fragment
where a row is stored is a
restriction - Melbourne rows:
WHERE Continent = ‘Australasia’
 vertical
 not
as many applications
 may wish to hold columns
holding - sensitive data or
special data (eg picture, map) on
a dedicated server
19
Principles & Objectives (cont’d)
 again:
relational systems are
well suited to handle this
fragmentation  virtual table can be defined as a
join of physical vertical
fragments, and
 specification of sensitive
columns to hold on dedicated
server is a projection
20
Principles & Objectives (cont’d)
 Fragmentation
should be hidden
from users so that applications do
not become dependent on a given
fragmentation
 views will be used to hide sensitive
columns from unauthorised users
 query processor will fragment
queries against a fragmented table
21
Principles & Objectives (cont’d)
 Eg
SELECT ID FROM Account
WHERE CreditRating = ‘AAA’
 becomes SELECT ID FROM A_Account
WHERE CreditRating = ‘AAA’
UNION
SELECT ID FROM E_Account
WHERE CreditRating = ‘AAA’
22
Principles & Objectives (cont’d)

Replication independence
replicas: may make sense to replicate
commonly used data on multiple sites
 should be hidden from users
 complications - update:

 do
all copies of an object need to be
locked?
 do all copies of an object need to be
updated?
23
Principles & Objectives (cont’d)

Distributed query processing
distributed queries are potentially
very costly, so need for optimisation
 distributed query optimisation just an
extension of local query optimisation
for RDBMS

 so,
once again, relational systems
well-suited to distributed systems
24
Principles & Objectives (cont’d)

Date makes the point that the setoriented relational approach is well
suited to distributed databases as a
single request (query) can be sent to
a site from which data is sought; in a
record oriented system, a request
must be sent for each record
25
Principles & Objectives (cont’d)

Distributed transaction management
this is more of a requirement than an
objective
 most applications will use
transactions to protect data integrity
 in a distributed database, there will
be a need for distributed transactions


transactions that involve changes to records
on multiple sites
26
Principles & Objectives
(cont’d)

Hardware independence
the idea is that you should be free to
choose the hardware on which you
implement your distributed database
 more an issue of the operating
system supported
 important for organisations with a
mix of hardware/operating systems

27
Principles & Objectives (cont’d)

Products like Oracle are strong here  run
on different range of Unix
flavours, NT, MVS, (mainframe OS),
etc

Microsoft SQL server is at the other
end of the spectrum
 runs

only on NT
Operating System Independence

see above
28
Principles & Objectives (cont’d)

Network Independence



similar sort of thing
increasingly, the operating system hides
the NOS from DBMS
DBMS Independence
should be able to mix & match
RDBMS
 in fact, advanced features like
cooperative distributed transaction
processing is limited

29
The Difficult Bits

In a distributed database, it becomes
much more difficult to manage:




The catalog
Query processing
Concurrent access
Recovery
30