Lecture12 - Distributed Databases
Download
Report
Transcript Lecture12 - Distributed Databases
COIS20026 Database
Development & Management
Week 12 – Distributed Databases
Prepared by: Pramila Gupta
Updated by: Angelika Schlotzer & Satish Balmuri
Week 12 - Distributed Databases
Reading
Readings for this week:
Study guide module 12
Text book readings as directed by
study guide
2
Objectives
Describe what is meant by a distributed
database
Describe how this differs from a decentralised
database
List the reasons for and against a distributed
database
Describe the difference between homogenous
and heterogeneous distributed databases
Describe location transparency and local
autonomy
3
Objectives (cont’d)
Explain horizontal partitioning and vertical
partitioning
Define local transaction and global
transaction
List & describe the 4 key objectives of a
distributed database:
Location transparency
Replication transparency
Failure transparency
Concurrency transparency
4
Distributed vs decentralised
Distributed database
Appears as one database to the user
Users should not normally be aware of the
location of any given data
Decentralised database:
Does not appear as one database to the user
User will have to manually navigate to data at
another site – will have to know where it is.
5
Architecture
DBMS runs on multiple sites on a
network
normally organisations will use one of
the big six DBMS
ORACLE, DB2, Informix, Sybase, Ingres,
Microsoft
and only use 1 ‘database engine’ DBMS
specialist knowledge (personnel) required
to manage/program them
6
Architecture (cont’d)
There
will be problems/limitations
getting 2 different DBMS to work
together (standards are emerging to
make this easier)
when all DBMS in a distributed
database are the same, we call it a
homogeneous system as distinct from
a heterogeneous system (refer to
figures 13-2 and 13-3)
each DBMS manages a collection of
tables (as part of databases)
7
Architecture (cont’d)
These tables are exposed to (can be
used by) users (end-users & programs)
on other sites
the goal is:
that users are unaware of the physical
location of tables
to a user, a distributed database looks like a
local database
distributed database systems are
typically only used by large
organisations
8
Why use a Distributed Database
System
Large organisations are
geographically dispersed entities
it may make sense to keep data
where it is generated & most often
used
to reduce data transfer costs/network
bandwidth
improve access speeds
9
Why use a Distributed Database
System (cont’d)
Politics typically plays a part
increased local autonomy is a factor
10
Why not use a Distributed
Database System
Expensive to buy
even more expensive to
manage/maintain
specialised knowledge (personnel) is
needed to setup, manage & maintain
more database personnel are
required (to manage the different
sites)
11
Principles & Objectives
Fundamental principle of a
distributed database:
to the user, the distributed database
should look like a local database
12 objectives for a distributed
database system:
local autonomy
local
DBMS is autonomous
12
Principles & Objectives (cont’d)
Local
DBMS can perform its functions
independently of other sites
if some other site is down, local DBMS
can still function
in practice, local DBMS must
cooperate with other DBMS
hence, will be partly dependent on
other sites for some services
eg access to a table where the ‘primary
copy’ is held on another site
13
Principles & Objectives (cont’d)
So:
local autonomy - to the
maximum extent possible
no reliance on central site
no
site in network should assume
special role as ‘central site’
otherwise, system is vulnerable to
failure of this site
actually, this is just one aspect of the
local autonomy issue
14
Principles & Objectives (cont’d)
Continuous operation
minimise
unplanned shutdowns
there should be no need for planned
shutdowns (eg to add a new site)
location independence
otherwise
known as ‘transparency’
it should be transparent to a user /
programmer that some tables are
held at a remote site
15
Principles & Objectives (cont’d)
Someone
needs to know where they
are - the database administrator(s)
by hiding these details from the user /
programmer:
life is simpler for the user/programmer
applications do not become dependent
on the location of the tables (ie no data
dependence)
16
Principles & Objectives (cont’d)
Fragmentation independence
fragments:
horizontal
table
rows are held in different
locations (eg Australasian account
records held in Melbourne
(M_Account) and European account
records held in Paris (P_Account)
users see a single, unified Account
table
17
Principles & Objectives (cont’d)
Note:
relational systems are well
suited to handle this
fragmentation;
eg- Account virtual table can be
defined in terms of physical
tables as: SELECT * FROM
A_Account UNION SELECT *
FROM E_Account
eg - specification of a fragment
where a row is stored
18
Principles & Objectives (cont’d)
eg
- specification of a fragment
where a row is stored is a
restriction - Melbourne rows:
WHERE Continent = ‘Australasia’
vertical
not
as many applications
may wish to hold columns
holding - sensitive data or
special data (eg picture, map) on
a dedicated server
19
Principles & Objectives (cont’d)
again:
relational systems are
well suited to handle this
fragmentation virtual table can be defined as a
join of physical vertical
fragments, and
specification of sensitive
columns to hold on dedicated
server is a projection
20
Principles & Objectives (cont’d)
Fragmentation
should be hidden
from users so that applications do
not become dependent on a given
fragmentation
views will be used to hide sensitive
columns from unauthorised users
query processor will fragment
queries against a fragmented table
21
Principles & Objectives (cont’d)
Eg
SELECT ID FROM Account
WHERE CreditRating = ‘AAA’
becomes SELECT ID FROM A_Account
WHERE CreditRating = ‘AAA’
UNION
SELECT ID FROM E_Account
WHERE CreditRating = ‘AAA’
22
Principles & Objectives (cont’d)
Replication independence
replicas: may make sense to replicate
commonly used data on multiple sites
should be hidden from users
complications - update:
do
all copies of an object need to be
locked?
do all copies of an object need to be
updated?
23
Principles & Objectives (cont’d)
Distributed query processing
distributed queries are potentially
very costly, so need for optimisation
distributed query optimisation just an
extension of local query optimisation
for RDBMS
so,
once again, relational systems
well-suited to distributed systems
24
Principles & Objectives (cont’d)
Date makes the point that the setoriented relational approach is well
suited to distributed databases as a
single request (query) can be sent to
a site from which data is sought; in a
record oriented system, a request
must be sent for each record
25
Principles & Objectives (cont’d)
Distributed transaction management
this is more of a requirement than an
objective
most applications will use
transactions to protect data integrity
in a distributed database, there will
be a need for distributed transactions
transactions that involve changes to records
on multiple sites
26
Principles & Objectives
(cont’d)
Hardware independence
the idea is that you should be free to
choose the hardware on which you
implement your distributed database
more an issue of the operating
system supported
important for organisations with a
mix of hardware/operating systems
27
Principles & Objectives (cont’d)
Products like Oracle are strong here run
on different range of Unix
flavours, NT, MVS, (mainframe OS),
etc
Microsoft SQL server is at the other
end of the spectrum
runs
only on NT
Operating System Independence
see above
28
Principles & Objectives (cont’d)
Network Independence
similar sort of thing
increasingly, the operating system hides
the NOS from DBMS
DBMS Independence
should be able to mix & match
RDBMS
in fact, advanced features like
cooperative distributed transaction
processing is limited
29
The Difficult Bits
In a distributed database, it becomes
much more difficult to manage:
The catalog
Query processing
Concurrent access
Recovery
30