pmit-6102-15-lec3-Distributed DatabaseDesignx
Download
Report
Transcript pmit-6102-15-lec3-Distributed DatabaseDesignx
PMIT-6102
Advanced Database Systems
ByJesmin Akhter
Assistant Professor, IIT, Jahangirnagar University
Lecture 03
Distributed Database Design
Outline
Distributed Database Design
Distributed Design Problem
Distributed Design Issues
o Fragmentation
o Data Allocation
Slide 3
Distributed Design Problem
The design of a distributed computer system involves
making decisions on the placement of data and
programs across the sites of a computer network
as well as possibly designing the network itself
In the case of distributed DBMSs, the distribution of
applications involves two things:
distribution of the distributed DBMS software
distribution of the application programs that run on it.
Are not significant problem
Assume a copy of the distributed DBMS software exists
at each site where data are stored
Network has already been designed
We concentrate on distribution of data
Slide 4
Distribution Design Issues
The following set of interrelated questions covers the
entire issue.
Why fragment at all?
How should we fragment?
How much should we fragment?
How to test correctness?
How should we allocate?
What is the necessary information for fragmentation
and allocation?
Slide 5
Reasons for Fragmentation
The important issue is the appropriate unit of
distribution.
A relation is not a suitable unit, for a number of
reasons.
o First, application views are usually subsets of relations.
o Therefore, the locality of accesses of applications is
defined not on entire relations but on their subsets
o Hence consider subsets of relations as distribution
units.
Slide 6
Reasons for Fragmentation
The relation is not replicated and is stored at only one site,
results in an unnecessarily high volume of remote data
accesses
The relation is replicated at all or some of the sites where
the applications reside.
May has unnecessary replication, which causes problems in
executing updates
may not be desirable if storage is limited.
Finally, the decomposition of a relation into fragments, each
being treated as a unit, permits a number of transactions to
execute concurrently.
Thus fragmentation typically increases the level of concurrency
and therefore the system throughput.
Slide 7
Fragmentation Alternatives
Relation instances are essentially tables, so the issue
is one of finding alternative ways of dividing a table
into smaller ones.
There are clearly two alternatives for this:
dividing it horizontally or dividing it vertically.
Slide 8
Fragmentation Alternatives – Horizontal
PROJ
PROJ1 :
projects with budgets less than
$200,000
PROJ2 :
P1
P2
P3
P4
P5
projects with budgets greater
than or equal to $200,000
PROJ1
PNO
PNO
PNAME
Instrumentation
Database Develop.
CAD/CAM
Maintenance
CAD/CAM
BUDGET
LOC
150000
135000
250000
310000
500000
Montreal
New York
New York
Paris
Boston
PROJ2
PNAME
BUDGET
LOC
PNO
PNAME
BUDGET
LOC
P1 Instrumentation
150000
Montreal
P3
CAD/CAM
250000
New York
P2
135000
New York
P4
Maintenance
310000
Paris
P5
CAD/CAM
500000
Boston
Database Develop.
Example of Horizontal Partitioning
Slide 9
Fragmentation Alternatives – Vertical
PROJ
PROJ1: information about project
budgets
PROJ2: information about project
names and locations
PROJ1
PNO
PNAME
BUDGET
P1
P2
P3
P4
P5
Instrumentation
Database Develop.
CAD/CAM
Maintenance
CAD/CAM
LOC
150000 Montreal
135000 New York
250000 New York
310000 Paris
500000 Boston
PROJ2
PNO
BUDGET
PNO
PNAME
LOC
P1
P2
P3
P4
P5
150000
135000
250000
310000
500000
P1
P2
P3
P4
P5
Instrumentation
Database Develop.
CAD/CAM
Maintenance
CAD/CAM
Montreal
New York
New York
Paris
Boston
Example of Vertical Partitioning
Slide 10
Correctness of Fragmentation
Completeness
Decomposition of relation R into fragments R1, R2, ..., Rn is
complete if and only if each data item in R can also be
found in some Ri
This property, which is identical to the lossless
decomposition property of normalization
it ensures that the data in a global relation are mapped
into fragments without any loss
Slide 11
Correctness of Fragmentation
Reconstruction
If relation R is decomposed into fragments FR={R1, R2,
..., Rn}, then there should exist some relational operator ∇
such that
R = ∇ Ri ,
The reconstructability of the relation from its fragments
ensures that constraints defined on the data in the form
of dependencies are preserved.
Slide 12
Correctness of Fragmentation
Disjointness
If relation R is horizontally decomposed into fragments
FR={R1, R2, ..., Rn}, and data item dj is in Rj, then dj should
not be in any other fragment Rk (k ≠ j ).
This criterion ensures that the horizontal fragments are
disjoint.
If relation R is vertically decomposed, its primary key
attributes are typically repeated in all its fragments (for
reconstruction).
Therefore, in case of vertical partitioning, disjointness is
defined only on the non-primary key attributes of a
relation.
Slide 13
Allocation Alternatives
Non-replicated
partitioned : each fragment resides at only one site
Replicated
fully replicated : each fragment at each site
partially replicated : each fragment at some of the sites
Slide 14
Information Requirements
The information needed for distribution design can be
divided into four categories:
Database information
Application information
Communication network information
Computer system information
Slide 15
Fragmentation
Horizontal Fragmentation (HF)
Vertical Fragmentation (VF)
Hybrid Fragmentation (HF)
Slide 16
Horizontal Fragmentation (HF)
Horizontal fragmentation partitions a relation along its
tuples.
each fragment has a subset of the tuples of the relation.
There are two versions of horizontal partitioning:
Primary horizontal fragmentation
Derived horizontal fragmentation
Primary horizontal fragmentation of a relation is
performed
using predicates that are defined on that relation.
Derived horizontal fragmentation is the partitioning of a
relation
results from predicates being defined on another relation.
Slide 17
Information Requirements of HF
Database Information
how the database relations are connected to one another,
especially with joins.
In the relational model, these relationships are also depicted
as relations.
SKILL
TITLE, SAL
L1
PROJ
EMP
ENO, ENAME, TITLE
L2
ASG
PNO, PNAME, BUDGET, LOC
L3
ENO, PNO, RESP, DUR
Expression of Relationships Among Relations Using Links
cardinality of each relation: card(R)
Slide 18
Information Requirements of HF
Application Information
simple predicates : Given R[A1, A2, …, An], a simple
predicate pj is
pj : Ai θValue
where θ {=,<,≤,>,≥,≠}, Value Di and Di is the domain
of Ai.
For relation R we define Pr = {p1, p2, …,pm}
Example :
PNAME = "Maintenance"
BUDGET ≤ 200000
Slide 19
Information Requirements of HF
Application Information
minterm predicates : Given R and Pr = {p1, p2, …,pm}
define M = {m1,m2,…,mr} as
M = { mi | mi =
pjPr pj* }, 1≤j≤m, 1≤i≤z
where pj* = pj or pj* = ¬(pj).
Example
m1: PNAME="Maintenance" BUDGET≤200000
m2: NOT(PNAME="Maintenance") BUDGET≤200000
m3: PNAME= "Maintenance" NOT(BUDGET≤200000)
m4: NOT(PNAME="Maintenance") NOT(BUDGET≤200000)
Slide 20
Information Requirements of HF
Application Information
In terms of quantitative information about user applications,
we need to have two sets of data.
minterm selectivities: sel(mi)
o The number of tuples of the relation that would be accessed
by a user query which is specified according to a given
minterm predicate mi.
access frequencies: acc(qi)
o The frequency with which a user application qi accesses
data.
o Access frequency for a minterm predicate can also be
defined.
Slide 21
Primary Horizontal Fragmentation
Definition :
A primary horizontal fragmentation is defined by a selection operation
on the owner relations of a database schema. Therefore, given relation
R, its horizontal fragments are given by
Ri = Fi(R), 1 ≤ i ≤ w
where Fi is a selection formula used to obtain fragment Ri. If Fi is in
conjunctive normal form, it is a minterm predicate (mi).
A horizontal fragment Ri of relation R consists of all the tuples of R which
satisfy a minterm predicate mi.
Given a set of minterm predicates M, there are as many horizontal fragments
of relation R as there are minterm predicates.
Set of horizontal fragments also referred to as minterm fragments.
Slide 22
Primary Horizontal Fragmentation
PROJ1 = LOC=“Montreal” (PROJ)
PROJ2 = LOC=“New York” (PROJ)
PROJ3 = LOC=“Paris” (PROJ)
Primary Horizontal Fragmentation of Relation PROJ
Slide 23
PHF – Algorithm
Given:
A relation R, the set of simple predicates Pr
Output: The set of fragments of R = {R1, R2,…,Rw} which
obey the fragmentation rules.
Preliminaries :
Pr should be complete
Pr should be minimal
Slide 24
Completeness of Simple Predicates
A set of simple predicates Pr is said to be complete if
and only if there is an equal probability of access by
every application to any tuple belonging to any minterm
fragment that is defined according to Pr.
Example :
Assume PROJ[PNO,PNAME,BUDGET,LOC] has two
applications defined on it.
Find the budgets of projects at each location. (1)
Find projects with budgets less than $200000. (2)
Slide 25
Completeness of Simple Predicates
According to (1),
Pr={LOC=“Montreal”,LOC=“New York”,LOC=“Paris”}
which is not complete with respect to (2).
Modify
Pr ={LOC=“Montreal”,LOC=“New York”,LOC=“Paris”,
BUDGET≤200000,BUDGET>200000}
which is complete.
Slide 26
Minimality of Simple Predicates
If a predicate influences how fragmentation is performed,
(i.e., causes a fragment f to be further fragmented into, say,
fi and fj) then there should be at least one application that
accesses fi and fj differently.
In other words, the simple predicate should be relevant in
determining a fragmentation.
If all the predicates of a set Pr are relevant, then Pr is
minimal.
Pi is relevant if and only if
acc(mi ) acc(m j )
card ( f i ) card ( f j )
acc(mi ) the access frequency of a minterm mi.
Slide 27
Minimality of Simple Predicates
Example :
Pr ={LOC=“Montreal”,LOC=“New York”, LOC=“Paris”,
BUDGET≤200000,BUDGET>200000}
is minimal (in addition to being complete). However, if we add
PNAME = “Instrumentation”
then Pr is not minimal.
Slide 28
COM_MIN Algorithm
Given:
a relation R and a set of simple predicates Pr
Output: a complete and minimal set of simple predicates
Pr' for Pr
Rule 1: a relation or fragment is partitioned into at least
two parts which are accessed differently by at
least one application.
Slide 29
COM_MIN Algorithm
Initialization :
find a pi Pr such that pi partitions R according to Rule 1
set Pr' = pi ; Pr Pr – {pi} ; F {fi}
Iteratively add predicates to Pr' until it is complete
find a pj Pr such that pj partitions some fk defined
according to minterm predicate over Pr' according to Rule 1
set Pr' = Pr' {pj}; Pr Pr – {pj}; F F {fj}
if pk Pr' which is nonrelevant then
Pr' Pr' – {pk}
F F – {fk}
Slide 30
PHORIZONTAL Algorithm
Makes use of COM_MIN to perform fragmentation.
Input:
a relation R and a set of simple predicates Pr
Output: a set of minterm predicates M according to
which relation R is to be fragmented
Pr' COM_MIN (R,Pr)
determine the set M of minterm predicates
determine the set I of implications among pi Pr‘
Iteratively eliminate the contradictory minterms from M
M M-mi
Slide 31
PHF – Example
Two candidate relations : PAY and PROJ.
Fragmentation of relation PAY
Application: Check the salary info and determine raise.
Employee records kept at two sites application run at
two sites
Simple predicates
p1 : SAL ≤ 30000
p2 : SAL > 30000
Pr = {p1,p2} which is complete and minimal Pr'=Pr
Minterm predicates
m1 : (SAL ≤ 30000)
m2 : NOT(SAL ≤ 30000) (SAL > 30000)
Slide 32
PHF – Example
PAY1
TITLE
PAY2
SAL
TITLE
SAL
Mech. Eng.
27000
Elect. Eng.
40000
Programmer
24000
Syst. Anal.
34000
Slide 33
PHF – Example
Fragmentation of relation PROJ
Applications:
o Find the name and budget of projects given their location.
– Issued at three sites
o Access project information according to budget
– one site accesses ≤200000 other accesses >200000
Simple predicates
For application (1)
p1 : LOC = “Montreal”
p2 : LOC = “New York”
p3 : LOC = “Paris”
For application (2)
p4 : BUDGET ≤ 200000
p5 : BUDGET > 200000
Pr = Pr' = {p1,p2,p3,p4,p5}
Slide 34
PHF – Example
Fragmentation of relation PROJ continued
Minterm fragments left after elimination
m1 : (LOC = “Montreal”) (BUDGET ≤ 200000)
m2 : (LOC = “Montreal”) (BUDGET > 200000)
m3 : (LOC = “New York”) (BUDGET ≤ 200000)
m4 : (LOC = “New York”) (BUDGET > 200000)
m5 : (LOC = “Paris”) (BUDGET ≤ 200000)
m6 : (LOC = “Paris”) (BUDGET > 200000)
Slide 35
PHF – Example
The result of the primary horizontal fragmentation of PROJ forms six
fragments FPROJ = {PROJ1, PROJ2, PROJ3, PROJ4, PROJ5, PROJ}
according to the minterm predicates M .
Since fragments PROJ2, and PROJ5 are empty, they are not depicted in
Figure
PROJ3
PROJ1
PNO
PNAME
BUDGET
P1
Instrumentation
150000
LOC
PNO
Montreal
P2
PROJ4
PNO
P3
PNAME
BUDGET
LOC
New York
Database Develop. 135000
PROJ6
PNAME
BUDGET
CAD/CAM
250000
LOC
PNO
New York
P4
PNAME
Maintenance
BUDGET
LOC
310000
Paris
Slide 36
PHF – Correctness
Completeness
Since Pr' is complete and minimal, the selection
predicates are complete
Reconstruction
If relation R is fragmented into FR = {R1,R2,…,Rr}
R = Ri FR Ri
Disjointness
Minterm predicates that form the basis of fragmentation
should be mutually exclusive.
Slide 37
Derived Horizontal Fragmentation
Defined on a member relation of a link according to a
selection operation specified on its owner.
Each link is an equijoin.
Equijoin can be implemented by means of semijoins.
SKILL
TITLE, SAL
L1
EMP
PROJ
ENO, ENAME, TITLE
L2
PNO, PNAME, BUDGET, LOC
L3
ASG
ENO, PNO, RESP, DUR
Slide 38
DHF – Definition
Given a link L where owner(L)=S and member(L)=R, the
derived horizontal fragments of R are defined as
Ri = R ⋉ Si, 1≤i≤w
where w is the maximum number of fragments that will be
defined on R and
Si = Fi (S)
where Fi is the formula according to which the primary
horizontal fragment Si is defined.
Slide 39
DHF – Example
Given link L1 where owner(L1)=SKILL and
member(L1)=EMP
EMP1 = EMP ⋉ SKILL1
EMP2 = EMP ⋉ SKILL2
where
SKILL1 = SAL≤30000(SKILL)
SKILL2 = SAL>30000(SKILL)
EMP2
EMP1
ENO
ENAME
E3
E4
E7
A. Lee
J. Miller
R. Davis
TITLE
Mech. Eng.
Programmer
Mech. Eng.
ENO
ENAME
TITLE
E1
J. Doe
Elect. Eng.
E2
E5
E6
E8
M. Smith
B. Casey
L. Chu
J. Jones
Syst. Anal.
Syst. Anal.
Elect. Eng.
Syst. Anal.
Slide 40
DHF – Correctness
Completeness
Referential integrity
o ensures that the tuples of any fragment of the member
relation are also in the owner relation.
o Let R be the member relation of a link whose owner is
relation S which is fragmented as FS = {S1, S2, ..., Sn}.
Furthermore, let A be the join attribute between R and S.
Then, for each tuple t of R, there should be a tuple t' of S
such that
t[A] = t' [A]
Reconstruction
Let a relation R with fragmentation FR = { R1,R2,……Rw }
Disjointness
Simple join graphs between the owner and the member
fragments.
Slide 41
DHF – Correctness
Slide 42
Vertical Fragmentation
More difficult than horizontal, because more alternatives
exist.
Example:
In horizontal partitioning, if the total number of simple predicates
in Pr is n, there are 2n possible minterm predicates that can be
defined on it.
some of these will contradict the existing implications, further
reducing the candidate fragments that need to be considered
In the case of vertical partitioning if a relation has m nonprimary key attributes, the number of possible fragments is
equal to B(m), which is the mth Bell number.
For large values of m;B(m)= approximately (mm)
for m=10, B(m) =115,000,
for m=15, B(m) =109,
for m=30, B(m) = 1023
Slide 43
Hybrid Fragmentation
In most cases a simple horizontal or vertical
fragmentation of a database schema will not be
sufficient to satisfy the requirements of user
applications.
In this case a vertical fragmentation may be followed
by a horizontal one, or vice versa, producing a tree
structured Partitioning.
Since the two types of partitioning strategies are
applied one after the other, this alternative is called
hybrid fragmentation.
It has also been named mixed fragmentation or nested
fragmentation.
Slide 44
Hybrid Fragmentation
It is also called mixed fragmentation or nested fragmentation.
R
HF
HF
R1
R2
VF
VF
VF
R11
R12
R21
VF
R22
VF
R23
Slide 45
Correctness of Hybrid Fragmentation
To reconstruct the original global relation in case of hybrid
fragmentation, one starts at the leaves of the partitioning
tree and moves upward by performing joins and unions.
The fragmentation is complete if the intermediate and leaf
fragments are complete.
Similarly, disjointness is guaranteed if intermediate and leaf
fragments are disjoint.
Slide 46
Allocation
Allocation Problem
Given
F = {F1, F2, …, Fn}
fragments
S ={S1, S2, …, Sm}
network sites
on which a set of applications Q = {q1, q2,…, qq} is running.
The allocation problem involves finding the “optimal” distribution of F to S.
Optimality can be defined with respect to two measures:
Minimal cost
o The cost function consists of the cost of storing each Fi at a site Sj,
o the cost of querying Fi at site Sj , the cost of updating Fi at all sites where it is
stored,
o the cost of data communication.
Performance
o minimize the response time.
o maximize the system throughput at each site.
Slide 47
Allocation Model
General Form
min(Total Cost)
subject to
response time constraint
storage constraint
processing constraint
Decision Variable
xij
1 if fragment Fi is stored at site Sj
0 otherwise
Slide 48
Allocation Model
Total Cost
query processing cost
all queries
all sites
cost of storing a fragment at a site
all fragments
Storage Cost (of fragment Fj at Sk)
(unit storage cost at Sk) (size of Fj) xjk
We choose a different approach in our model of the database
allocation problem (DAP) and specify it as consisting of the
processing cost (PC) and the transmission cost (TC).
Thus the query processing cost (QPC) for application qi is:
processing component + transmission component
Slide 49
Allocation Model
Query Processing Cost
Processing component PC, consists of three cost factors
the access cost (AC) + the integrity enforcement cost (IE) + the
concurrency control cost (CC)
Access cost
(no. of update accesses+ no. of read accesses) xij local processing cost at a site
all sites all fragments
o The first two terms calculate the number of accesses of user query qi
to fragment Fj.
o We assume that the local costs of processing them are identical.
o The summation gives the total number of accesses for all the
fragments referenced by qi. Multiplication by LPCk gives the cost of
this access at site Sk.
o We again use xij to select only those cost values for the sites where
fragments are stored.
Integrity enforcement and concurrency control costs
o Can be similarly calculated
Slide 50
Allocation Model
Query Processing Cost
Transmission component
cost of processing updates + cost of processing retrievals
In update queries it is necessary to inform all the sites where replicas exist,
while in retrieval queries, it is sufficient to access only one of the copies.
In addition, at the end of an update request, there is no data transmission
back to the originating site other than a confirmation message, whereas the
retrieval-only queries may result in significant data transmission.
Cost of updates
update message cost acknowledgment cost
all sites all fragments
all sites all fragments
Retrieval Cost
min all sites (cost of retrieval command cost of sending back the result)
all fragments
Slide 51
Allocation Model
Constraints
Response Time
execution time of query ≤ max. allowable response time for
that query
Storage Constraint (for a site)
storage requirement of a fragment at that site
storage capacity at that site
all fragments
Processing constraint (for a site)
processing load of a query at that site
processing capacity of that site
all queries
Slide 52
Thank You
Slide 53