Chapter 17 Supplementary
Download
Report
Transcript Chapter 17 Supplementary
Chapter 17: Additional Slides
April 9, 2016
Outline
Physical Data Management
Fragments
Distributed Query Processing
Transactions
Logical Data Management
Transparency
Conceptual Data Management
Physical Data Management
Fragments
What is a Fragment?
Vertical subset (project operation)
Horizontal subset (restrict operation)
Mixed Fragment (combination of project and restrict)
A fragment may be allocated to a single or
multiple sites
Fragments may be replicated where
there is a primary fragment as a single site
Copies of the fragment are placed at multiple sites
(secondary)
Physical Data Management
Distributed Query Processing
Involves both local (intra site) and global
(inter site) optimization.
Multiple optimization objectives
The weighting of communication costs
versus local processing costs depends on
network characteristics.
There are many more possible access
plans for a distributed query.
Physical Data Management
Distributed Query Processing cont’d
Local vs. Global query processing
In Local,
Queries are performed at a central server
(single site)
In Global,
Must decide which sites to access for the fragments
May need to move fragments from site to site
Multiple optimization is needed for Global
Due to the multiple sites and access plans
Many possible access plans for Global
Choosing the best one may be difficult
Physical Data Management
Distributed Query Processing cont’d
Communication Costs
Communication Time (CT)
Fixed Message Delay (MD)
Variable Transmission Time (TT)
CT = MD + TT
MD = Number of Messages * Delay per
message
TT = Number of bits/Data rate
Physical Data Management
Distributed Query Processing cont’d
Global Query
Example (p. 632)
List the order number,
order date, product
number, product name,
product price, and order
quantity for eastern
orders with a specified
customer number, date
range, and product
color.
Four possible access
plans
Product data
Customer-order data
Product data
Customer-order data
Physical Data Management
Distributed Query Processing cont’d
Access Plan 1
Move the Product table
to the Tulsa site where
the query is processed
Product data
Customer-order data
Product data
Customer-order data
Physical Data Management
Distributed Query Processing cont’d
Access Plan 2
Restrict the Product
table at the Denver Site
Then move result to the
Tulsa site to execute
the remainder of the
query
Product data
Customer-order data
Product data
Customer-order data
Physical Data Management
Distributed Query Processing cont’d
Access Plan 3
Perform join and
restrictions of EasternOrders and Eastern
Order-lines fragments
at the Tulsa site
Then move result to
Denver site to join with
Product Table
Product data
Customer-order data
Product data
Customer-order data
Physical Data Management
Distributed Query Processing cont’d
Access Plan 4
Restrict the Product
table at the Denver site
Move product numbers
to Tulsa and do
restrict/join
Then move result back
to Denver to combine
with Product table to
get product names
Product data
Customer-order data
Product data
Customer-order data
Physical Data Management
Distributed Query Processing cont’d
Obviously many different access plans can
be used to answer the same query
Need to investigate
actual network costs
the local processing costs at each site to
determine which access plan is the best
Physical Data Management
Transactions – 2 Phase Commit Protocol
2 Phase Commit (2PC)
Ensures that all transactions are Atomic
One site is selected as a Coordinator while
other sites are Participants
Each Participant site execute a different part of
the transaction
Two phases: Voting Phase and Decision Phase
Figure 17.18, page 634
Physical Data Management
Transactions – 2 Phase Commit Protocol
Several Complications IF
Failures during recovery and Timeouts
Log records are lost
Coordinator fails
Etc.
Several methods to resolve these, but out
of scope for this class
Logical Data Management
Transparency
Transparency is related to data independence.
With transparency, users can write queries with
no knowledge of the distribution, and distribution
changes will not cause changes to existing
queries and transactions.
Without transparency, users must reference
some distribution details in queries and
distribution changes can lead to changes in
existing queries.
Logical Data Management
Fragmentation Transparency
Fragmentation transparency provides the
highest level of data independence.
Users formulate queries and transactions
without knowledge of fragments (locations,
or local formats).
If fragments change, queries and
transactions are not affected.
Table 17.6, p 626
Logical Data Management
Location Transparency
Location transparency provides a lesser
level of data independence than
fragmentation transparency.
Users need to reference fragments in
formulating queries and transactions.
However, knowledge of locations and local
formats is not necessary.
Table 17.7, p 627
Conceptual Data Management
Schema Integration
Multiple types of schemas may exist to
describe the same dataset
Integrate multiple schemas into a single
schema
Best explained using an Exercise
Conceptual Data Management
Schema Integration - Exercise
Engineering Database – Relational Schema
First, find the common
entities and relationships
between schemas
E(eno, ename, title), p.k. = eno
J(jno, jname, budget, loc, cname), p.k. = jno
G(eno, jno, resp, dur), p.k. = eno, jno
S(title, sal), p.k. = title
Employee Database – CODASYL Schema
Database III – E/R Model
Department(dept-name, budget, manager)
Employee(e#, name, address, title, salary)
Department Employs Employee (1:N relationship)
• Entities
Engineer(Engineer No, name, title, salary)
Project(PNo, project name, budget, location)
Client(Client Name, Address)
• Relationships
Engineer Works_In Project : (Responsibility, Duration)
Project Contract_By Client : (Contract Date)
Conceptual Data Management
Schema Integration - Exercise
Engineering Database – Relational Schema
Second, draw the conceptual
diagram for the common
entities and relationships
E(eno, ename, title), p.k. = eno
J(jno, jname, budget, loc, cname), p.k. = jno
G(eno, jno, resp, dur), p.k. = eno, jno
S(title, sal), p.k. = title
Employee Database – CODASYL Schema
Database III – E/R Model
Department(dept-name, budget, manager)
Employee(e#, name, address, title, salary)
Department Employs Employee (1:N relationship)
• Entities
Engineer(Engineer No, name, title, salary)
Project(PNo, project name, budget, location)
Client(Client Name, Address)
• Relationships
Engineer Works_In Project : (Responsibility, Duration)
Project Contract_By Client : (Contract Date)
Conceptual Data Management
Schema Integration
No single correct solution to this exercise
It may be seen that there are multiple
solutions to this problem
Considered a very HARD problem
Often hard to find the best synonyms
especially from a large set of schemas
Weekly Exercise
Questions 2, 4, and 5