Chapter 17 Supplementary

Download Report

Transcript Chapter 17 Supplementary

Chapter 17: Additional Slides
April 9, 2016
Outline
Physical Data Management
Fragments
Distributed Query Processing
Transactions
Logical Data Management
Transparency
Conceptual Data Management
Physical Data Management
Fragments
 What is a Fragment?
Vertical subset (project operation)
Horizontal subset (restrict operation)
Mixed Fragment (combination of project and restrict)
 A fragment may be allocated to a single or
multiple sites
 Fragments may be replicated where
there is a primary fragment as a single site
Copies of the fragment are placed at multiple sites
(secondary)
Physical Data Management
Distributed Query Processing
Involves both local (intra site) and global
(inter site) optimization.
Multiple optimization objectives
The weighting of communication costs
versus local processing costs depends on
network characteristics.
There are many more possible access
plans for a distributed query.
Physical Data Management
Distributed Query Processing cont’d
Local vs. Global query processing
In Local,
Queries are performed at a central server
(single site)
In Global,
Must decide which sites to access for the fragments
May need to move fragments from site to site
Multiple optimization is needed for Global
Due to the multiple sites and access plans
Many possible access plans for Global
Choosing the best one may be difficult
Physical Data Management
Distributed Query Processing cont’d
Communication Costs
Communication Time (CT)
Fixed Message Delay (MD)
Variable Transmission Time (TT)
CT = MD + TT
MD = Number of Messages * Delay per
message
TT = Number of bits/Data rate
Physical Data Management
Distributed Query Processing cont’d
 Global Query
Example (p. 632)
List the order number,
order date, product
number, product name,
product price, and order
quantity for eastern
orders with a specified
customer number, date
range, and product
color.
 Four possible access
plans
Product data
Customer-order data
Product data
Customer-order data
Physical Data Management
Distributed Query Processing cont’d
 Access Plan 1
Move the Product table
to the Tulsa site where
the query is processed
Product data
Customer-order data
Product data
Customer-order data
Physical Data Management
Distributed Query Processing cont’d
 Access Plan 2
Restrict the Product
table at the Denver Site
Then move result to the
Tulsa site to execute
the remainder of the
query
Product data
Customer-order data
Product data
Customer-order data
Physical Data Management
Distributed Query Processing cont’d
 Access Plan 3
Perform join and
restrictions of EasternOrders and Eastern
Order-lines fragments
at the Tulsa site
Then move result to
Denver site to join with
Product Table
Product data
Customer-order data
Product data
Customer-order data
Physical Data Management
Distributed Query Processing cont’d
 Access Plan 4
Restrict the Product
table at the Denver site
Move product numbers
to Tulsa and do
restrict/join
Then move result back
to Denver to combine
with Product table to
get product names
Product data
Customer-order data
Product data
Customer-order data
Physical Data Management
Distributed Query Processing cont’d
Obviously many different access plans can
be used to answer the same query
Need to investigate
actual network costs
the local processing costs at each site to
determine which access plan is the best
Physical Data Management
Transactions – 2 Phase Commit Protocol
2 Phase Commit (2PC)
Ensures that all transactions are Atomic
One site is selected as a Coordinator while
other sites are Participants
Each Participant site execute a different part of
the transaction
Two phases: Voting Phase and Decision Phase
Figure 17.18, page 634
Physical Data Management
Transactions – 2 Phase Commit Protocol
Several Complications IF
Failures during recovery and Timeouts
Log records are lost
Coordinator fails
Etc.
Several methods to resolve these, but out
of scope for this class
Logical Data Management
Transparency
 Transparency is related to data independence.
 With transparency, users can write queries with
no knowledge of the distribution, and distribution
changes will not cause changes to existing
queries and transactions.
 Without transparency, users must reference
some distribution details in queries and
distribution changes can lead to changes in
existing queries.
Logical Data Management
Fragmentation Transparency
Fragmentation transparency provides the
highest level of data independence.
Users formulate queries and transactions
without knowledge of fragments (locations,
or local formats).
If fragments change, queries and
transactions are not affected.
Table 17.6, p 626
Logical Data Management
Location Transparency
Location transparency provides a lesser
level of data independence than
fragmentation transparency.
Users need to reference fragments in
formulating queries and transactions.
However, knowledge of locations and local
formats is not necessary.
Table 17.7, p 627
Conceptual Data Management
Schema Integration
Multiple types of schemas may exist to
describe the same dataset
Integrate multiple schemas into a single
schema
Best explained using an Exercise
Conceptual Data Management
Schema Integration - Exercise
Engineering Database – Relational Schema
First, find the common
entities and relationships
between schemas
E(eno, ename, title), p.k. = eno
J(jno, jname, budget, loc, cname), p.k. = jno
G(eno, jno, resp, dur), p.k. = eno, jno
S(title, sal), p.k. = title
Employee Database – CODASYL Schema
Database III – E/R Model
Department(dept-name, budget, manager)
Employee(e#, name, address, title, salary)
Department Employs Employee (1:N relationship)
• Entities
Engineer(Engineer No, name, title, salary)
Project(PNo, project name, budget, location)
Client(Client Name, Address)
• Relationships
Engineer Works_In Project : (Responsibility, Duration)
Project Contract_By Client : (Contract Date)
Conceptual Data Management
Schema Integration - Exercise
Engineering Database – Relational Schema
Second, draw the conceptual
diagram for the common
entities and relationships
E(eno, ename, title), p.k. = eno
J(jno, jname, budget, loc, cname), p.k. = jno
G(eno, jno, resp, dur), p.k. = eno, jno
S(title, sal), p.k. = title
Employee Database – CODASYL Schema
Database III – E/R Model
Department(dept-name, budget, manager)
Employee(e#, name, address, title, salary)
Department Employs Employee (1:N relationship)
• Entities
Engineer(Engineer No, name, title, salary)
Project(PNo, project name, budget, location)
Client(Client Name, Address)
• Relationships
Engineer Works_In Project : (Responsibility, Duration)
Project Contract_By Client : (Contract Date)
Conceptual Data Management
Schema Integration
No single correct solution to this exercise
It may be seen that there are multiple
solutions to this problem
Considered a very HARD problem
Often hard to find the best synonyms
especially from a large set of schemas
Weekly Exercise
Questions 2, 4, and 5