Transcript CS244A1

Distributed Database Systems
1
A Distributed Database on a
Geographically Dispersed Network
2
A Distributed Database on
a Local Network
3
A Multi-Processor System
4
Types of Accesses to a
Distributed Database
5
Distributed Access Plan
1)
2)
3)
At site 1
Send sites 2 and 3 the supplier number SN
At sites 2 and 3
Execute in parallel, upon receipt of the supplier
number, the following program:
Find all PARTS records having
SUP # = SN;
Send result to site 1
At Site 1
Merge results from sites 2 and 3;
Output the result.
6
7
Components of a Commercial DDBMS
8
Data Distribution
Problem:
Choose a unit of the logical database to use for
assignment to data modules.
Possibilities:
Relations
–Distribution issues will influence
logical database design.
Columns
–Distribution issues will influence
logical database design.
Rows
–Too many; Directories become too
large.
Data Items -Too many; Directories become too
large.
9
Data Distribution
Fragments – Logically
defined rectangular
subsets of relations
Fragment
1
Fragment
2
Fragment
3
Relation 1
Fragment
1
Fragment
2
Relation 2
10
Data Distribution
Logical definition of fragments Name
Age
$
Jones
35
32K
Job-Title Supervisor
Salesman
Dept.
Black
A
$ > 30K
Fragment 1
$ < 30K
Fragment 2
Fragment 3
11
Data Distribution
Datamodules
DM1
DM2
DM3
F1
F2
F3
Personnel
F1
F2
Inventory
Assignment of Fragments to Datamodules
12
Data Distribution
Advantages of fragments as units of distribution.
Very flexible in size and definition.
Distribution choices are largely independent of logical
design.
13
System Considerations
Reliable Network
Pipelining
Logical Data Items
Database Operations:
Transactions:
Read
Write
Read Set
Write Set
Atomic – “All or Nothing” Effect
14
System Considerations (cont’d)
Each site in the DDBMS has one or both
of the following software modules:
Transaction Manager (TM)
Data Manager (DM)
TM’s
Read, Parse, and Optimize user queries
Handle all interface with the user
DM’s
Maintain physical database
Perform actual reads and writes
15
System Considerations (cont’d)
Transaction
Transaction
Transaction
Transaction
TM
DM
Data
TM
DM
Data
TM
DM
Data
TM’s communication only with DM’s
DM’s communication only with TM’s
16
Transaction Execution
Transaction
TM’s Action.
Begin
Set up temporary workspace.
Read (X)
Select a DM which stores X,
Send a message to this DM requesting X,
Place X in workspace.
Read (X)
No Action necessary
X is already in workspace.
Write (X)
Change the value of X.
Read (X)
No action necessary.
End
Send a pre-commit to each DM that stores a copy of X,
Await acknowledgements,
Send commit message
17
Optimal File Allocation In A
Distributed Database System
Given a number of computers that process
common information files, how can we:
allocate the files optimally so that the allocation
yields minimum overall operating costs (storage
and communication)?
meet access time requirements for each file?
not exceed the storage capacity of each
computer?
Note: A File may be viewed as a segment.
18
System Parameters
n Computers
m Files
Size of each file
Usage distribution for each file at each computer
Frequency of modification of each file at each
computer during usage
Access time requirement for each file at each
computer
Storage capacity of each computer.
Cost of storage per unit file length per computer.
Cost of transmission per unit file length per second
per pair of computers.
19
Model
COSTS
Total Cost
= Storage Costs + Transmission Costs
TC
= CS + C T
Transmission Costs
= Costs for Retrievals + Cost for Updates
CT
= CTR + CTU
CONSTRAINTS
Each file must be stored in at least one computer.
The storage capacity of each computer must not be exceeded.
The probability of exceeding the required access time for each
file must be less than a specified bound.
20
Mathematical Representation Model
21
22
23
24
25
26
Transmission Paths Between
Each Pair of Computers
27
28
Reliability Constraint
Assuming processors and channels each have identical reliability,
ap = availability of the processor
ac = availability of the channel
rj = # of redundant copies of the jth file
Aj = Availability of the jth file
Aj= ap [1 - (1 - acap)rj
For example ap = 0.98, ac = 0.99, then
Aj = 0.951 for rj = 1
Aj = 0.979 for rj = 2
29
30
File Directory for Distributed
Databases
31
User
Transaction
DDBMS
Transaction
Manager
Directory
Manager
To Other Nodes
Legend
Database
Manager
High-Level Request
Standard Database Call
Physical Access Call
Database
Directory
Fragment
Overview of the Directory Manager
Non-Local Request
32
Content of Directory
Global description
Fragmentation description
Allocation description
Mappings to local names
Access method description
Statistics on the database
Consistency information
33
Content of a Directory System
Physical (Static)
Logical (Dynamic)
Security
Operation
Location (Site, Copy #,
Disk, Page);
File Status (R, W)
(File, User, C);
Compression ratio
(Logical Operation Query
Data Value);
Creator;
Number of Backlog Jobs; C=Read/Write; Query Access Optimizer;
Creation Date;
Site Availability;
Version of the File Size; Resource Requirement;
Read Only;
Statistical Data
Gathering;
Write Only;
Protocols
Processing Cost;
Code Format;
Communication Cost;
Translation Cost;
Date of Last Update;
34
The Functional Objectives of
Integrated Dictionary/Directory
To support the control of data resources
Maintaining data independence, security, and
integrity
To support applications development
Offering standardized data definitions and
usage characteristics
Established program entities, DDL
To provide independence of directory
data elements
Different hardware and software
environments
Changes in these environments
35
Possible Data Types In IDD
Data names, definitions, formats and sizes.
Integrity constraints, authorization tables, and usage statistics for
transaction management.
Schemas and sub-schemas.
Description of standardized transactions and reports.
Characteristics of hardware, such as processors, lines, and terminals.
Description of users.
The IDD must support the maintenance of relationships between various
entities such as:
Associations between
Authorization tables and data,
Users and transactions
Reports
The IDD supplies version control
36
Attribute
Attribute
Attribute
Entity
Relationship
Entity
Attribute
Attribute
Attribute
Figure 1
37
Maximum Length
400 Characters
Relationship Created
820708
Entity Created
820114
Payroll Record
Contains
Social
Security
Number
Entity Created
820519
Comments
Length
9 Characters
Figure 2
38
Schema
Model
Level
Typical
Meta-Entity-Types
Schema
Level
Typical
Typical
Entity-Types,
Relationship-Types,and
Attribute-Types
Entities, Relationships, and
Attributes
Social-Security-Number
Element
Entity-Type
Agency-Name
Employee Record
Record
Payroll Record
Document
Relationship-Type
Attribute-Type
Dictionary Level
Form 1040
FIPS Guideline
Record-Contains-Element
Payroll-Record-ContainsEmployee-Name
Length
9 Characters
Creator
ADP Division
Table 1
39
Classes of Directory
Centralized Directory
Single Master Directory
Extended Centralized Directory
Multiple Master Directory
Local Directory
Distributed Directory
40
41
42
43
44
45
Causes For Directory Update
Changing the description or structure of
the user database.
Moving user database entities from one
node to another.
Changing the description of a user or node.
Changing a user view.
Changing a network node’s status.
46
Specific Drawbacks with Globally
Replicated Directories
1) Additional remote activity to maintain
directory coherence.
2) Difficulty of posting directory changes to
a down site.
3) Difficulty of integrating a new site.
4) Storage of directory entries where they
are not referenced.
5) Blurred responsibility for maintaining the
directory.
47
Performance Measure
Operating Cost/Unit Time = Communication Cost
(Query+Update)
+Storage Cost + Code Translation Cost
(Query+Update)
Response Time
48
Operating Cost for the
Centralized Directory System
49
50
Cost Trade-offs of Directory Systems
Assume
Communication cost much greater than storage cost
No Translation cost
All computers have same directory update rate
Then the cost trade-off point is at directory update rate.
P(C,EC) = 2/(N – 1)b
P(C,D) = 2/(N – 1)
P(L,D) = 1
51
52
Type
Description
Advantages
Disadvantages
Centralized
Single Master
directory
Simplicity
Transmission costs and
delays
Extended Centralized Variation of the
centralized case in
which the directory
information is
permanently appended
in the local node once
it is obtained from the
master directory
Multiple Master
Distributed Master
Localized
Ease of update
Reduces transmission
costs and delays
Coordinating updates of
local directories
Knowledge of appended
directories
Variation of the
centralized case in
which redundant copies
of the master directory
exist
Reduces transmission
costs and delays
Master at every
node
Fast Response
Local directory at
each node without
replication
Simple update procedure
Fall-soft
Characteristics
Storage requirements
Coordinating update of
redundant copies
Storage costs
Transmission costs for
updates to the directory
Directory Design Alternatives
Transmission costs for
non-local queries
53
Distributed Ingres Dictionary/Directory
Contain Four Types of Data:
Relation name and location
Information for parsing queries
(domain names, formats, etc.)
Performance information
(number of tuples, storage structures, etc.)
Consistency information
(protection, integrity constraints, etc. Does
not include control data for concurrency
control and synchronization)
54
SDD-1 Dictionary/Directory
The directory itself is defined and maintained like
any other user data. It can be logically fragmented,
distributed, and replicated across the distributed
DBMS’s.
A directory locator (a small highly static file of
directory fragment locations) is kept at every site
and is used by the TMs and DMs to plan and control
transactions and to help ensure DB integrity and
consistency across concurrent accesses of data
elements.
The transaction modules are capable of caching
remotely accessed directory data for subsequent
usage. This facility is provided on the presumption
that DB operations will exhibit the locality-ofreference characteristic.
55
PatientDB1
name
SSN
age
Vpatient : Patient Class
name
PatientDB2
SSN
name
age
SSN
patID
patID
{report}
PatReportDB2
patID
report
Note that a shaded box represents a real collection
and an unshaded box represents a virtual entity.
Figure 17: Pictorial diagram showing usefulness of keys.
56
personDB1
name
sex
age
ssn
Vperson : PersonClass
name
sex
age
personDB2
ssn
name
gender
ssn
job
job
Note that a shaded box represents a real collection
and an unshaded box represents a virtual entity.
People
V person
Virtual Collection
Figure 15: Pictorial diagram showing correspondence between virtual and real attributes.
57
Vretiree:retireClass
financeDB1
name
name
income
stockAmount
Vincome: incomeClass
financeDB2
stockAmount
name
pension
pension
Note that a shaded box represents a real collection and
an unshaded box represents a virtual entity.
Figure 18: Pictorial diagram for aggregation.
58
Vname: nameClass
first
middle
personDB1
name
last
Note that a shaded box represents a real collection
and an unshaded box represents a virtual entity.
Figure 19: Pictorial diagram of computed attribute.
59
financeDB1
name
Vretiree:retireClass
name
income
1
stockAmount
financeDB2
2
name
pension
Note that a shaded box represents a real collection
and an unshaded box represents a virtual entity.
Figure 20: Pictorial diagram of computed attribute.
60
carInsuranceDB1
carOwner
Vinsurance:insuranceClass
name
{insuranceAmounts}
amount
houseInsuranceDB2
houseOnwer
amount
Note that a shaded box represents a real collection
and an unshaded box represents a virtual entity.
Figure 21: Pictorial diagram showing grouping.
61
patientDB1
name
Vpatient : patientClass
name
{doctors}
(key)
docID
patientDB2
(pointer)
name
physician
relationship
patientDB1
Vdoctors : doctorClass
name
name
docID
docID
salary
patientDB1
name
salary
Note that a shaded box represents a real collection and
an unshaded box represents a virtual entity.
Figure 22: Pictorial diagram showing relationship.
62
VtreatedBy : treatedByClass
patient
doctor
(key)
(key)
amountOwed
patientDB1
name
docID
amountOwed
Vpatient : PatientClass
Vdoctor : DoctorClass
.
.
.
.
.
.
Note that a shaded box represents a real collection
and an unshaded box represents a virtual entity.
Figure 23: Pictorial diagram showing a named relationship.
63
VpersonPatient : personClass
name
patientDB1
name
SSN
Vpatient : patientClass
payment
patID
amount
VpersonDoctor : personClass
name
doctorDB2
name
Vdoctor : DoctorClass
docID
docID
salary
salary
Note that a shaded box represents a real collection and an unshaded box represents a virtual entity.
person
patient
VpersonPatient
Vpatient
doctor
Vdoctor
VpersonDoctor
Virtual collections
Figure 24: Pictorial diagram showing relationship.
64
ConceptSemType
conceptID
semTypeID
Vconcept
(key)
conceptID
semType
{termSet}
Vterm
termID
Concept
{stringSet}
conceptID
Vstring
termID
stringName
stringType
stringID
stringID
stringType
stringVal
Note that a shaded box represents a real collection and
an unshaded box represents a virtual entity.
Figure 30: Derivation of Virtual Entity Vconcept.
65
SemTypeDef
ID
DsemType
name
definition
ID
name
definition
{relatedTo}
SemTypeRel
DsemRelate
name1
relName
rel
semName
name2
status
status
Note that a shaded box represents a real collection and
an unshaded box represents a virtual entity.
Figure 31: Derivation of Virtual Entity VsemType.
66