Presentation file ()

Download Report

Transcript Presentation file ()

In the name of God
Distributed Database
research group
Instructor: Dr. M. Rahgouzar
Samira Tasharofi
Reza Basseda
1
Outline











Introduction
Distributed Data Storage
Distributed Transaction
Commit Protocols
Concurrency Control in Distributed Database
Availability
Distributed Query Processing
Heterogeneous Distributed Databases
Directory Systems
Conclusion
Acknowledgement
2
Introduction

Distributed computing
 Consists of a number of processing elements (not
necessarily homogeneous ) that interconnected
by a computer network and that co-operate in
performing their assigned tasks

Distributed Database
 Database whose relations reside on different sites
 Database some of whose relations are replicated at
different sites
 Database whose relations are split between different
sites
3
Introduction

Distributed database management system


Software system that permits the management of
distributed database system
Advantages





Local autonomy
Improved performance (by proper fragmentation)
Improved reliability/availability (by replication)
Greater expandability
Greater shareability
4
Introduction (Cont.)

Disadvantages





Higher complexity
Higher software and hardware cost
Synchronization and co-ordination among the
sites
Higher maintenance overhead in case of
replication
Greater security problem
5
Distributed Data Storage


On Dynamic Fragmentation of Distributed Databases Using Partial
Replication – D. Pinto, G. Torres ,2002
Fragmentation


Three kinds of fragmentation : Vertical Fragmentation, Horizontal
Fragmentation and Mixed Fragmentation
Solution


horizontal fragmentation with partial replication
RBy2(bound) Algorithm


1. For each query requested, a slave computer increments a counter (ctr) for the
user that have made the request.
2. If ctr reaches bound number (parameter of this algorithm), then this computer
is a candidate to have a set of records replicated and need to follow steps 3 and
4, else step 5.




3. Request the set of records that the user is asking for and save this information into the
slave database.
4. Reset the user local counter to cero.
5. end
Local access database before sending the query to the master computer
6
Distributed Data Storage
(Cont.)

The algorithm is based on two techniques





Slave-Master Search, to provide fast access to database
on user queries
Replication slave-master (two computers) in order to get
availability
Allows database availability even when the
connection between slave and master database is
broken
Proved on local network
Future Work

Check how the algorithm works under dial-up connection
7
Distributed Data Storage
(Cont.)

Transparent Data Relocation in Highly Available
Distributed Systems,S. Voulgaris, M.V. Steen, A.
Baggio, and G. Ballintjn, 2002



Management issue of distributed services: redistribution of
non-replicated data among the servers comprising a
distributed service
Redistribute the data without disrupting the service’s
availability
Solution Base
 Shipping the data records that need to be relocated to their
new hosting server
 Updating the servers’ mapping information to reflect the
new configuration of the distributed service
8
Distributed Data Storage
(Cont.)

Solution For a Single Redistribution

Initialization



Record Relocation
Termination


Distribute new mapping M’
Replace M with M’
Solution for Overlapping Redistributions

Per-server Sequential Redistribution


Per-server Mixed but Ordered Redistributions


Using redistribution R2 after R1 completed
The server ships each record as soon as possible, based on the
virtual mapping with first preference
Direct Shipping to Final Destination
9
Distributed Data Storage
(Cont.)

Advantages
 low delays in the servicing of client requests during a
configuration change
 Adding no significant processing requirement to the servers
involved, and terminates in a timely fashion
 conceptual simplicity

sequential concurrent versions
10
Transaction Management

Ensuring Relaxed Atomicity for Flexible Transaction in Multidatabase
Systems, A.Zhang, M.Nodine, B.Bhargava, O.Bukhres, ACM

Global transaction: A set of sub transactions, where each sub transaction is
a transaction accessing the data items at a single local site

Flexible global transaction: Specifying Definitions of
 Execution ordering dependencies between two sub transaction
 Alternating dependencies between two subsets of sub transactions
 Eliminate prepare-to-commit stage
Sub transaction
 Retriable
 Compensable
 Pivot


In global transactions at most one sub transaction can be pivot
11
Transaction Management
(Cont.)

Semi-atomicity in Flexible Transaction: Weaker than atomicity in global
transaction
 All its sub transaction in one <-rpo commit and all attempted sub
transactions not in the committed <-rpo are either aborted or have their
effects undone
 No partial effects of its sub transactions remain permanent in local
database

Ensuring semi-atomicity
 Using retry and compensetable techniques

Flexible transaction Advantages
 Enhances the scope of global transaction management beyond that
offered by the traditional global transaction model
 Blocking that caused by two-phase commit can be prevented
12
Transaction Management
(Cont.)

Global scheduling for flexible Transactions in
Heterogeneous distributed Database Systems, A. Zhang, M.
Nodine, B. Bhargava

Global Serializability
 If the projection of committed local, flexible and surplus
transactions is conflict equivalent to some serial execution of
these transactions

Compensation-interference free
 For any sub transaction tj which is serialized between a
subtransaction ti and its compensating transaction cti in S, WC(ti)
/\ AC(tj) = 0
13
Transaction Management
(Cont.)

F-serializability



Prevents the flexible transactions which are serialized
between a flexible transaction and its compensating sub
transactions to affect any data items that have been
updated by flexible transaction (global serializable and
compensation free)
Avoids unnecessary abort or compensation
Scheduling Protocols


Stored Sub transaction Execution Graph (SSEG)
Avoid cascading abort
14
Commit Protocols

A Two-Phase Commit Protocol for Mobile Wireless
Environment, N. Nouali, A. Doucet, H. Drias, 2005

IF the traditional 2PC is executed in mobile environment,
disconnections will increase the number of, may be wrong,
abortion decisions of transaction because if a FH tries to
communicate with it a disconnected MH this will cause a failure

Disconnections are not exceptions but rather are part of the
normal mode of operation, so they should not be treated as
failures
15
Commit Protocols (Cont.)

The case of mobile client and fixed servers,
Fixed Coordinator


To mitigate the unforeseeable breakdowns, the client must
force-write the identity and location information of the
coordinator (commit-BS) just before sending the commitrequest
Only one force-write is needed to record the coordinator
information during the entire execution of Atomic
commitment Protocol (ACP)
16
Commit Protocols (Cont.)

The case of mobile client and mobile servers
 The participant-agent is responsible of transmitting the result to
the participant at reconnection time and also of keeping logs and
eventually recovering in the case of failure

When participant registers to a new BS, the participant MH (or
mobile participant) informs its participant-agent about its new
location

Workload is shifted to the fixed part of the network thus
preserving processing power and communication resources and
minimizing traffic cost over the wireless links
17
Commit Protocols (Cont.)

Reducing the Latency of Non-Blocking Commitment using
Optimism and Replication, R.J.Peris, M.P.Martinez,G.Alonso,
S.Arevalo, 2001

2Phase Commit : Blocking
3Phase Commit



Sending too much messages
Low performance

Flushing log records adds to the overall latency as messages cannot be
sent or responded to before writing to the log

Delay is reduced by allowing sites to send messages instead of flushing
log records
18
Commit Protocols (Cont.)

Solution




To use the main memory of a replicated group as stable
memory instead of a mirrored log with careful writes
The group of the participating transaction managers (the
TM group)
A replicated group providing the commit service that acts
as coordinator (the CS group)
A participant does not wait to flush its log, instead it
uniformly multicasts its vote together with its log entry
19
Commit Protocols (Cont.)


If the message corresponds to the last vote, and
all were yes votes, the transaction is optimistically
committed, and the fact is communicated to the
TM group
The optimistic commit changes the locks held by
the transaction to opt-mode

Do not allow to the holding transaction to commit until
the transaction that released them, definitively
commits
20
Commit Protocols (Cont.)

The message is optimistically delivered right away without waiting for
the stabilization of the message (e.g., waiting for the message to be
received by all the members of the group)
21
Commit Protocols (Cont.)



The PROMPT Real-Time Commit Protocol, J.H.Harista,
K.Ramamritham, R.Gupta,IEEE,1999
Distributed commit processing can have considerably more effect
than distributed data processing on real-time performance
PROMPT
 New commit protocol in real-time distributed transactions
 Preventing borrowers from continuing to execute if their
associated lenders had not been received their decisions was
addressed by incorporating an additional bit and message that
informed the master about the borrowing state and the
completion of borrowing by a cohort
22
Commit Protocols (Cont.)

Features
 Controlled optimistic access to uncommitted data


Active abort


Cohorts inform the master as soon as they decide to abort locally
Silent kill


Reduce data inaccessibility and priority inversion
Aborts due to deadline misses that occur before the master has
initiated the commit protocol are implemented silently
Healthy-Lending


Health factor : deadline of transaction
Using HF to decide whether transaction can lend its data (best
choice)
23
Commit Protocols (Cont.)

One-Phase Real-Time Commit Protocols, P.Saha, 1999

Comparing one-phase protocols (e.g. EP) and PROMPT, the
best-performing two-phase real-time commit protocol
 For parallel distributed transaction, EP outperforms PROMPT
 For sequential distributed transaction, EP perform rather poorly
 For high workload cases EP performs better

Future Works
 Addressing the security considerations in Multi-Level Secure
(MLS) distributed RTDBS
 Combination of EP and PROMPT
24
Concurrency Control in
Distributed Databases

Distributed Concurrency Control Performance: A Study of
Algorithms, Distribution, and Replication
 Michael J. Carey & Miron Livny
 It express Distributed Concurrency Control Algorithms and
evaluate their performance in some of conditions. At the start, It
describe Concurrency Control classic algorithms such as 2PL,
Wound-Wait, Basic Timestamp ordering and discussed on the
structure of distributed concurrency control algorithm. Then it
suggest a basic model for DDB and has some experiments with
those algorithm and evaluate the model.
25
Concurrency Control in
Distributed Databases (Cont.)

Concurrency Control in Distributed Database Systems
 PHILIP A. BERNSTEIN & NATHAN GOODMAN
 It explain mythological proofs for Distributed Concurrency Control
Algorithms and theoretically evaluate them. It define serializablity
in DDB and define a formal language to formulate transactions in
DDB. Then it express each of classic algorithms in his formal
language and prove their correctness and completeness.
26
Concurrency Control in Distributed
Databases (Cont.)


Concurrency Control in Distributed Object-Oriented Database
Systems
 Kjetil Nørv°ag & Olav Sandst°a & and Kjell Bratbergsengen
 The simulation results in this paper is a comparison of
performance and response times for two concurrency control
algorithms, timestamp ordering and two-phase locking. The
simulations have been run with different number of nodes,
network types, data declustering and workloads. The results
show that for a mix of small and long transactions, the throughput
is significantly higher for a system with a timestamp ordering
scheduler than for a system with a two-phase locking scheduler.
Implementing Atomic Actions on Decentralized Data
 DAVID P. REED
 It’s a general survey on Concurrency Control in DDB and
describe classic method.
27
Concurrency Control in Distributed
Databases (Cont.)

Dynamic Voting Algorithms for Maintaining the Consistency of a
Replicated Database
 SUSHIL JAJODIA & DAVID MUTCHLER
 The best known pessimistic algorithm, voting, is a “static”
algorithm, meaning that all potential distinguished partitions can
be listed in advance. It presents a dynamic extension of voting
called dynamic voting. This algorithm permits updates in a
partition provided it contains more than half of the up-to-date
copies of the replicated file. It also presents an extension of
dynamic voting called dynamic voting with linearly ordered copies
(abbreviated as dynamic-linear). These algorithms are dynamic
because the order in which past distinguished partitions were
created plays a role in the selection of the next distinguished
partition.
28
Concurrency Control in Distributed
Databases (Cont.)

Deadlock Detection in Distributed Databases


EDGAR KNAPP
This paper is concerned only with the aspect of deadlock
detection. Recent developments in the area of distributed
deadlock detection algorithms are surveyed, with a special
emphasis on their relation to distributed DBSs. The paper
introduces a uniform framework for the discussion of these
algorithms. The abstraction achieved this way allows us to
talk about the algorithms in terms of the underlying
theoretical concepts, instead of just giving a phenomenonlogical description of the workings of the algorithms.
29
Concurrency Control in Distributed
Databases (Cont.)

MODELS OF A VERY LARGE DISTRIBUTED DATABASE
 Mark Blakey
 The best known pessimistic algorithm, voting, is a “static”
algorithm, meaning that all potential distinguished partitions can
be listed in advance. It presents a dynamic extension of voting
called dynamic voting. This algorithm permits updates in a
partition provided it contains more than half of the up-to-date
copies of the replicated file. It also presents an extension of
dynamic voting called dynamic voting with linearly ordered copies
(abbreviated as dynamic-linear). These algorithms are dynamic
because the order in which past distinguished partitions were
created plays a role in the selection of the next distinguished
partition.
30
Concurrency Control in Distributed
Databases (Cont.)

Performance Study of a Centralized Concurrency Control
Algorithm for Distributed Database Systems using SIMULA
 K. Viswanathan Iyer & L. M. Patnaik
 One objective of this paper is to elaborate the simulation
methodology using SIMULA. Detailed studies have been carried
out on a centralized CC algorithm and its modified version. The
results compare well with a previously reported study on these
algorithms. Here, additional results concerning the update
intensiveness of transactions and the degree of conflict are
obtained. The degree of conflict is quantitatively measured and it
is seen to be a useful performance index. It seems that, It is
going to formulate the effectiveness of Concurrency Control
Algorithm and it focused on the behavior of a class of
performance index.
31
Availability


Maintaining Availability in Partitioned Replicated Databases
 AMR EL ABBADI & SAM TOUEG
 It describes a new replica control protocol that allows the
accessing of data in spite of site failures and network partitioning.
It claims that this protocol provides the database designer with a
large degree of flexibility in deciding the degree of data
availability, as well as the cost of accessing data.
Providing High Availability Using Lazy Replication
 RIVKA LADIN & BARBARA LISKOV & SANJAY GHEMAWAT
 This paper describes a new technique that supports causal order.
An operation call is executed at just one replica; updating of other
replicas happens by lazy exchange of “gossip” messages—
hence the name “lazy replication.” The replicated service
continues to provide service in spite of node failures and network
partitions.
32
Distributed Query Processing


Query Brokers for Distributed And Flexible Query Evaluation
 Tuyet-Trinh yu & Christine Collet
 This paper provides an approach for designing query processor
of a DDB by using hierarchical mediators and using query
brokers which translate a global query in DDB context to local
queries
Query Decomposition, Optimization and Processing in
Multidatabase Systems
 Cem Evrendilek & Asuman Dogac
 This paper suggest an approach to decomposing queries in a
optimized manner. In this way, we need to dynamically calculate
cost of query processing in every sites for all of the sub queries
and using this factors in calculating minimum cost.
33
Distributed Query Processing (Cont.)

Dynamically Distributed Query Evaluation
 Trevor Jim & Dan Suciu
 This paper provides an approach for evaluation of queries over
the web and a directory system dynamically. It provides a
language for explaining information requirements over a multi
database system. It uses this language for defining a DDB and its
queries in a formal way. So it suggest an algorithm for dynamic
query evaluation in DDB and by this logic it proves its algorithms
correctness.
34
Distributed Query Processing (Cont.)

Database Connectivity Using an Agent-Based Mediator System
 Larry M. Stephenes & Michael N Huhns
 This paper provides an Agent-Based approach for managing a
DDB. It uses Agents as proactive components which include KB
about system and have reaction to topology changes to manage
query processing and concurrency control , …. It uses KQML and
a specific coordination strategy for this system.
35
Distributed Query Processing (Cont.)

Optimizing Equijoin Queries In Distributed Databases Where
Relations Are t-lash Partitioned
 DENNIS SHASHA & TSONG-LI WANG
 It studies the optimization problem that arises when the query
processor must repartition the relations and intermediate results
participating in a multi join query. Using estimates of the sizes of
intermediate relations, it shows (1) optimum solutions for closed
chain queries; (2) the NP-completeness of the optimization
problem for star, tree, and general graph queries; and (3)
effective heuristics for these hard cases.
36
Conclusion



DDB is a mature topic and many model
provided for expressing its approaches such
as concurrency control, availability , …
Various models using to prove correctness of
its approaches in concurrency controls , …
It is faced with many of DB problems in a new
viewpoint because of its distribution

Legacy system and wrapper design to …
37
Conclusion (Cont.)


High availability by increasing replication vs.
performance of transaction management and
concurrency control: A trade off
DDB system Concurrency control open
problems


Improving recent approaches to increase
availability with performance
Improving distributed query evaluation over large
distributions
38
Conclusion (Cont.)


Tuple routing over a distribution and data
distribution with high performance and
performance evaluation factors in DDB
Using autonomous components to manage a
DDB and using AI in peer-to-peer DDB

39
Acknowledgement
40
References

Silbershots et al , “Database System
Concepts” 4th edition , McGraw-Hill, 2002
41