Transcript document

Distributed databases
A Distributed Database is a partnership of multiple autonomous database
systems, interconnected by communications network.
Advantages over a single centralised system are
•Greater Reliability
– When one site fails, others continue to function
•Greater Throughput
– Many processors and data stores at work simultaneously, perhaps
independently
•Greater capacity for growth (especially if homogeneous)
– A new instance of identical hardware & software is easily added
•Greater autonomy for departments within an organisation
– No consent or cooperation needed from centralised DP authority to
organise, or reorganise, data needed for one purpose only
– Each site may still operate, though perhaps in a limited way, for
benefit of local users even if disconnected from other sites
http://csiweb.ucd.ie/staff/acater/comp30150.html
A Distributed Database should support
•location transparency
– users & programs unaware of data location
•replication transparency (cf phone book)
– any given data may be stored at several distinct sites;
– each site might (or might not) have its own local copy of data
•perhaps, fragmentation transparency
– logical wholes may in fact be split across sites
Location, replication and fragmentation transparency are all aspects of
physical data independence;
A distributed system should look like a centralised system to a user ->
internal level problems (not external, conceptual: remember ISO model)
http://csiweb.ucd.ie/staff/acater/comp30150.html
Replication of data should improve availability and performance.
• Retrieval should in general use the nearest available copy to cut
communication cost.
• As usage pattern evolves, one may want to add, remove, or relocate
replicas of data: need to have replication transparency, where details of
locating and maintaining replicas are handled by the system.
But it also brings along a new problem:
• Updates will have to be propagated to all copies at all sites.
– This requires communication between systems.
– Special provision must be made for unavailable sites.
http://csiweb.ucd.ie/staff/acater/comp30150.html
There are several problems to be resolved in a distributed system:
• Update propagation
• Transaction atomicity
• Efficient query processing
• Concurrency control
• Catalog management
These problems should be solved in a way that
• Does not compromise the supposed advantages of a distributed database
(reliability, throughput, ease of growth, local autonomy)
• Does not necessitate excessive communication
– Communication is always slow compared with speeds within a local
system - difference is 2 or 3 orders of magnitude.
– A goal in distributed systems is to economise on the number of
messages, and also the volume of data, to be transmitted.
http://csiweb.ucd.ie/staff/acater/comp30150.html
Update propagation
An update to any logical data object should be propagated to all physical
copies of the object.
Difficulty arises where a site holding a copy has become disconnected
because of site failure or link failure: its replica cannot be accessed at the
time of update.
Possible resolutions to the difficulty:
•Immediate propagation, rollback if not possible
•Propagate immediately to accessible copies, with delay otherwise
•Primary Copy strategy
•Moving Primary Copy refinement
•Read-Only Snapshots
http://csiweb.ucd.ie/staff/acater/comp30150.html
The strategy of propagating updates immediately is unacceptable if it
requires that an update fails where any replica site is unavailable.
If we have N nodes, and p (p < 1) is the probability of a successful
connection, then pN is probability of update succeeding. For large N, the
probability of success can become too low.
This strategy seriously compromises the supposed availability and autonomy
advantages of distributed databases.
http://csiweb.ucd.ie/staff/acater/comp30150.html
There is an improvement if update is done at all accessible sites immediately,
& updates for each inaccessible site are delayed for processing when that site
becomes available (at restart time).
This is better for availability and for autonomy
But when a site restarts,
•It must communicate with all other sites to determine whether any updates
need to be performed on any of its data. Comms cost!
•At that time, other sites may have become inaccessible. Does the restart
have to be deferred? Then Goodbye Autonomy!
http://csiweb.ucd.ie/staff/acater/comp30150.html
One copy of each logical data item may be designated as the primary copy of
that item. Primary copies of different items should be stored at different
locations, not centralised.
Updates are then directed at the primary copy, and are deemed complete as
soon as they are applied to the primary copy - ie control returns to the
updating transaction.
The site holding the primary copy is then responsible for broadcasting the
update to all other sites after applying the initial update.
In this case, other sites can be doing their updates in parallel with the
transaction requesting the update.
http://csiweb.ucd.ie/staff/acater/comp30150.html
One problem here is that after updating the primary copy of a logical data
item X, some transaction(s) may inspect a secondary copy of X that has
not yet been updated: system has to deal with this problem somehow.
This could involve the same transaction that was doing the update,
and/or it could involve distinct (?concurrent) transactions.
Another problem is unavailability of the primary copy site.
One solution to that is the so-called “moving primary copy solution”:
• If the primary copy site becomes unavailable, some new primary copy site
is “elected” and the old primary copy site is “deposed”; it becomes just
another secondary copy site when it again becomes available.
http://csiweb.ucd.ie/staff/acater/comp30150.html
A really difficult problem is partitioning:
• If, due probably to communication link failure, the network becomes
partitioned into two isolated sub-networks, then a single object could come to
have different values in the two partitions:
• no easy solution to reconciling these when network is reunited.
In practice, many applications can often tolerate slightly out-of-date data.
If this is so, there can be a single master copy of each item used by all
transactions that need to do updates, and snapshots - rather than replicas serving some read-only transactions.
•Snapshots are like views (cf relational model), but they are real objects,
actually stored in the database, and are read-only.
DEFINE SNAPSHOT LONDONSUPPLIERS
AS SELECT S#, SNAME, STATUS FROM S WHERE
REFRESH EVERY DAY
http://csiweb.ucd.ie/staff/acater/comp30150.html
CITY=‘LONDON’
Transaction Atomicity: Two-phase commit protocol.
• Sites have distinct resource managers, vulnerable to independent failure
• No centralised resource manager since wish to preserve local autonomy
Two-Phase Commit method:
1) any agent (cohort) may decide to rollback transaction
2) when all agents complete, coordinator broadcasts “get ready”
3) each site responds “ok” (ready either way) or “not ok” (or silent)
4) coordinator broadcasts commit/rollback
5) each site does as told (eventually), and acknowledges
6) coordinator terminates
Some loss of autonomy is inevitable. Here, between points 3 & 5, sites have
relinquished some of their autonomy. Like a wedding ceremony …
The procedure is potentially expensive in terms of message count.
http://csiweb.ucd.ie/staff/acater/comp30150.html
Theorem: No finite procedure guarantees atomicity.
For suppose one exists, needing minimum of M messages to be sent;
if some failure causes loss of Mth message,
• then if procedure works, M is not the minimum;
• if it doesn’t, it does not guarantee atomicity.
Either way, there is a contradiction.
Is Two-Phase Commit not a finite procedure?
Or does it need a minimum of 0 messages to be sent?
http://csiweb.ucd.ie/staff/acater/comp30150.html
Efficient Query processing
Suppose a query at site A involves data at site B.
Various possibilities:
a) process query at B and transmit result to A
b) move B data to A and process it there
c) if joining relation at A to relation at B
i)
move RA to B (or vice versa), or
ii) move RA, RB to some C
Generally, for realistic queries, various strategies are available
Communication cost is always a major factor, but may sometimes not be the
most important - overload, supercomputer misuse, etc.
http://csiweb.ucd.ie/staff/acater/comp30150.html
Where data from more than one site is needed, there are generally many
different ways of distributing the computation.
Estimates of the sizes of intermediate results may allow a system to pick a
strategy which minimises the communication cost: (such counts may be
maintained automatically, using eg triggers/daemons in integrity rules)
•global optimisation
•hill-climbing
2 components of comms cost:
•time transmitting data (volume)
•time establishing connections (number of messages)
Second component of cost especially favours the use of “set processing”
query languages, such as relational algebra, over one-item-at-a-time
languages.
http://csiweb.ucd.ie/staff/acater/comp30150.html
example of a distributed computation:
query at A: “get supplier numbers for London suppliers of red parts”
SELECT S#
FROM S,SP,P
WHERE S.CITY = ‘LONDON’
AND S.S# = SP.S#
AND SP.P# = P.P#
AND P.COLOUR = ‘RED’
environment:
S is at site A, has 10k tuples
P is at site B, has 100k tuples
SP is at site A, has 1000k tuples
data rate is 1000 tuples/second
access delay is 0.1 second
http://csiweb.ucd.ie/staff/acater/comp30150.html
If 10 Red Parts & 100k If 10k Red Parts & 1k
shipments by London
shipments by London
Suppliers
Suppliers
Move P to A
100 + 0.2
100 + 0.2
Move S, SP to B
1010 + 0.4
1010 + 0.4
http://csiweb.ucd.ie/staff/acater/comp30150.html
If 10 Red Parts & 100k If 10k Red Parts & 1k
shipments by London
shipments by London
Suppliers
Suppliers
Move P to A
100 + 0.2
100 + 0.2
Move S, SP to B
1010 + 0.4
1010 + 0.4
Check part colour for
each London shipment
200 + 20000
2 + 200
Check supplier city for
each Red part
0.02 + 2.2
20 + 2000.2
http://csiweb.ucd.ie/staff/acater/comp30150.html
If 10 Red Parts & 100k If 10k Red Parts & 1k
shipments by London
shipments by London
Suppliers
Suppliers
Move P to A
100 + 0.2
100 + 0.2
Move S, SP to B
1010 + 0.4
1010 + 0.4
Check part colour for
each London shipment
200 + 20000
2 + 200
Check supplier city for
each Red part
0.02 + 2.2
20 + 2000.2
Move London
shipments to B
100 + 0.3
1 + 0.3
Move Red Parts to A
0.01 + 0.2
10 + 0.2
http://csiweb.ucd.ie/staff/acater/comp30150.html
Concurrency control in distributed DBMS
Locking techniques offer some equivalent serialisation
• Deadlock is possible, so must be avoided or repaired
• Transactions must obey a protocol:
– to read, S lock 1 replica
– to update, X lock every (available?) replica
– commit before releasing locks
• Many messages may be required, for lock requests, lock grants, actual
updates, acknowledgements/error codes, lock releases.
Timestamping techniques offer one specific serialisation
• Deadlock is not possible.
• Globally unique timestamps are required, roughly synchronised
http://csiweb.ucd.ie/staff/acater/comp30150.html
Distributed locking also introduces a new problem - “global deadlock”.
• every local wait-for graph is free of cycles,
• but the global graph contains a cycle.
To detect a cycle in the global graph requires a lot of communication.
• An individual site may detect whether there is potentially a deadlock
involving locks it manages,
• But determining whether there actually is deadlock can require a lot of
communication.
Do Not Centralise This Graph:
to centralise increases vulnerability and decreases autonomy
http://csiweb.ucd.ie/staff/acater/comp30150.html
“Conservative Timestamping” offers
1. no waste of work through restarts demanded by concurrency control
2. less communications traffic
3. but very little concurrency
•
•
No operation is performed until it can be guaranteed not to cause a
conflict + restart in the future:
Requests are delayed until system knows it cannot receive conflicting
requests from older transactions.
Each site keeps queues of read requests & update requests originating from
every other site, including itself;
•
It delays read requests until each update queue is non-empty and begins
with a request from a younger transaction
•
It delays update requests until every update queue is non-empty; and
then does oldest.
http://csiweb.ucd.ie/staff/acater/comp30150.html
• Conservative Timestamping needs occasional “null requests” to avoid the
system getting stuck.
• It needs no timestamps in the database; but does need communication
between all pairs of sites.
• It actually serialises the execution of transactions - there is no
concurrency at all. It is extremely pessimistic.
More optimistic timestamping approaches do exist, based eg on conflict
graph analysis using transaction classes. To work, this needs
• accurate classification of transactions
• ad-hoc queries to be assigned to a global class that conflicts with all others.
http://csiweb.ucd.ie/staff/acater/comp30150.html
Catalog management
The catalog (or data dictionary) of a database management system stores all
information about database objects. In a RDBMS:• relation names
• attribute names, their domains;whether key or not, whether nulls allowed
• Database-specific integrity constraints
• Access rights of users & programs
• In distributed system, Sites holding copies (whether primary or replica)
Without this metadata, a DBMS at a site cannot mediate access to data.
Where, and in what manner, should the catalog itself be stored in a
distributed database system?
• Centralised? Fully Replicated? Partitioned? Some other?
http://csiweb.ucd.ie/staff/acater/comp30150.html
• If centralised
– lose local autonomy
– availability problem
• If fully replicated
– must propagate local updates
– hard to add new site
• If partitioned (each site fully autonomous)
– gains autonomy, but
– need to broadcast for every remote access
• If both partitioned and centralised
– some autonomy lost
– no broadcast needed
– availability problem
• If some other
– each site should still at least store its own complete local catalog
http://csiweb.ucd.ie/staff/acater/comp30150.html