Read-only Transaction

Transcript Read-only Transaction

Distributed Transaction
Processing
Some of the slides have been borrowed from courses taught at Stanford, Berkeley, Washington,
and earlier version of CS 223 at UCI
Distributed Transaction
.
Transaction T
Action:
a1,a2
Action:
a3
Action:
a4,a5
Transaction T spans multiple databases
ICS214B
Notes 11
2
Distributed Transactions Processing
• Each DBMS performs locking & logging for recovery
• How to rollback T?
– easy. Each DBMS can rollback independently.
• How to commit T?
– If DBMS1 commits T, and DBMS 2 experiences a a failure
before REDO logs of T persistent, T cannot be committed
at DBMS2.
– Loss of atomicity.
• Atomic Commit Protocols
– Protocol guarantees all sites make the same decision –
commit, or abort!
Atomic Commit Protocol
• Guarantee of Atomic commit protocol:
– Protocol resilient to communication and site failures
– despite failures, if all failures repaired, then transactions commits or
aborts at all sites.
• Metrics to compare different protocols:
– Overhead: I/O (number of records written to disk), network
(number of messages– when no failures, when there are failures).
– Latency: number of rounds of messages, amount of forced I/O to
disk.
– Blocking: unbounded waiting due to failures.
– Recovery Complexity-- how complex is the termination protocol
• Most common ACP:
– Two-phase commit (2PC)
• Centralized 2PC, Distributed 2PC, Linear 2PC, …
– Three Phase Commit (3PC)
ICS214B
Notes 11
4
Terminology
• Resource Managers (RMs)
– Usually databases
• Participants
– RMs that did work on behalf of transaction
• Coordinator
– Component that runs two-phase commit on
behalf of transaction
ICS214B
Notes 11
5
States of the Transaction
• At Coordinator:
–
–
–
–
Initiated (I) -- transaction known to system
Preparing (P) -- prepare message sent to participants
committed (C) -- has committed
Aborted (A) -- has aborted
• At participant:
–
–
–
–
ICS214B
Initiated (I)
Prepared (P) -- prepared to commit, if the coordinator so desires
committed (C)
Aborted (A)
Notes 11
6
REQUEST-TO-PREPARE
Participant
Coordinator
PREPARED*
COMMIT*
DONE
ICS214B
Notes 11
1. Local Prepare Work.
Write prepare on
logs (Forced)
2. Local Prepare Work.
(lazy)
3. Write Commit record
on logs (Forced)
4. Local commit work.
Write completion log
record on logs. Ack
when durable.
(effectively forced,
though not
immediately)
5. Write Completion log
record (lazy)
7
Protocol Database
• Coordinator maintains a protocol database (in main
memory) for each transaction
• Protocol database
– enables coordinator to execute 2PC
– answers inquiries by participants about status of
transaction
• cohorts may make such inquiries if they fail during recovery
– entry for transaction deleted when coordinator is sure that
no one will ever inquire about transaction again (when it
has been acked by all the participants)
ICS214B
Notes 11
8
Two phase commit -- normal actions
commit case (coordinator)
– make entry into protocol database for transaction marking its status as
initiated when coordinator first learns about transaction
– Add participant to the cohort list in protocol database when coordinator
learns about the cohorts
– Change status of transaction to preparing before sending prepare message. (it
is assumed that coordinator will know about all the participants before this
step)
– On receipt of PREPARE message from cohort, mark cohort as PREPARED in the
protocol database. If all cohorts PREPARED, then change status to
COMMITTED and send COMMIT message.
• must force a commit log record to disk before sending commit message.
– on receipt of ACK message from cohort, mark cohort as ACKED. When all
cohorts have acked, then delete entry of transaction from protocol database.
• Must write a completed log record to disk before deletion from protocol
database. No need to force the write though.
ICS214B
Notes 11
9
Two Phase Commit - normal actions
commit case (participant)
• On receipt of PREPARE message, write PREPARED log
record before sending PREPARED message
– needs to be forced to disk since coordinator may now
commit.
• On receipt of COMMIT message, write COMPLETION
log record before sending ACK to coordinator
– cohort must ensure log forced to disk before sending ack -but no great urgency for doing so.
ICS214B
Notes 11
10
REQUEST-TO-PREPARE
Participant
Coordinator
NO
ABORT*
DONE
ICS214B
Notes 11
1. Local Prepare Work.
Write prepare on
logs (Forced) if
voting yes. Else, local
abort work. Write
Abort log record
(lazy)
2. Local Prepare Work.
(lazy)
3. Write Abort log
record (forced)
4. Local abort work
Write Abort record
on logs. Ack when
done (forced,
though not
immediately)
5. Write completion log
record (lazy)
11
Timeout actions
•
•
•
At various stages of protocol, transaction waits from messages at both coordinator and
participants.
If message not received, on timeout, timeout action is executed:
Coordinator Timeout Actions
– waiting for votes of participants: ABORT transaction, send aborts to all, delete from protocol
database
– waiting for ack from some participant: forward the transaction to recovery process that
periodically will periodically send COMMIT to participant. When participant will recover, and
all participants send an ACK, coordinator writes a completion log record and deletes entry
from protocol database.
•
Cohort timeout actions:
– waiting for prepare: abort the transaction, write abort log record, send abort message to
coordinator. Alternatively, it could wait for the coordinator to ask for prepare and then vote
NO.
– Waiting for decision: forward transaction to recovery process. Recovery process periodically
executes status-transaction call to the coordinator. Such a transaction is blocked for recovery
of failure.
– NOTE: The recovery process could have used a different termination protocol -- e.g., polling
other participants to reduce blocking. (cooperative Termination)
ICS214B
Notes 11
12
2PC is blocking
Sample scenario:
Coord
P2
W
P1
P3
W
P4
W
ICS214B
Notes 11
13
Case I:
P1  “W”; coordinator sent commits
coord
P1  “C”
Case II:
P1
P1  NO; P1  A
w
P2
w
P3
w
P4
 P2, P3, P4 (surviving participants)
cannot safely abort or commit transaction
ICS214B
Notes 11
14
Recovery Actions (cohort)
• All sites execute REDO-UNDO pass
• Detection: A site knows it is a cohort if it finds a prepared
log record for a transaction
• If the log does not contain a completion log record:
– reacquire all locks for the transaction
– ask coordinator for the status of transaction
• The coordinator, if it has no information about the transaction in its
protocol database in memory, it returns ABORT message. If it has
information and transaction committed, it sends back a COMMIT. Else,
if it has information but transaction not yet committed, it sends back a
WAIT.
• If log contains a completion log record
– do nothing
ICS214B
Notes 11
15
Recovery Action (coordinator)
•
•
If protocol database was made fault-tolerant by logging every change, simply reconstruct the
protocol database and restart 2PC from the point of failure.
However, since we have only logged the commit and completion transitions and nothing else:
– if the log does not contain a commit. Simply abort the transaction. If a cohort asks for
status in the future, its status is not in the protocol database and it will be considered as
aborted.
– If commit log record, but no completion log record,
• recreate transactions entry committed in the protocol database and the recovery
process will ask all the participants if they are still waiting for a commit message. If
no one is waiting, the completion entry will be written.
– If commit log record + completion log record
• do nothing.
ICS214B
Notes 11
16
2PC analysis
• Count number of messages, and log writes and number of forced log
writes
• Normal Processing overhead
– Coordinator: 2 log writes (commit/Abort, complete) 1 forced + 2
messages per cohort
– Cohort
• 2 log writes both forced (prepared, committed/aborted)
• 2 messages to coordinator
• Various Optimizations to reduce overheads!
ICS214B
Notes 11
17
Read-only Transaction
• A read-only participant need only respond to phase
one. It doesn’t care what the decision is.
• It responds Prepared-Read-Only to Request-to-Prepare,
to tell the coordinator not to send the decision
• Limitation - All other participants must be fully
terminated, since the read-only participant will
release locks after voting.
– No more testing of SQL integrity constraints
– No more evaluation of SQL triggers
2/15/99
18
Presumed Abort
• After a coordinator decides Abort and sends Abort to
participants, it forgets about T immediately.
• Participants don’t acknowledge Abort (with Done)
Coordinator
Log Start2PC
Participant
Request-to-Prepare
Prepared
Log abort
(forget T)
Log prepared
Commit
Log abort (forget T)
•
If a participant times out waiting for the decision, it asks the coordinator to retry.
– If the coordinator has no info for T, it replies Abort.
– Note: Lots of savings when transaction aborts!
2/15/99
19
Transfer of Coordination
If there is one participant, you can save a round of messages
1. Coordinator asks participant to prepare and become the
coordinator.
2. The participant (now coordinator) prepares, commits, and
tells the former coordinator to commit.
3. The coordinator commits and replies Done.
Coordinator
Participant
Log prepared
Request-to-Prepare-and
-transfer-coordination
Log committed
Log commit
Commit
Done
• Supported by Transarc Encina, but not in any standards.
2/15/99
20
Reducing Blocking of 2PC
• 2PC results in blocking when the cohort is in a
prepared state
• Blocked transactions hold onto resources
causing increased contention.
• What can we do to reduce blocking?
Heuristic Commit
• Suppose a participant recovers, but the termination
protocol leaves T blocked.
• Operator can guess whether to commit or abort
– Must detect wrong guesses when coordinator recovers
– Must run compensations for wrong guesses
• Heuristic commit
– If T is blocked, the local resource manager (actually,
transaction manager) guesses
– At coordinator recovery, the transaction managers
jointly detect wrong guesses.
2/15/99
22
Cooperative Termination Protocol (CTP)
• Assume coordinator includes a list of participants in
Request-to-Prepare.
• If a participant times-out waiting for the decision,
it runs the following protocol.
1. Participant P sends Decision-Req to other participants
2. If participant Q voted No or hasn’t voted or received Abort
from the coordinator, it responds Abort
3. If participant Q received Commit from the coordinator,
it responds Commit.
4. If participant Q is uncertain, it responds Uncertain
(or doesn’t respond at all).
• If all participants are uncertain, then P remains blocked.
2/15/99
23
Cooperative Termination Issues
• Participants don’t know when to forget T,
since other participants may require CTP
– Solution 1 - After receiving Done from all participants,
coordinator sends End to all participants
– Solution 2 - After receiving a decision, a participant may
forget T any time.
• To ensure it can run CTP, a participant should
include the list of participants in the vote log
record.
2/15/99
24
Is there a non-blocking protocol?
Theorem:
If communications failure or total site failures (i.e., all sites
are down simultaneously) are possible, then every atomic
protocol may cause processes to become blocked.
Two exceptions:
if we ignore communication failures, it is possible to
design such a protocol (Skeen et. al. 83)
If we impose some restrictions on transactions (I.e., what
data they can read/write) such a protocol can also be
designed (Mehrotra et. al. 92)
ICS214B
Notes 11
25
Next…
• Three-phase commit (3PC)
– Nonblocking if reliable network (no
communications failure) and no total site
failures
– Handling communications failures
ICS214B
Notes 11
26
Why 2PC blocks?
• Since operational site on timeout in prepare
state does not know if the failed site(s) had
committed or aborted the transaction.
• Polling all operational sites does not work
since all the operational sites might be in
doubt.
ICS214B
Notes 11
27
Approach to Making ACP Non-blocking
•
•
•
For a given state S of a transaction T in the ACP, let the concurrency set of S be the set
of states that other sites could be in.
For example, in 2PC, the concurrency set of PREPARE state is {PREPARE, ABORT,
COMMIT}
To develop non-blocking protocol, one needs to ensure that:
– concurrency set of a transaction does not contain both a commit and an abort
– There exists no non-committable state whose concurrency set contains a commit. A state is
committable if occupancy of the state by any site implies everyone has voted to commit the
transaction.
•
Necessity of these conditions illustrated by considering a situation with only 1 site
operational. If either of the above violated, there will be blocking.
–
–
•
Let S be a state with concurrency set of both commit and abort. If last node is in state S, it cannot commit or
abort since we do not know the state of others.
Let S be a non-committable state whose concurrency set includes commit . If last node is in state S, it cannot
ABORT unilaterally (since S’s concurrency set contains commit). It cannot commit either, since not all sites
have voted to commit –presumably one may vote NO.
Sufficiency illustrated by designing a termination protocol that will terminate the
protocol correctly if the above assumptions hold.
ICS214B
Notes 11
28
Coordinator
Log start-3PC record
(participant list)
Participant
REQUEST-PREPARE
PREPARED
Log prepared record
(state W)
PRECOMMIT
ACK
Log commit record
(state C)
COMMIT
Log committed record
(state C)
ICS214B
Notes 11
29
Coordinator
Participant
REQUEST-PREPARE
1. Timeout: Abort
PREPARED
PRECOMMIT
2. Timeout: ignore
1. Timeout: abort
2. Timeout
Termination Protocol
ACK
COMMIT
3. Timeout
Termination Protocol
Note: Timeout failure means the corresponding cohort/coordinator failed
Notes 11
and not message failures which are assumed to not fail.
ICS214B
30
Three Phase Commit - Termination
Protocol
• Choose a backup coordinator from the remaining operational
sites.
• Backup coordinator sends messages to other operational
sites to make transition to its local state (or to find out that
such a transition is not feasible) and waits for response.
• Based on response as well as its local state, it continues to
commit or abort the transaction.
• It commits, if its concurrency set includes a commit state.
Else, it aborts.
ICS214B
Notes 11
31
Termination Protocol
Start
3PC
Decision
reached
Coordinator
fails
• Only operational processes participate in termination
• Recovered processes wait until decision is reached and
then learn decision
ICS214B
Notes 11
All sites
learn decision
protocol.
32
Coordinator
Participant
REQUEST-PREPARE
Abortable (A)
PREPARED
PRECOMMIT
Uncertain (U)
ACK
Precommitted (PC)
COMMIT
Committed (C)
ICS214B
Notes 11
33
Termination Protocol
• Elect new coordinator
– Use Election Protocol
• New coordinator sends STATE-REQUEST to
participants
• Makes decision using termination rules
• Communicates to participants
ICS214B
Notes 11
34
STATE-REQUEST*
ICS214B
ABORT*
Participant
Coordinator
ABORTABLE
Notes 11
35
STATE-REQUEST*
ICS214B
COMMIT*
Participant
Coordinator
COMMITTED
Notes 11
36
STATE-REQUEST*
ICS214B
ABORT*
Participant
Coordinator
UNCERTAIN*
Notes 11
37
STATE-REQUEST*
Participant
Coordinator
PRECOMMITTED,
NO COMMITTED
PRECOMMIT*
ACK*
COMMIT*
ICS214B
Notes 11
38
Termination Protocol
Sample scenario:
Coord
P1
W
P2
W
P3
W
ICS214B
Notes 11
39
Termination Protocol
Sample scenario:
Coord
P1
W
P2
W
P3
PC
ICS214B
Notes 11
40
Note: 3PC unsafe with communication failures!
W
P
W
P
W
abort
ICS214B
commit
Notes 11
41
Replication
• Replicate Data at multiple sites in a distributed
system
• Why?
– Better Availability
– Better response time
– Better throughput
• Key Issues
– How to ensure consistency
– How to propagate updates
Basic Technique
• Treat replicated data just as ordinary data. Ensure 1
copy serializability using 2PL, 2PC combination
• Locking Protocol
– Reads lock all copies, writes lock all copies
– Reads all one copy, writes lock all copies
– Quorum based protocols – assign weights to each copy,
ensure read and write quorums conflict
• Majority voting
– Primary Copy
• Propogation
– Eager – at time of operation
– Deferred – at commit time
Weaker Consistency Models
• Single Master Replication
– Update transactions execute at Master copy
– Logs shipped from master to backups and used to recreate
the state of the database
– Read-only transactions could execute over “older”
transaction consistent state of the data.
– If primary fails, backups could take over transaction
processing.
• Multimaster Replication
– Update anywhere, read anywhere
– Could result in inconsistent state of the database.
– Reconciliation in case of inconsistency.

Read-only Transaction

Transcript Read-only Transaction

Directory