slides

Transcript slides

Advanced
Database Topics
Copyright © Ellis Cohen 2002-2005
Concurrency Control
Features & Mechanisms
These slides are licensed under a Creative Commons
Attribution-NonCommercial-ShareAlike 2.5 License.
For more information on how you may use them,
please see http://www.openlineconsult.com/db
1
Topics
Concurrency Control Features &
Mechanisms
Timestamp-Based
Concurrency Control
Client Caches
Optimistic Concurrency Control with a
Timestamped Cache
Transaction-Based Validation
Optimistic Concurrency Control with a
Simple/Snapshot Cache
Snapshot-Based
Concurrency Control
Consistency Failure &
The Constraint Violation Problem
Copyright © Ellis Cohen, 2002-2005
2
Concurrency Control
Features & Mechanisms
Copyright © Ellis Cohen, 2002-2005
3
Characterizing Concurrency
Features
Optimistic vs Pessimistic
Locking vs Non-Locking
Versioned vs Non-Versioned
Cache vs Non-Cache
Copyright © Ellis Cohen, 2002-2005
4
Optimistic & Pessimistic Concurrency
Optimistic Concurrency Mechanisms
Allows transaction to proceed to
completion.
When it attempts to commit, abort it if it
does not serialize with transactions
which already completed.
Pessimistic Concurrency Mechanisms
Each operation in a transaction is only
allowed to proceed if it serializes with
overlapping transactions.
Otherwise it must wait (or possibly
cause its own or another transaction to
abort)
Lock-based concurrency is pessimistic
Copyright © Ellis Cohen, 2002-2005
5
Lock-Based vs Non-Locking
Lock-Based Approaches
2PL with Deadlock Detection (in particular)
avoids unnecessary aborts and need for abort
is determined as early as possible
Locking has significant constant overhead even
when there are few conflicts
Good approach when there is significant conflict
(but beware of the convoy phenomenon)
Non-Locking Approaches
Avoid overhead of locking
Susceptible to delayed or increased rate of aborts
Good approach when not too much conflict
Useful approach for distributed systems
Copyright © Ellis Cohen, 2002-2005
6
Non-Locking Approaches
Validation-Based
Ensures serializability by pretending entire
transaction occurs at the time it commits
Optimistic: Allow entire transaction to execute,
but at commit time, validate it:
check that the data it used is not out-of-date
Timestamp-Based
Ensures serializability by pretending entire
transaction occurs at the time it starts (or an
arbitrary time!)
Assign transaction timestamp at that time
Pessimistic: Check every data access for
timestamp conflict
(Partially or Completely Optimistic if some or all
timestamp conflict checks are deferred to commit
time)
Copyright © Ellis Cohen, 2002-2005
7
Issues for Non-Locking Protocols
Recoverability
– Occurs when a transaction T2 reads data written
by T1, and T2 commits before T1 completes.
– Major dangers:
• Durability Failure
• Non-recoverable actions taken at commit time e.g. reports to user, launch nuclear missile
Avoid Cascading Aborts
– Occurs when a transaction T2 reads data written
by T1, and then T1 aborts
Phantoms
– Without index or predicate locks, how are
conflicts involving phantoms detected?
Copyright © Ellis Cohen, 2002-2005
8
Database Versioning
Single Version
A single version of the database
state is used by all transactions
Multi-Version
The database state retains multiple
versions of each piece of data.
Each version is timestamped with
the time a new version of the
data item was written to the
database state.
How might multi-versioning be implemented using a log?
Copyright © Ellis Cohen, 2002-2005
9
Implementing Multi-Versioning
Every modification log entry in a physiological log contains
a rowid and a before and/or after image.
Adding a timestamp to every such modification entry gives
us multi-versioning.
To find the value of a tuple at time T:
• Using an UNDO log
– Find that place in the log
– Look forwards until you find an entry for the tuple, and use
its before image.
– If you don't find such a tuple in the log, use its current
state
• Using a REDO log
– Find that place in the log
– Look backwards until you find an entry for the tuple within
a committed tranaction, and use its after image.
– If you don't find such a tuple in the log, use the value in
the last complete DB backup (presuming the log begins
after the last backup)
Copyright © Ellis Cohen, 2002-2005
10
Concurrency Control Mechanisms
Lock-Based
Approach: Lock-Based (Pessimistic)
DB Versioning: Single Version
Timestamp-Based Concurrency Control
Approach: Timestamp-Based
(generally Pessimistic)
DB Versioning: Single or Multi-Version
Optimistic Concurrency Control
Approach: Validation-Based (Optimistic)
DB Versioning: Single Version
Read-Consistent (not entirely serializable)
Approach:
Validation-Based (Optimistic) or
Lock-Based (Pessimistic, X locks ONLY!)
DB Versioning: Multi-Version
Copyright © Ellis Cohen, 2002-2005
11
Timestamp-Based
Concurrency Control
Copyright © Ellis Cohen, 2002-2005
12
Timestamp-Based Concurrency
Approach: Timestamp-Based
Use timestamps to serialize transactions
Pretends entire transaction occurs at the
time it starts
Assign transaction timestamp at that time
Pessimistic: Check every data access for
timestamp conflict (assuming no cache)
Versioning:
Single Version
A single version of the database state is used by
all transactions
Multi-Version
The database state retains multiple versions of
each piece of data.
Copyright © Ellis Cohen, 2002-2005
13
Timestamp-Based Concurrency
(single version)
When a transaction T starts, assign a
(monotonically increasing) timestamp Tt to it
Associate each data item D with
– a read timestamp Dr
holds the timestamp of the latest reader
– a write timestamp Dw
holds the timestamp of the latest writer
Aborts some transaction (wound/die) if
– One transaction tries to perform an operation (e.g.
tries to read some data)
– Conflicting operation (e.g. writing the same data)
already done by a transaction that started later
Aborted transactions are restarted
with a new timestamp
Copyright © Ellis Cohen, 2002-2005
14
Reading Too Late
Transaction
A
x
R
t1
Transaction
B
update XT
set x = 0
select x
from XT
t2
W
30
t0
t0
0
t0
t2
This needs to happen at
t1, so it needs to see x
as it was at t1, but that's
not available!
A is too old, abort it (or B)
restart it with a later
timestamp
Copyright © Ellis Cohen, 2002-2005
15
Writing Too Late
Transaction
A
x
R
t1
Transaction
B
select x
from XT
update XT
set x = 0
t2
W
30
t0
t0
30
t2
t0
This needs to happen at
t1, but a later transaction
read its value before it
was changed!
A is too old, abort it,
restart it with a later
timestamp
Copyright © Ellis Cohen, 2002-2005
16
Thomas Write Rule
Transaction
A
x
R
t1
Transaction
B
update XT
set x = 70
update XT
set x = 0
t2
W
30
t0
t0
70
t0
t2
This needs to happen at
t1, but a later transaction
already changed its value.
Executing this will
mistakenly overwrite it.
So just ignore this
operation. Don't do it!
(but what if B rolls back)?
Copyright © Ellis Cohen, 2002-2005
17
Timestamp-Based Concurrency Rules
When T tries to read D
– if Tt < Dw, abort T
(can't read something written by later transaction)
– else Dr  max(Tt ,Dr)
When T tries to write D
– if Tt < Dr, abort T
(can't write something read by later transaction)
– else if Tt < Dw, ignore [Thomas Write Rule]
(already overwritten. Note: for blind writes only;
most updates are a read+write)
– else Dw  Tt [shows it was written by T]
No locking: How do you
– Ensure Recoverability?
– Avoid Cascading Aborts?
– Prevent Phantom Read Problems?
How does it compare to 2PL?
Note that even reads require writes (of timestamps)
Copyright © Ellis Cohen, 2002-2005
18
Recoverability & Cascading Abort
Transaction
A
x
R
t1
Transaction
B
t2
W
30
t0
t0
0
update XT
set x = 0
t0
update XT
set x = x + 100
COMMIT
t1
100
t2
t2
Can't be allowed
Must wait until A completes
ROLLBACK
t2 reads data written by t1:
If t2 commits before t1 completes,
we may lose recoverability.
t2 must wait for t1 to commit
If t1 aborts, t2 will need to abort as well
Copyright © Ellis Cohen, 2002-2005
19
Timestamp-Based
Multiversion Concurrency Control
Like single-version timestamp-based, but
avoids aborting transactions
that read earlier versions of data
 essential for long transactions
Maintain versions for each data item
– When a transaction first writes a data item, a
new version of the data item is created labeled
with that transaction's timestamp
– Subsequent writes by same transaction rewrite
that version
– A transaction simply finds the appropriate
version to read. Can't read out of order.
– Read only transactions are never aborted.
– Read-only transactions can timewarp -- use an
arbitrary timestamp in the past to obtain
information about the system at that time
Copyright © Ellis Cohen, 2002-2005
20
Reading Correct Version
Transaction
A
x
R
t1
Transaction
B
update XT
set x = 0
select x
from XT
t2
W
30
t0
t0
0
t0
t2
This needs to happen at
t1, so it finds correct
version to read!
Undo log
retains old version
Copyright © Ellis Cohen, 2002-2005
21
Writing Conflicting Version
Transaction A
x
t1
R
Transaction B t2
30
Transaction C
update XT
set x = 0
select x
from XT
update XT
set x = 70
W
t3
t0
t0
0
t0
t3
30
t2
t0
We could create a t1 version of x.
However, B (at t2) already read
the t0 version of x
So we must either abort A or B,
restart with a later timestamp
Copyright © Ellis Cohen, 2002-2005
22
Timestamp-Based
Multiversion Concurrency Rules
Find V
The version of D
with the largest write timestamp <= Tt
When T tries to read D
- read V
- set Vr  max( Tt , Vr )
When T tries to writes D
- if Tt < Vr, abort T
(can't write something that should have been
read, but wasn't)
- else if Tt = Vw, overwrite it
- else create a new version V' with V'w = Tt
What about recoverability,
avoiding cascading aborts & phantoms?
Copyright © Ellis Cohen, 2002-2005
23
Client Caches
Copyright © Ellis Cohen, 2002-2005
24
Cache vs Non-Cache Mechanisms
Non-Cache-Based Mechanisms
All data reads/updates are made directly
from/to the server database state
Cache-Based Mechanisms
A separate cache is maintained by/for
each client
All data updates are made to the cache,
andd only written to the server
database state on commit (often
checking first whether this is allowed)
Data needed for a query may be able to
be read directly from the cache.
What are the advantages of cache-based mechanisms?
Copyright © Ellis Cohen, 2002-2005
25
Advantages of Caching
Efficient reads: If needed data is in the
cache, the cost of a DB query may be able to
be avoided
Deferred updates: Updates are made to the
cache, and deferred to commit time, where
they can be done all at once (or not at all if
the transaction aborts)
Simplifies Recovery Management: Because
changes are not made to the DB state until
commit time, they don't need to be undone.
Avoid Pessimistic Overhead: While caches
are still useful with pessimistic approaches
(locking & timestamp-based concurrency),
caches support optimistic approaches where
serializability checking is deferred and done
once at commit time.
Copyright © Ellis Cohen, 2002-2005
26
Client Cache Placement
Server-Managed
Client Cache
Client A
DB Server
Client B
The DB server maintains
a cache on behalf
of each client.
Allows its implementation
to be integrated with the
server page cache & log.
Most useful for RDBs
Most useful for OODBs & Distributed Systems
Client-Side Cache
Client A
DB Server
Client B
Each client cache is
maintained at the client.
Can significantly improve
performance when the
client runs on a different
machine than the
database server
Copyright © Ellis Cohen, 2002-2005
27
Client Caches
• A separate cache is maintained by/for each client
• If a transaction needs to read a data item D
(for an RDB, data items correspond to tuples)
– If D is not in the cache, get it from the server DB
state, and put it in the cache
– If D is already in the cache, get it from the cache
[though if it is old or stale, and not yet used in the
current transaction, the latest value may be obtained
from the server DB state]
• All data updates are made to the cache. [This
requires integrating items returned from server queries
with items already in the cache]
• When the transaction commits, all data updated by
the transaction is written back from the client's
cache to the server database state.
Note: Don't confuse the server page cache
(used for performance and recovery) with
client caches (used primarily for isolating transactions)
Copyright © Ellis Cohen, 2002-2005
28
Client Cache Types for Concurrency
Simple Cache
The cache simply maintains the data
read/written by the transaction
Timestamped Cache
Each piece of data obtained from the
database server is associated with the
server's timestamp for that data (the
time that data was last updated)
Snapshot Cache
The client's cache holds a "virtual"
snapshot of the current database state
made when the transaction starts
Copyright © Ellis Cohen, 2002-2005
29
Cache Data Lifetimes
Transaction Lifetime
– The data in a client's cache is cleared
at the end of each transaction – i.e.
every transaction starts with an empty
cache
Session Lifetime
(Timestamped Caches Only)
– The data remains in the cache at the
end of a transaction. At the start of a
transaction, the cache may contain
data used in the client's previous
transactions.
• Does require a (relatively simple) local undo
mechanism if a transaction aborts
Copyright © Ellis Cohen, 2002-2005
30
Tuple Caching
For Relational Databases, the most natural
items to cache are tuples, along with the
server's ROWID for each.
When a client executes
SELECT * FROM Emps
WHERE deptno = 10
the tuples are returned from the server
along with the server's ROWID for each
one
The tuples (along with their ROWID's) are
retained in the cache.
Copyright © Ellis Cohen, 2002-2005
31
Partial Queries
Suppose the cache is empty, and
the very first client operation
in a transaction is
SELECT empno, sal FROM Emps
WHERE job = 'CLERK'
How should the client cache manager
process this query?
Consider what happens if the same
request is made later in the transaction.
Copyright © Ellis Cohen, 2002-2005
32
Partial Query Alternatives
1. Send the request to the server.
Cache nothing.
Reasonable
2. Send the request to the server.
Cache the results for reuse.
Possible, but would significantly complicate
concurrency and cache management 
it would require having field-level granularity,
not just tuple-level granularity.
3. Request & cache whole tuples from the server
SELECT * FROM Emps WHERE job = 'CLERK'
Then execute the original query locally against
the cached tuples
Makes most use of cache, cost of downloading
tuples worth it if they will be used in future queries
Copyright © Ellis Cohen, 2002-2005
33
Tuple Overlap Problem
Suppose a client executes
SELECT * FROM Emps
WHERE deptno = 10
Later in the same transaction, the client executes
SELECT * FROM Emps
WHERE job = 'CLERK'
If there are clerks in department 10,
the tuples will overlap.
In general, if a query includes a tuple which
is already in the cache, what should be done?
Copyright © Ellis Cohen, 2002-2005
34
Tuple Query & Replacement
The query below is shipped to the server
SELECT * FROM Emps
WHERE job = 'CLERK'
It returns a set of tuples, each with its rowid
• If the rowid is not in the cache
– add the tuple (and its rowid) to the cache, and
include it in the query result set
• If the rowid is in the cache, and its tuple has been
used in the current transaction
– throw away the tuple returned from the server;
add the tuple in the cache to the query result set
• If the rowid is in the cache, but its tuple has not yet
been used in the current transaction
– replace the tuple in the cache; put the newly
returned tuple in the query result set
Are there ways to make this more efficient?
Copyright © Ellis Cohen, 2002-2005
35
Tuple Filtering
Server Filtering
– The server can itself keep track of the rowids of
tuples it has sent the client during a transaction
– For tuples (that match the query) that it has
already sent the client for this transaction, it just
sends rowid (it sends the rowids + contents for
the other tuples)
Client Filtering (for Session Lifetime Caches)
– The client cache manager may be able to identify
(via a local query) tuples that were in its cache
prior to the current transaction which match the
query.
– It can send the rowids and timestamps of those
tuples to the server, and the server will only send
those tuples back if they have been updated more
recently
Copyright © Ellis Cohen, 2002-2005
36
Predicate Caching
The client cache manager keeps track of the fact
that the results of
SELECT * FROM Emps WHERE deptno = 10
are already in the cache , and specifically saved as
Cache-Emps-deptno-10.
So it just sends the server the query
SELECT * FROM Emps WHERE job = 'CLERK' AND
(deptno != 10 OR deptno IS NULL)
It also locally executes the query
SELECT * FROM Cache-Emps-deptno-10
WHERE job = 'CLERK'
and builds the result set for the original query by
union-ing these two result sets
Copyright © Ellis Cohen, 2002-2005
37
Querying Updated Data
Suppose a client executes
UPDATE Emps SET sal = sal + 100
WHERE deptno = 10
How is the update implemented?
Later in the same transaction, the client executes
SELECT * FROM Emps
WHERE job = 'CLERK'
If there are clerks in department 10,
the salary retrieved for those clerks had better be
the new salary.
What are the implication for query processing?
Copyright © Ellis Cohen, 2002-2005
38
Updates & Queries
The client cache manager first processes the query
SELECT * FROM Emps
WHERE deptno = 10
and places any new tuples retrieved (i.e. ones not
already used in the current transaction) in the cache.
Then, the UPDATE command is applied to the local
tuples in the cache.
The problem of querying updated data should make it
clear why the client cache manager MUST make sure
that result sets do NOT include tuples used (and
specifically modified) by the current transaction.
Copyright © Ellis Cohen, 2002-2005
39
Processing Complex Queries
Processing more complex queries, even simple
aggregate queries, can be challenging when a cache
is used. Consider
SELECT deptno, sum(sal)
FROM Emps WHERE job != 'CLERK'
GROUP BY deptno
if some employee's salaries have already been
updated in the transaction.
The simplest approach is for the client cache
manager to first process
SELECT * FROM Emps WHERE job != 'CLERK'
and place any new tuples retrieved (i.e. ones not
already used in the current transaction) in the
cache.
Then, the more complex query is applied to the local
tuples in the cache. Predicate caching allows more
complex and clever solutions.
Copyright © Ellis Cohen, 2002-2005
40
Concurrency Control Mechanisms & Caching
Lock-Based
Approach: Lock-Based (Pessimistic)
DB Versioning: Single Version
Caching: None
(Simple Client-side caches can increase performance for
distributed clients; all locking is still done at the server)
Timestamp-Based
Approach: Timestamp-Based (Pessimistic)
DB Versioning: Single or Multi-Version
Caching: None
(Simple Cache can optionally be used;
approach then becomes completely or partly Optimistic)
Optimistic Concurrency Control
Approach: Validation-Based (Optimistic)
DB Versioning: Single Version
Caching: Simple, Timestamp
(Can use Snapshot Cache, but same effect as Simple Cache)
Read-Consistent (not entirely serializable)
Approach: Validation-Based (Optimistic) or
Lock-Based (Pessimistic, X locks ONLY!)
DB Versioning: Multi-Version
Caching: Snapshot
Copyright © Ellis Cohen, 2002-2005
41
Caching with
Timestamp-Based Concurrency
A simple client cache can be used with
timestamp-based concurrency
– Initial read of a data item in a transaction must
contact DB server to set that data item's read
timestamp (and get current version of cache
value if stale)
– All updates are made to the cache
– At commit time, attempt is made to write all
updated data from cache to server database
state, with the client's timestamp (of course)
Characteristics
– Some transactions that completed without
cache will abort with cache (due to a read by a
later transaction between write & commit)
– Commits no longer need to wait for completion
of transactions whose writes they read (since
no writes are done until commit)
– Fully optimistic if DB is multi-versioned, else
partially pessimistic, since reads can fail.
Copyright © Ellis Cohen, 2002-2005
42
Limitations of
Timestamp-Based Concurrency
• Transactions that try to write a data item may
need to abort because of conflicts with
– other incomplete concurrent transactions
(that might abort)
– even other transactions that have already aborted
that already have read that data item
• When a cache is used, and a data item is in the
cache, the initial read of a data item in a
transaction must still contact the DB server
– to get the latest version, if necessary, and
– to set that data item's read timestamp (adding
significant overhead)
Optimistic Concurrency Control
overcomes these limitations
Copyright © Ellis Cohen, 2002-2005
43
Optimistic
Concurrency Control
with a
Timestamped Cache
Copyright © Ellis Cohen, 2002-2005
44
Optimistic Concurrency Control
• Assumes (optimistically) that a
transaction will not have conflicts with
other transactions. Does not set locks;
avoids their overhead.
• Cache-Based: Reads all data from and
writes all data to its client cache.
• Validation-Based: When the transaction
commits, writes all changes back to the
DB server, but only after validating that
the data it used during the transaction
is still up-to-date.
This effectively allows the system to act
as if the entire transaction happened at
the time of the commit.
WHY?
Copyright © Ellis Cohen, 2002-2005
45
Optimistic Concurrency Control
is Cache-Based
• A separate cache is maintained by/for each client
• If a transaction needs to read a data item D
– If D is not in the cache, get it from the server
database state, and put it in the cache. Note: it is
impossible to read another's transaction's uncommitted
modifications!
– If D is already in the cache, get it from the cache
(though if it is old or stale, and not yet used in this
transaction, the latest value may optionally be
obtained from the server database state)
• All WRITEs and UPDATEs are made to the cache
• When the transaction commits, all data written or
updated by the transaction is written back from the
client's cache to the server database state
• Often, the cache is cleared at the end of each
transaction
Copyright © Ellis Cohen, 2002-2005
46
Optimistic Concurrency Control
is Validation-Based
– When a transaction requests a commit,
the transaction must first be validated
– Optimistic Concurrency uses R/W
Validation:
• (R) Suppose T reads some data D
• (W) Suppose another transaction commits and
writes a later version of that same data, D'
• Then T commits. OOPS! If we are pretending
that the entire transaction happened at commit
time, then T should have read D', not D
– If R/W Validation finds any such R/W
consistencies, then the validating
transaction aborts instead of committing!
Does this ensure Recoverability, avoid Cascading
Aborts, and prevent Phantoms?
Copyright © Ellis Cohen, 2002-2005
47
Client Cache Types (Again)
Simple Cache
The cache simply maintains the data
read/written by the transaction
Timestamped Cache
Each piece of data obtained from the
database server is associated with the
server's timestamp for that data (the
time that data was last updated)
Snapshot Cache
The client's cache holds a "virtual"
snapshot of the current database state
made when the transaction starts
Copyright © Ellis Cohen, 2002-2005
48
Optimistic Concurrency Control
with a Timestamped Cache
Uses single version database state
But associated with each piece of data, is the time
it was last committed (its server timestamp)
When a reads data from the server
Get the data from the database state. Also,
get (and remember) the data's server timestamp
R/W Validation
– For each piece of data read by the validating
transaction, compare its cache time to the data's
server timestamp
– Fail validation (and abort the transaction) if any
data's server timestamp is after the cache's
timestamp for the data
– (More complexity added if validation + commit
is not atomic)
Copyright © Ellis Cohen, 2002-2005
49
Unrelated Changes
t0
Transaction
Transaction
A
B
update XT
set x = x + 100
select y
from YT
t1
t2
COMMIT
t3
A committed while B was
running, but it did not
change anything B read
Try to COMMIT:
Validation succeeds!
y has not changed!
Copyright © Ellis Cohen, 2002-2005
50
Read After Commit
Transaction
A
Ax
x
30
t0
update XT
set x = x + 100
t1
130
30
t1
COMMIT
130
130
t2
t3
Bx
130
130
130
130
Transaction
B
select x
from XT
Try to COMMIT:
Validation succeeds!
x has not changed!
x is changed by A, which overlaps with B
but B read x after A committed.
However, no transaction that updated x commits
between the time B reads it and the time B commits
Copyright © Ellis Cohen, 2002-2005
51
Validation Causes Abort
Transaction
A
t1
update XT
set x = x + 100
t2
t3
COMMIT
Ax
x
30
130
30
Bx
130
30
t0
30
130
t3
130
t0
30
t4
B's Validation:
Transaction
B
select x
from XT
Reads a
value that
doesn't
reflect A's
write, even
though A
commits
before B
Try to COMMIT,
but validation fails!
B read the t0 version of x,
and then tries to commit at t4
x was last persistently updated (by A)
at time t3 => OOPS!
Copyright © Ellis Cohen, 2002-2005
52
Why Should Read-Only Transactions Fail?
Transaction
A
Ax/Ay
x/y
30/30
Bx/By
t0
update XT
set x = x + 100
130/
30/30
t1
update YT
set y = y + 100
130/130
30/30
30/
130/130
130/130
30/
t2
t3
COMMIT
130/130
30/130
If B's commit succeeds,
then the parent application might tell the user
inconsistent values for x & y.
This violates View Serializability!
Copyright © Ellis Cohen, 2002-2005
Transaction
B
select x
from XT
select y
from YT
COMMIT?
NO!
53
Lost Update Problem
Suppose transactions A & B simultaneously
deposit $1000 into a checking account,
whose initial balance is $5000
1) select balance into curbal from checking
where acctid = 30792;
2) update checking
set balance = curbal + 1000
where acctid = 30792;
3) COMMIT
Optimistic Concurrency Prevents Lost Updates!
Copyright © Ellis Cohen, 2002-2005
54
Preventing Lost Updates
A
A
curbal/
balance
balance
curbal 
balance
2300/
2300
2300/
2300
2300/
2300/
3300
2300
2300/
2300/
3300
2300/
3300
2300
balance 
curbal + 1000
3300
2300/
3300
2300/
3300
3300
2300/
3300
Try to COMMIT:
Validation fails!
Transaction
t0
t1
balance 
t2 curbal + 1000
t3
t4
t5
COMMIT
2300
B
curbal/
balance
Transaction
B
curbal 
balance
Why does B's commit fail?
Copyright © Ellis Cohen, 2002-2005
55
Lost Updates Fail R/W Validation!
A
t1
t4
t5
COMMIT
balance
2300/
2300
2300/
3300
3300
3300
B
2300/
2300/
3300
2300/
3300
curbal 
balance
COMMIT?
At t1, B reads balance (and timestamps it at t1)
When B tries to commit at t5, validation discovers
that balance now has a later timestamp (t4).
In other words, this failure is due entirely to failing
R/W validation (B reads balance, A then commits,
changing balance before B commits). The fact that
B is also trying to write balance is IRRELEVANT!
In locking, this could be prevented entirely by
R/W conflicts in the grant matrix.
When do lock-based systems need to disallow W/W?
Copyright © Ellis Cohen, 2002-2005
56
The Inconsistent Update Problem
Transaction A
1) UPDATE XT SET x = 10
2) UPDATE YT SET y = 10
Transaction B
1) UPDATE XT SET x = 80
2) UPDATE YT SET y = 80
Serial Schedule
A1 A2 B1 B2
 x:80 y:80
Serial Schedule
B1 B2 A1 A2
 x:10 y:10
NON-SERIALIZABLE
Schedule
A1 B1 B2 A2
 x:80 y:10
The non-serializable schedule is caused
by a real W/W conflict.
How does Optimistic Concurrency Control
prevent it?
Copyright © Ellis Cohen, 2002-2005
57
Preventing Inconsistent Updates
A
A
x/y
x  10
10/
30/30
t1
10/
30/30
80/
x  80
t2
10/
30/30
80/80
y  80
Transaction
t0
x/y
B
x/y
Transaction
30/30
t3
y  10
10/10
30/30
80/80
t4
COMMIT
10/10
10/10
80/80
80/80
80/80
t5
B
COMMIT
Notice: Neither A or B read anything, so both validate
successfully. Inconsistent Updates are prevented so long as
multiple COMMITs (i.e. Validates + Writes) do not overlap
Copyright © Ellis Cohen, 2002-2005
58
Phantoms
Transaction
Transaction
B
A
SELECT * FROM T
WHERE name = 'JONES'
AND val = 111
UPDATE T SET name = 'JONES'
WHERE name = 'SMITH'
AND val = 111
COMMIT
If Tuple Caching is used,
what's the result of this query?
How can validation failures be
ensured when (potential)
phantom read violations occur?
SELECT * FROM T
WHERE name = 'JONES'
AND val = 111
COMMIT?
Copyright © Ellis Cohen, 2002-2005
59
Validation & Cache Placement
Server-Managed
Client Cache
Client A
DB Server
When the client tells the
server to COMMIT, the server
validates based upon both the
server data and the data in the
server-managed client cache
Client B
Client-Side Cache
Client A
DB Server
Client B
When the client tells the server
to COMMIT, it must pass along
the information needed for
validation & commit:
• the timestamps of all data read
in the transaction
• the contents of all data written
Actually, the timestamps of the data read during
the transaction can be kept at the server
Copyright © Ellis Cohen, 2002-2005
60
Cache Data Lifetime
Some optimistic concurrency
algorithms automatically clear
the client cache at the end of
every transaction.
1) Explain how the decision to clear/not clear the cache
affects the correctness and performance of optimistic
concurrency control.
2) If the cache is not cleared, when should the cache
consider obtaining a newer version of a data item from the
server database state. Why?
Copyright © Ellis Cohen, 2002-2005
61
Transaction-Based
Validation
Copyright © Ellis Cohen, 2002-2005
62
Transaction-Based Validation
If all transactions starts out with an empty
client cache
– Then a transaction can only have R/W validation
conflicts with data written by overlapping
transactions which have already completed
Rather than having the DB state retain last
timestamp of each piece of data written
– Each committed transaction keeps track of data
items it committed
R/W Validation:
– Compare read set of validating transaction with
write set of each completed overlapping
transaction
– If a completed transaction's write set has a piece of
data in common with the validating transaction's
read set, see if the commit time of the completed
transaction is later than the time the validating
transaction read the data. If so, abort.
Copyright © Ellis Cohen, 2002-2005
63
Using Transaction-Based Validation
t1
Transaction
A
update XT
set x = x + 100
t2
t3
COMMIT
Ax
t1
130
x
30
Bx
B
30
130
30
t1
30
130
130
30
B's Validation:
Transaction
select x
from XT
Reads a
value that
doesn't
reflect A's
write, even
though A
commits
before B
ABORT
Overlapping committed transactions: { A }
A's commit time: t2
A's writes: { x }
B's read time of each: { x:t1 }
t1 < t2 => B aborts
Copyright © Ellis Cohen, 2002-2005
64
Forward Validation
t1
Transaction
A
update XT
set x = x + 100
Ax
130
x
30
t3
COMMIT
130
Transaction
B
30
30
t2
Bx
30
select x
from XT
130
ABORT
A's Validation:
Transactions in Process: { B }
A's write set: { x }
B's read set: { x }
Overlap => A wounds B, or A dies
Copyright © Ellis Cohen, 2002-2005
Because of
the R/W
conflict with
A, B will
eventually fail
its own
validation and
abort, so just
kill B as part
of A's commit
How does this
change the
Lost Update
Problem?
65
Concurrent Commits & Inconsistent Updates
Ax/Ay
Transaction
update YT
set y = 10
COMMIT
Validate/
Write
Inconsistent
Updates are
prevented so long
as we do not allow
COMMITs to
overlap
Bx/By
Transaction
30/
30
A
update XT
set x = 10
x/y
B
10/
10
30/
30
80/
80
10/
10
30/
30
80/
80
10/
10
10/
80
80/
80
10/
10
80/
10
80/
80
update XT update YT
set x = 80 set y = 80
COMMIT
Validate/
Write
But if we allow commits to run
concurrently without using any kind of
locking, how can we prevent
inconsistent updates?
Copyright © Ellis Cohen, 2002-2005
66
Concurrent Commit and W/W Validation
Ax/Ay
Transaction
update YT
set y = 10
COMMIT
Validate/
Write
Because B starts
its commit after A
started to commit
B does
R/W Validation
against A
Bx/By
Transaction
30/
30
A
update XT
set x = 10
x/y
B
10/
10
30/
30
80/
80
10/
10
30/
30
80/
80
10/
10
10/
30
10/
10
10/
10
Because B starts its
commit before A
completes its COMMIT
B's write set and A's write
set must NOT overlap!
update XT update YT
set x = 80 set y = 80
COMMIT
Validation
FAILS!
Inconsistent
Updates are
prevented by W/W
validation
against overlapping
commits
which already
started
Copyright © Ellis Cohen, 2002-2005
67
Optimistic Concurrency
Control with a
Simple/Snapshot
Cache
Copyright © Ellis Cohen, 2002-2005
68
Optimistic Concurrency Control
with a Simple Cache
Like using a timestamped cache
But don't remember the time data was gotten
Just assume all data was read when the transaction
started
Simpler concurrency control, but leads to
unnecessary aborts
R/W Validation:
– Compare read set of validating transaction with
write set of each completed overlapping
transaction
– If a completed transaction's write set has a piece
of data in common with the validating
transaction's read set, abort (assume that the
validating transaction read the data BEFORE the
other transaction committed)
– Still do forward validation aborts at commit
and W/W Validation if commits overlap
Copyright © Ellis Cohen, 2002-2005
69
Simple Cache Validation
Causes Unnecessary Abort
t1
t2
Transaction
A
Ax
x
30
update XT
set x = x + 100
130
30
COMMIT
130
130
130
t3
Bx
130
Transaction
B
select x
from XT
ABORT
B's Validation:
Overlapping committed transactions: { A }
A's write set: { x }
B's read set: { x }
Overlap => B aborts
Copyright © Ellis Cohen, 2002-2005
Even though
this was read
AFTER A
committed,
the cache
doesn’t retain
that
information,
so validation
assumes it
was read
BEFORE!
This would
succeed using
a timestamp
cache!
70
Abort on Post-Commit Read
Suppose
– Some transaction T reads a piece of data
D that is not in its simple cache, and
– D was written by an overlapping
committed transaction
Then when T tries to validate
– T will abort (validation will assume that T
read D BEFORE the other transaction
committed it)
So
– Can just abort T immediately!
Copyright © Ellis Cohen, 2002-2005
71
Abort on Post-Commit Read Example
t1
t2
Transaction
Ax
x
30
update XT
set x = x + 100
130
30
COMMIT
130
130
A
130
Bx
130
Transaction
B
select x
from XT
ABORTs
Overlapping committed transactions: { A }
A's write set: { x }
B is trying to read x, which is in the write
set of an overlapping committed
transaction, so abort it!
Copyright © Ellis Cohen, 2002-2005
This would
succeed using
a timestamped
cache!
72
Optimistic Concurrency Control
with a Snapshot Cache
Snapshot Cache
The transaction's cache holds a "virtual" snapshot of
the current database state made when the
transaction started
R/W Validation
– Compare read set of validating transaction with
write set of each completed overlapping
transaction
– If a completed transaction's write set has a piece
of data in common with the validating transaction's
read set, abort (because the validating transaction
snapshot data was made BEFORE the other
transaction completed)
– Also do forward validation aborts at commit, and
W/W validation if commits overlap
– Validation ensures that current value of data read
is the same as its initial value, so the effect is
identical to use of the Simple Cache
Copyright © Ellis Cohen, 2002-2005
73
Using Snapshot Cache
t1
t2
Transaction
Ax
x
30
update XT
set x = x + 100
130
30
COMMIT
130
130
A
130
Bx
30
Transaction
B
select x
from XT
Reads the
snapshot
value, instead
of the current
value
ABORT
Result is identical to use of a simple cache.
In both cases, a transaction aborts if it tries to read a
value written by an overlapping committed transaction
Copyright © Ellis Cohen, 2002-2005
74
Snapshot-Based
Concurrency Control
Copyright © Ellis Cohen, 2002-2005
75
Snapshot-Based
Concurrency Controls
A snapshot cache holds a "virtual" snapshot
of the current database state made when
the transaction started
Can be used for two different kinds of
concurrency control mechanisms
• Optimistic
– Validation-Based (with R/W Validation)
Discussed above.
• Read-Consistent
– Validation-Based (with W/W Validation)
– Lock-Based (with eXclusive locks ONLY!)
Copyright © Ellis Cohen, 2002-2005
76
Snapshot-Based Features
• Transactions see a "consistent"
view of the database (i.e. as it
was when the transaction started)
• Avoid Dirty Reads,
Non-Repeatable Reads
& Phantom Reads
What if we use snapshots without validating on commit?
Copyright © Ellis Cohen, 2002-2005
77
Snapshots without Validation
Transaction
A
Ax
x
30
update XT
set x = x + 100
130
30
COMMIT
130
130
130
Bx
130
130
Transaction
B
update XT
set x = x + 100
COMMIT
If Read-Write transactions use snapshots,
but don't validate, they are prone to Lost Updates
Copyright © Ellis Cohen, 2002-2005
78
Snapshots & Lost Updates
• All Snapshot-Based concurrency
mechanisms avoid dirty reads,
non-repeatable reads &
phantom reads (i.e. avoids most
R/W conflicts)
• Read Consistent concurrency
mechanisms also avoid lost &
inconsistent updates by
ensuring there are no W/W
conflicts
Copyright © Ellis Cohen, 2002-2005
79
Avoiding Lost Updates
Detect W/W Conflicts
– Approach: Validation-Based (W/W)
– Look at overlapping transactions that
already completed.
– Fail validation & abort if any of them
wrote same data that this transaction
wrote
– Neither reads nor writes wait
Prevent W/W Conflicts
– Approach: Lock-based (X only)
– Use X locks to prevent concurrent
writes of the same data
– Writes only wait for other writes
– Blocked writers ABORT if lock holder
commits
Copyright © Ellis Cohen, 2002-2005
80
Oracle's Lock-Based
Read-Consistent Concurrency Control
Each client uses a snapshot cache
– holds a "virtual" snapshot of the current
database state made when the
transaction started
Transactions obtain X locks before
writing data.
– S locks are not obtained by default, so
– Reads never wait & never block writes
If a transaction must wait on a lock,
– If the transaction holding the lock aborts,
continue the waiting transaction
– Else if the transaction holding the lock
commits, abort the waiting transaction
Copyright © Ellis Cohen, 2002-2005
81
X Locks Can't Just Wait!
Transaction
A
update XT
set x = x + 100
COMMIT
Ax
130
130
x
30
Bx
B
update XT
set x = x + 100
30
130
130
130
Transaction
130
Blocks until A
commits.
Still sees 30
instead of 130!
COMMIT
Adding X Locks
does not prevent Lost Updates if locks just block
Lost Updates are only prevented
If lock conflicts only allow one of the conflicting
transactions to commit
Copyright © Ellis Cohen, 2002-2005
82
Snapshots & Serializability
• All Snapshot-Based
concurrency mechanisms avoid
dirty reads, non-repeatable
reads & phantom reads
• Read Consistent concurrency
mechanisms avoid lost &
inconsistent updates
Does that mean that Read Consistent
concurrency mechanisms
are serializable?
Copyright © Ellis Cohen, 2002-2005
83
Extended Isolation Levels
Read Uncommitted
Lost Updates are prevented,
but Dirty Reads are still possible
Read Committed
Dirty Reads are also prevented
but Non Repeatable Reads are still possible
Repeatable Read
Non Repeatable Reads are also prevented
Phantoms are still possible
Isolation level
Snapshot
of Read
Consistent
Phantoms are also prevented
transactions.
Constraint Violations are still possible
Oracle calls it
Serializable
serializable.
All problems are prevented
It's NOT!
Complete isolation
Copyright © Ellis Cohen, 2002-2005
84
Read-Only Snapshot Concurrency
Read-Only Transactions in systems
which are Snapshot-Based …
• can act as if they commit when the
transaction starts (when the
snapshot is taken)
• they can "timewarp"
– choose an earlier time to start (if the
multi-versioned state goes back to
that point)
– Allows system to run queries based on
state at an earlier time. Oracle calls
these "flashback queries"
Copyright © Ellis Cohen, 2002-2005
85
Consistency Failure &
The Constraint
Violation Problem
Copyright © Ellis Cohen, 2002-2005
86
Constraint Violation Problem
Suppose that
– C is a constraint (or assertion) about the
database
– Every transaction T, when run in isolation, has
the property that
C is true before T  C is true after T
If a DB uses a serializable concurrency
control
If C is true initially,
Then (looking just at the state of the committed
data), C will always remain true
This may be violated using
Snapshot Isolation (e.g. using Read
Consistent Concurrency)
Violates Consistency aspect of ACID Properties
Copyright © Ellis Cohen, 2002-2005
87
Constraint Violation Example
The code below allocates resource :id to :who (either dilip or chu).
dilipOwns holds the resources owned by dilip.
chuOwns holds the resources owned by chu.
The constraint requires is that a resources can only be owned by
one of them. With 2PL, this constraint is enforced,
even with concurrent execution of the code.
With Read Consistent Concurrency Control, that's not true.
Suppose T1 runs the code below with :who = 'dilip' &
concurrently T2 runs it with :who = 'chu' and the same value for :id
With 2PL, both
BEGIN
SELECT count(*) INTO dilipknt FROM dilipOwns would acquire
S locks on the
WHERE (id = :id);
id indices of
SELECT count(*) INTO chuknt FROM chuOwns
both tables
WHERE (id = :id);
IF (dilipknt + chuknt = 0) THEN
With 2PL, If both T1
and T2 get to this
IF (:who = 'dilip') THEN
point at the same
INSERT INTO dilipOwns VALUES( :id );
time, they will each
ELSIF (:who = 'chu') THEN
try to acquire X locks
INSERT INTO chuOwns VALUES( :id );
on the id indices of
END IF;
one of the tables.
With Read Consistent Concurrency
END IF;
They will deadlock
Control, both commit & the
and one will abort.
END;
constraint is violated
Copyright © Ellis Cohen, 2002-2005
88
Explicit Locking Must Be Used
dilipOwns( id )
chuOwns( id )
Constraint: an id may appear in
dilipOwns or chuOwns, but not both
Suppose T1 runs the code below with :who = 'dilip' &
concurrently T2 runs it with :who = 'chu' and the same value for :id
BEGIN
LOCK TABLE dilipOwns, chuOwns IN EXCLUSIVE MODE;
SELECT count(*) INTO dilipknt FROM dilipOwns
WHERE (id = :id);
SELECT count(*) INTO chuknt FROM chuOwns
WHERE (id = :id);
To ensure
IF (dilipknt + chuknt = 0) THEN
serializability
IF (:who = 'dilip') THEN
with Read
INSERT INTO dilipOwns VALUES( :id );
Consistent
ELSIF (:who = 'chu') THEN
Concurrency
INSERT INTO chuOwns VALUES( :id );
Control,
explicit
END IF;
locking must
END IF;
be used
END;
Copyright © Ellis Cohen, 2002-2005
89
A Real Constraint Violation Problem
Constraint: There are no two reservations for
the same room with overlapping times
Consider
rooms( roomnum, location, … )
reservations( roomnum,
stime, ftime, reserver, purpose )
Code
The code below
reserves :roomnum
from :stime to :ftime,
assuming there is no
conflicting
reservation
SELECT count(*) INTO overlaps FROM reservations
WHERE (roomnum = :roomnum)
AND (stime BETWEEN :stime AND :ftime
OR ftime BETWEEN :stime AND :ftime);
IF (overlaps = 0) THEN
INSERT INTO reservations VALUES (
:roomnum, :stime, :ftime, :reserver, :purpose );
END IF;
Suppose two reservations are made concurrently
for the same room, but for overlapping times. No
COMMIT;
problem with 2PL. With Read Consistent CC, both
reservations could be inserted.
Check this is true. How would you code around it?
Copyright © Ellis Cohen, 2002-2005
90
Use FOR UPDATE
Consider
rooms( roomnum, location, … )
reservations(
roomnum, stime, ftime, reserver, purpose )
Code
SELECT * FROM rooms
WHERE (roomnum = :roomnum) FOR UPDATE;
SELECT count(*) INTO overlaps FROM reservations
WHERE (roomnum = :roomnum)
AND (stime BETWEEN :stime AND :ftime
OR ftime BETWEEN :stime AND :ftime);
IF (overlaps = 0) THEN
INSERT INTO reservations VALUES (
:roomnum, :stime, :ftime, :reserver, :purpose );
END IF;
COMMIT;
Using FOR UPDATE forces
acquisition of X locks
Copyright © Ellis Cohen, 2002-2005
91
Constraint View Violations
Using Snapshot Isolation can
also produce Constraint View
Violations:
–This is a read anomaly that can
occur in a Read-Only transaction
using Snapshot Isolation
–It can report an inconsistency (a
constraint violation) that actually
does not (and moreover cannot)
occur with a serial schedule.
Copyright © Ellis Cohen, 2002-2005
92
Read-Committed Isolation
Using Snapshot Concurrency Control
Weakened form of Read-Consistency
Default Oracle Concurrency Mechanism
A transaction takes a new virtual snapshot at
the beginning of every query
– contains any data written by own transaction
– plus all data committed by other transactions
If this transaction waits for a write lock
– then if the transaction holding the lock aborts,
continue the transaction
– else if the transaction holding the lock commits,
re-execute the query
[don't abort the transaction!]
Allows Non-Repeatable Reads & Phantoms
Weaker than SQL standard Read-Committed
Copyright © Ellis Cohen, 2002-2005
93
Read Committed Problem
Suppose transactions A & B simultaneously
withdraw $1000 from a checking account
1) select balance into curbal from checking
where acctid = 30792;
2) if curbal < 1000 then raise error
3) update checking
set balance = balance - 1000
where acctid = 30792;
4) emit $1000 from ATM;
Schedule A1 A2 B1 B2 B3 B4 A3 A4 doesn’t
work using Read Committed Isolation.
How could the code be changed to work even
with Read Committed Isolation?
Copyright © Ellis Cohen, 2002-2005
94
Read Committed and FOR UPDATE
Suppose transactions A & B simultaneously
withdraw $1000 from a checking account
1) select balance into curbal from checking
where acctid = 30792 FOR UPDATE;
2) if curbal < 1000 then raise error
3) update checking
set balance = balance - 1000
where acctid = 30792;
4) emit $1000 from ATM;
Copyright © Ellis Cohen, 2002-2005
95

slides

Transcript slides

Directory