Introduction to Persistent Storage, Concurrency Control and

Transcript Introduction to Persistent Storage, Concurrency Control and

CS 766
Introduction:
1. The Age of Infinite Storage.
2. Concurrency Control.
3. Recovery
.
Section 1
#1
1. The Age
of
Infinite Storage
has begun
Many of us have enough money in our pockets right now to buy all
the storage we will be able to fill for the next 5 years.
So having the storage capacity is no longer a problem.
Managing it is a problem (especially when the volume gets large).
How much data is there?
Section 1
#2
Googi 10100
 Tera Bytes (TBs) are Here
 1 TB costs  1k$ to buy
 1 TB costs ~300k$/year to own
...
 Management and curation are the expensive part
 Searching 1 TB takes hours
 I’m Terrified by TeraBytes
We are here
 I’m Petrified by PetaBytes
 I’m completely Exafied by
ExaBytes
 I’m too old to ever be Zettafied by ZettaBytes, but you
may be in your lifetime.
 You may be Yottafied by YottaBytes.
 You may not be Googified by GoogiBytes,
Yotta
1024
Zetta
1021
Exa
1018
Peta
1015
Tera
1012
Giga
109
Mega
106
Kilo
103
but the next generation may be?
Section 1
#3
Yotta
How much information is there?
Zetta
 Soon everything can be
recorded and indexed.
 Most of it will never be
seen by humans.
 Data summarization,
trend detection, anomaly
detection, data mining,
are key technologies
Exa
Everything!
Recorded
Peta
All Books
MultiMedia
Tera
All books (words)
.Movie
Giga
A Photo
Mega
A Book
Kilo
10-24 Yocto, 10-21 zepto, 10-18 atto, 10-15 femto, 10-12 pico, 10-9 nano, 10-6 micro, 10-3 milli
Section 1
#4
First Disk, in 1956
 IBM 305 RAMAC
 4 MB
 50 24” disks
 1200 rpm
 100
(revolutions per minute)
milli-seconds (ms) access time
 35k$/year to rent
 Included computer &
accounting software
(tubes not transistors)
Section 1
#5
1.6 meters
10 years later
30 MB
Section 1
#6
In 2003, the Cost of Storage was about 1K$/TB.
It’s gone steadily down since then.
12/1/1999
Price vs disk capacity 9/1/2000
k$/TB
Price vs disk capacity
9/1/2001
Price vs disk capacity
y = 17.9x
$
IDE
IDE
SCSI
8.0
15
20
y = 13x
SCSI
20
0
40
= 2.0x
80
8.0
054.0 9.0
0 7.0 10
03.0 8.0
06.07.0
2.0
$
200
y=x
0
50
100
150
Raw Disk
unit Size
50
100
150GB
200
Raw Disk unit Size GB
20
rawSCSI 6
raw
IDE k$/TB
20 k$/TB
GB
30
40
50
40
Disk unit size GB
200
250
5.0
4.0
0
3.04.0
2.03.0
1.02.0
1.0
0.0
0.0
0
60
60
80
SCSI
6.0
0.0
50
100
150
Raw Disk unit Size GB
200
0
10.0
1.05.0
IDE
y = 2x
0 0
5
10
5.0
11/4/2003
y=x
400
10.0
7.0
IDE
raw
k$/TB
6.09.0
60
y
20
40
60
Raw Disk unit Size GB
SCSI
SCSI
10
15
y = 6.7x
SCSI
0
800
600
9.0
20
25
$
y = 7.2x
IDE
raw
k$/TB
10.0
25
30
SCSI
$
$
200
30
35
Price vs
disk capacityy = 6x
IDE
SCSI
IDE
y = 3.8x
GB
$
400
35
40
4/1/2002
Price vs disk capacity
800 200
600
40
$
$
$
$
1000
900
1000
800
900
700
800
1400 600
700
500
1200 600
400
500
300
14001000 400
200
800 300
100
12001400200
600 0
100
10001200 0 0
400
0
1000
50
SCSI
IDE
100
150
Disk unit size GB
200
IDE
0
50
50
100
150
200
Disk unit size GB
Disk100
unit size150
GB
Section 1
#7
200
250
Kilo
Mega
Disk Evolution
Giga
Tera
Peta
Exa
Zetta
Yotta
Section 1
#8
Memex
As We May Think, Vannevar Bush, 1945
“A memex is a device in which an
individual stores all his books, records,
and communications, and which is
mechanized so that it may be consulted
with exceeding speed and flexibility”
“yet if the user inserted 5000 pages of
material a day it would take him
hundreds of years to fill the repository,
so that he can enter material freely”
Section 1
#9
Can you fill a terabyte in a
year?
Item
Items/TB
Items/day
a 300 KB JPEG image
3M
9,800
a 1 MB Document
1M
2,900
a 1 hour, 256 kb/s MP3
audio file
9K
26
a 1 hour 1 MPEG video
290
0.8
Section 1
# 10
On a Personal Terabyte,
How Will We Find Anything?
 Need Queries, Indexing, Data Mining,
Scalability, Replication…
 If you don’t use a DBMS, you will
implement one of your own!
 Need for Data Mining, Machine Learning is
more important then ever!
Of the digital data in existence today,
 80% is personal/individual
DBMS
 20% is Corporate/Governmental
Section 1
# 11
We’re awash with data!

Network data:


10 exabytes by 2010
~ 1019 Bytes
10 zettabytes by 2015
~ 1022 Bytes
WWW (and other text collections)


~ 1016 Bytes
Sensor data from sensors (including Micro & Nano -sensor networks)


15 petabytes by 2007
National Virtual Observatory (aggregated astronomical data)


~ 1013 Bytes
US EROS Data Center archives Earth Observing System (near Soiux Falls SD)
Remotely Sensed satellite and aerial imagery data


10 terabytes by 2004
10 yottabytes by 2020
~ 1025 Bytes
Genomic/Proteomic/Metabolomic data (microarrays, genechips, genome sequences)

10 gazillabytes by 2030
~ 1028 Bytes?
I made up these Name! Projected data sizes are
overrunning our ability to name their orders of magnitude!

Stock Market prediction data (prices + all the above?)

10 supragazillabytes by 2040 ~ 1031 Bytes?
Useful information must be teased out of these large volumes of raw data.
AND these are some of the 1/5th of Corporate or Governmental data collections.
The other 4/5ths of data sets are personnel!
Section 1
# 12
 Parkinson’s Law (for data)
 Data expands to fill available storage
 Disk-storage version of Moore’s Law
 Available storage doubles every 9 months!
 How do we get the information we need
from the massive volumes of data we will
have?
 Querying (for the information we know is there)
 Data mining (for the answers to questions we
don't know to ask precisely).
Section 3
# 13
INTRODUCTION TO CONCURRENCY CONTROL AND RECOVERY
IN DATABASE SYSTEMS



CONCURRENCY CONTROL is activity of coordinating process actions that
operate in parallel (concurrently in time), access shared data (or any other
shared resources), and potentially interfere with each other (e.g., if one
process is reading data item, x, concurrently while another is changing x,
then the reader could get the first half of the old value concatenated with the
second half of the new value).
RECOVERY is the acitivity of ensuring that software and hardware failures do
not corrupt (cause the loss of) presistent data (Data that has been
guaranteed by the DBMS to survive the ending of the program that created
it.).
A TRANSACTION is an atomic unit of database work during the execution of
a program that accesses a shared database. Transactions must be atomic (all
or nothing: Either successfully complete or leave no effects on the database
(or users) whatsoever.)


It's the job of the Concurrency Control/Recovery system to insure this.
Transactions don't interfere with each other.




(No audio is embedded in the next 15 slides)
Termination means either all effects are made permanent or the Transaction has no
effect on the database or user at all.
A DATABASE is a set of named data items (each has a value)
The collection of all values of the database at any time constitute the database STATE at
that time
An ITEM is a word of main memory, a diskpage, the entire database, an area,
a file, a subset of a file, a record or a field (GRANULE or data item). We will
assume the GRANULARITY is record level.
INTRODUCTION TO CONCURRENCY CONTROL AND RECOVERY - 2

A DATABASE SYSTEM is a hardware and software system supporting data
access commands (operations).

The DATABASE OPERATIONS of interest to this discussion are two:


Read(x) returns value stored in the data item named x.
Write(x) changes the value of x to value.

A Database System executes these operations ATOMICALLY and as a Von
Neumann machine (as if sequentially).

Operation execution is usually CONCURRENT however = several operation
executions overlap in time, but the final effect must be the same as the
sequential ordering specified by the concurrency controller.

The TRANSACTION OPERATIONS of interest in this discussion are three:
START (or BEGIN) means "A new Transaction begins here". COMMIT
means "This transaction completed correctly (normally) and that the
effects should now be made permanent". ABORT means "This transaction
did not complete correctly and dictates that all its effects should be
obliterated.
INTRODUCTION TO CONCURRENCY CONTROL AND RECOVERY - 3

We assume the Database System responds to a START with the return of a
unique Transaction Identifier (TID) from a linearly ordered set of TIDs (e.g.,
start time or timestamp), and that the Transaction program attaches the TID
to each operation as it is submitted.

Our Database System view of a transaction is that it consists only of:


a START,
a partially ordered set of database operations (reads and writes),


a read(x) operation returns the value found in data item, x, to the requesting transaction
a write(x, new_value) operations replaces the current_value found in data_item, x, with
new_value and returns an acknowledgement to the requesting transactions (and ACK).

a Commit or an Abort (but never both).

The User view of a transaction is that of one or more procedures including
database and transaction operations and program logic.

COMMIT or ABORT: A transaction is ACTIVE if it is started but not yet
committed or aborted and it is UNCOMMITTED if it is aborted or active.


Note: Aborts can be transaction generated or imposed by the Database System. ABORT
wipes out all transaction effects (on the database or users) Causes? System failure in the
middle of a transaction, The Database System discovers it has returned an incorrect
value to the transaction. Execution of a transaction's COMMIT (and the returning an
acknowledgement (ack)) of that commit constitutes a guarantee that the DBS will not
abort the transaction and that all the transaction's effects will survive subsequent
failures of the system (or even Operation System failures that corrupt main memory but
not media falures).
MESSAGES: We assume there is no direct communication between
separate transactions, except through writes to and reads from the Database.
INTRODUCTION TO CONCURRENCY CONTROL AND RECOVERY - 4

Most Database System provide (guarantee) the following ACID
properties for their Transactions:

Atomicity: A Transaction's changes to the state of the Database
are atomic, that is, they either all happen or none happen.
(These changes include database changes and messages.)

Consistency: A Transaction is a correct transformation of the
Database state. The actions of a transaction, taken as a group,
do not violate any of integrity constraints associated with the
state. (This requires that the Transaction be a correct program.)

Isolation: Even though Transactions execute concurrently, it
must appear to each Transaction, T, that others execute either
before T or after T, but not both. (It must appears as though T is
the only transaction in in the system.)

Durability: Once a Transaction completes successfully (commits)
the changes (writes) it made survive failures.
INTRODUCTION TO CONCURRENCY CONTROL AND RECOVERY - 5

T reads x from S if:





1. T reads x after S has written x
2. S does not abort before T reads x
3. every transaction which writes x between S's write and T's read, aborts
before T reads. (So the value returned to T is the value written by S.)
T reads_from S if T reads one or more data items from S.
RECOVERABILITY: An Execution (of a set of concurrent transactions) is
RECOVERABLE if, forall T that commit, T's Commit follows the Commit of
every transaction that T reads_from.

OR, said another way, if T reads_from S and T commits, then, S commits
before T commits.

Note: Recoverability can be achieved by delaying Commits (dependent ones).

Admittedely TERMINAL I/O and other external interactions between
transactions and the users who issue them, can result in indirect
communication and therefore dependencies, but, we consider any errors
caused by such interactions to be user errors (we wash our hands of them ;-)
I.e., we do not consider transaction to communicate with each other except
through database writes and reads. (DBSs can prevent some user
INDIRECT_READ problems by, DEFERRING all output until commit.
INTRODUCTION TO CONCURRENCY CONTROL AND RECOVERY - 6

CASCADELESSNESS (Avoidance of Cascading Aborts or ACA) of an
execution is achieved if the DBMS ensures that every transaction
reads only values written by committed transactions.

The DBMS must delay Read(x) until all transaction that have
already issued a Write(x, value) abort or commit.

I.e., If T reads_from S then S must commit or abort before T
reads_from S.

The set of Cascadeless executions is CONTAINED_IN the set of
Recoverable executions.


ACA  RC
And therefore as properties of individual transactions, Reverability
implies ACA.

RC  ACA
INTRODUCTION TO CONCURRENCY CONTROL AND RECOVERY - 7

STRICT EXECUTION: Many systems implement ABORT by restoring
the "before-image" of all writes made by the aborting transaction.
T1:
1
3
T1-COMMIT
v
|
X------ 1 ----- | -----------------------v
Y--------------- 3 ----- 1 -------------^
T2:
1
2 -------^
| --------|
2
T2 ABORT
before image of X in Write_by_T2(X,2) =1
and
before image of Y in Write_by_T2(Y,1) =3,
since if T2 had never happened :
T1:
1
3
T1-COMMIT
v
|
X------ 1 ----- | ------------------------------------v
Y--------------- 3 ------------------------------------We see that "aborting by restoring before-images" requires that the DBMS use the
WRITE-AHEAD-LOGGING (WAL) PROTOCOL: The DBMS can't overwrite a value
without logging it first.
INTRODUCTION TO CONCURRENCY CONTROL AND RECOVERY - 8

WRITE-AHEAD-LOGGING (WAL) PROTOCOL: DBS can't overwrite a value
without logging it first. (The logged value then becomes the before-value
(bfv) and "logging" means, basically, writing it to a secure place, e.g., disk
(but a different disk from the one containing the DB))
T1:
X-----
1
v
1
----
Y----------------T2:
Log:

Abort by restoring before-images is not always correct:
T1:
X-- 1 --------T2:

3
T1-COMMIT
|
| ----------------------------------2 ---------v
^
3 --- 1 ----------------------------| ---------: ^
|
: |
|
: 1
2 T2 ABORT
v
bfv(T2,Y=3)
bfv(T2,X=1)
2
v
2
ABORT
-----
3 ------------------------^
3
Restoring before images would reset X to 1 (whereas it should be 3)
unless abort of T1 is "cascaded" to T2 (i.e., T2 aborted also).
INTRODUCTION TO CONCURRENCY CONTROL AND RECOVERY - 9

Restoring before images would reset X to 1 (whereas it should be 3)
unless abort of T1 is "cascaded" to T2 (i.e., T2 aborted also).

If aborts are cascaded to all transactions that read_from_T1
T1:
X----- 1 ------T2:
2
v
2
ABORT-T1
-----
3 ------------------------^
3
ABORT-T2

then restoring before images would reset X to 1 (when T1 ABORTS), then
X to 2 (when T2 ABORTS).

This is also incorrect (X should be restored to 1).

The problem is that the before image was written by an ABORTED
transaction.

Solution: Write(x,val) be delayed until all transaction which have
previously written x have terminated (COMMIT/ABORT).
INTRODUCTION TO CONCURRENCY CONTROL AND RECOVERY - 10

STRICT (ST) executions

1. Delay Read(x) until all transaction that did a previous
Write(x) either Commit or Abort (avoid cascading aborts).

2. Delay Write(x) until all transaction that did a previous
Write(x) either Commit of Abort (So that we can
implement abort by restoring before-images.).
Strictness  Cascadelessness  Recoverability
of executions:
ST  ACA  RC
as properties
INTRODUCTION TO CONCURRENCY CONTROL AND RECOVERY - 11
RIGOROUS (RG) executions =

Strict and

3. Delay Write(x) until all transaction previously Read(x)
Commit/Abort.
Rigorous  Strictness  Cascadelessness  Recoverability
RG  ST  ACA  RC
INTRODUCTION TO CONCURRENCY CONTROL AND RECOVERY - 12
DATABASE CONSISTENCY: States of the Database are consistent if
they satisfy the consistency contraints (eg. Bal=sum(accts)).
Designers define consistency predicates.
Transaction correctness requires also that the transaction preserve
consistency.
Serializable executions preserve consistency.
SERIAL EXECUTION means there is no interleaving of the operations
of concurrent transactions.
Serial executions are correct if each transaction is correct (takes the
database from one consistent state to another consistent state)
so the DB goes from consistent state to consistent state ...
forever (Each consistent state is produced by one isolated
Transaction.).
INTRODUCTION TO CONCURRENCY CONTROL AND RECOVERY - 13


A SERIALIZABLE (SR) execution is one that produces the same
output and has the same effect on the Database state as SOME
serial execution.
Serializability is a definition of complete correctness in our
model. SERIAL  SR

ORDERING Transactions: If equivalence to a particular order is required, it
is the users responsibility to insure it (issue 2nd Transaction only after
commit of 1st Transaction is ack'ed by system).

Sometimes serializability is taken to be unrealistic (Editorial: I don't agree!
Especially in this age of "two" systems, an OLTP system (OnLine
Transaction Processing system, for short updates) and a DW system
(DataWarehouse system, for adhoc, long-running read-only queries) we
should be able to achieve SR!)

Averages over large amounts of data (some inconsistent retrieval is
tolerable)

Physical process control (process nonterminating so "very long running")
INTRODUCTION TO CONCURRENCY CONTROL AND RECOVERY - 14
From Gray-Reuter:
"Suprisingly, most system do not provide true isolation (e.g., serializability).
Originally, this was because implementors didn't understand the issues. Now
the issues are understood, but implementors make a compromise between
correctness and performance."
The typical modern system default is to guard agains problems caused by readfroms (WRITE followed by READ) and write-overs (WRITE followed by WRITE)
but ignore write-after-reads (READ followed by WRITE) dependencies.

This goes under the name "cursor stability" in most SQL systems.

The ISO and ANSI SQL standard mandates true isolation (e.g., serializability)
as a default, but few vendors follow this aspect of the standard. Rather, they
allow sophisticated users to request isolation as an option called "repeatable
reads". They also allow a third option to disable write-after-read isolation,
called "browse" access, which allows queries to scan the database without
acquiring locks and without delaying other Transactions. These options are
generally called degrees of isolation.

Full isolation is used in this treatment . None-the-less, we give the definitions
of isolation levels, but don't recomend them.

ISOLATION LEVELS
SQL defines at least three levels of isolation weaker than serializability (they do
not guarantee consistency entirely, but they are easier to achieve). We look at:
REPEATABLE READ
READ COMMITTED
READ UNCOMMITTED

INTRODUCTION TO CONCURRENCY CONTROL AND RECOVERY - 15

REPEATABLE READ ensures that a transaction, T, reads only changes made by committed
Transactions. No value read or written by T can be changed by another Transaction until T
is complete (committed or aborted).

Repeatable Read allows the PHANTOM PHENOMENON: If T is reading "all employee records in
the sales department", another transaction might enter (insert) a new employee record with
department=sales, which could be missed by T's read. Repeatable read isolation can be
achieved by using the same locking protocol as 2PL except it doesn't use index locking (index
locking will guarantee that no other transaction can complete an insert of an employee record
in the sales dept until T has completed it's read - since such a read locks the value, department
= sales of the department index and an insert involves a write to that same index record.)

READ COMMITTED ensures that T reads only changes made by committed STNs and that no
value written by T is changed by any other transaction until T is complete. However a value
read by T may well be modified by another Transaction while T is still in progress. T is
exposed to the phantom problem also. A read committed transaction obtains exclusive
locks before writing items and holds these locks until END. It also obtains shared locks
before reading but these locks are released immediately (their only effect is to guarantee
that the Transaction that last modified the object has completed that task.)

READ UNCOMMITTED ensures nothing (T can read changes made to an item by an ongoing
transaction and the item can be further changed while T is in progress. T is exposed to the
phantom problem. A read uncommitted Transaction does not obtain shared locks before
reading items. This mode represents the greatest exposure to uncommitted changes of
other transactions; so much so that SQL prohibits such a transaction from making any
changes itself - a read uncommitted transaction is required to have an access mode of
READ ONLY.

There are other definitions of reduced levels of isolation (less than serializability) in other
treatments of the subject (and in many actual commercial systems). These notes will stick
to serializability, and only make occasional comments regarding the reduced levels.
Thank
you.
Section 3
#1

Introduction to Persistent Storage, Concurrency Control and

Transcript Introduction to Persistent Storage, Concurrency Control and

Directory