Why Not Store Everything in Main Memory? Why use disks?

Download Report

Transcript Why Not Store Everything in Main Memory? Why use disks?

8.
Transactions
The terminology used in this section is that all users (online interactive users or batch programs)
issue transactions to the DBMS. A TRANSACTION is an atomic unit of database work
specified by a user to the DBMS (atomic means - either executed to completion or not
executed at all). Transactions are often called QUERIES when they request only read access
(i.e., QUERIES are READ-ONLY TRANSACTIONS).
A transaction is issued using constructs: (these are the only ones we need for discussion of Concurrency
Control and Recovery - the DBMS issues discussed in this section).
BEGIN to initiate a transaction (most actual system supply the BEGIN if the user doesn't, e.g., whenever a
new SQL statement is encountered it is assumed to iniate a new transaction)
END to end a transaction (usually either COMMIT for successful END and ABORT for unsuccessful END)
(most actual system supply this element if the user doesn't, e.g., If SQL statement execution is successful,
Then DBMS supplies COMMIT, else ABORT)
READ whenever any data is needed from the DB (e.g., in an SQL SELECT)
WRITE whenever any data needs to be written to the DB (e.g., in an SQL INSERT or UPDATE) In this set
of notes, all others aspects of language, coding, etc. will be considered as un-intrepreted aspects.
When a transaction arrives at the DBMS, a Transaction Manager (TM) is assigned to it (code segment to
act on its behalf). The TM interfaces with other components, e.g., the Scheduler (SCHED) for
permission to access particular data items. SCHED is like a policeman, giving permission to access the
requested item(s). Its activity is called concurrency control.
Once permission is granted for TM to access data items Data Manager (DM) does the actual reads and
writes. There are several models for describing this interaction. We will describe two of them, Model-1
and Model-2.
Section 8
#1
Transactions Processing,
Model-1
1. TM makes requests to the
SCHEDULER to read/write data
item(s) or to commit/abort the
transaction
2. Scheduler (SCHED) decides if the
request can be scheduled . If yes, it
schedules request (passes it to DM (on
TMs behalf). If no rejects it, informs
TM.
3. DM read/writes the data item or
commits or aborts the transaction if
possible, else returns reject to the
SCHEDULER (which returns it to TM)
4. DM returns the value read (or returns an
acknowledgement(ACK) of the write or
commit request to the SCHEDULER
5. SCHED returns the same to the TM.
There can be one TM multithreaded by all
transactions, or an individual TM
assigned to each individual transaction.
Transaction Manager(s)
1. read, write,
commit, abort
2 ,3; reject
5; read value,
write/commit ack
Scheduler
2. read, write,
commit, abort
3 reject.
4 value read;
write/commit_ack
Data Manager
3. read, write,
Data on Disk
Section 8
#2
Transactions Processing, Model-2
(assumed through the rest of notes)
1. TM requests permissions from SCHED.
2. SCHED accepts or rejects TMs
permission requests.
3. TM requests DM to do
read/write/commit/abort.
4. DM read/writes the data item or
commits or aborts the transaction if
possible, else returns reject to the TM.
5. DM returns the value read (or returns an
acknowledgement(ACK) of the write or
commit request to the TM
There can be one TM multithreaded by all
transactions, or an individual TM
assigned to each individual transaction.
Transaction Manager(s)
1. read, write,
commit, abort
2. decision:
accept
or
reject
Scheduler
5. value read
or
ack reject
3. read, write,
commit, abort
Data Manager
4. read, write,
Data on Disk
Section 8
#3
Concurrency
Control
(CC)
(the activity of the scheduler, SCHED)
We need concurrency control (AKA mutual exclusion) whenever there are shared system
resources that cannot be used concurrently.
An illegal concurrent use of a shared resource is a conflict. E.g., one user reads JONES while
another user changes JONES to SMITH. The read could get SMIES.
The shared DBMS resources will be call data items.
DATA ITEM GRANULARITY is the level at which we treat Concurrency Control:
field level (logical level, very fine granularity)
record level (logical level, fine granularity)
page level (physical level, medium granularity)
file level (logical level, coarse granularity)
area level (logical level, quite coarse granularity)
database level (logical level, very coarse granularity)
We assume, that a data item is a record (i.e., we assume logical, record-level granularity)
This means there are many more shared resources for DBMS to manage than anywhere
else, (e.g., printers for an O/S), and thus, CC is harder in a DBMS than anywhere else!
A DBMS may have 1,000,000 records or more. An O/S may have to manage ~ 50 printers.
Ethernet Medium Access Protocol (unswitched) manages ONE shared wire.
Although you may have studied mutual exclusion before (e.g., in an
Operating Systems course it is a more complicated problem in DBMS.
Section 8
#4
Concurrency Control cont.
In any resource management situation (Operating System, Network Operating System or DBMS...) there are
"shared resources" and there are "users". SHARED RESOURCE MANAGEMENT deals with
how the system can insure correct access to shared resources among concurrently executing transactions?
All answers seem to come from traffic control! (traffic intersections, construction zones, driveup windows).
WAITING POLICY: If a needed resource is unavailable, requester waits until it becomes available (e.g.,
intersection red light, Hardee's drive up lane). This is how print jobs are managed by an OS.
Advantages: NO RESTARTING (no unnecessary progress loss) e.g., At Hardees, they don't say "Go home!
Come back later! Disadvantages: DEADLOCKS may happen unless they are managed. E.g., at a
construction zone, if the two flag women don't coordinate, both traffic lines may start into construction
zones from opposite directions resulting in a DEADLOCK in the middle!). Another disadvantage is
INCONSISTENT RESPONSE TIMES. At the Hardees window, you may wait an hour or a minute. (Not
so important at Hardees (well maybe it is if you're very hungry? ;-), but at an Emergency Room?).
RESTART POLICY: If a needed resource is unavailable, the request is terminated and restarted later. E.g.,
When someone goes before a parole board, they either get their request or they restart the process later).
In Ethernet (unswitched) CSMA/CD, if node A wants to send a message to node B:
1. Carrier Sense (the "CS" part): the wire is checked for traffic; if it is busy (in use by another sender), A
waits (according to some "back-off algorithm") then checks again, etc. until the wire is idle, then SENDs
the message.
2. Collision Detection (the "CD" part): listen to bus until you're certain that your message did not collide
with another concurrently sent message (the required length of wait time is the traversal_time of wire,
since there are terminators (absorbers) at each end).
Advantages of restart policies: simple, no deadlock
Section 8 # 5
Disadvantages: Lower throughput, lost progress, long delays?, possible livelock.
Concurrency Control cont.
A Transaction is an atomic computation or program taking the database from one consistent
state to another (without necessarily preserving consistency at each step of the way).
The transaction is an atomic unit of database work, ie, DBMS executes transaction to
completion or not at all, GUARANTEED. If only one transaction is allowed to execute at
time and if the database starts in a consistent state then it will always end up in a consistent
state! The problem with such a SERIAL EXECUTION policy is that it is too inefficient!
A DBMS must guarantee the so-called ACIDS PROPERTIES of transactions:
ATOMICITY: A transaction is an all-or-nothing proposition. Either a transaction is executed by
the DBMS to completion or all of its effects are erased completely. (Transaction = atomic unit
of database workload).
CONSISTENCY: Correct Transactions take the database from one consistent state to another
consistent state. Consistency is defined in terms of consistency constraints or "integrity
constraints", e.g., entity integrity, referential integrity, other integrities.
ISOLATION: Each user is given the illusion of being the sole user of the system (by the
concurrency control subsystem).
DURABILITY: The effects of a transaction are never lost after it is "committed" by the DBMS.
(ie, after a COMMIT request is acked by DBMS).
Section 8
#6
Execution types
SERIAL EXECUTION insures most of the ACID properties (Consistency and
isolation for sure. It also helps in atomicity and durability). i.e., queue all
transactions as they come in (into a FIFO queue?). Let each transaction execute to
completion before the next even starts. Serial execution may produce unacceptable
execution delays (i.e., long response times) and low system utilization.
SERIALIZABLE EXECUTION is much, much better! Concurrent execution of
multiple transactions is called serializable if the effect of the execution of operations
(reads and writes) within the transactions are sequenced in a way that the result is
equivalent to some serial execution (i.e., is as if it was done by a serial execution of
transaction operations). Serializability facilitates ATOMICITY, CONSISTENCY
and ISOLATION of concurrent, correct transactions, just as well as SERIAL does,
but allow much higher system throughput.
RECOVERABILITY facilitates DURABILITY (more on this later). An execution is
RECOVERABLE if every transactions that commits, commits only after every
other transaction it read-from is committed.
Section 8
#7
Isolation Levels
SQL defines execution types or levels of isolation weaker than SERIALIZABILITY
(they do not guarantee ACIDS properties entirely, but they are easier to achieve).
REPEATABLE READ ensures that no value read or written by a transaction, T, is
changed by any other transaction until T is complete; and that T can read only
changes made by committed transactions.
READ COMMITTED ensures that no value written by a transaction, T is changed by
any other transaction until T is complete; and that T can read only changes made by
committed transactions.
READ UNCOMMITTED ensures nothing (T can read changes made to an item by an
ongoing trans and the item can be further changed while T is in progress.
There will be further discussion on these later in these notes. For now, please note there
are several suggested paper topics in the topics file concerning isolation levels. But
also note that I think these other isolation levels are bunk!
Section 8
#8
Concurrent Transactions
are transactions whose executions overlaps in time (the individual operations
(read/write of a particular data item) may be interleaved in time). Again, the only
operations we concern ourselves with are BEGIN, READ, WRITE, COMMIT,
ABORT.
READ and WRITE are the operations that apply to data items. A data item can be a
field, record, file, area or DB (logical granules) or page (physical granule). We
assume record-level granularity.
A read(X) operation, reads current value of the data item, X, into a program variable
(which we will also called X for simplicity). Even though we will not concern our
selves with these details in this section, read(X) includes the following steps: 1. Find
the address of the page containing X. 2. Copy that page to a main memory buffer
(unless it is already in memory). 3. Copy the value of the dataitem, X, from the
buffer to the program variable, X
The write(X) operation, writes the value of the program variable, X, into the database
item X. It includes the following steps: 1. Find the address of the page containing X
2. Copy that page to a main memory buffer (unless it is already in memory). 3. Copy
the program variable, X, to buffer area for X. 4. Write the buffer back to disk (can
be deferred and is governed by DM).
Section 8
#9