Technical Report II
Download
Report
Transcript Technical Report II
Advanced Topics in Databases
Technical Report II : Main memory Databases
Fahime Raja
Niloofar Razavi
Melody Siadaty
Summer 2005
Outline:
Introduction
MMDB vs. DRDB
Storage Issues
Reliability Issues
Data Structures
Concurrency Control
Commit Processing
Data Representation
Access Methods
T-tree definition & structure
Query Processing
Recovery
Introduction:
In a Main Memory DBMS (MMDBMS) there is no central role for I/O
management.
Still, an important question in a MMDBMS is how to do transactions
and recovery in an efficient way:
Some of the proposed algorithms assume that a (small) stable subset
of the main memory exists,
a piece of memory whose content will not be lost in a power outage through
a battery backup.
These stable memories can be used to place, e.g., a redo log.
Others do not assume stable memories, and still use I/O to write
transaction information to stable storage.
These algorithms hence do not eliminate I/O, but minimize it, as the critical
path in a MMDBMS transaction only needs to write the log; not data pages
from the buffer manager.
Introduction:
Motivations:
Requirement of short Access/Response time
Telecommunication applications
Applications handling high traffic of data, e.g. Router.
Real-Time applications
End of popularity of MMDBMS techniques in the early
1990s
MMDBMS were thereafter only considered of specific
interest to real-time database applications, like above
Main Memory Databases
Vs.
Disk Resident Databases
M : the primary copy of data live permanently in
main memory
D: the primary copy of data is permanently disk
resident
M: there can be a backup copy ,resident on disk
D: data can be temporarily cached in main memory for
access speed up.
Main Memory Databases
Vs.
Disk Resident Databases
Storage issues:
Access time of MM, orders of magnitude less than for disks (100
nsec vs. 10 msec)
MMDBMS footprints range between 200 KB and 2 MB
MM is normally volatile
stable memory still expensive!
Disks have high fixed dcost per access independent of the amount
of the retrieved data (Block Oriented)
MM does not care of sequential access
MM data are more vulnerable to software errors, since they can
be directly accessed by the processor.
Main Memory Databases
Vs.
Disk Resident Databases
Reliability issues:
even if special hardware can enhance MM
reliability, periodic backup is necessary
MM content is lost if system crashes
If a single memory board fails, the entire machine must
be powered down
loosing all data!
Whatever power backup for main memory is, in turn, is
less reliable than passive magnetic media.
Main Memory Databases
Vs.
Disk Resident Databases
Data Structures:
MMDB are not DRDB with a
very large cache!
DRDB:
Cached data are accessed through
indexes designed for disk access.
Access is made through a buffer
manager ,which, given the disk
address, checks if the relevant
block is in the MM-Cache and
then copies it to the MM
application working area .
API for DRDB
MMDB:
data are accessed by directly
referring to their memory address.
The following pictures illustrate
this fact.
API for MMDB
MMDBMS Concurrency Control:
Lock duration is short :
Serial transaction processing
reduced contention
The advantage of small locking granules (fields or records) (when data
are memory resident) ,is removed,
Very large lock granules (e.g., relations) are most appropriate (up to the
entire database) for memory resident data.
almost eliminate the need of concurrency control
highly reduces cache flushes.
CC is still necessary when,
Mixed length transactions coexist
A multiprocessor system shares the DB among different units.
MMDBMS Concurrency Control:
Implementation :
Traditional:
a hash table that contains entries for the objects currently locked .
No lock information attached to data
MMDB:
Stuff some small number of locking information to data.
the first bit is set the object is locked,
If it is locked and the second bit is set,
there are one or more waiting transaction
else it is free.
MMDBMS Commit Processing
necessary to have a backup copy and to keep a log of
transaction activity
Before a transaction can commit, its activity records must
be written to the log.
Logging can impact response time, since each transaction
must wait for at least one stable write before committing.
Logging can also affect throughput if the log becomes a
bottleneck
In MMDBs, the logging represents the only disk
operation each transaction will require.
MMDBMS Commit Processing
Typical log length 400 bytes
stable the log tail (< 100 pages ) in a small amount of
stable main memory
40 bytes for Begin/End
360 bytes for Old/New values
Reduces response time
Doesn’t affect bottlenecks
Pre-committing:
releases transaction’s locks as soon as its log record is placed in
the log, without waiting for the information to be propagated to
the disk.
Doesn’t affect serialization
Because the log is sequential
MMDBMS Commit Processing
Doesn’t reduce response time
Enhances concurrency
Response time of others.
Group committing:
accumulates enough commit records to fill up a log
page and then flushes it to disk
Reduces the total # of disk accesses
can be used to relieve a log bottleneck .
MMDB Data Representation:
Relational data are usually represented as flat files
Tuples are stored sequentially
Attribute values are embedded in the tuples.
Access is local
Space consuming due to duplicate values
Inefficient due to sequentially of computations
Need of indexes :
Relational tuples can be represented as a set of pointers to data values.
use of pointers is space efficient when large values appear multiple
times in the database,
(the actual value needs to only be stored once.)
Pointers also simplify the handling of variable length fields since
variable length data can be represented using pointers into a heap.
MMDB Access Methods:
Hashing:
Fast lookup & updating
Not space efficient
Doesn’t support range queries
Tree indexing:
With a single pointer get access both to an attribute
value and the entire tuple
Pointers are of fixed short lengths
Suited for range queries
MMDB Access Methods:
The T-Tree :
Is a data structure whose ancestors are B-Trees and AVL-trees
It is binary like AVL-trees
search is essentially binary
A T-node contains many elements like B-tree
storage & update are done efficiently
Insertions and deletions usually move data between a single node
like in B-trees
Rebalancing is done by node rotation
Like in AVL-trees, but much less frequent.
MMDB Access Methods:
T-Tree
MMDB Query Processing:
Query processing for DRDB focus on reducing disk
access costs
Query processing for MMDB must focus on reducing
processing costs
Operation costs vary from system to system
No general optimization technique
Implementation for relational operators should benefit of
main memory data 7 index representation
Nested loop-join is preferred to sort-merge join:
Although the sorted relations could be represented easily in a
main memory database using pointer lists, there is really no need for
this since much of the motivation for sorting is already lost .
MMDBMS Backup and Recovery:
Backups of memory resident databases must be
maintained on disk or other stable storage to insure
against loss of the volatile data .
1.the procedure used during normal database operation to
keep the backup up-to-date (Checkpointing),
2.the procedure used to recover from a failure. (Failure
Recovery)
loading blocks of the database “on demand” until all of the
data has been loaded.
using disk striping or disk array
there must be independent paths from the disks to memory
MMDBMS Backup and Recovery:
3 major steps in a recovery process:
1. Logging
log information can be recorded at two different levels of
abstraction,
physical logging:
records, for each update, the state of the modified database
and the physical location (such as page id and offset) to
which the update is applied.
logical logging: records the state transition of the database
and the logical location affected by the update.
The Write Ahead Logging (WAL) is the most popular and
accepted type of logging rules
MMDBMS Backup and Recovery:
2.Checkpointing
MMDB checkpointing approaches fall into three
categories:
non-fuzzy checkpoint
the checkpointer first obtains locks on the data items to be
checkpointed so that consistency of the database is preserved
fuzzy checkpoint
the checkpointer does not secure any locks. To synchronize with
normal transactions, the checkpointer writes a record to the log
after a checkpoint is completed. In addition, the database is
generally dumped in segments.
log-driven checkpoint
the checkpointer applies the log to a previous dump to generate a
new dump rather than dumping from the main-memory database.
MMDBMS Backup and Recovery:
3. Reloading
is the last and the one of the most important parts of the recovery
process.
Reloading must ensure that the database is consistent after a crash.
The database is loaded from the last checkpointed image and the undo
and redo operations are applied to restore it to the most recent
consistent state.
The existing reloading schemes can be divided into two classes:
simple reloading
the system does not resume its normal operation until the entire database
is brought back into the main memory
concurrent reloading
reload activities and transaction processing are performed in parallel.