Technical Report I

Download Report

Transcript Technical Report I

Advanced Topics in Databases
Technical Report I
Fahime Raja
Niloofar Razavi
Melody Siadaty
Spring 2005
Outline:

Main Memory databases

Transaction Processing Monitors

Transactional Workflows

Real Time Databases
Main Memory Databases:

Memory resident database systems (MMDB’s) store their
data in main physical memory and provide very highspeed access.

Traditional database systems rely on



the disk subsystem to retrieve and update data
use an offline storage device such as magnetic tape for backup.
MMDB will use


physical memory as primary storage
a disk subsystem for backup.
Main Memory Databases:

Advantages:

Achieving significant performance improvements over
conventional database systems

Improve of processing time

Improve of throughput rates

due to elimination of I/O overhead.
Main Memory Databases:

Vs. conventional DBMS:

Volatility of main memory


High overhead for the initial load



using a small stable main memory to support recovery processing
frequently create the archives,
distribute them across several secondary storage devices
Implementation issues

Inherent differences between disk and memory storage
Main Memory Databases:

Concurrency Control:

access to main memory is so much faster than disk access,
transactions may complete more quickly in a main
memory system.




So, locks will not be held as long,
Lock contention may not be as important as it is when the data is
disk resident.
The advantage of small locking granules (fields or records)
(when data are memory resident), are removed.
Very large lock granules (e.g., relations) are most appropriate
for memory resident data.
Main Memory Databases:

Implementation of the locking mechanism :

conventional system: a hash table that contains entries for the
objects currently locked .

MMDB: a small number of bits in data to represent data’s
lock status.



the first bit is set  the object is locked,
If it is locked and the second bit is set,
 there are one or more waiting transaction
else it is free.
Main Memory Databases:

Commit Processing:






necessary to have a backup copy and to keep a log of transaction activity
Before a transaction can commit, its activity records must be written to the log.
Logging can impact response time, since each transaction must wait for at least one
stable write before committing.
Logging can also affect throughput if the log becomes a bottleneck
In MMDBs, the logging represents the only disk operation each transaction will
require.
Using a small amount of stable main memory hold a portion of the log.



eliminating the response time problem, since transactions need never wait for disk
operations.
Pre-committing: is accomplished by releasing a transaction’s locks as soon as its log
record is placed in the log, without waiting for the information to be propagated to
the disk.
Group commits: can be used to relieve a log bottleneck.
Main Memory Databases:

Access Methods:


in main memory access methods, data values on which the index
is built, need not be stored in the index itself, as is done in BTrees.
random access is fast in main memory  pointers can be
followed quickly index structures can store pointers to the
indexed data, rather than the data itself.


eliminating the problem of storing variable length fields in an
index and ,
saving space as long as the pointers are smaller than the data they
point to!
Main Memory Databases:

Data Representation:
 Relational tuples can be represented as a set of pointers to data values.
 use of pointers is space efficient when large values appear multiple
times in the database,

(the actual value needs to only be stored once.)
Pointers also simplify the handling of variable length fields since
variable length data can be represented using pointers into a heap.
Query Processing
 In MMDB: sequential access is not faster than random access .
 E.g.: sort merge join processing loses its advantage!



Although the sorted relations could be represented easily in a main memory
database using pointer lists, there is really no need for this since much of the
motivation for sorting is already lost .
Main Memory Databases:

Recovery:



Backups of memory resident databases must be maintained on
disk or other stable storage to insure against loss of the volatile
data .
1.the procedure used during normal database operation to keep
the backup up-to-date (Checkpointing),
2.the procedure used to recover from a failure. (Failure
Recovery)


loading blocks of the database “on demand” until all of the data
has been loaded.
using disk striping or disk array

there must be independent paths from the disks to memory
Main Memory Databases:


Papers:
Main Memory Database Systems: An overview
Hector Garcia-Molina, Kenneth Salem, IEEE 1992

This paper surveys the major memory residence optimizations and
briefly discusses some of the memory resident systems that have been
designed or implemented.
Transaction Management for a Main-Memory Database

Piyush Burte, Boanerges Aleman-Meza, D. Brent Weatherly, Rong Wu

In this paper, the details of thread concurrency and resource locking
protocols are examined, the deadlock prevention scheme, and the Javabased implementation of these design decisions. Also the effectiveness
of the design with performance tests that simulate typical transactions on
a highly concurrent database system are shown.
Main Memory Databases:
Main Memory Database Recovery

Margaret H. Eich, IEEE 1986


This paper examines MMDB recovery, identifies differences
from traditional DBMS recovery, composes a “wish list” of
MMDB recovery requirements, describes why previously
proposed techniques do not satisfy these requirements, and
proposes a new MMDB recovery technique which does.
* -Understanding, Modeling and Improvement Main-Memory
Database Performance
Stefan Manegold, 2002, Amsterdam university
Main Memory Databases:
Some possible future works:

Efficient loading for MMDBs including the idea of
partitioning.



examining methods for distributing log and archive
database information across multiple secondary storage
devices.
Simulation and testing the results
Transaction Processing Monitor


A Transaction Processing (TP) application is a program that performs
an administrative function by accessing a shared data based on behalf
of an online user.
A TP system is an integrated set of products that supports TP
applications ,including :




hardware such as processors, memories, disks, and communication
controllers
software, such as operating systems , DBMSs, computer networks
TP monitors
The main function of a TP monitor is to


coordinate the flow of transaction requests between terminals or other
devices and application programs that can process these requests
it imposes a certain structure on the software components of a TP system
and offers functions to support the activities of each component
Transaction Processing Monitor
Most of TP applications are structured to perform the
following steps for each terminal :

1.
2.
3.
4.
5.
6.
7.
Interact with the terminal user to collect the transaction’s
input, usually through forms and menus.
Translate the transaction input into a standard-format request
message.
Start the transaction.
Examine the request’s header to determine its type.
Execute the request type’s application program, which may in
turn invoke DBMSs and other application programs.
Commit the transaction after the application has finished.
Send the transaction’s output to the terminal.
Transaction Processing Monitor
A TP monitor divides an
application into 3
components that perform
the above steps:

Message Manager


performs steps 1, 2, and 7
Request Control


performs steps 3, 4, and 6
Application Server


performs step 5, in
collaboration with
DBMSs
Transaction Processing Monitor
Papers:


Transaction Processing Monitors
Philip.A.Bernstein
November 1990,Vol 33,No.11,Communication of ACM


In this article, it was shown that TP monitors have evolved to solve
distributed computing problems that are not solved by the
underlying OS, DBMS, and network. In particular, they support
multithreaded processes, message routing, queuing, and system
management and recovery. Sometimes, they support the transaction
abstraction
Chapter 2-(Transaction Processing systems) of Distributed
Database Systems
Michael Gertz, spring 2003
Transactional Workflows:

Workflow: is an activity in which multiple tasks are executed in a
coordinated way by different processing entities.

Task: some work to be done;
 Textual description in a file,a form,a msg,a computer program

Processing entity: that performs the tasks
 Person / sw system

A workflow process :
 directed graph P=< N, A >


a set of nodes N= {n1,n2 .... }
and a set of arcs
Transactional Workflows:

Increasingly,
workflow
management systems
(WFMSs) are being
used as the primary
technology for
organizations to
perform their daily
business processes
(workflows)
architecture of a WFMS
Transactional Workflows:

Execution:




Scheduler: program  submitting tasks, monitoring
events….
Task agents: controlling exec. of a task
Query mechanisms : for the state of the workflow
3 architectures:



Centralized: single scheduler
Partially distributed: one instance of the scheduler for each
workflow
Fully distributed: no scheduler, communicating agents
Transactional Workflows:


Papers:
Logic Bases Modeling and Analysis of Workflows
Hasan Davulcu , Michael Kifer, CR. Ramakrishnan, I.V.
Ramakrishnan,ACM 1998

In this paper Concurrent Transaction Logic (CTL) as the language for
specifying, analyzing, and scheduling of workflows is proposed. Also it
has been shown that both local and global properties of workflows can
be naturally represented as CTL formulas and reasoning can be done
with the use of the proof theory and the semantics of this logic.
Virtual Transaction Model to Support Workflow
Vasudev Krishnamoorthy,Ming-Chien Shan,ACM 2000


This paper presents a model called the virtual transaction ,to provide
transactional support to the workflow applications. and also discusses on
how ACID properties will be achieved using this model.
Transactional Workflows:
Correctness Issues in Workflow Management
Mohan Kamathy and Krithi Ramamritham
Department of Computer Science, University of
Massachusetts, Amherst MA, 1996


This paper discusses about issues happened due to failures
in workflows and then describes techniques for ensuring
correctness of workflows.
Real-Time Database systems

Real-time database systems inherit many properties from
both database systems and real-time systems.


ACID properties
Transactions process in a RTDBS are associated with
timing constraints deadlines:



Hard: serious problem if the deadline is contradicted
Firm: no worth if completed after the deadline
Soft: diminishing value if completed after the deadline
Real-Time Database systems

In traditional databases only the data consistency should
be preserved,

But in RTDBS : both timing and data consistency
constraints

 need of time-critical scheduling methods for




Concurrency control
Resource scheduling
Commit processing
Buffer management
Real-Time Database systems

Goal:

In traditional Dbs:



In RTDBS :


minimizing the transaction response time
And maximizing the throughput
Maximizing the # of transactions that satisfy their deadlines.
assigning a priority to each transaction based on its
deadline


Earliest Deadline First
Least Slack First

Slack Time: max length of the time ,the transaction can be delayed
and still satisfy its deadline.
Real-Time Database systems
Papers:

Advances in Real-Time Database Systems Research
Azer Bestavros, Sigmod 1996
Research Issues in Real-Time Database Systems
Ozgur Ulusoy,Bilkent University



Maintaining Security in Firm Real-time Systems
Quazi N. Ahmed and Susan V. Vrbsky, Department of Computer Science,
The University of Alabama, Tuscaloosa, AL 35487-0290, U.S.A.



In this paper a basic understanding of the issues in real-time databases is provided and
research efforts in this area are introduced.
In this papera new concurrency control algorithm for secure firm real-time databases is
produced. Also, the results show that the algorithm performs fairly well in terms of
security and timeliness compared to a non-secure algorithm.
*- Value-Based Scheduling in Real-Time Database Systems
Jayant R. Haritsa, Michael J. Carey, and Miron Livny ,VLDB Journal 1992