Introduction to Database Systems

Download Report

Transcript Introduction to Database Systems

Introduction to
Database Systems
Introduction to Database Systems
1
Basic Definitions





Mini-world
 Some part of the real world about which data is stored in a
database.
Data
 Known facts that can be recorded and have an implicit
meaning.
Database
 A collection of related data.
Database Management System (DBMS)
 A software package/system to facilitate the creation and
maintenance of a computerized database.
Database System
 The DBMS software together with the data itself.
Sometimes, the applications are also included.
Introduction to Database Systems
2
Data Models



A data model is a collection of concepts
for describing data.
A schema is a description of a particular
collection of data, using a given data model.
The relational data model is the most
widely used model today.


Main concept: relation, basically a table with
rows and columns.
Every relation has a schema, which describes
the columns, or fields.
Introduction to Database Systems
3
Levels of Abstraction

Many views, single
conceptual (logical)
schema and physical
schema.



Views describe how users
see the data.
Conceptual schema
defines logical structure
Physical schema describes
the files and indexes
used.
Introduction to Database Systems
View 1
View 2
View 3
Conceptual Schema
Physical Schema
Database
4
Example: University Database

Conceptual schema




Physical schema



Students(sid: string, name: string, login: string,
age: integer, gpa: real)
Courses(cid: string, cname: string, credits: integer)
Enrolled(sid: string, cid: string, grade: string)
Relations stored as unordered files.
Index on first column of Students.
External Schema (View)

Course_info(cid: string, enrollment: integer)
Introduction to Database Systems
5
File Systems vs. DBMS

Large main memory


Query processing



Special code for different queries
Concurrency control


Application must stage large datasets between main
memory and secondary storage (e.g., buffering, pageoriented access, 32-bit addressing of 4 GB, etc.)
Must protect data from inconsistency due to multiple
concurrent users
Crash recovery
Security and access control
Introduction to Database Systems
6
Advantages of the Database
Approach

Data independence








Applications insulated from how data is structured and
stored.
Logical data independence: Protection from changes in
logical structure of data.
Physical data independence: Protection from changes in
physical structure of data.
Efficient data access
Data integrity and security
Uniform data administration
Concurrent access and crash recovery
Reduced application development time
Introduction to Database Systems
7
When not to use a DBMS



Main costs of using a DBMS
 High initial investment and possible need for additional
hardware.
 Overhead for providing generality, security, concurrency
control, recovery, and integrity functions.
When a DBMS may be unnecessary
 If the database and applications are simple, well defined, and
not expected to change.
 If there are stringent real-time requirements that may not
be met.
 If access to data by multiple users is not required.
When no DBMS may suffice
 Limitation of its modeling capability.
 Special operations not supported by the DBMS.
Introduction to Database Systems
8
Query Languages


Query languages: Allow manipulation and
retrieval of data from a database.
Relational model supports simple, powerful query
languages



To specify "what" instead of "how"
A query is applied to relation instances, and the
result of a query is also a relation instance.
Several ways of expressing a given query; a query
optimizer should choose the most efficient
version.
Introduction to Database Systems
9
Concurrency Control

Concurrent execution of user programs is
essential for good DBMS performance.


Interleaving actions of different user
programs can lead to inconsistency


Disk accesses are frequent and relatively slow.
Check is cleared while account balance is being
computed.
DBMS ensures such problems don’t arise

Users can pretend they are using a single-user
system.
Introduction to Database Systems
10
Transaction


A transaction is a sequence of database actions
(reads/writes).
Desirable Properties: ACID




Atomicity: A transaction is an atomic unit of processing;
it is either performed in its entirety or not performed
at all.
Consistency preservation: A correct execution of the
transaction must take the database from one consistent
state to another.
Isolation: A transaction should not make its updates
visible to other transactions until it is committed.
Durability or permanency: Changes made by a
committed transaction must never be lost because of
subsequent failure.
Introduction to Database Systems
11
Scheduling Concurrent
Transactions

DBMS ensures that execution of {T1, ... , Tn} is
equivalent to some serial execution T1’ ... Tn’.



Before reading/writing an object, a transaction
requests a lock on the object, and waits till the DBMS
gives it the lock.
Idea: If an action of Ti (say, writing X) affects Tj
(which perhaps reads X), one of them, say Ti, will obtain
the lock on X first and Tj is forced to wait until Ti
completes; this effectively orders the transactions.
What if Tj already has a lock on Y and Ti later requests
a lock on Y? (Deadlock!)
Introduction to Database Systems
12
Ensuring Atomicity


DBMS ensures atomicity (all-or-nothing property)
even if system crashes in the middle of a
transaction.
Idea: Keep a log (history) of all actions carried
out by the DBMS while executing a set of
transactions:


Before a change is made to the database, the
corresponding log entry is forced to a safe location
(Write-Ahead Log or WAL protocol).
After a crash, the effects of partially executed
transactions are undone using the log.
Introduction to Database Systems
13
The Log

Actions to be recorded in the log





Ti writes an object: The old value and the new value.
Ti commits/aborts: A log record indicating this action.
Log records chained together by transaction id,
so it’s easy to undo a specific transaction.
Periodic checkpointing can reduce the time
needed to recover from a crash.
All log related activities (and in fact, all
concurrency control related activities such as
lock/unlock, dealing with deadlocks etc.) are
handled transparently by the DBMS.
Introduction to Database Systems
14
Structure of a DBMS
Queries
Query Optimizer
Operator Evaluator
Files and Access Methods
Buffer Management
Disk Space Management
Concurrency
Control
Recovery
Manager
Database
Introduction to Database Systems
15