Transcript Overview

CS 542 – Database
Management Systems
Summer 2004
Kajal Claypool
[email protected]
Slides adapted from Silberschatz et al. and
Ramakrishnan et al.
Administrative Stuff

Class web page:


http://www.cs.wpi.edu/~kajal/courses/cs542S04/index.html
Collaboration Policy:

Homework assignments are individual efforts
• Copying from Web, friends, old assignments, any source =>
cheating!
• “Copying” -> taking anything verbatim, finding the main idea
and using it.
• 1st cheating offense -> F for assignment
• 2nd cheating offense -> F for the course
Slides adapted from Silberschatz et al. and
Ramakrishnan et al.
Introduction to
Database Systems
An Overview of DBMS
Chapter 1
Slides adapted from Silberschatz et al. and
Ramakrishnan et al.
Outline

Purpose of Database Systems
 View of Data
 Data Models
 Data Definition Language
 Data Manipulation Language
 Transaction Management
 Storage Management
 Database Administrator
 Database users
 Overall System Architecture
Slides adapted from Silberschatz et al. and
Ramakrishnan et al.
Why study Databases?

Databases are everywhere:







DBMS brings together many different CS areas


Banking: all transactions
Airlines: reservations, schedules
Universities: registration, grades
Sales: customers, products, purchases
Manufacturing: production, inventory, orders, supply chain
Human resources: employee records, salaries, tax deductions
OS, Algorithms, AI, logic, languages, multimedia
DBAs make a lot of money!
Slides adapted from Silberschatz et al. and
Ramakrishnan et al.
What is a DBMS?

DBMS: Database Management System



Collection of interrelated data
Set of sophisticated programs to access that data
Database: Collection of data


Usually information relevant to an enterprise
Models the real world
• Entities (students and courses)
• Relationship between entities (a student takes a course)
Slides adapted from Silberschatz et al. and
Ramakrishnan et al.
Why a DBMS?

In the early days, database applications
were built on top of file systems
 Drawbacks of using file systems to store
data:

Data redundancy and inconsistency
• Format of one file may be different from format of another file!

Difficulty in accessing data
• Need special programs to now generate a list of all customers
in a particular postal-code area

Data isolation
• Because data may be scattered over various files, it becomes
hard for the programmer to grab the information from all
locations
Slides adapted from Silberschatz et al. and
Ramakrishnan et al.
Problems (contd.)

Integrity problems


Integrity constraints (e.g. account balance > 0) become part of program code
Hard to add new constraints or change existing ones
• Example: Bank decides that savings balance must always be greater than
$50.

Atomicity:

Certain actions must be treated as one. This is hard to do if data is scattered
over many files.
• Example: if you withdraw money, the bank must withdraw and then update the
balance as one operation

Concurrent Access:


Often for efficiency you may allow many users to access the files at the same
time. But you must guarantee that my money does not go into your account!
This is hard with files.
Security:

Not every user should be able to see everything! Hard to control with files.
Slides adapted from Silberschatz et al. and
Ramakrishnan et al.
A DBMS

Database systems offer solutions to all the
above problems





provide uniform structures for storage of information
provide mechanisms for manipulating the information
ensure the safety of the information stored despite system
crashes or attempts at unauthorized access
avoid anomalous results if data is to be shared among many
users
Their Goal: do all of this in a way that is both convenient
and efficient.
Slides adapted from Silberschatz et al. and
Ramakrishnan et al.
Data Storage
Slides adapted from Silberschatz et al. and
Ramakrishnan et al.
Information Storage

Remember “convenience” and “efficient” are two requirements
for a DBMS system

To efficiently retrieve data , data is often stored as complex data
structures. However, these complex data structures adversely
effect the requirement “convenience of database system users”

Solution: Data Abstraction

Simplify the user’s interaction with the DBMS
Slides adapted from Silberschatz et al. and
Ramakrishnan et al.
Data Abstraction

Physical Level



Logical Level




Lowest level
Describes “how” data (a record) is actually stored
Next higher level
Describes “what” data are stored in the database, and what
relationships exist between those data
Logical level user does not need to be aware of the complexity of
the physical level structures
View Level



Highest level of abstraction
Only part of the entire database is visible, I.e., information that a
user may need to see
Also provides a security mechanism
• Example: a teller in a bank does not need to see the salaries of all bank
employees!
Slides adapted from Silberschatz et al. and
Ramakrishnan et al.
View of Data
An architecture for a database system
Slides adapted from Silberschatz et al. and
Ramakrishnan et al.
Data Abstraction (contd.)

Example: a record in Pascal
type customer = record
name : string;
street : string;
city : integer;
end;

Physical level:


Logical level:


Stored as contiguous blocks
As shown above
View level:

A subset
type customer = record
name : string;
end;
Slides adapted from Silberschatz et al. and
Ramakrishnan et al.
Instances and Schemas


Concept is similar to types and variables in programming
languages
Schema – the logical structure of the database





Instance – the actual content of the database at a particular point in time


e.g., the database consists of information about a set of customers and
accounts and the relationship between them
Analogous to type information of a variable in a program
Physical schema: database design at the physical level
Logical schema: database design at the logical level
Analogous to the value of a variable
Physical Data Independence – the ability to modify the physical
schema without changing the logical schema


Applications depend on the logical schema
In general, the interfaces between the various levels and components should
be well defined so that changes in some parts do not seriously influence others.
Slides adapted from Silberschatz et al. and
Ramakrishnan et al.
Data Models

A collection of concepts for describing





data
data relationships
data semantics
data constraints
Examples of Data Models:





Entity-Relationship model
Relational model
Object-oriented model
Semi-structured data models (XML)
Older models: network model and hierarchical model
Slides adapted from Silberschatz et al. and
Ramakrishnan et al.
Entity-Relationship Model

E-R model of real world

Entities (objects)
• E.g. customers, accounts, bank branch

Relationships between entities
• E.g. Account A-101 is held by customer Johnson
• Relationship set depositor associates customers with accounts

Widely used for database design

Database design in E-R model usually converted to design in the
relational model (coming up next) which is used for storage and
processing
Slides adapted from Silberschatz et al. and
Ramakrishnan et al.
Relational Model

Attributes
Example of tabular data in the relational model
Customerid
customername
192-83-7465
Johnson
019-28-3746
Smith
192-83-7465
Johnson
321-12-3123
Jones
019-28-3746
Smith
customerstreet
customercity
accountnumber
Alma
Palo Alto
A-101
North
Rye
A-215
Alma
Palo Alto
A-201
Main
Harrison
A-217
North
Rye
A-201
Slides adapted from Silberschatz et al. and
Ramakrishnan et al.
A Sample Relational
Database
Slides adapted from Silberschatz et al. and
Ramakrishnan et al.
Data Manipulation
Slides adapted from Silberschatz et al. and
Ramakrishnan et al.
Data Definition Language
(DDL)

Specification notation for defining the database schema



char(10),
integer)
DDL compiler generates a set of tables stored in a data dictionary
Data dictionary contains metadata (i.e., data about data)


E.g.
create table account (
account-number
balance
e.g.: schema of a table is metadata
Data storage and definition language


language in which the storage structure and access methods used by
the database system are specified
Usually an extension of the data definition language
Slides adapted from Silberschatz et al. and
Ramakrishnan et al.
Data Manipulation Language
(DML)

Language for accessing and manipulating the data organized by the
appropriate data model


DML also known as query language
It can:
•
•
•
•

Retrieve information stored in the database
Insert new information
Delete existing information
Modify existing information
Two classes of languages

Procedural –
• user specifies what data is required and how to get those data

Nonprocedural (or Declarative)
• user specifies what data is required without specifying how to get those data

The portion of DML that is involved with information retrieval is called a
query language


SQL is the most widely used query language
The DML component of SQL is nonprocedural
Slides adapted from Silberschatz et al. and
Ramakrishnan et al.
SQL

SQL: widely used non-procedural language



E.g. find the name of the customer with customer-id 192-83-7465
select customer.customer-name
from customer
where customer.customer-id = ‘192-83-7465’
E.g. find the balances of all accounts held by the customer with customer-id 19283-7465
select account.balance
from depositor, account
where depositor.customer-id = ‘192-83-7465’ and
depositor.account-number = account.account-number
Application programs generally access databases through one of


Language extensions to allow embedded SQL
Application program interface (e.g. ODBC/JDBC) which allow SQL queries to be
sent to a database
Slides adapted from Silberschatz et al. and
Ramakrishnan et al.
Safety
Slides adapted from Silberschatz et al. and
Ramakrishnan et al.
Database Users


Users are differentiated by the way they expect to interact with
the system
Application programmers –


Sophisticated users –


form requests in a database query language
Specialized users –


interact with system through DML calls
write specialized database applications that do not fit into the
traditional data processing framework
Naïve users –


invoke one of the permanent application programs that have been
written previously
E.g. people accessing database over the web, bank tellers, clerical
staff
Slides adapted from Silberschatz et al. and
Ramakrishnan et al.
Database Administrator


Coordinates all the activities of the database system; the
database administrator has a good understanding of the
enterprise’s information resources and needs.
Database administrator's duties include:







Schema definition
Storage structure and access method definition
Schema and physical organization modification
Granting user authority to access the database
Specifying integrity constraints
Acting as liaison with users
Monitoring performance and responding to changes in requirements
Slides adapted from Silberschatz et al. and
Ramakrishnan et al.
Safe Sharing
Slides adapted from Silberschatz et al. and
Ramakrishnan et al.
Concurrency Control

Concurrent execution of user programs is essential for good
DBMS performance.



Because disk accesses are frequent, and relatively slow, it is
important to keep the cpu humming by working on several user
programs concurrently
Interleaving actions of different user programs can lead to
inconsistency: e.g., check is cleared while account balance is
being computed
DBMS ensures that such problems do not arise. Users can
pretend that they are using a single-user system.
Slides adapted from Silberschatz et al. and
Ramakrishnan et al.
Transaction:
An Execution of a DB Program

Key concept is transaction:


An atomic sequence of database actions (read/write)
Each transaction, executed completely, must leave the database in
a consistent state if the DB is consistent when the transaction
begins.



Users can specify some simple integrity constraints on the data, and the DBMS
will enforce these constraints
Beyond this, the DBMS does not really understand the semantics of your data
(eg. It does not understand how the interest is calculated on the account
balance)
Thus, ensuring that the transaction (run alone) preserves consistency is
ultimately the user’s responsibility!
Slides adapted from Silberschatz et al. and
Ramakrishnan et al.
Scheduling Concurrent
Transactions

DBMS ensures that the execution of {T1, T2….Tn} is equivalent
to some serial execution of T1’, T2’…Tn’



Main Idea:


Before reading/writing an object, a transaction requests a lock on the object,
and waits till the DBMS gives it the lock. All locks are released at the end of
the transaction.
This is called Strict 2PL locking protocol
If an action of Ti (say writing X) affects Tj (which perhaps reads X), one of
them say Ti will obtain the lock first and Tj is forced to wait until Ti
completes; this effectively orders the transactions
What if Tj already has a lock on Y and Ti requests Y?
Slides adapted from Silberschatz et al. and
Ramakrishnan et al.
Ensuring Atomicity


DBMS ensures atomicity, all or nothing, even if the system crashes
in the middle of a Xact.
Idea:

Keep a log (history) of all actions carried out by the DBMS while
executing a set of Xacts
• Before a change is made to the database, the corresponding log entry is
made and forced into a safe location
• After a crash, the effects of partially executed Xacts are undone using the
log.
Slides adapted from Silberschatz et al. and
Ramakrishnan et al.
Overall System Structure
Slides adapted from Silberschatz et al. and
Ramakrishnan et al.
Summary






DBMS used to maintain and query large datasets
Benefits include recovery from system crashes, concurrent
access, quick application development, data integrity and
security
Levels of abstraction give data independence
A DBMS typically has a layered architecture
DBAs are one of the highest paid computer jobs and still in high
demand!
DBMS R&D is one of biggest research areas in industry
(Microsoft, IBM, Oracle) and in academia!
Slides adapted from Silberschatz et al. and
Ramakrishnan et al.
Where to next?

Storage and Manipulation of Data

Storage at the Logical level
• Chapter 2

Manipulation of Data
• Chapter 3
Slides adapted from Silberschatz et al. and
Ramakrishnan et al.