Transcript Intro

CS4432: Database Systems II
Course Introduction
Mohamed Eltabakh
1
What Is a Relational Database
Management System ?
Database Management System = DBMS
Relational DBMS = RDBMS
• A collection of files that store the data
– But: Files that we do not directly access or read
• A big C program written by someone else that
accesses and updates those files for you
– But: Huge program containing 100s of 1000s of lines
2
Where are RDBMS used ?
• Backend for traditional “database” applications
• Backend for large Websites
• Backend for Web services
Airline
reservation
University
registration
Bank accounts
and transactions
Movie Database
Hospital System
3
Example of a Traditional Database
Application
University
Suppose we are building a system registration
to store the information about:
• students
• courses
• professors
• who takes what, who teaches what
4
Can we do it without a DBMS ?
Sure we can! Start by storing the data in files:
students.txt
courses.txt
professors.txt
Now write C or Java programs to implement
specific tasks
File System Approach
5
Doing it without a DBMS...
• Enroll “Mary Johnson” in “CSE444”:
Write a C program to do the following:
Read ‘students.txt’
Read ‘courses.txt’
Find&update the record “Mary Johnson”
Find&update the record “CSE444”
Write “students.txt”
Write “courses.txt”
6
What Can Go Wrong
Several drawbacks of using file systems
•Data redundancy and inconsistency
– Multiple file formats, duplication of information in different files
– Multiple records formats within the same file
– No order enforced between fields
•Difficulty in accessing data
– Need to write a new program to carry out each new task
– No indexes, always scan the entire file
•Integrity problems
– Modify one file (or field in a file), and not changing the dependent
fields or files
– Integrity constraints (e.g., account balance > 0) become “buried” in
program code rather than being stated explicitly
7
What Can Go Wrong
• Concurrent access by multiple users
– Many users need to access/update the data at the same time
(concurrent access)
– Uncontrolled concurrent access can lead to inconsistencies
• Example: Two people are updating the same bank account at the same time
• Security problems
– Hard to provide user access to some, but not all, data
• Recovery from crashes
– While updating the data the system crashes
• Maintenance problems
– Hard to search for or update a field
– Hard to add new fields
8
Enters a DMBS
“Two tier database system”
Applications
Connection
(ODBC, JDBC)
Data files
Database server
(someone else’s
C program)
9
Functionality of a DBMS
The programmer sees SQL, which has two components:
• Data Definition Language - DDL
Frontend
• Data Manipulation Language – DML
(CS3431)
Behind the scene the DBMS has:
• Query Optimizer
Backend
• Query Engine
(CS4432)
• Storage Management
• Transaction Management (concurrency, recovery)
10
How the Programmer Sees the
DBMS
Frontend
• Start with DDL to create tables:
CREATE TABLE Students (
Name CHAR(30)
SSN CHAR(9) PRIMARY KEY NOT NULL,
Category CHAR(20)
) ...
• Continue with DML to populate tables:
INSERT INTO Students
VALUES(‘Charles’, ‘123456789’, ‘undergraduate’)
. . . .
11
How the Programmer Sees the
DBMS
Frontend
• Tables:
Students:
SSN
123-45-6789
234-56-7890
Courses:
CID
CSE444
CSE541
Takes:
Name
Charles
Dan
…
Category
undergrad
grad
…
Name
Databases
Operating systems
SSN
123-45-6789
123-45-6789
234-56-7890
CID
CSE444
CSE444
CSE142
…
Quarter
fall
winter
“data independence” = separate logical view
from physical implementation
12
What is Hidden ???
CREATE TABLE Students (
Name CHAR(30)
SSN CHAR(9) PRIMARY KEY NOT NULL,
Category CHAR(20)
) ...
May create indexes
INSERT INTO Students
VALUES(‘Charles’, ‘123456789’, ‘undergraduate’)
. . . .
May update
indexes
Backend
Creating file
(Data)
Updating catalog
tables
Reading and updating
file from disk
Check
constraints
13
Queries
Frontend
• Find all courses that “Mary” takes
SELECT C.name
FROM Students S, Takes T, Courses C
WHERE S.name=“Mary” and
S.ssn = T.ssn and T.cid = C.cid
• We did not specify how to execute
• We did not specify how to optimize
Should be happy
not to do it…
14
What is Hidden ???
Backend
Imperative query execution plan:
Declarative SQL query
Cname
SELECT C.name
FROM Students S, Takes T, Courses C
WHERE S.name=“Mary” and
S.ssn = T.ssn and T.cid = C.cid
cid=cid
sid=sid
Complex cost model &
many alternatives
name=“Mary”
Students
Takes
Courses
The optimizer chooses the best execution plan for a query
15
Transactions
Frontend
• Enroll “Mary Johnson” in “CSE444”:
BEGIN TRANSACTION;
INSERT INTO Takes
SELECT Students.SSN, Courses.CID
FROM Students, Courses
WHERE Students.name = ‘Mary Johnson’ and
Courses.name = ‘CSE444’
-- More updates here....
IF everything-went-OK
THEN COMMIT;
ELSE ROLLBACK
If system crashes, the transaction is still either committed or aborted
16
Transactions
• A transaction = sequence of statements that either
all succeed, or all fail
• Basic unit of processing
• Transactions have the ACID properties:
A = atomicity
C = consistency
I = independence (Isolation)
D = durability
17
Transaction ACID Properties
T1
T2
T3
T4
• Each transaction has a Start and End and does many things
in between
• “A”  Atomic: Either the entire transaction is done (all its
actions) or none.
• “C”  Consistency: A transaction must move the DB from
one consistent state to another consistent state
18
Transaction ACID Properties (Cont’d)
T1
T2
T3
T4
• What about interaction
– Can T2 read what T1 is writing?
– Can T3 read what T1 is reading?
– Can T4 read what T1 wrote?
• “I”  Isolation: Although running concurrently, they should
appear as if they run is a certain serial order
19
Transaction ACID Properties (Cont’d)
T1
T2
T3
T4
• If T1 failed & T2 completed  This means what?
• T1  Rolledback & T2  Committed
• “D”  Durability: The effect of a committed transaction must
be persistent (not lost)
20
Transactions
BEGIN TRANSACTION;
BEGIN TRANSACTION;
BEGIN
TRANSACTION;
INSERT
INTO
Takes
BEGIN
TRANSACTION;
INSERT
INTO
Takes Courses.CID
SELECT Students.SSN,
INSERT
INTO
Takes
SELECT
Students.SSN,
Courses.CID
FROM
Students,
Courses
INSERT
INTO
Takes
SELECT
Students.SSN,
Courses.CID
FROM
Students,
Courses
WHERE
Students.name
= ‘Mary
Johnson’ and
SELECT
Students.SSN,
Courses.CID
FROM
Students,
Courses
WHERE
Students.name
=
‘Mary
Courses.name
= ‘CSE444’Johnson’ and
FROM
Students,
Courses
WHERE
Students.name
= ‘Mary Johnson’ and
Courses.name = ‘CSE444’
WHERE
Students.name
= ‘Mary Johnson’ and
= ‘CSE444’
-- More updates Courses.name
here....
= ‘CSE444’
-- More updates Courses.name
here....
-- More updates here....
IF everything-went-OK
-- More updates here....
IFTHEN
everything-went-OK
COMMIT;
IF
everything-went-OK
THEN
COMMIT;
ELSE IF
ROLLBACK
everything-went-OK
THEN
COMMIT;
ELSE ROLLBACK
THEN
COMMIT;
ELSE ROLLBACK
ELSE ROLLBACK
Backend
Many transactions at the
same time
(Concurrency Control)
Ensure the ACID
properties
Logging and Recovery
Control
21
DBMS Backend
Components
• We will cover
several of these
components
Chapter 1 in
textbook
22
Topics To Be Covered…
• File & System Structure
Records in blocks, dictionary, buffer management,…
• Indexing & Hashing
B-Trees, hashing,…
• Query Processing
Query costs, join strategies,…
• Crash Recovery
Failures, stable storage,…
• Concurrency Control
Correctness, locks,…
• Transaction Processing
Logs, deadlocks,…
23
Database Material at WPI
B, C terms
CS 3431
D term (alternate)
CS 4432
you are here
CS525
MQP
CS 542
CS 561
Grad. DB
Advanced
DB
DON’T TAKE!
DSRG
Selected
Topics
Selected
DB Project
DB Research
at WPI
Varies
Any time
year round
24
Database Systems
• The big commercial database vendors:
–
–
–
–
Oracle
IBM (with DB2) bought Informix recently
Microsoft (SQL Server)
Sybase
• Some free database systems (Unix) :
– Postgres
– Mysql
– Predator
• In CS4432 we use Oracle & SimpleDB!
25