CS186: Introduction to Database Systems

Download Report

Transcript CS186: Introduction to Database Systems

EECS 647: Introduction to
Database Systems
Instructor: Luke Huan
Spring 2007
Administrative

The following book has been reserved in the
engineering library:

Data mining : concepts and techniques by Jiawei Han and
Micheline Kamber
7/21/2015
Luke Huan Univ. of Kansas
2
Review: Data Integrity in SQL


Constraints
Trigger
7/21/2015
Luke Huan Univ. of Kansas
3
Trigger example
CREATE TRIGGER EECS168AutoRecruit
AFTER INSERT ON Student Event
REFERENCING NEW ROW AS newStudent
FOR EACH ROW
WHEN (newStudent.GPA > 3.0) Condition
INSERT INTO Enroll
VALUES(newStudent.SID, ’EECS168’);
Action
7/21/2015
Luke Huan Univ. of Kansas
4
Today’s Topic




View
Index
Transaction
Recursion
7/21/2015
Luke Huan Univ. of Kansas
5
Views

A view is like a “virtual” table



Defined by a query, which describes how to compute the
view contents on the fly
DBMS stores the view definition query instead of view
contents
Can be used in queries just like a regular table
7/21/2015
Luke Huan Univ. of Kansas
6
Creating and dropping views

Example: EECS647roster


CREATE VIEW EECS647Roster AS
SELECT SID, name, age, GPA Called “base tables”
FROM Student
WHERE SID IN (SELECT SID FROM Enroll
WHERE CID = ’EECS647’);
To drop a view

DROP VIEW view_name;
7/21/2015
Luke Huan Univ. of Kansas
7
Using views in queries

Example: find the average GPA of EECS647 students

SELECT AVG(GPA) FROM EECS647Roster;

To process the query, replace the reference to the view by
its definition
SELECT AVG(GPA)
FROM (SELECT SID, name, age, GPA
FROM Student
WHERE SID IN (SELECT SID
FROM Enroll
WHERE CID =
’EECS647’));

7/21/2015
Luke Huan Univ. of Kansas
8
Why use views?
To hide data from users
To hide complexity from users
Logical data independence







If applications deal with views, we can change the
underlying schema without affecting applications
Recall physical data independence: change the physical
organization of data without affecting applications
To provide a uniform interface for different
implementations or sources
Real database applications use tons of views
7/21/2015
Luke Huan Univ. of Kansas
9
Modifying views




Does not seem to make sense since views are virtual
But does make sense if that is how users see the
database
Goal: modify the base tables such that the modification
would appear to have been accomplished on the view
Be careful!



There may be one way to modify
There may be many ways to modify
There may be no way to modify
7/21/2015
Luke Huan Univ. of Kansas
10
A simple case
CREATE VIEW StudentGPA AS
SELECT SID, GPA FROM Student;
DELETE FROM StudentGPA WHERE SID = 123;
translates to:
DELETE FROM Student WHERE SID = 123;
7/21/2015
Luke Huan Univ. of Kansas
11
An impossible case
CREATE VIEW HighGPAStudent AS
SELECT SID, GPA FROM Student
WHERE GPA > 3.7;
INSERT INTO HighGPAStudent
VALUES(987, 2.5);

No matter what you do on Student, the inserted row will
not be in HighGPAStudent
7/21/2015
Luke Huan Univ. of Kansas
12
A case with too many possibilities
CREATE VIEW AverageGPA(GPA) AS
SELECT AVG(GPA) FROM Student;

Note that you can rename columns in view definition
UPDATE AverageGPA SET GPA = 2.5;



Set everybody’s GPA to 2.5?
Adjust everybody’s GPA by the same amount?
Just lower Lisa’s GPA?
7/21/2015
Luke Huan Univ. of Kansas
13
SQL92 updateable views

More or less just single-table selection queries





No join
No aggregation
No subqueries
Arguably somewhat restrictive
Still might get it wrong in some cases


See the slide titled “An impossible case”
Adding WITH CHECK OPTION to the end of the view
definition will make DBMS reject such modifications
7/21/2015
Luke Huan Univ. of Kansas
14
Indexes for Performance Tuning
An index is an auxiliary persistent data structure



Search tree (e.g. B-tree, R-tree)
Lookup table (e.g., hash table)
An index on R.A can speed up accesses of the form



R.A = value (lookup)
R.A > value (range query)
An index on ( R.A1, …, R.An ) can speed up




R.A1 = value1 Æ … Æ R.An = valuen
(R.A1 , … R.An) > (value1, …, valuen)
More on indexes in the second half of this course!
7/21/2015
Luke Huan Univ. of Kansas
15
Examples of Hash Index

SELECT * FROM Student WHERE name = ‘Kevin’


Without an index on Student.name: must scan the entire table if we
store Student as a flat file of unordered rows
With index: go “directly” to rows with name = ’Kevin’
name
sid
name
age
gpa
John
1234
John
21
3.5
Mary
1123
Mary
19
3.8
Bob
1011
Bob
22
2.6
Susan
1204
Susan
22
3.4
Kevin
1306
Kevin
18
2.9
A hash table
7/21/2015
Luke Huan Univ. of Kansas
16
Creating and dropping indexes

CREATE [UNIQUE] INDEX index_name ON
table_name(column_name1, …, column_namen);


With UNIQUE, the DBMS will also enforce that
{column_name1, …, column_namen} is a key of
table_name
e.g CREATE INDEX name_index ON STUDENT
(SName);

DROP INDEX index_name;

Typically, the DBMS will automatically create indexes
for PRIMARY KEY and UNIQUE constraint declarations
7/21/2015
Luke Huan Univ. of Kansas
17
Choosing indexes to create
More indexes = better performance?
 Indexes take space
 Indexes need to be maintained when data is updated
 Indexes have one more level of indirection

Optimal index selection depends on both query and
update workload and the size of tables
7/21/2015
Luke Huan Univ. of Kansas
18
Transactions




A program may carry out many operations on the data
retrieved from the database
However, the DBMS is only concerned about what data
is read/written from/to the database.
database - a fixed set of relations (A, B, C, …)
transaction - a sequence of read and write operations
(read(A), write(B), …)

DBMS’s abstract view of a user program
7/21/2015
Luke Huan Univ. of Kansas
19
Correctness criteria: The ACID properties


A tomicity: All actions in the Xact happen, or none happen.
C onsistency: If each Xact is consistent, and the DB starts
consistent, it ends up consistent.

I solation:
Execution of one Xact is isolated from that of other
Xacts.

D urability:
If a Xact commits, its effects persist.
An Example about SQL Transaction

Consider two transactions (Xacts):
T1:
T2:
•
•
•
•
BEGIN A=A+100, B=B-100 END
BEGIN A=1.06*A, B=1.06*B END
1st xact transfers $100 from B’s account to A’s
2nd credits both accounts with 6% interest.
Assume at first A and B each have $1000. What are the
legal outcomes of running T1 and T2???
• $1100 *1.06 = $1166
There is no guarantee that T1 will execute before T2 or
vice-versa, if both are submitted together. But, the net
effect must be equivalent to these two transactions
running serially in some order.
7/21/2015
Luke Huan Univ. of Kansas
21
Example (Contd.)


Legal outcomes: A=1166,B=954 or A=1160,B=960
Consider a possible interleaved schedule:
T1:
T2:

•
A=1.06*A,
B=B-100
B=1.06*B
This is OK (same as T1;T2). But what about:
T1:
T2:
•
A=A+100,
A=A+100,
A=1.06*A, B=1.06*B
B=B-100
Result: A=1166, B=960; A+B = 2126, bank loses $6
The DBMS’s view of the second schedule:
T1:
T2:
7/21/2015
R(A), W(A),
R(A), W(A), R(B), W(B)
Luke Huan Univ. of Kansas
R(B), W(B)
22
SQL transactions



Syntax in pgSQL:
BEGIN
<database operations>
COMMIT [ROLLBACK]
A transaction is automatically started when a user executes an
SQL statement (begin is optional)
Subsequent statements in the same session are executed as part of
this transaction


Statements see changes made by earlier ones in the same transaction
Statements in other concurrently running transactions do not see
these changes
COMMIT command commits the transaction (flushing the update
to disk)
 ROLLBACK command aborts the transaction (all effects are
undone)
7/21/2015
Luke Huan Univ. of Kansas
23

Atomicity

Partial effects of a transaction must be undone when

User explicitly aborts the transaction using ROLLBACK



The DBMS crashes before a transaction commits
Partial effects of a modification statement must be
undone when any constraint is violated


E.g., application asks for user confirmation in the last step
and issues COMMIT or ROLLBACK depending on the
response
However, only this statement is rolled back; the
transaction continues
How is atomicity achieved?

Logging (to support undo)
7/21/2015
Luke Huan Univ. of Kansas
24
Isolation


Transactions must appear to be executed in a serial
schedule (with no interleaving operations)
For performance, DBMS executes transactions using a
serializable schedule



In this schedule, only those operations that can be
interleaved are executed concurrently
Those that can not be interleaved are in a serialized way
The schedule guarantees to produce the same effects as a
serial schedule
7/21/2015
Luke Huan Univ. of Kansas
25
SQL isolation levels

Strongest isolation level: SERIALIZABLE



Complete isolation
Usually se as default
Weaker isolation levels: REPEATABLE READ, READ
COMMITTED, READ UNCOMMITTED



Increase performance by eliminating overhead and
allowing higher degrees of concurrency
Trade-off: sometimes you get the “wrong” answer
pgSQL default is READ COMMITTED
7/21/2015
Luke Huan Univ. of Kansas
26
READ UNCOMMITTED

Can read “dirty” data



A data item is dirty if it is written by an uncommitted transaction
Problem: What if the transaction that wrote the dirty data
eventually aborts?
Example: wrong average

-- T1:
UPDATE Student
SET GPA = 3.0
WHERE SID = 142;
-- T2:
SELECT AVG(GPA)
FROM Student;
ROLLBACK;
COMMIT;
7/21/2015
Luke Huan Univ. of Kansas
27
READ COMMITTED

All reads see a snapshot of the database (including all committed
transactions) right before the beginning of the query



No dirty reads, but non-repeatable reads possible
Reading the same data item twice can produce different results
Example: different averages

-- T1:
-- T2:
SELECT AVG(GPA)
FROM Student;
UPDATE Student
SET GPA = 3.0
WHERE SID = 142;
COMMIT;
SELECT AVG(GPA)
FROM Student;
COMMIT;
7/21/2015
Luke Huan Univ. of Kansas
28
REPEATABLE READ

Reads are repeatable, but may see phantoms



Do not allow the modification of existing values
New rows may be inserted in the mean time
Example: different average (still!)

-- T1:
-- T2:
SELECT AVG(GPA)
FROM Student;
INSERT INTO Student
VALUES(789, ‘Nelson’, 10, 1.0);
COMMIT;
SELECT AVG(GPA)
FROM Student;
COMMIT;
7/21/2015
Luke Huan Univ. of Kansas
29
SERIALIZABLE READ


Reads see the snapshot of the database right before the
beginning of the transaction
Example: the same average

-- T1:
-- T2:
SELECT AVG(GPA)
FROM Student;
INSERT INTO Student
VALUES(789, ‘Nelson’, 10, 1.0);
COMMIT;
SELECT AVG(GPA)
FROM Student;
COMMIT;
7/21/2015
Luke Huan Univ. of Kansas
30
Summary of SQL isolation levels
Isolation level/anomaly
READ UNCOMMITTED
Dirty reads
Non-repeatable reads
Phantoms
Possible
Possible
Possible
READ COMMITTED
Impossible
Possible
Possible
REPEATABLE READ
Impossible
Impossible
Possible
SERIALIZABLE
Impossible
Impossible
Impossible

Syntax: At the beginning of a transaction,
SET TRANSACTION ISOLATION LEVEL
isolation_level [READ ONLY|READ WRITE];

READ UNCOMMITTED can only be READ ONLY
7/21/2015
Luke Huan Univ. of Kansas
31
Recursion in SQL

SQL2 had no recursion

You can find Bart’s parents, grandparents, great
grandparents, etc.
SELECT p1.parent AS grandparent
FROM Parent p1, Parent p2
WHERE p1.child = p2.parent
AND p2.child = ’Bart’;


But you cannot find all his ancestors with a single query
SQL3 introduces recursion



WITH clause
Implemented in DB2 (called common table expressions)
Unfortunately, pgSQL does not support it.
7/21/2015
Luke Huan Univ. of Kansas
32
Ancestor query in SQL3
WITH Ancestor(anc, desc) AS
base case
((SELECT parent, child FROM Parent)
recursion step
UNION
Define a
(SELECT a1.anc, a2.desc
a relation
FROM Ancestor a1, Ancestor a2
recursively
WHERE a1.desc = a2.anc))
SELECT anc
FROM Ancestor
Query using the relation
WHERE desc = ’Bart’;
defined in WITH clause
How do we compute such a recursive query?
7/21/2015
Luke Huan Univ. of Kansas
33
Summary of SQL features covered so
far

Query
Modification
Constraints
Triggers
Views
Transaction
Recursion

Next: Database programming






7/21/2015
Luke Huan Univ. of Kansas
34