2_Managing external data_2
Download
Report
Transcript 2_Managing external data_2
SQL and SQAPL
Data Models
• A Database models some portion of
the real world.
• Data Model is link between user’s
view of the world and bits stored in
computer.
• We will concentrate on the Relational
Model
Data Models
• A data model is a collection of concepts for
describing data.
• A database schema is a description of a
particular collection of data, using a given
data model.
• The relational model of data is the most
widely used model today.
– Main concept: relation, basically a table
with rows and columns.
– Every relation has a schema, which
describes the columns, or fields.
Levels of Abstraction
• Views describe how
users see the data.
• Conceptual schema
defines logical structure
• Physical schema
describes the files and
indexes used.
• (sometimes called the
ANSI/SPARC model)
Users
View 1
View 2
View 3
Conceptual Schema
Physical Schema
DB
Data Independence
• A Simple Idea:
Applications should be
insulated from how data
is structured and stored.
• Logical data independence:
Protection from changes in
logical structure of data.
• Physical data independence:
Protection from changes in
physical structure of data.
View 1
View 2
View 3
Conceptual Schema
Physical Schema
DB
SQL
SQL consists of the following parts:
•
•
•
•
•
•
•
•
Data Definition Language (DDL)
Interactive Data Manipulation Language (Interactive DML)
Embedded Data Manipulation Language (Embedded DML)
Views
Integrity
Transaction Control
Authorization
Catalog and Dictionary Facilities
AIRPORT
airportcode
name
city
state
FLT-SCHEDULE
flt# airline
dtime
from-airportcode
atime
to-airportcode
miles
FLT-WEEKDAY
flt# weekday
FLT-INSTANCE
flt#
date
plane#
#avail-seats
AIRPLANE
plane#
plane-type
total-#seats
CUSTOMER
cust#
first
middle
last
phone#
street
city
RESERVATION
flt#
date
cust#
seat#
check-in-status
ticket#
state
zip
price
DDL - Overview
•
•
•
•
primitive types
domains
schema
tables
DDL - Primitive Types
• numeric
(or INT), SMALLINT are subsets of the
integers (machine dependent)
REAL, DOUBLE PRECISION are floating-point and
double-precision floating-point (machine
dependent)
FLOAT(N) is floating-point with at least N digits
DECIMAL(P,D) (or DEC(P,D), or NUMERIC(P,D)), with P digits
of which D are to the right of the decimal point
– INTEGER
–
–
–
DDL - Primitive Types (cont.)
• character-string
– CHAR(N) (or CHARACTER(N))
–
is a fixed-length
character string
VARCHAR(N) (or CHAR VARYING(N), or CHARACTER
VARYING(N)) is a variable-length character string
with at most N characters
• bit-strings
– BIT(N) is
–
a fixed-length bit string
VARBIT(N) (or BIT VARYING(N)) is a bit string with at
most N bits
DDL - Primitive Types (cont.)
• time
–
–
–
is a date: YYYY-MM-DD
TIME, a time of day: HH-MM-SS
TIME(I), a time of day with I decimal fractions
of a second: HH-MM-SS-F....F
– TIME WITH TIME ZONE, a time with a time zone
added: HH-MM-SS-HH-MM
DATE
DDL - Primitive Types (cont.)
–
TIME-STAMP, date,
time, fractions of a second
and an optional WITH TIME ZONE qualifier:
YYYY-MM-DD-HH-MM-SS-F...F{-HH-MM}
–
INTERVAL, relative
value used to increment or
decrement DATE, TIME, or TIMESTAMP: YEAR/MONTH
or DAY/TIME
DDL - Domains
• a domain can be defined as follows:
CREATE DOMAIN AIRPORT-CODE CHAR(3);
CREATE DOMAIN FLIGHTNUMBER CHAR(5);
• using domain definitions makes it easier to
see which columns are related
• changing a domain definition one place
changes it consistently everywhere it is used
• default values can be defined for domains
• constraints can be defined for domains
(later)
DDL - Domains (cont.)
• all domains contain the value, NULL.
• to define a different default value:
CREATE DOMAIN AIRPORT-CODE CHAR(3) DEFAULT ‘<literal>’;
CREATE DOMAIN AIRPORT-CODE CHAR(3) DEFAULT ‘niladic function’;
• literal, such as ‘???’, ‘NO-VALUE’,...
• niladic function, such as USER, CURRENT-USER, SESSIONUSER, SYSTEM-USER, CURRENT-DATE, CURRENT-TIME, CURRENTTIMESTAMP
• defaults defined in a column takes precedence
over the above
DDL - Domains (cont.)
• a domain is dropped as follows:
DROP DOMAIN AIRPORT-CODE RESTRICT;
DROP DOMAIN AIRPORT-CODE CASCADE;
• restrict: drop operation fails if the domain is
used in column definitions
• cascade: drop operation causes columns
to be defined directly on the underlying
data type
DDL - Schema
• create a schema:
CREATE SCHEMA AIRLINE AUTHORIZATION LEO;
• the schema AIRLINE has now been created and
is owner by the user “LEO”
• tables can now be created and added to the
schema
DDL - Schema
• to drop a schema:
DROP SCHEMA AIRLINE RESTRICT;
DROP SCHEMA AIRLINE CASCADE;
• restrict: drop operation fails if schema is not
empty
• cascade: drop operation removes everything
in the schema
DDL - Tables
• to create a table in the AIRLINE schema:
CREATE TABLE AIRLINE.FLT-SCHEDULE
(FLT#
FLIGHTNUMBER NOT NULL,
AIRLINE
VARCHAR(25),
FROM-AIRPORTCODE AIRPORT-CODE,
DTIME
TIME,
TO-AIRPORTCODE
AIRPORT-CODE,
ATIME
TIME,
PRIMARY KEY (FLT#),
FOREIGN KEY (FROM-AIRPORTCODE) REFERENCES
AIRPORT(AIRPORTCODE),
FOREIGN KEY (TO-AIRPORTCODE)
REFERENCES AIRPORT(AIRPORTCODE));
DDL - Tables (cont.)
CREATE TABLE AIRLINE.FLT-WEEKDAY
(FLT#
FLIGHTNUMBER NOT NULL,
WEEKDAY
CHAR(2),
UNIQUE(FLT#, WEEKDAY),
FOREIGN KEY (FLT#)
REFERENCES FLTT-SCHEDULE(FLT#));
CREATE TABLE AIRLINE.FLT-INSTANCE
(FLT#
FLIGHTNUMBER NOT NULL,
DATE
DATE
NOT NULL,
#AVAIL-SEATS SMALLINT,
PRIMARY KEY(FLT#, DATE),
FOREIGN KEY FLT#
REFERENCES FLT-SCHEDULE(FLT#));
DDL - Tables (cont.)
CREATE TABLE AIRLINE.RESERVATION
(FLT#
FLIGHTNUMBER NOT NULL,
DATE
DATE
NOT NULL,
CUST#
INTEGER
NOT NULL,
SEAT#
CHAR(4),
CHECK-IN-STATUS CHAR,
UNIQUE(FLT#, DATE, CUST#),
FOREIGN KEY (FLT#)
REFERENCES FLT-INSTANCE(FLT#),
FOREIGN KEY (DATE)
REFERENCES FLT-INSTANCE(DATE),
FOREIGN KEY (CUST#)
REFERENCES CUSTOMER(CUST#));
DDL - Tables (cont.)
• to drop a table:
DROP TABLE RESERVATION RESTRICT;
DROP TABLE RESERVATION CASCADE;
• restrict: drop operation fails if the table is
referenced by view/constraint definitions
• cascade: drop operation removes referencing
view/constraint definitions
DDL - Tables (cont.)
• to add a column to a table:
ALTER TABLE AIRLINE.FLT-SCHEDULE
ADD PRICE DECIMAL(7,2);
• if no DEFAULT is specified, the new column
will have NULL values for all tuples already
in the database
DDL - Tables (cont.)
• to drop a column from a table
ALTER TABLE AIRLINE.FLT-SCHEDULE
DROP PRICE RESTRICT (or CASCADE);
• restrict: drop operation fails if the column
is referenced
• cascade: drop operation removes
referencing view/constraint definitions
Case
• A dataset consisting of 5 tables are available
as .csv files or as an excel file
• Let us get this data into APL
• Getdata
• Next let us create the database using SQAPL
• First you must create a database in Access
• SQAPL
Interactive DML - Overview
•
•
•
•
•
•
•
•
•
•
•
•
•
select-from-where
select clause
where clause
from clause
tuple variables
string matching
ordering of rows
set operations
built-in functions
nested subqueries
joins
recursive queries
insert, delete, update
Interactive DML - select-from-where
SELECT A1, A2, ... An
FROM
R1 , R2 , ... Rm
WHERE P
• the SELECT clause specifies the columns of the
result
• the FROM clause specifies the tables to be
scanned in the query
• the WHERE clause specifies the condition on
the columns of the tables in the FROM clause
Interactive DML
- select clause
• “Find the airlines in FLT-SCHEDULE”
SELECT AIRLINE
FROM FLT-SCHEDULE;
SELECT ALL AIRLINE
FROM FLT-SCHEDULE;
• “Find the airlines in FLT-SCHEDULE with
duplicates removed”
SELECT DISTINCT AIRLINE
FROM FLT-SCHEDULE;
Interactive DML
- select clause
• “Find all columns in FLT-SCHEDULE”
SELECT *
FROM FLT-SCHEDULE;
• “Find FLT# and price raised by 10%”
SELECT FLT#, PRICE*1.1
FROM FLT-SCHEDULE;
Interactive DML
- where clause
• “Find FLT# and price in FLT-SCHEDULE
for flights out of Atlanta”
SELECT FLT#, PRICE
FROM FLT-SCHEDULE
WHERE FROM-AIRPORTCODE=“ATL”;
Interactive DML
- from clause
• “Find FLT#, WEEKDAY, and FROMAIRPORTCODE in FLT-WEEKDAY and FLTSCHEDULE”
SELECT FLT-SCHEDULE.FLT#,
WEEKDAY, FROM-AIRPORTCODE
FROM FLT-WEEKDAY, FLT-SCHEDULE
WHERE FLT-WEEKDAY.FLT# = FLT-SCHEDULE.FLT#;
• dot-notation disambiguates FLT# in FLTWEEKDAY and FLT-SCHEDULE
Interactive DML
- tuple variables
• alias definition:
SELECT S.FLT#, WEEKDAY, T.FROM-AIRPORTCODE
FROM FLT-WEEKDAY S, FLT-SCHEDULE T
WHERE S.FLT# = T.FLT#;
• S and T are
tuple variables
Interactive DML
- tuple variables
• SQL’s heritage as a tuple calculus language shows
• tuple variables are useful when one relation is used
“twice” in a query:
SELECT S.FLT#
FROM FLT-SCHEDULE S, FLT-SCHEDULE T
WHERE S.PRICE > T.PRICE
AND T.FLT# = “DL212”;
Interactive DML
- string matching
• wildcard searches use:
%: matches any substring
_: matches any character
SELECT S.FLT#, WEEKDAY
FROM FLT-WEEKDAY S, FLT-SCHEDULE T
WHERE S.FLT# = T.FLT#
AND T.AIRLINE LIKE “%an%”;
• “%an%” matches American, Airtran, Scandinavian,
Lufthansa, PanAm...
• “A%” matches American, Airtran, ...
• “
%” matches any string with at least three characters
Interactive DML
- ordering of rows
• the order by clause orders the rows in a query
result in ascending (asc) or descending (desc)
order
• “Find FLT#, airline, and price from FLTSCHEDULE for flights out of Atlanta ordered by
ascending airline and descending price:”
SELECT FLT#, AIRLINE, PRICE
FROM FLT-SCHEDULE
WHERE FROM-AIRPORTCODE=“ATL”
ORDER BY AIRLINE ASC, PRICE DESC;
Interactive DML - set operations
S
T
T
S
S union T
• “Find FLT# for flights on Tuesdays in FLT-WEEKDAY and
FLT# with more than 100 seats in FLT-INSTANCE ”
SELECT FLT#
FROM FLT-WEEKDAY
WHERE WEEKDAY = “TU”
UNION
SELECT FLT#
FROM FLT-INSTANCE
WHERE #AVAIL-SEATS > 100;
• UNION ALL
preserves duplicates
Interactive DML - set operation
S
T
S
T
S intersect T
• “Find FLT# for flights on Tuesdays in FLT-WEEKDAY
with more than 100 seats in FLT-INSTANCE”
SELECT FLT#
FROM FLT-WEEKDAY
WHERE WEEKDAY = “TU”
INTERSECT
SELECT FLT#
FROM FLT-INSTANCE
WHERE #AVAIL-SEATS > 100;
• INTERSECT ALL
preserves duplicates
Interactive DML - set operation
S\T
S
T
S minus T
• “Find FLT# for flights on Tuesdays in FLT-WEEKDAY
except FLT# with more than 100 seats in FLT-INSTANCE”
SELECT FLT#
FROM FLT-WEEKDAY
WHERE WEEKDAY = “TU”
EXCEPT
SELECT FLT#
FROM FLT-INSTANCE
WHERE #AVAIL-SEATS > 100;
• EXCEPT ALL preserves duplicates
Interactive DML
- built-in functions
• count (COUNT), sum (SUM), average (AVG), minimum (MIN),
maximum (MAX)
• “Count flights scheduled for Tuesdays from FLT-WEEKDAY”
SELECT COUNT( *)
FROM FLT-WEEKDAY
WHERE WEEKDAY = “TU”;
• “Find the average ticket price by airline from FLT-SCHEDULE”
SELECT AIRLINE, AVG(PRICE)
FROM FLT-SCHEDULE
GROUP BY AIRLINE;
Interactive DML
- built-in functions
• “Find the average ticket price by airline for scheduled
flights out of Atlanta for airlines with more than 5
scheduled flights out of Atlanta from FLT-SCHEDULE”
SELECT AIRLINE, AVG(PRICE)
FROM FLT-SCHEDULE
WHERE FROM-AIRPORTCODE = “ATL”
GROUP BY AIRLINE
HAVING COUNT (FLT#) >= 5;
• “Find the highest priced flight(s) out of Atlanta from FLTSCHEDULE”
SELECT FLT#, MAX(PRICE)
FROM FLT-SCHEDULE
WHERE FROM-AIRPORTCODE = “ATL”;
Interactive DML
- nested subqueries
• Set membership: IN, NOT IN
• “Find airlines from FLT-SCHEDULE where FLT# is
in the set of FLT#’s for flights on Tuesdays from
FLT-WEEKDAY”
SELECT DISTINCT AIRLINE
FROM FLT-SCHEDULE
WHERE FLT# IN
(SELECT FLT#
FROM FLT-WEEKDAY
WHERE WEEKDAY = “TU”);
Interactive DML
- nested subqueries
• “Find FLT#’s for flights on Tuesdays or Thursdays
from FLT-WEEKDAY”
SELECT DISTINCT FLT#
FROM FLT-WEEKDAY
WHERE WEEKDAY IN (“TU”, “TH”);
Interactive DML
- nested subqueries
• “Find FLT# for flights from Atlanta to Chicago
with a price that is lower than all flights from
Birmingham to Chicago”
SELECT FLT#
FROM FLT-SCHEDULE
WHERE FROM-AIRPORTCODE=“ATL”
AND TO-AIRPORTCODE=“CHI” AND PRICE <
ALL (SELECT PRICE
FROM FLT-SCHEDULE
WHERE FROM-AIRPORTCODE=“BIR”
AND TO-AIRPORTCODE=“CHI”);
Interactive DML - joins
• cross join: Cartesian product
• [inner] join: only keeps rows that satisfy the join condition
• left outer join: keeps all rows from left table; fills in nulls
as needed
• right outer join: keeps all rows from right table; fills in
nulls as needed
• full outer join: keeps all rows from both tables; fills in nulls
as needed
• natural or on-condition must be specified for all inner and
outer joins
• natural: equi-join on columns with same name; one
column preserved
Interactive DML - joins
• “Find all two-leg, one-day trips out of Atlanta; show
also a leg-one even if there is no connecting legtwo the same day”
SELECT X.FLT# LEG-ONE, Y.FLT# LEG-TWO
FROM
((FLT-SCHEDULE NATURAL JOIN FLT-INSTANCE) X
LEFT OUTER JOIN
(FLT-SCHEDULE NATURAL JOIN FLT-INSTANCE) Y
ON (X.TO-AIRPORTCODE=Y.FROM-AIRPORTCODE
AND X.DATE=Y.DATE AND X.ATIME<Y.DTIME))
WHERE X.FROM-AIRPORTCODE=“ATL”;
Interactive DML- recursive queries
• not in SQL2; maybe in SQL3...(?)
• “Find all reachable airports for multi-leg trips out of
Atlanta”
WITH
PAIRS AS SELECT FROM-AIRPORTCODE D, TO-AIRPORTCODE
A FROM FLT-SCHEDULE,
RECURSIVE REACHES(D, A) AS /*initially empty*/
PAIRS
UNION
(SELECT PAIRS.D, REACHES.A
FROM PAIRS, REACHES
WHERE PAIRS.A=REACHES.D)
SELECT A FROM REACHES WHERE D=“ATL”;
Interactive DML - insert, delete, update
INSERT INTO FLT-SCHEDULE
VALUES (“DL212”, “DELTA”, 11-15-00, “ATL”,
13-05-00, ”CHI”, 650, 00351.00);
INSERT INTO FLT-SCHEDULE(FLT#,AIRLINE)
VALUES (“DL212”, “DELTA”); /*default nulls added*/
• “Insert into FLT-INSTANCE all flights scheduled
for Thursday, 9/10/98”
INSERT INTO FLT-INSTANCE(FLT#, DATE)
(SELECT S.FLT#, 1998-09-10
FROM FLT-SCHEDULE S, FLT-WEEKDAY D
WHERE S.FLT#=D.FLT#
AND D.WEEKDAY=“TH”);
Interactive DML - insert, delete, update
“Cancel all flight instances for Delta on
9/10/98”
DELETE FROM FLT-INSTANCE
WHERE DATE=1998-09-10
AND FLT# IN
(SELECT FLT#
FROM FLT-SCHEDULE
WHERE AIRLINE=“DELTA”);
Interactive DML- insert, delete, update
“Update all reservations for customers on
DL212 on 9/10/98 to reservations on AA121
on 9/10/98”
UPDATE RESERVATION
SET FLT#=“AA121”
WHERE DATE=1998-09-10
AND FLT#=“DL212”;
Embedded DML - Overview
•
•
•
•
•
•
•
host languages
precompilation
impedance mismatch
database access
cursor types
fetch orientation
exception handling
Embedded DML - host languages
• SQL doesn’t do iteration, recursion, report
printing, user interaction, and SQL doesn’t
do Windows
• SQL may be embedded in host languages,
like COBOL, FORTRAN, MUMPS, PL/I, PASCAL, ADA,
C, C++, JAVA
• Or used from languages like APL
Embedded DML
- impedance mismatch
• SQL is a powerful, set-oriented, declarative
language
• SQL queries return sets of rows
• host languages cannot handle large sets of
structured data
• cursors resolve the mismatch:
• Demo
Views - definition, use, update
• a view is a virtual table
• how a view is defined:
CREATE VIEW ATL-FLT
AS SELECT FLT#, AIRLINE, PRICE
FROM FLT-SCHEDULE
WHERE FROM-AIRPORTCODE = “ATL”;
• how a query on a view is written:
SELECT *
FROM ATL-FLT
WHERE PRICE <= 00200.00;
Views - definition, use, update
• how a query on a view is computed:
SELECT FLT#, AIRLINE, PRICE
FROM FLT-SCHEDULE
WHERE FROM-AIRPORTCODE=“ATL”
AND PRICE<00200.00;
• how a view definition is dropped:
DROP VIEW ATL-FLT [RESTRICT|CASCADE];
Views - definition, use, update
• views inherit column names of the base
tables they are defined from
• columns may be explicitly named in the
view definition
• column names must be named if inheriting
them causes ambiguity
• views may have computed columns, e.g.
from applying built-in-functions; these must
be named in the view definition
Views - definition, use, update
these views are not updatable
CREATE VIEW ATL-PRICES
AS SELECT AIRLINE, PRICE
FROM FLT-SCHEDULE
WHERE FROM-AIRPORTCODE=“ATL”;
CREATE VIEW AVG-ATL-PRICES
AS SELECT AIRLINE, AVG(PRICE)
FROM FLT-SCHEDULE
WHERE FROM-AIRPORTCODE=“ATL”
GROUP BY AIRLINE;
this view is theoretically updatable, but
cannot be updated in SQL
CREATE VIEW FLT-SCHED-AND-DAY
AS SELECT S.*, D.WEEKDAY
FROM FLT-SCHEDULE S, FLT-WEEKDAY D
WHERE D.FLT# = S.FLT#;
Views - definition, use, update
a view is updatable if and only if:
• it does not contain any of the keywords JOIN, UNION, INTERSECT,
EXCEPT
• it does not contain the keyword DISTINCT
• every column in the view corresponds to a uniquely identifiable base
table column
• the FROM clause references exactly one table which must be a base
table or an updatable view
• the table referenced in the FROM clause cannot be referenced in the
FROM clause of a nested WHERE clause
• it does not have a GROUP BY clause
• it does not have a HAVING clause
updatable means insert, delete, update all ok
Views - definition, use, update
CREATE VIEW LOW-ATL-FARES /*updatable view*/
AS SELECT *
FROM FLT-SCHEDULE
WHERE FROM-AIRPORTCODE=“ATL”
AND PRICE<00200.00;
UPDATE
LOW-ATL-FARES /*moves row
*/
SET PRICE = 00250.00
/* outside the view*/
WHERE TO-AIRPORTCODE = “BOS”;
INSERT INTO LOW-ATL-FARES /*creates row
*/
VALUES (“DL222”, ”DELTA”,
/*outside the view*/
”BIR”, 11-15-00, ”CHI”, 13-05-00, 00180.00);
Integrity - constraints
• constraint: a conditional expression required not to
evaluate to false
• a constraint cannot be created if it is already violated
• a constraint is enforced from the point of creation forward
• a constraint has a unique name
• if a constraint is violated its name is made available to the
user
• constraints cannot reference parameters or host
variables; they are application independent
• data type checking is a primitive form of constraint
Integrity - domain constraints
• associated with a domain; applies to all columns
defined on the domain
CREATE DOMAIN WEEKDAY CHAR(2)
CONSTRAINT IC-WEEKDAY
CHECK (VALUE IN
( “MO”, “TU”, “WE”, “TH”, “FR”, “SA”, “SU”));
CREATE DOMAIN PRICE DECIMAL(7,2)
CONSTRAINT IC-PRICE
CHECK (VALUE > 00000.00 );
CREATE DOMAIN FLT# CHAR(5)
CONSTRAINT IC-FLT#
CHECK (VALUE NOT NULL);
Integrity - base table, column constraints
• associated with a specific base table
CREATE TABLE AIRLINE.FLT-SCHEDULE
(FLT#
FLIGHTNUMBER NOT NULL,
AIRLINE
VARCHAR(25),
FROM-AIRPORTCODE
AIRPORT-CODE,
DTIME
TIME,
TO-AIRPORTCODE AIRPORT-CODE,
ATIME
TIME,
CONSTRAINT FLTPK PRIMARY KEY (FLT#),
CONSTRAINT FROM-AIRPORTCODE-FK
FOREIGN KEY (FROM-AIRPORTCODE)
REFERENCES AIRPORT(AIRPORTCODE)
ON DELETE SET NULL ON UPDATE CASCADE,
FOREIGN KEY (FROM-AIRPORTCODE)
REFERENCES AIRPORT(AIRPORTCODE)
ON DELETE SET NULL ON UPDATE CASCADE,
CONSTRAINT IC-DTIME-ATIME
CHECK DTIME < ATIME);
Integrity
- general constraints
• applies to an arbitrary combination of columns and
tables
• connecting RESERVATIONS for a customer must
make sense:
CREATE ASSERTION IC-CONNECTING-FLIGHTS
CHECK (NOT EXISTS
(SELECT *
FROM FLT-SCHEDULE FS1 FS2, RESERVATION R1 R2
WHERE FS1.FLT#=R1.FLT#
AND FS2.FLT#=R2.FLT#
AND R1.DATE=R2.DATE
AND FS1.TO-AIRPORTCODE=FS2.FROM-AIRPORTCODE
AND FS1.ATIME+ INTERVAL “30” MINUTE
> FS2.DTIME));
Integrity - (not so) general constraints
• not all constraints can be specified
CREATE TABLE AIRLINE.FLT-WEEKDAY
(FLT#
FLIGHTNUMBER NOT NULL,
WEEKDAY CHAR(2),
.... ));
CREATE TABLE AIRLINE.FLT-INSTANCE
(FLT#
FLIGHTNUMBER NOT NULL,
DATE
DATE
NOT NULL,
.... ));
CREATE ASSERTION DATE-WEEKDAY-CHECK
(NOT EXISTS
(SELECT *
FROM FLT-INSTANCE FI, FLT-WEEKDAY FSD
WHERE FI.FLT#=FSD.FLT#
AND weekday-of(FI.DATE) <> FSD.WEEKDAY));
• weekday-of: DATE
WEEKDAY
Transaction Control
• atomic, consistent, isolated, durable (ACID)
transactions are supported by:
– COMMIT and
– ROLLBACK
EXEC SQL OPEN FLT;
WHILE TRUE DO
EXEC SQL FETCH FLT
INTO :FLT#, :AIRLINE, :PRICE;
DO YOUR THING WITH THE DATA;
END-WHILE;
EXEC SQL CLOSE FLT;
QUIT: IF SQLCODE < 0 THEN EXEC SQL ROLLBACK
ELSE EXEC SQL COMMIT;
Authorization
• Discretionary Access Control (DAC) is supported
by GRANT and REVOKE:
GRANT <privileges>
ON <table>
TO <users>
[WITH GRANT OPTION];
REVOKE [GRANT OPTION FOR] <privileges>
ON <table>
FROM <users> {RESTRICT | CASCADE};
<privileges>: SELECT, INSERT(X), INSERT,
UPDATE(X), UPDATE, DELETE
CASCADE: revoke cascades through its subtree
RESTRICT: revoke succeeds only if there is no subtree
Authorization
GRANT INSERT, DELETE
ON FLT-SCHEDULE
TO U1, U2
WITH GRANT OPTION;
GRANT UPDATE(PRICE)
ON FLT-SCHEDULE
TO U3;
REVOKE GRANT OPTION FOR DELETE
ON FLT-SCHEDULE
FROM U2 CASCADE;
REVOKE DELETE
ON FLT-SCHEDULE
FROM U2 CASCADE;
Catalog and Dictionary Facilities
• an INFORMATION_SCHEMA contains the following
tables (or rather views) for the CURRENT_USER:
– INFORMATION-_SCHEMA_CATALOG_NAME: single-row, singlecolumn table with the name of the catalog in which the
INFORMATION_SCHEMA resides
– SCHEMATA created by CURRENT_USER
– DOMAINS accessible to CURRENT_USER
– TABLES accessible to CURRENT_USER
– VIEWS accessible to CURRENT_USER
– COLUMNS of tables accessible to CURRENT_USER
– TABLE_PRIVILEGES granted by or to CURRENT_USER
– COLUMN_PRIVILEGES granted by or to CURRENT_USER
– USAGE_PRIVILEGES granted by or to CURRENT_USER
– DOMAIN_CONSTRAINTS
– TABLE_CONSTRAINTS
– REFERENTIAL_CONSTRAINTS
– CHECK_CONSTRAINTS
– and 18 others ...
Structure of a DBMS
• A typical DBMS has a layered
architecture.
• The figure does not show the
concurrency control and recovery
components.
• Each system has its own variations.
• The book shows a somewhat more
detailed version.
• You will see the “real deal” in
PostgreSQL.
– It’s a pretty full-featured
example
• Next class: we will start on this
stack, bottom up.
These layers
must consider
concurrency
control and
recovery
Query Optimization
and Execution
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management
DB
TRANSACTION CONTROL
Transaction control
•
•
•
•
•
•
•
•
Transaction Concept
Concurrent Execution
Conflict Serializability
Locking
SQL 92 Consistency Levels
Oracle multi version concurrency control
Transaction Log
Crash Recovery
Transaction concept
• A sequence of SQL statements that form a
logical unit of work.
• Changes the database from one logically
consistent state to another. E.g delete order
– DB consistent
•
–
DB inconsistent
•
–
delete from order_header where ono = 123;
RI failure now
delete from order_line where ono = 123;
DB consistent
No unmatched FK’s
Transaction properties
• Atomicity
–
All SQL complete or none are completed.
• Consistency
–
Programmer responsibility.
• Isolation
Multiple transactions execute in isolation.
• Durability
– Successful commits survive system failures.
–
Transaction State Diagram
Partially
Committed
Committed
Failed
Rollback
Active
User Action
System Action
Transaction states
• Active
- state whilst executing
• Partially Committed
–
SQL commit statement execution started.
• Failed
–
Normal execution cannot proceed
• Committed
• Rollback
undone
- changes guaranteed to stay
- changes guaranteed
SQL Transaction statements
• The first SQL statement starts a
transaction that is terminated by
– commit;
• makes all changes permanent
– rollback;
• removes all changes
– A DDL command
– A system generated rollback
– Quitting the session
Commit at regular intervals
• This will keep your transactions short
– avoids wasting system resources
– prevents loss of work if system generated
rollback occurs
– allows use of rollback to undo user mistakes
Benefits from concurrent
execution of transactions
• Interleaving of CPU and I/O
– Allows different transactions to execute in
parallel.
– Increases system utilisation and throughput.
• Allows long and short running transactions to
make progress if not accessing the same data.
– Alternative is to serialise and then the short
transactions have long waits.
Database Reads and Writes
• Each transaction can be regarded as a
sequence of database reads and writes.
• Concurrent transactions only conflict when
accessing the same data.
• Local variables are in separate address
space so no conflict.
• Computation on local variables only
explains how the write value is formed.
Lost Update ( SQL version )
U1
Time
select val = current from grant;
U2
t1
val = val + 500;
t2
select val = current from grant;
val = val * 1.1;
update grant set current = val;
t3
t4
update grant set current = val;
Database Table grant(current)
@t1 2000 @t3 2500 @t4 2200
Lost Update ( Read/Write)
U1
Read(current)
Time
U2
t1
val = val + 500;
t2
Read(current)
val = val * 1.1;
Write(current)
t3
t4
Write(current)
Database Table grant(current)
@t1 2000 @t3 2500 @t4 2200
Lost Update Serial Schedule 1
U1
Read(current)
Time
U2
t1
val = val + 500;
Write(current)
t2
t3
Read(current)
val = val * 1.1;
t4
Write(current)
Database Table grant(current)
@t1 2000 @t2 2500 @t4 2750
Lost Update Serial Schedule 2
U1
Time
Database Table grant(current)
U2
Definitions
• A Schedule is a sequence of reads/writes of a
set of concurrent users that preserves the order
of the reads/writes of the individual users. E.g.
Lost Update(Read/Write) slide
• A serial schedule is a schedule where the
reads/writes of each users are executed
consecutively without any interleaving of
different users reads/writes E.g. Lost Update
Serial Schedule 1 slide
Database schedules
U1
U2
read A
U1
U2
read A
read A
U1
U2
write A
write A
U1
U2
write A
read A
write A
Consider the effect of reversing the interleaved sequence U1 U2.
Only the two read A may be interchanged without effect - rest are
conflicting actions.
I.e. the new sequence changes the value read or the value after the
write.
Read A Write B are not conflicting as on different values.
Conflict Serialisability
• Schedule S is conflict serialisable if it is
conflict equivalent to a serial schedule.
• Two schedules are conflict equivalent if
one can be transformed into the other by a
series of swaps of non conflicting actions.
Is the Lost Update schedule
Conflict Serialisable ?
Conflict Schedule
U1
Serial Schedule
U2
U1
read A
read A
read A
write A
?
write A
read A
write A
.
U2
write A
A Conflict Serialisable schedule?
U1
U2
U1
read A
read A
write A
write A
read A
write A
?
U2
read B
write B
read B
read A
write B
write A
read B
read B
write B
write B
Conflict Serialisable
schedules via locking
• Each transaction must obtain a lock before accessing a
data value.
– Shared lock granted allows read access.
– eXclusive lock granted allows write access.
• Locks are requested from and granted by the Lock
Manager process.
• Ensures that any interleaving of concurrent transactions
produces a conflict serialisable schedule.
Lock Compatibility Matrix
The lock manager grants lock requests by reference to:
Lock already granted on data A.
Shared
eXclusive
Lock requested
Shared
Yes
No
on data A
eXclusive
No
No
If table entry Yes then lock types are compatible and requested
lock is granted.
If table entry No then requesting transaction is placed into a wait
state until the lock holding transaction(s) have completed and
returned the lock.
Rigorous Two Phase Locking
• Growing Phase = Active state.
Locks may be requested but not dropped.
• Shrinking Phase = Partially committed or failed
– Locks are dropped but not requested.
–
• Thus all locks held until end of transaction
signalled by rollback or commit
Lost Update Locking solution
T1
Lock Manager
T2
request S (A)
granted
read A
request S (A)
granted
read A
request X (A)
Waiting for lock
request X (A)
Waiting for lock
Lost Update Locking solution
(continued from previous slide)
T1
Lock Manager
T2
request X (A)
request X (A)
Deadlock - System aborted
Drops S (A) lock
write A
granted
Restarted by programmer.
Transactions have been serialised.
commit locks dropped
Summary of Lock Requests
T1
T2
S(A)
S(A)
W(A)
W(A)
Wait for grant of lock held by T pointed to
Why use W locks on reads for
values about to be updated ?
T1
T2
Wait for grant of lock held by T pointed to
Why use W locks on reads for
values about to be updated ?
T1
T2
W(A)
W(A)
waiting for W lock
continues
avoids deadlocking
Wait for grant of lock held by T pointed to
Effect of locking
• Lock manager prevents writers and
readers accessing the same data item
concurrently.
• Thus a schedule containing conflicting
actions cannot occur.
• Thus all allowed schedules are conflict
serialisable.
Lock starvation
• Writers may not make progress when
many Readers are accessing the same
data item.
• Prevent by not granting further Shared
lock requests when an eXclusive request
is waiting.
• Then process waiting requests in FIFO
order.
Readers Wait for Writers
T1
T2
T3
T4
read A
read A
write A start wait
read A start wait
commit
commit
end wait
commit
end wait
Time
Deadlock Detection
• Lock Manager builds wait-for graph for
all the transactions in process.
• Deadlock exists if the graph contains a
cycle. Detection algorithm run frequently.
• Lost Update Wait-for graph showing cycle.
T1
T2
Effect of locking on database
performance
• Readers and Writers concurrently
accessing the same data item under Two
Phase locking will be forced to serialise I.e
wait until exclusive access is possible.
• Users will experience delayed responses.
• To improve performance reduced levels of
locking (or consistency of final results) are
available. Use with caution.
SQL 92 Consistency Levels
- From highest to lowest level
• Serializable i.e Two Phase Locking
• Repeatable Read.
– phantom reads possible
• Read Committed
– read locks dropped after data has been read.
– Non repeatable & phantom reads possible.
• Read Uncommitted
(Lowest Level)
– reads data that still have write locks in place.
– dirty read possible
Transaction interactions
between T1 and T2
• dirty read
– T1 can read uncommitted data from T2
• non repeatable read
– T1 re-reads data committed by T2 and now
sees the new data value
• phantom read
– T1 re-executes a query and discovers new
data inserted or changed by committed T2
Non Repeatable Read
emp id dept_id
dept dept_id name
d1
T1
Sales
T2
insert into emp select e1, dept_id from dept
where name = ‘Sales’
update dept set dept_id = ‘d2’
where dept_id = ‘d1’;commit;
insert into emp select e2, dept_id from dept
where name = ‘Sales’; commit;
Non Repeatable Read - locking
at Read Committed level
T1
acquire locks X(emp) S(dept)
insert ……...
drop lock S(dept) *
acquire lock S(dept)
insert …...; commit;
drop locks X(emp) S(dept)
T2
acquire lock X(dept)
update ……..; commit;
drop lock X(dept)
Is this problem possible under
Rigorous Two Phase Locking?
T1
T2
Wait for grant of lock held by T pointed to
Inconsistent Analysis /
Phantom Read
T1
Stock
pno
qty
T2
select sum(qty) from stock;
update stock set qty = 2
1
2
where pno = 2;
2
4
3
1
1
update stock set qty = 3
4
5
5
where pno = 5;
5
1
commit;
2
2
3
Actual 13
4
3
Reported 15
Inconsistent Analysis /
Phantom Read locking analysis
• T2 For each row acquire lock S(stock) ,
read value, drop lock S(stock)
• T1 Allows lock X(stock) to be acquired on
previously read row Completes updates.
Drops X(stock) lock.
• T2 acquires S(stock) and so reads both
old and new values causing the problem.
Is this problem possible under
Rigorous Two Phase Locking?
T1
T2
Wait for grant of lock held by T pointed to
Uncommitted / Dirty Read
account
acc_id
1
balance
-2000
T1
T2
update account set balance = 10000
where acc_id = 1;
select balance from account
where acc_id = 1;
passes credit check
rollback;
Uncommitted / Dirty Read
locking analysis
• Locking at Read Uncommitted Level which
effectively means Lock Compatability
Matrix is changed.
• Now Shared and eXclusive locks changes
from No to Yes.
• Note Concurrent Writes are still not
allowed.
Is this problem possible under
Rigorous Two Phase Locking?
T1
T2
Wait for grant of lock held by T pointed to
Lock granularity
• Table locking provides least concurrency
• Disk Page locking
– Most common level. The number of rows
locked depends on row size/page size ratio.
• Row locking provides maximum
concurrency
– Commonly used for on line transactions
– Highest system cost as requires most locks
Transaction Locking and
Chrash recovery
Transaction Log - Immediate
Update of row data on Disk
Transaction
Write Ahead Log on Disk
T0 start
update stock set qty = 100 where p# = 1;
T0, 2.0, qty, 200, 100
delete from stock where p# = 2;
T0, 2.1, p#, 2,, qty, 100,
insert into stock values( 3, 600);
T0, 2.2, p#,, 3, qty,, 600
commit;
T0 commit
Disk Page 2
offset 0
1
100 200
holds stock values
offset 1
2
100
offset 2
3
600
Transaction Log sequence
• Write Ahead Protocol means BI,AI and
other log entries are written before the
data row on disk is changed.
• If transaction ends with a commit no
further action is required.
• If ended with a rollback then system must
apply BI to row data to restore the original
values on disk.
Transaction Log entries
T0
T1
Database Values
read a
read c
a
b
c
a = a - 50
c = c - 100
1000
2000
700
write a
write c
read b
b = b + 50
write b
Write out log assuming execute in order T0 T1 and show Database
Values assuming Immediate Update of database values.
Transaction log entries
Log
Database Values
a
b
c
1000
2000
700
Transaction log entries
Log
Database Values
T0 start
a
b
c
950
2000
700
T0 a 1000 950
T0 b 2000 2050
1000
2050
T0 commit
T1 start
T1 c 700 600
T1 commit
600
Immediate Update of row data in memory
update stock set qty = qty * 2
Row Data Buffer Page
Memory
Log Buffer Page
p#
qty
T1, 9.1, qty,10,20
1
10 20
T1, 9.2, qty, 40, 80
2
40 80
Stock Database Page
flush on
before row
Disk
commit/page full
data buffer flushed
T1, 9.1, qty,10,20
1
10
T1, 9.2, qty, 40, 80
2
40
Log File Page
Log Based Recovery
• System crash loses memory buffer pages.
• Committed changes may not be on disk.
–
Row data Buffer pages not yet flushed out.
• Non committed changes may be on disk.
–
Row data Buffer pages already flushed out.
• WAL ensures that Log pages are already
on disk ( a different disk from the table
data!) before row data buffer pages
flushed to disk
Recovery Processing
• Read log forwards from the start putting
all transactions found onto one of the lists:
–
–
Undo transactions with no commit
terminator
Redo transaction with a commit terminator
• Go backwards from crash point ‘undoing’
–
Using Before Images
• Go forwards from start ‘redoing’
–
Using After Images
Database recovery
Using the data from the Transaction Log slide show the recovered
database values. Assume failure occurred
i
after write b
ii
after write c
Database recovery
after write b
Log
Database Values
T0 start
a
T0 a 1000 950
1000 950
b
c
2000
700
T0 b 2000 2050
2050
recovery starts by reading from start Undo list
Redo list
Database recovery
after write b
Log
Database Values
T0 start
a
b
c
950
2000
700
T0 a 1000 950
1000
T0 b 2000 2050
2050
recovery starts by reading from start Undo list
Redo list
T0
Undo T0 b 2000 2050 by applying BI
Undo T0 a 1000 950
Database values as at start
950 1000
No entries as no ends
2050 2000
Database recovery
after write c
recovery starts by reading from start Undo list
Redo list
T0 start
T0 a 1000 950
T0 b 2000 2050
Database Values
T0 commit
a
b
c
T1 start
950
2050
600
T1 c 700 600
Database recovery
after write c
recovery starts by reading from start T Undo list
T Redo list
T0 start
T0
T0 a 1000 950
T1
T0 b 2000 2050
Database Values
T0 end
a
b
c
T1 start
?
?
?
T1 c 700 600
redo 950
redo 2050
undo
700
Why undo backwards & redo
forwards through log?
Log
BI AI
Database Values
T0 a
10 20
a
T0 a
20 10
10
20
10
if undo backwards using BI
?
20
10
if redo forwards using AI
?
20
10
Checkpoint
• Flushes all log buffer pages to disk.
• Flushes all modified data buffer pages to
disk.
• Writes a log record with list of open
transactions.
• Recovery takes place from the latest
checkpoint record in the log.
–
Reduces recovery time
Disk Crash Recovery
• Assumes database dumps are taken
periodically.
–
Flushes all buffers; copies database files;
writes out a dump record.
• Load latest database dump onto disk.
• Using log started immediately after the
dump which has been loaded rollforward
‘redoing’ all committed transactions.
The End