SQL Introduction

Download Report

Transcript SQL Introduction

Lecture #4
October 19, 2000
SQL
Administration
• Exam date officially moved to December
7th, 6:30pm, here.
• Homework #3 – will be on the web site
tomorrow. Due in two weeks. SQL.
• Accounts are being set up.
• Projects, projects, projects.
SQL Introduction
Standard language for querying and manipulating data
Structured Query Language
Many standards out there: SQL92, SQL2, SQL3, SQL99
Vendors support various subsets of these, but all of what we’ll
be talking about.
Basic form: (many many more bells and whistles in addition)
Select attributes
From relations (possibly multiple, joined)
Where conditions (selections)
Selections
SELECT *
FROM Company
WHERE country=“USA” AND stockPrice > 50
You can use:
attribute names of the relation(s) used in the FROM.
comparison operators: =, <>, <, >, <=, >=
apply arithmetic operations: stockprice*2
operations on strings (e.g., “||” for concatenation).
Lexicographic order on strings.
Pattern matching: s LIKE p
Special stuff for comparing dates and times.
Projections
Select only a subset of the attributes
SELECT name, stock price
FROM Company
WHERE country=“USA” AND stockPrice > 50
Rename the attributes in the resulting table
SELECT name AS company, stockprice AS price
FROM Company
WHERE country=“USA” AND stockPrice > 50
Ordering the Results
SELECT name, stock price
FROM Company
WHERE country=“USA” AND stockPrice > 50
ORDERBY country, name
Ordering is ascending, unless you specify the DESC keyword.
Ties are broken by the second attribute on the ORDERBY list, etc.
Joins
SELECT name, store
FROM
Person, Purchase
WHERE per-name=buyer AND city=“Seattle”
AND product=“gizmo”
Product ( pname, price, category, maker)
Purchase (buyer, seller, store, product)
Company (cname, stock price, country)
Person( per-name, phone number, city)
Disambiguating Attributes
Find names of people buying telephony products:
SELECT Person.name
FROM
Person, Purchase, Product
WHERE
Person.name=buyer
AND product=Product.name
AND Product.category=“telephony”
Product ( name, price, category, maker)
Purchase (buyer, seller, store, product)
Person( name, phone number, city)
Tuple Variables
Find pairs of companies making products in the same category
SELECT product1.maker, product2.maker
FROM
Product AS product1, Product AS product2
WHERE
product1.category=product2.category
AND product1.maker <> product2.maker
Product ( name, price, category, maker)
First Unintuitive SQLism
SELECT R.A
FROM R,S,T
WHERE R.A=S.A OR R.A=T.A
Looking for R  (S  T)
But what happens if T is empty?
Union, Intersection, Difference
(SELECT name
FROM
Person
WHERE City=“Seattle”)
UNION
(SELECT name
FROM
Person, Purchase
WHERE buyer=name AND store=“The Bon”)
Similarly, you can use INTERSECT and EXCEPT.
You must have the same attribute names (otherwise: rename).
Exercises
Product ( pname, price, category, maker)
Purchase (buyer, seller, store, product)
Company (cname, stock price, country)
Person( per-name, phone number, city)
Ex #1: Find people who bought telephony products.
Ex #2: Find names of people who bought American products
Ex #3: Find names of people who bought American products and did
not buy French products
Ex #4: Find names of people who bought American products and they
live in Seattle.
Ex #5: Find people who bought stuff from Joe or bought products
from a company whose stock prices is more than $50.
Subqueries
SELECT Purchase.product
FROM Purchase
WHERE buyer =
(SELECT name
FROM Person
WHERE social-security-number = “123 - 45 - 6789”);
In this case, the subquery returns one value.
If it returns more, it’s a run-time error.
Subqueries Returning Relations
Find companies who manufacture products bought by Joe Blow.
SELECT Company.name
FROM
Company, Product
WHERE Company.name=maker
AND Product.name IN
(SELECT product
FROM Purchase
WHERE buyer = “Joe Blow”);
You can also use: s > ALL R
s > ANY R
EXISTS R
Conditions on Tuples
SELECT Company.name
FROM
Company, Product
WHERE Company.name=maker
AND (Product.name,price) IN
(SELECT product, price)
FROM Purchase
WHERE buyer = “Joe Blow”);
Correlated Queries
Find movies whose title appears more than once.
SELECT title
FROM Movie AS Old
WHERE year < ANY
(SELECT year
FROM Movie
WHERE title = Old.title);
Movie (title, year, director, length)
Movie titles are not unique (titles may reappear in a later year).
Note scope of variables
Removing Duplicates
SELECT DISTINCT Company.name
FROM
Company, Product
WHERE Company.name=maker
AND (Product.name,price) IN
(SELECT product, price)
FROM Purchase
WHERE buyer = “Joe Blow”);
Conserving Duplicates
The UNION, INTERSECTION and EXCEPT operators
operate as sets, not bags.
(SELECT name
FROM
Person
WHERE City=“Seattle”)
UNION ALL
(SELECT name
FROM
Person, Purchase
WHERE buyer=name AND store=“The Bon”)
Grouping and Aggregation
Product
Date
Price
Quantity
Bagel
10/21
0.85
15
Banana
10/22
0.52
7
Banana
10/19
0.52
17
Bagel
10/20
0.85
20
Example 1: find total sales for the entire database
Simple Aggregation
SELECT Sum(price * quantity)
FROM
Purchase
SELECT Sum(price * quantity)
FROM
Purchase
WHERE product = ‘bagel’
SQL supports several aggregation operations:
SUM, MIN, MAX, AVG, COUNT
Except COUNT, all aggregations apply to a single attribute
Grouping and Aggregation
Product
Date
Price
Quantity
Bagel
10/21
0.85
15
Banana
10/22
0.52
7
Banana
10/19
0.52
17
Bagel
10/20
0.85
20
Example 2: find total sales per product.
Solution: Two Steps
First: group the entries by product.
Product
Date
Price
Quantity
Banana
10/19
0.52
17
Banana
10/22
0.52
7
Bagel
10/20
0.85
20
Bagel
10/21
0.85
15
Example 2: find total sales per product.
Then, aggregate
Product
TotalSales
Bagel
$29.75
Banana
$12.48
SELECT
product, Sum(price * quantity) AS TotalSales
FROM
Purchase
GROUP BY product
Another Example
Product
SumSales
MaxQuantity
Banana
$12.48
17
Bagel
$29.75
20
For every product, what is the total sales and max quantity sold?
SELECT
product, Sum(price * quantity) AS SumSales
Max(quantity) AS MaxQuantity
FROM
Purchase
GROUP BY product
Grouping and Aggregation:
Summary
SELECT
product, Sum(price)
FROM
Product, Purchase
WHERE
Product.name = Purchase.product
GROUP BY Product.name
1. Compute the relation (I.e., the FROM and WHERE).
2. Group by the attributes in the GROUP BY
3. Select one tuple for every group (and apply aggregation)
SELECT can have (1) grouped attributes or (2) aggregates.
HAVING Clause
Same query, except that we consider only products that had
at least 100 buyers.
SELECT
product, Sum(price * quantity)
FROM
Purchase
GROUP BY product
HAVING
Sum(quantity) > 30
HAVING clause contains conditions on aggregates.
Exercises
Product ( pname, price, category, maker)
Purchase (buyer, seller, store, product)
Company (cname, stock price, country)
Person( per-name, phone number, city)
Ex #6: What is the total sales of products by country of manufacture?
Ex #7: Find the total revenue from products of which 100,000
units or more have been sold.
Modifying the Database
We have 3 kinds of modifications: insertion, deletion, update.
Insertion: general form -INSERT INTO R(A1,…., An) VALUES (v1,…., vn)
Insert a new purchase to the database:
INSERT INTO Purchase(buyer, seller, product, store)
VALUES (‘Joe’, ‘Fred’, ‘wakeup-clock-espresso-machine’,
‘The Sharper Image’)
If we don’t provide all the attributes of R, they will be filled with NULL.
We can drop the attribute names if we’re providing all of them in order.
More Interesting Insertions
INSERT INTO PRODUCT(name)
SELECT DISTINCT product
FROM Purchase
WHERE product NOT IN
(SELECT name
FROM Product)
The query replaces the VALUES keyword.
Note the order of querying and inserting.
Deletions
DELETE FROM
WHERE
PURCHASE
seller = ‘Joe’ AND
product = ‘Brooklyn Bridge’
Factoid about SQL: there is no way to delete only a single
occurrence of a tuple that appears twice
in a relation.
Updates
UPDATE PRODUCT
SET price = price/2
WHERE Product.name IN
(SELECT product
FROM Purchase
WHERE Date =‘Oct, 25, 2000’);
Data Definition in SQL
So far, SQL operations on the data.
Data definition: defining the schema.
• Create tables
• Delete tables
• Modify table schema
But first:
Define data types.
Finally: define indexes.
Data Types in SQL
• Character strings (fixed of varying length)
• Bit strings (fixed or varying length)
• Integer (SHORTINT)
• Floating point
• Dates and times
Domains will be used in table declarations.
To reuse domains:
CREATE DOMAIN address AS VARCHAR(55)
You wish you had richer types, no? (hang on…)
Creating Tables
CREATE
TABLE Person(
name
social-security-number
age
city
gender
Birthdate
);
VARCHAR(30),
INTEGER,
SHORTINT,
VARCHAR(30),
BIT(1),
DATE
Deleting or Modifying a Table
Deleting: DROP Person;
Altering: (adding or removing an attribute).
ALTER TABLE Person
ADD phone CHAR(16);
ALTER TABLE Person
DROP age;
What happens when you make changes to the schema?
Default Values
The default of defaults: NULL
Specifying default values:
CREATE
TABLE Person(
name
VARCHAR(30),
social-security-number INTEGER,
age
SHORTINT DEFAULT 100,
city VARCHAR(30) DEFAULT ‘Seattle’,
gender
CHAR(1) DEFAULT ‘?’,
Birthdate
DATE
Indexes
REALLY important to speed up query processing time.
Suppose we have a relation
Person (name, SSN, age, city)
An index on SSN enables us to fetch a tuple
for a given SSN very efficiently (not have to scan the whole relation).
The problem of deciding which indexes to put on the relations is
very hard! (it’s called: physical database design).
Creating Indexes
CREATE INDEX ssnIndex ON Person(SSN)
Indexes can be created on more than one attribute:
CREATE INDEX doubleindex ON
Person (age, city)
Why not create indexes on all the attributes?
Defining Views (!!)
Views are relations, except that they are not physically stored.
They are used to:
• simplify complex queries, and
• define distinct conceptual interfaces for different classes
of users.
Example view: purchases of telephony products.
CREATE VIEW telephony-purchases AS
SELECT product, buyer, seller, store
FROM Purchase, Product
WHERE Purchase.product = Product.pname
AND Product.category = ‘telephony’
The view is materialized when its results are stored in the DBMS.
A Different View
CREATE VIEW Seattle-view AS
SELECT buyer, seller, product, store
FROM Person, Purchase
WHERE Person.city = ‘Seattle’ AND
Person.per-name = Purchase.buyer
We can later use the views:
SELECT name, store
FROM
Seattle-view, Product
WHERE Seattle-view.product = Product.name AND
Product.category = ‘shoes’
What’s really happening when we query a view?? It’s unfolded.
Updating Views
How can I insert a tuple into a table that doesn’t exist?
CREATE VIEW bon-purchase AS
SELECT store, seller, product (note: buyer is not selected)
FROM
Purchase
WHERE store = ‘The Bon Marche’
If we make the following insertion:
INSERT INTO bon-purchase
VALUES (‘the Bon Marche’, ‘Joe’, ‘Denby Mug’)
We can simply add a tuple
(‘the Bon Marche’, ‘Joe’, NULL, ‘Denby Mug’)
to relation Purchase.
Non-Updatable Views
CREATE VIEW Seattle-view AS
SELECT seller, product, store
FROM Person, Purchase
WHERE Person.city = ‘Seattle’ AND
Person.name = Purchase.buyer
How can we add the following tuple to the view?
(‘Joe’, ‘Shoe Model 12345’, ‘Nine West’)
In principle, two tuples should be added to the database:
Person: (foo, NullPhoneNumber, ‘Seattle’)
Purchase: (foo, ‘Joe’, ‘Nine West’, ‘Shoe Model 12345’)
But it’s very hard to manage the foo’s later, so this update is not legal.
Reusing a Materialized View
• Suppose I have only the result of SeattleView:
SELECT buyer, seller, product, store
FROM Person, Purchase
WHERE Person.city = ‘Seattle’ AND
Person.per-name = Purchase.buyer
• and I want to answer the query
SELECT buyer, seller
FROM Person, Purchase
WHERE Person.city = ‘Seattle’ AND
Person.per-name = Purchase.buyer AND
Purchase.product=‘gizmo’.
Then, I can rewrite the query using the view.
Query Rewriting Using Views
Rewritten query:
SELECT buyer, seller
FROM
SeattleView
WHERE product= ‘gizmo’
Original query:
SELECT buyer, seller
FROM Person, Purchase
WHERE Person.city = ‘Seattle’ AND
Person.per-name = Purchase.buyer AND
Purchase.product=‘gizmo’.
Another Example
• I still have only the result of SeattleView:
SELECT buyer, seller, product, store
FROM Person, Purchase
WHERE Person.city = ‘Seattle’ AND
Person.per-name = Purchase.buyer
• but I want to answer the query
SELECT buyer, seller
FROM Person, Purchase
WHERE Person.city = ‘Seattle’ AND
Person.per-name = Purchase.buyer AND
Person.Phone LIKE ‘206 543 %’.
The General Problem
• Given a set of views V1,…,Vn, and a query
Q, can we answer Q using only the answers to
V1,…,Vn?
• Why do we care?
– We can answer queries more efficiently.
– We can query data sources on the WWW in a
principled manner.
• Many, many papers on this problem.
• The best performing algorithm: The MiniCon
Algorithm, (Pottinger & Levy, 2000).
• Great survey on the topic: (Halevy, 2000).
Querying the WWW
• Assume a virtual schema of the WWW, e.g.,
– Course(number, university, title, prof, quarter)
• Every data source on the web contains the
answer to a view over the virtual schema:
UW database: SELECT number, title, prof
FROM Course
WHERE univ=‘UW’ AND quarter=‘4/99’
Stanford database: SELECT number, title, prof, quarter
FROM Course
WHERE univ=‘Stanford’
User query: find all professors who teach “database systems”
SQL: Constraints and Triggers
• Chapter 6 Ullman and Widom
• Certain properties we’d like our database to
hold
• Modification of the database may break
these properties
• Build handlers into the database definition
• Key constraints
• Referential integrity constraints.
Keys: Fundamental Constraint
• In the CREATE TABLE statement, use:
– PRIMARY KEY, UNIQUE
CREATE TABLE MovieStar (
name CHAR(30) PRIMARY KEY,
address VARCHAR(255),
gender CHAR(1));
• Or, list at end of CREATE TABLE
PRIMARY KEY (name)
Keys...
• Can use the UNIQUE keyword in same way
– …but for any number of attributes
– foreign key only reference PRIMARY KEY
• Indexing Keys
CREATE UNIQUE INDEX YearIndex ON Movie(year)
• Makes insertions easier to check for key
constraints
• Subtle differences between PRIMARY KEY and UNIQUE
Foreign Key Constraints
ActedIn(ActorName, MovieName)
Movies(MovieName, year)
If MovieName in ActedIn is a foreign key for
Movies, then:
1) Foreign Key must be a reference to a valid
value in the referenced table.
2) … must be a PRIMARY KEY in the
referenced table.
Yes, this is a referential integrity constraint.
Declaring Foreign Key
Constraints
• REFERENCES keyword...
CREATE TABLE ActedIn (
Name CHAR(30) PRIMARY KEY,
MovieName CHAR(30)
REFERENCES Movies(MovieName));
• Or, summarize at end of CREATE TABLE
FOREIGN KEY MovieName REFERENCES
Movies(MovieName)
• MovieName must be a PRIMARY KEY
How do we Maintain them?
• Given a change to DB, there are several
possible violations:
– Insert new tuple with bogus foreign key value
– Update a tuple to a bogus foreign key value
– Delete a tuple in the referenced table with the
referenced foreign key value
– Update a tuple in the referenced table that
changes the referenced foreign key value
How to Maintain?
• Recall, ActedIn has FK MovieName...
Movies(MovieName, year)
(Fatal Attraction, 1987)
ActedIn(ActorName, MovieName)
(Michael Douglas, Fatal Attraction)
insert: (Rick Moranis, Strange Brew)
How to Maintain?
• Policies for handling the change…
– Reject the update (default)
– Cascade (example: cascading deletes)
– Set NULL
• Can set update and delete actions
independently in CREATE TABLE
MovieName CHAR(30)
REFERENCES Movies(MovieName))
ON DELETE SET NULL
ON UPDATE CASCADE
Constraining Attribute Values
• Constrain invalid values
– NOT NULL
– gender CHAR(1)
CHECK (gender IN (‘F’, ‘M’))
– MovieName CHAR(30)
CHECK (MovieName IN
(SELECT MovieName FROM Movies))
• The constraint is checked whenever the value of
the attribute is changed or added.
• Isn’t this the same as REFERENCE?
Constraining Values with User
Defined ‘Types’
• Can define new domains to use as the attribute
type...
CREATE DOMAIN GenderDomain CHAR(1)
CHECK (VALUE IN (‘F’, ‘M’));
• Then update our attribute definition...
gender GenderDomain
• Note use of VALUE to refer to the attribute value.
• Deferred constraints: deal with circular
referencial integrity constraints.
More Complex Constraints...
• …Among several attributes in one table
– Specify at the end of CREATE TABLE
CHECK (gender = ‘F’ OR name NOT LIKE
‘Ms.%’)
• Checked whenever a tuple of the relation is added
or updated.
• Note, changes in other places in the database may
cause the constraint to be violated. If this is
important, use assertions.
Declaring Assertions
• CREATE ASSERTION <name> CHECK
(<condition>)
CREATE ASSERTION RichPres CHECK
(NOT EXISTS
(SELECT *
FROM Studio, MovieExec
WHERE presC# = cert#
AND netWorth < 10000000))
Checked whenever any change occurs to the database.
Another Example
CREATE ASSERTION SumLength
CHECK
(10000 < ALL
(SELECT SUM(length)
FROM Movie
GROUP BY studioName)).
Can we also write this as a tuple constraint?
Different Constraint Types
Type
Where Declared
When activated Guaranteed
to hold?
Attribute
CHECK
with attribute
on insertion
or update
not if
subquery
Tuple
CHECK
relation schema
insertion or
update to
relation
not if
subquery
Assertion
database schema
on change to Yes
any relation
mentioned
Triggers
Enable the database programmer to specify:
• when to check a constraint,
• what exactly to do.
A trigger has 3 parts:
• An event (e.g., update to an attribute)
• A condition (e.g., a query to check)
• An action (deletion, update, insertion)
When the event happens, the system will check the constraint, and
if satisfied, will perform the action.
NOTE: triggers may cause cascading effects.
Database vendors did not wait for standards with triggers!
Elements of Triggers (in SQL3)
• Timing of action execution: before, after or instead of triggering
event
• The action can refer to both the old and new state of the database.
• Update events may specify a particular column or set of columns.
• A condition is specified with a WHEN clause.
• The action can be performed either for
• once for every tuple, or
• once for all the tuples that are changed by the database operation.
Example: Row Level Trigger
CREATE TRIGGER
NoLowerPrices
AFTER UPDATE OF price ON Product
REFERENCING
OLD AS OldTuple
NEW AS NewTuple
WHEN (OldTuple.price > NewTuple.price)
UPDATE Product
SET price = OldTuple.price
WHERE name = NewTuple.name
FOR EACH ROW
Statement Level Trigger
CREATE TRIGGER average-price-preserve
INSTEAD OF UPDATE OF price ON Product
REFERENCING
OLD_TABLE AS OldStuff
NEW_TABLE AS NewStuff
WHEN (1000 <
(SELECT AVG (price)
FROM ((Product EXCEPT OldStuff) UNION NewStuff))
DELETE FROM Product
WHERE (name, price, company) IN OldStuff;
INSERT INTO Product
(SELECT * FROM NewStuff)
Bad Things Can Happen
CREATE TRIGGER Bad-trigger
AFTER UPDATE OF price IN Product
REFERENCING OLD AS OldTuple
NEW AS NewTuple
WHEN (NewTuple.price > 50)
UPDATE Product
SET price = NewTuple.price * 2
WHERE name = NewTuple.name
FOR EACH ROW
Embedded SQL
• Direct SQL is rarely used: usually, SQL is
embedded in some application code.
• We need some method to reference SQL
statements.
• But: there is an impedance mismatch problem.
• So: we use cursors.
• Many things can be explained with the impedance
mismatch.
Programs with SQL
Host language + Embedded SQL
Preprocessor
Preprocessor
Host Language + function calls
Host
Host language
language compiler
compiler
Host language program
The Impedance Mismatch
Problem
The host language manipulates variables, values, pointers
SQL manipulates relations.
There is no construct in the host language for
manipulating relations.
Why not use only one language?
• Forgetting SQL: “we can quickly dispense with this idea”
[Ullman & Widom, pg. 363].
• SQL cannot do everything that the host language can do.
Interface: SQL / Host Language
Values get passed through shared variables.
Colons precede shared variables when they occur within the
SQL statements.
EXEC SQL: precedes every SQL statement in the host language.
The variable SQLSTATE provides error messages and status
reports (e.g., 00000 says that the operation completed with no
problem).
EXEC SQL BEGIN DECLARE SECTION;
char productName[30];
EXEC SQL END DECLARE SECTION;
Using Shared Variables
Void simpleInsert() {
EXEC SQL BEGIN DECLARE SECTION;
char productName[20], company[30];
char SQLSTATE[6];
EXEC SQL END DECLARE SECTION;
/* get values for productName and company
somehow */
EXEC SQL INSERT INTO Product(name, company)
VALUES (:productName, :company);
}
Single-Row Select Statements
Void getPrice() {
EXEC SQL BEGIN DECLARE SECTION;
char productName[20], company[30];
integer price;
char SQLSTATE[6];
EXEC SQL END DECLARE SECTION;
/* read value of product name */
EXEC SQL SELECT price
INTO :price
FROM Product
WHERE Product.name = :productName;
/* print out value of price */
}
Cursors
EXEC SQL DECLARE cursorName CURSOR FOR
SELECT …. FROM …. WHERE …. ;
EXEC SQL OPEN cursorName;
while (true) {
EXEC SQL FETCH FROM cursorName INTO :variables;
if (NO_MORE_TUPLES) break;
/* do something with values */
}
EXEC SQL CLOSE cursorName;
More on Cursors
• cursors can modify a relation as well as read it.
• We can determine the order in which the cursor will get
tuples by the ORDER BY keyword in the SQL query.
• Cursors can be protected against changes to the
underlying relations.
• The cursor can be a scrolling one: can go forward, backward
+n, -n, Abs(n), Abs(-n).