FROM Movie WHERE

Download Report

Transcript FROM Movie WHERE

Subqueries Example
Find the name of the producer of ‘Star Wars’.
Movie(title, year, length, inColor, studioName, producerC)
MovieExec(name, address, cert, netWorth)
We can do:
SELECT name
FROM Movie, MovieExec
WHERE title ='Star Wars' AND producerC =cert;
Or we can have a subquery:
SELECT name
FROM MovieExec
WHERE cert =(SELECT producerC
FROM Movie
WHERE title = 'Star Wars');
If we can deduce that there
will be only a single value
produced by the subquery,
then we can use this
expression, surrounded by
parentheses, as if it were a
constant.
Conditions Involving Relations
There are a number of SQL operators that we can apply
to a relation R and produce a Boolean result.
Typically used in the WHERE clause.
1. EXISTS R is a condition that is true if R is not empty.
2. s IN R is true if s is equal to one of the tuples in R.
Likewise, s NOT IN R is true if and only if s is equal to no
tuple in R.
Example
Give all the producers of movies in which Julia Roberts stars .
SELECT name
FROM MovieExec
WHERE cert IN
(SELECT producerC
FROM Movie
WHERE (title, year) IN
(SELECT movieTitle, movieYear
FROM StarsIn
WHERE starName = 'Julia Roberts'));
Remark
The previous nested query can, like many nested
queries, be written as a single SELECT-FROM-WHERE
expression.
SELECT name
FROM MovieExec, Movie, StarsIn
WHERE cert = producerC AND
title = movieTitle AND
year = movie Year AND
starName = 'Julia Roberts';
(Continued)
3. s > ALL R is true if s is greater than every value in the unary
(one column) relation R.
Similarly, the > operator could be replaced by any other
comparison operators with the analogous meaning. For
instance, s <> ALL R is the same as s NOT IN R.
4. s > ANY R is true if s is greater than at least one value in
unary relation R.
Similarly we can use any other comparison operators in
place of >. For instance, s =ANY R is the same as s IN R.
EXISTS, ALL, and ANY operators can be negated by putting
NOT in front of the entire expression.
Bag Semantics and
Union, Intersection and Difference
• Although the SELECT-FROM-WHERE statement uses
bag semantics, the default for union, intersection, and
difference is set semantics.
Motivation?
• When doing projection in relational algebra, it is easier to
avoid eliminating duplicates. Just work tuple-at-a-time.
• When doing intersection or difference, it is most
efficient to sort the relations first. At that point you may
as well eliminate the duplicates anyway.
Controlling Duplicate Elimination
• Force the result to be a set by
– SELECT DISTINCT . . .
• Force the result to be a bag (i.e., don’t eliminate duplicates) by ALL, as in
– UNION ALL . . .
• Only UNION ALL supported in ORACLE.
Example
• Find all the different studios producing movies:
Movie(title, year, length, inColor, studioName, producerC),
SELECT DISTINCT studioname
FROM Movie;
• Notice that without DISTINCT, a studioname would be listed as many
times as there were movies from that studio.
Aggregations
•
SUM, AVG, COUNT, MIN, and MAX can be applied to a column in a SELECT
clause to produce that aggregation on the column.
Example
• Find the average length of movies from Disney.
SELECT AVG(length)
FROM Movie
WHERE studioName = 'Disney';
Remark
• We can also use COUNT(*) which counts the number of tuples in the relation
constructed from the FROM and WHERE clauses of the query.
Eliminating Duplicates in an Aggregation
•
DISTINCT inside an aggregation causes duplicates to be eliminated before
the aggregation.
Example
• Find the number of different producers for Disney movies.
SELECT COUNT(DISTINCT producerc)
FROM Movie
WHERE studioname = 'Disney';
This is not the same as:
DISTINCT here is useless!
Why?
SELECT DISTINCT COUNT(producerc)
FROM Movie
WHERE studioname = 'Disney';
Not only in COUNT…
SELECT AVG(DISTINCT length)
FROM Movie
WHERE studioname = 'Disney';
• This will produce the average of only the distinct
values for length.
NULL’s Ignored in Aggregation
• NULL never contributes to a sum, average, or count, and
can never be the minimum or maximum of a column.
SELECT SUM(networth)
FROM moviestar NATURAL FULL OUTER JOIN movieexec;
• But if there are no non-NULL values in a column, then the
result of the aggregation is NULL.
Example: Effect of NULL’s
SELECT count(*)
FROM Movie
WHERE studioName = 'Disney';
SELECT count(length)
FROM Movie
WHERE studioName = 'Disney';
The number of movies
from Disney.
The number of movies
from Disney with a
known length.
Grouping
• We may follow a SELECT-FROM-WHERE expression by GROUP BY
and a list of attributes.
• The relation that results from the SELECT-FROM-WHERE
– is grouped according to the values of all the listed attributes
in GROUP BY, and
– any aggregation is applied only within each group.
Example
• From the Movie relation, find the average length for each studio.
SELECT studioName, AVG(length)
FROM Movie
GROUP BY studioName;
Another Example
• From Movie and MovieExec, find the producer’s total
length of film produced:
SELECT name, SUM(length)
FROM Movie, MovieExec
WHERE producerc = cert
GROUP BY name;
Compute
those
tuples first,
then group
by name.
Restriction on SELECT Lists With
Aggregation
•
If any aggregation is used, then each element of the
SELECT list must be either:
1.
2.
Aggregated, or
An attribute on the GROUP BY list.
Illegal Query Example
• We might think we could find the shortest movie of Disney as:
SELECT title, MIN(length)
FROM Movie
WHERE studioName = 'Disney';
• But this query is illegal in SQL.
• Because title is neither aggregated nor on the GROUP BY list.
• We should do instead:
SELECT title, length
FROM Movie
WHERE studioName = 'Disney' AND length =
(SELECT MIN(length)
FROM Movie
WHERE studioName = 'Disney');
Or…
SELECT title, length
FROM Movie NATURAL JOIN
(SELECT MIN(length) AS length
FROM Movie
WHERE studioName = 'Disney')
WHERE studioName = 'Disney';
…resembling Relational Algebra.
HAVING Clauses
•
HAVING <condition> may follow a GROUP BY clause.
– If so, the condition applies to each group, and groups not satisfying the
condition are eliminated.
Example
• Consider again the query
SELECT name, SUM(length)
FROM Movie, MovieExec
WHERE producerc = cert
GROUP BY name;
• Suppose we didn’t wish to include all the producers in our table of aggregated
lengths. We want those producers
– with networth less than 1,000,000, and
– that have at least one movie before 1973.
• Solution
SELECT name, SUM(length)
FROM MovieExec, Movie
WHERE producerc = cert AND networth<1000000
GROUP BY name
HAVING MIN(year) < 1973;
Requirements on HAVING Conditions
•
•
These conditions may refer to any relation in the FROM
clause.
They may refer to attributes of those relations, as long
as the attribute makes sense within a group; i.e., it is
either:
1. A grouping attribute, or
2. Aggregated attribute.
“Having” is a special kind of
•

The previous query can also be written as:
SELECT name, sumLength
FROM (
SELECT name, MIN(year) AS minYear, SUM(length) AS sumLength
FROM MovieExec, Movie
WHERE producerc = cert AND networth < 1000000
GROUP BY name)
WHERE minYear < 1973;
Correlated Subqueries
•
Suppose StarsIn relation has an additional attribute “salary”
StarsIn(movieTitle, movieYear, starName, salary)
•
Now, find the stars who were paid for some movie more than the average
salary for that movie.
SELECT starName, movieTitle, movieYear
FROM StarsIn X
WHERE salary >
(SELECT AVG(salary)
FROM StarsIn
WHERE movieTitle = X.movieTitle
AND movieYear=X.movieYear);
Remarks
1. Outer query cannot reference any columns in the subquery.
2. Subquery references the tuple in the outer query.
3. Value of the tuple changes by row of the outer query, so the database must
rerun the subquery for each row comparison.
Another Solution (Nesting in FROM)
SELECT X.starName, X.movieTitle, X.movieYear
FROM StarsIn X, (SELECT movieTitle, movieYear, AVG(salary) AS avgSalary
FROM StarsIn
GROUP BY movieTitle, movieYear) Y
WHERE X.salary>Y.avgSalary AND
X.movieTitle=Y.movieTitle AND X.movieYear=Y.movieYear;
Exercise
Product(maker, model, type)
PC(model, speed, ram, hd, rd, price)
Laptop(model, speed, ram, hd, screen, price)
Printer(model, color, type, price)
1. Find those manufacturers that sell Laptops, but not PC's.
2. Find those hard-disk sizes that occur in two or more PC's.
3. Find those manufacturers of at least two different computers (PC or
Laptops) with speed of at least 700.
4. Find the manufacturers who sell exactly three different models of PC.