SQL Queries - subqueries and joining

Download Report

Transcript SQL Queries - subqueries and joining

SQL queries
subqueries and joining
SQL query – beyond one table
• So far, we have only applied queries to a
single table
• It is possible – and very useful – to make
queries applied to several tables
• Can answer more complex questions…
• …but queries also tend to become more
complex themselves 
RHS – SOC
2
SQL query – beyond one table
• We use a movie information database
as example
Movie
movieid
title
country
prodyear
genre
oscars
Actor
Casting
movieid
actorid
RHS – SOC
actorid
name
country
birth
living
oscars
3
SQL query – beyond one table
actorid
name
country
birth
1
John Wayne
USA
04-06-1911 No
2
2
Chow-Yun Fat
HK
12-11-1959 Yes
0
3
Karl Stegger
Denmark
30-01-1922 No
0
4
Jean Reno
France
10-10-1949 Yes
0
5
Julia Roberts
USA
28-09-1964 Yes
1
6
Natalie Portman
Israel
02-07-1980 Yes
0
7
Marc Duret
France
01-08-1966 Yes
1
RHS – SOC
living
oscars
4
SQL query – beyond one table
movieid
actorid
movieid
actorid
1
1
4
4
1
4
4
6
1
5
5
2
1
7
5
4
2
4
5
6
2
6
6
5
2
7
6
6
3
1
7
2
3
2
7
4
3
3
7
5
RHS – SOC
5
SQL query – subqueries
• Suppose we wish to answer a (complex)
question like this:
• How many movies are from the country
that Marc Duret is from
• This question cannot be answered by a
single query (as we know it…)
• Information is spread out in multiple tables
RHS – SOC
6
SQL query – subqueries
SELECT country
FROM Actor
WHERE name = ’Marc Duret’
Result is
’France’
SELECT COUNT(title) AS MovieCount
FROM Movie
WHERE Movie.country = ’France’
RHS – SOC
’France’
used as
input
7
SQL query – subqueries
• We can run these two
queries ”manually”, or
use output from first
query as input to second
query – not optimal…
• Better to make first
query a subquery to the
second query
RHS – SOC
8
SQL query – subqueries
Outer
query
SELECT COUNT(title) AS MovieCount
FROM Movie
WHERE country = (SELECT country
FROM Actor
WHERE name = ’Marc Duret’)
Inner
query
RHS – SOC
9
SQL query – subqueries
• Result from inner query (the subquery) is
used as input to outer query
• The inner query produces a result table –
just as any other query – which is then
used by the outer query
• Very convenient – but some complications
– Name clash
– Multiple results
RHS – SOC
10
SQL query – subqueries
• When we query more than one table, field
names from different tables might be
identical – a name clash
• We can qualify a field name by prefixing it
with the table name
– Movie.country (country field in Movie table)
– Actor.country (country field in Actor table)
RHS – SOC
11
SQL query – subqueries
• What if a subquery produces more than
one result?
SELECT country
FROM Actor
WHERE (oscars > 0)
RHS – SOC
Result:
USA
France
12
SQL query – subqueries
• In such case, we must use the IN keyword
SELECT COUNT(title) AS MovieCount
FROM Movie
WHERE country IN
(SELECT country
FROM Actor
WHERE (oscars > 0))
Try formulating
this query in
plain English…
RHS – SOC
13
Exercise 6 – SQL queries
•
•
•
•
Use the MovieInformation database, defined in exercise 1
Add records to the Actor table, as defined in the presentation
Add records to the Casting table, as defined in the presentation
With the data in place, run the below queries on the database
– SELECT title, prodYear, oscars FROM Movie WHERE oscars > (SELECT
SUM(oscars)/5 FROM Movie)
– SELECT * FROM Movie WHERE prodYear > (SELECT MAX(prodYear) FROM
Movie WHERE (oscars > 0))
• Now formulate queries yourself, in order to retrieve the below data:
– Get Name and Country for actors from the same country that ’E.T’ was made in
– Get Title and Oscars for movies that have won at least as many Oscars as John
Wayne has won
– Get Name and Birth for actors having won more than twice the average of
Oscars won for all actors
RHS – SOC
14
SQL query – joining
• Another approach to multi-table queries is
to use join
• To ”join” tables is to make a kind of
”multiplication” of the tables
• Simple case: Join without any conditions:
SELECT *
FROM Movie, Actor
RHS – SOC
15
SQL query – joining
• If the previous query is run against the
MovieInformation database, we get a
result with 49 rows…
• 49 = 7 x 7. All combinations of Movie
records (7) and Actor records (7)
• All fields from both tables are included
• This is rarely what we want…
RHS – SOC
16
SQL query – joining
• In almost all cases, we wish to pick out
those rows where certain fields match
• Example: For each movie, find the title of
the movie, and the names of the actors
appearing in the movie
• We already have this information in the
Casting table (20 records), but only as
identifiers…
RHS – SOC
17
SQL query – joining
• We clearly need information from all three
tables in the MovieInformation database,
so first attempt at a query could be:
SELECT *
FROM Movie, Casting, Actor
• Result: 980 records with a lot of fields…
RHS – SOC
18
SQL query – joining
• First of all, we only need the Title field
from the Movie table, and the Name field
from the Actor table
SELECT Movie.Title, Actor.Name
FROM Movie, Casting, Actor
• Result: 980 records with correct fields
RHS – SOC
19
SQL query – joining
• Next, we only need those records where
the relevant information match
• We are trying to ”substitute” identifiers in
the Casting table with real data (titles and
names)
• In order to get the right titles and names,
the identifiers for this data must match!
RHS – SOC
20
SQL query – joining
• Match identifier for titles and names:
SELECT Movie.Title, Actor.Name
FROM Movie, Casting, Actor
WHERE ((Movie.MovieId = Casting.MovieId)
AND (Actor.ActorId = Casting.ActorId))
• Result: 20 records with correct fields!
RHS – SOC
21
SQL query – joining
• This is a quite common ”pattern” for multitable queries
– Tables representing relations only contain
identifiers (keys in other tables)
– ”Real data” is contained in tables representing
entities (identifiers are keys)
– Obtaining ”real data” for relations requires use
of join, with matching of key fields
RHS – SOC
22
Exercise 7 – SQL queries
•
•
Use the MovieInformation database, defined in exercise 1, with data in all three
tables
With the data in place, run the below queries on the database
1) SELECT Movie.TITLE, COUNT(Actor.ActorId) AS Roles FROM Movie, Actor, Casting
WHERE (Actor.COUNTRY = 'France') AND (Casting.ACTORID = Actor.ACTORID)
AND (Casting.MOVIEID = Movie.MOVIEID)
GROUP BY Movie.TITLE
2) SELECT Movie.Genre, SUM(Actor.Oscars) AS TotalOscars FROM Movie, Actor, Casting
WHERE (Actor.Living = 'Yes') AND (Casting.ACTORID = Actor.ACTORID)
AND (Casting.MOVIEID = Movie.MOVIEID)
GROUP BY Movie.Genre
•
Now formulate queries yourself, in order to retrieve the below data:
– Get actor name, country and movie title for those actors appearing in a movie
made in their own country
– Get actor name and movie title for those actors that have not won an Oscar, but
have appeared in Oscar-winning movies
RHS – SOC
23