Lecture 7 - California State University, Los Angeles

Download Report

Transcript Lecture 7 - California State University, Los Angeles

Using Relational Databases and SQL
Lecture 7:
Department of Computer Science
California State University, Los Angeles
Topics for Today
GROUP BY
HAVING
Set Operations
Scripts
Views
Aggregating in Groups
Can we calculate this with a single query?
Yes, but we need a way to group data together
Solution: Use the GROUP BY clause
Grouping Data
You can make an aggregate function return
multiple values per table by grouping the table
-- Count the number of male and female members.
SELECT Gender, COUNT(*)
FROM Members
GROUP BY Gender;
Because there are two gender groups, male and
female, the COUNT function will return two
values, one for each gender group
How GROUP BY Works
GROUP BY begins by sorting the table based on
the grouping attributes (in our case, Gender)
If any aggregate functions are present, GROUP
BY causes each aggregate to be applied per-group
rather than per-table
GROUP BY then condenses the table so that each
group only appears once in the table (if listed)
and displays any aggregated values along with it
Grouping on Multiple Fields
GROUP BY can use multiple fieldnames (similar
to how you can sort using multiple fieldnames)
SELECT Genre, ArtistID, COUNT(*) FROM
Titles GROUP BY Genre, ArtistID order by Genre
Notice that the more fields you group by, the
more results you get!
GROUP BY AND DISTINCT
In some cases, using GROUP BY outside an
aggregate would produce the same result as using
DISTINCT
SELECT Gender FROM Members
GROUP BY Gender;
SELECT DISTINCT Gender
FROM Members;
This is sloppy practice!
GROUP BY and DISTINCT
GROUP BY works in conjunction with aggregate
functions, DISTINCT does not
GROUP BY affects how data is aggregated and
thus has more work to do. DISTINCT just checks
to see whether the value is a repeat
In other words, if you are using GROUP BY, you
had better have an aggregate function
somewhere!
From now on, using GROUP BY as a DISTINCT
replacement is an error!
GROUP BY and Primary Keys
Let’s say you want to display each title and the
number of members who played on it.
We could write
SELECT Title, COUNT(M.FirstName) FROM
titles JOIN XRefArtistsMembers USING(artistID)
join members M using(memberID) GROUP BY
Title
However, if there are multiple albums with the
same title, we will get garbage!
GROUP BY and Primary Keys
The solution is to GROUP BY the primary key,
TitleID, followed by Title
SELECT Title, COUNT(M.FirstName) FROM
titles JOIN XRefArtistsMembers USING(artistID)
join members M using(memberID) GROUP BY
titleID, title
We now have a query that will always work, even if
there are multiple titles with the same name
Filtering Aggregated Results
Using the previous example, once we have our
aggregated result table, is it possible to filter out
certain groups, say where COUNT(*) = 1?
The HAVING Clause
Yes, but we must have a way of filtering results
AFTER aggregation!
Solution is to use the HAVING clause
The HAVING clause filters AFTER aggregation
(this is why you CAN use aggregate functions in
the HAVING clause)
The WHERE clause filters BEFORE aggregation
(this is why you CANNOT use aggregate
functions in the WHERE clause)
HAVING Summary
Using HAVING to filter out groups in an
aggregated table
In a HAVING clause, you may use:
aggregate functions
regular functions
constant values
grouping attributes
HAVING Clause Example
Example:
SELECT Genre, COUNT(*) FROM Titles J
GROUP BY Genre HAVING COUNT(*) > 1;
Aggregating Distinct Values
A normal SELECT DISTINCT query filters out
duplicates after aggregation
Therefore, if a field contains duplicate values, and
you aggregate on that field, SELECT
DISTINCT WILL NOT filter out duplicate
values from being aggregated.
This could produce incorrect answers with, for
example, count() or avg()
Aggregating Distinct Values
The solution is to use the DISTINCT keyword
within the aggregate function
Example:
WRONG: SELECT DISTINCT COUNT(Firstname)
FROM Members;
There is only one value, the number 23!
RIGHT: SELECT COUNT(DISTINCT(Firstname))
FROM Members;
Aggregating Distinct Values
create table musicians(
id int primary key,
name varchar(30)
);
create table bands(
id int primary key,
name varchar(30)
);
create table xref(
musicianId int,
bandId int,
instrument varchar(30),
primary key ( musicianid, bandId, instrument)
);
insert into musicians values(1, "Mick");
insert into musicians values(2, "Keith");
insert into musicians values(3, "Charlie");
insert into musicians values(4, "Ron");
insert into musicians values(5, "Jan");
insert into musicians values(6, "Dean");
insert into bands values(1, "Rolling Stones");
insert into bands values(2, "Jan and Dean");
insert into xref values(1, 1, "Vocals");
insert into xref values(2, 1, "Guitar");
insert into xref values(3, 1, "Drums");
insert into xref values(4, 1, "Guitar");
insert into xref values(5, 2, "Vocals");
insert into xref values(6, 2, "Vocals");
insert into xref values(6, 2, "Vuvuzuela");
Aggregating Distinct Values
Find the number of different musicians in each band
-- WRONG!!!
SELECT b.name, COUNT(m.id) AS 'Musicians’FROM
bands b JOIN xref x ON b.id = x.bandId JOIN musicians
m ON x.musicianId = m.id GROUP BY bandID;
This is incorrect since some musicians in this DB
are listed more than once in xref because they play
more than one instrument
Aggregating Distinct Values
-- CORRECT!!!
SELECT b.name, COUNT(distinct m.id) AS
'Musicians’ FROM bands b JOIN xref x ON b.id =
x.bandId JOIN musicians m ON x.musicianId =
m.id GROUP BY bandID;
This is the correct answer!
Views
Views are stored queries
Views are part of the database schema
Use views as though they were actual tables
With complex DBs, may be better to query views
than tables
Views
Syntax:
CREATE [OR REPLACE]
VIEW view_name
AS select_statement
• It’s a good practice to include the primary
keys of the tables you select from.
• You may want to get the SELECT right first,
then build the CREATE VIEW around it.
Views
Example:
CREATE VIEW SalespersonMembers AS
SELECT s.lastname AS "salespersonName",
s.salesID , m.lastname AS "memberName",
m.memberID FROM salespeople s JOIN members
m USING(salesID);
Views
Select from views just as if they were tables:
SELECT * FROM SalespersonMembers;
Views
Benefits:
Consider an application with queries and DML based
on a third-party database
Database manager changes the table definitions: application
code breaks!
Any changes require careful coordination between DB
manager and application programmers
Consider an app that queries against database views
DB manager can change the tables but rewrite the views to
maintain an unchanged interface
Application programmers don’t have to worry about
underlying changes
Union
List all regions that contain members, studios, or
both
This is a SET operation, so there will be no
duplicates.
SELECT region FROM members UNION SELECT
region FROM studios;
Summary Statistics with UNION
Select 1, '< 2 minutes' As Length,
Count(*) As NumTracks From Tracks Where
LengthSeconds<120
Union
Select 2, '2-3 minutes', Count(*)From
Tracks Where LengthSeconds Between 120 and
180
Union
Select 3, '>3 minutes', Count(*)
From Tracks Where LengthSeconds>180;
UNION ALL
• Counts duplicates multiple time; breaks set definition of
UNION
• Get number of electronic versions available for all tracks:
Select Count(SoundFile) As Num_Electronic_Files
from (Select MP3 As SoundFile From Tracks Where
MP3=1 Union All Select RealAud From Tracks Where
RealAud=1) as elec;
In Class Exercise
Write SQL queries to find the following:
By how many seconds does the length of the longest
track in the db exceed that of the average track?
List the title names and IDs with the number of tracks
on each one