```SQL
Aggregation







Significant extension of relational algebra
Count(*)
“Count the number of tuples in Sailors”
SELECT count(*)
3
FROM Sailors
“What is the average age of sailors with rating 10?”
avg(age)
SELECT avg(age)
FROM Sailors
35
WHERE rating = 10
“How many different boats have been reserved?”
SELECT count(DISTINCT bid)
FROM Reserves
Syntax: COUNT, SUM, AVG, MAX, MIN apply to single
Result is a relation with only one tuple
421B: Database Systems - SQL Queries II
2
Aggregation (contd).


“Give the names of the sailors with the highest rankings”
SELECT sname
FROM Sailors S1
WHERE S1.rating = (SELECT MAX(S2.rating)
FROM Sailors S2)
(Note also the = in the where clause. We can use = when it is assured
that the relation resulting from the subquery has only one tuple.)
“Give the name of the sailor that is the first in the alphabet”
SELECT min(sname)
FROM Sailors S1
sid sname rating age
22 debby 7
31 debby 8
58 lilly
10
421B: Database Systems - SQL Queries II
45
55
35
min(sname)
debby
3
Grouping
So far, we have applied aggregate operators to all (qualifying) tuples.
Sometimes, we want to apply them to each of several groups of tuples.
 Example: “Find the average age of the sailors in each rating level”

 In general, we don’t know how many rating levels exist, and what the rating
values for these levels are.
 Suppose, we know that the rating levels go from 1 to 10; then we can write 10
queries that look like this:
for i=1,2,…10
 Using Grouping
SELECT avg(age)
FROM Sailors
GROUP BY rating
SELECT avg(age)
FROM Sailors
WHERE rating = i
sid sname rating age
22 debby 7
31 debby 7
58 lilly
10
421B: Database Systems - SQL Queries II
45
55
35
avg(age)
50
35
4
Queries with GROUP BY
SELECT
target-list
FROM relation list
WHERE
qualification
GROUP BY grouping list
 A group is defined as a set of tuples that have the same value for all
attributes in the grouping list
 One answer tuple is generated per group.
 The target-list contains attributes and/or aggregation terms
 The attributes of the target-list must be a subset of the grouping list:
Since each answer tuple corresponds to one group, we can only depict
attributes, for which all tuples in the group have the same value
 Example:
SELECT rating, avg(age)
FROM Sailors
GROUP BY rating
sid sname rating age
rating, avg(age)
22 debby 7
45
7
50
31 debby 7
55
10 35
58 lilly
10
35
421B: Database Systems - SQL Queries II
5
Evaluation
SELECT
target-list
FROM relation list
WHERE
qualification
GROUP BY grouping list

Conversion to Relational Algebra
 Compute the cross-product of relations in FROM clause, consider
only tuples that fulfill the qualification in WHERE clause, project
on fields that are needed (in SELECT or GROUP BY)
 Partition the remaining tuples into groups by the value of
attributes in grouping-list
 Return all attributes in the SELECT clause (must also be in the
group list) plus the calculated aggregation terms per group.
421B: Database Systems - SQL Queries II
6
SELECT lists with aggregation

If any aggregation is used, then each element in the attribute list of the
SELECT clause must either be aggregated or appear in a group-by clause
SELECT rating, avg(age)
FROM Sailors
GROUP BY rating
sid sname rating age
22 debby 7
31 debby 7
58 lilly
10

45
55
35
rating avg(age)
7
10
50
35
Look at a wrong (left) and correct (right) way to find the name of the oldest
sailor
SELECT sname, MAX(age)
FROM Sailors
421B: Database Systems - SQL Queries II
SELECT S1.sname, S1.age
FROM Sailors S1
WHERE S1.age = (SELECT MAX(S2.age)
FROM Sailors S2)
7
HAVING CLAUSE


HAVING clauses are selections on groups, just as WHERE
clauses are selections on tuples
Example: “For each rating level, find the average age of
all sailors over 18; only consider rating levels for which
there exist at least two sailors over 18”
SELECT rating, avg(age)
FROM Sailors
WHERE age > 18
GROUP BY rating
HAVING COUNT(*) >= 2
421B: Database Systems - SQL Queries II
8
Example
sid sname rating age
1
A
9
20
2
B
1
30
3
C
6
22
4
D
9
15
5
E
1
10
6
F
6
26
7
G
8
15
rating age
9
1
6
6
20
30
22
26
SELECT rating, avg(age)
FROM Sailors
WHERE age > 18
GROUP BY rating
HAVING COUNT(*) >= 2
 Select upon WHERE and
project to necessary
attributes
 Partition by GROUP and check
whether they fulfill HAVING
 Second column of result is
unnamed
rating
6
24
421B: Database Systems - SQL Queries II
9
Evaluation
SELECT rating, avg(age)
SELECT target-list
FROM Sailors
FROM relation list
WHERE age > 18
WHERE qualification
GROUP BY rating
GROUP BY grouping list
HAVING COUNT(*) >= 2
HAVING group-qualification
 Conversion to Relational Algebra
 Compute the cross-product of relations in FROM clause, consider only tuples that
fulfill the qualification in WHERE clause, project on fields that are needed (in
SELECT or GROUP BY)
 Partition the remaining tuples into groups by the value of attributes in grouping-list
 For each group, the group qualification is then applied selecting only those groups
that fulfill the qualification. Expressions in group-qualification must have a single
value per group. Hence, for each attribute in the group qualification, either
 the attribute also appears in the grouping list
 or it is argument of an aggregation
421B: Database Systems - SQL Queries II
10
Example II


For each red boat, find the number of reservations for this
boat
SELECT B.bid, COUNT (*) AS scount
FROM Boats B, Reserves R
WHERE R.bid=B.bid AND B.color=‘red’
GROUP BY B.bid
Grouping over a join of two relations.
421B: Database Systems - SQL Queries II
11
Example III

Find the age of the youngest sailor with age > 18, for each
rating with at least 2 sailors (of any age)
SELECT
FROM
WHERE
GROUP BY
HAVING
S1.rating, MIN (S1.age)
Sailors S1
S1.age > 18
S1.rating
1 < (SELECT COUNT(*)
FROM Sailors S2
WHERE S1.rating=S2.rating)
Shows HAVING clause can also contain a subquery.
 Compare this with the query where we considered only
ratings with 2 sailors over 18!
 What if HAVING clause is replaced by:
 HAVING COUNT(*) >1

421B: Database Systems - SQL Queries II
12
Example IV
Find those ratings for which the average age is the minimum
over all ratings
 Aggregate operations cannot be nested! WRONG
SELECT S.rating
FROM
Sailors S
WHERE S.age =(SELECT MIN (AVG (S2.age))
FROM Sailors S2)
• Correct solution (in SQL2):
SELECT Temp.rating, Temp.avgage
FROM (SELECT S.rating, AVG (S.age) AS avgage
FROM Sailors S
GROUP BY S.rating) AS Temp
WHERE Temp.avgage = (SELECT MIN (Temp.avgage)
FROM Temp)
• Or use view as intermediate relation

421B: Database Systems - SQL Queries II
13
NULL Values

Meaning of a NULL value
 Unknown/missing
 Inapplicable (e.g., no spouse’s name)

Comparing NULLs to values
 E.g., how to evaluate condition rating>7 if tuple has a NULL in rating?
 When we compare a NULL value and any other value (including NULL) using
a comparison operator like > or =, the result is “unknown”.
 If we want to check whether a value is NULL, SQL provides the special
comparison operator IS NULL

Arithmetic Operations (*, +, etc):
 When at least one operand has a NULL value (the other operands can have
any value including NULL) then the result is NULL (consequence
0*NULL=NULL !)
 We cannot use NULL as an operand (e.g., rating < NULL).
421B: Database Systems - SQL Queries II
14
NULL Values (contd.)

3-valued logic necessary: true, false, unknown
 NOT unknown = unknown
 A OR B = true if either A=true or B=true
 A OR B = false if A=false and B=false
 A OR B = unknown if (A=false and B=unknown) or (A=unknown and
B=false) or (A=unknown and B=unknown)
 A AND B = true if A=true and B=true
 A AND B = false if either A=false or B=false
 A AND B = unknown if (A=true and B=unknown) or (A=unknown and
B=true) or (A=unknown and B=unknown)
421B: Database Systems - SQL Queries II
15
Query evaluation considering
NULL values

Evaluation in SQL
 The qualification in the WHERE clause eliminates rows for
which the qualification does not evaluate true (i.e., rows that
evaluate to false or unknown are eliminated)
 SQL defines that rows are duplicates if corresponding columns
are either equal or both contain NULL (in contrast to the usual
on previous slide where the comparison of the NULLs results in
unknown)
 COUNT(*) handles NULLs like other values, I.e., they are
counted
 All other aggregate operations simply discard NULL values
421B: Database Systems - SQL Queries II
16
Outer Join


R1 C R2 = R1 C R2 with dangling tuples padded with
nulls and included in the result
Example:
 The result of Sailors
Reserves does only contain tuples
refering to sailors who have reserved at least one boat (at
least one tuple of this sailor in Reserves).
 The result of Sailors Reserves contains for each sailor
without matching Reserves tuple exactly one row, with the
result columns inherited from Reserves assigned NULL values
SELECT S.sid, S.sname, R.bid
FROM Sailors S NATURAL LEFT OUTER JOIN Reserves S
sid
421B: Database Systems - SQL Queries II
22
58
sname
bid
debby 103
lilly
NULL
17
Views

A view is just a unmaterialized relation: we store a
definition rather than a set of tuples.
CREATE VIEW ActiveSailors (sid,name)
AS SELECT DISTINCT S.sid, S.name
FROM Sailors S, Reserved R
WHERE S.sid = R.sid

Views can be used to present necessary information (or a
summary), while hiding details in underlying relation(s).

Given ActiveSailors, we know the names of the sailors who have
reserved boats (good for accounting), but not the age of the sailors
(uninteresting for accounting).
421B: Database Systems - SQL Queries II
18
Views (contd)




Views can be treated as if they were materialized
relations
The system translates a SELECT on a view into
SELECTS on the materialized relations
Modifications are problematic
Views can be dropped using the DROP VIEW command
 How to handle DROP TABLE if there’s a view on the table?
 DROP TABLE command has options to let the user specifiy this.
421B: Database Systems - SQL Queries II
19
Levels of Abstraction





Single conceptual (logical) schema defines
logical structure
 Conceptual database design
Physical schema describes the files and
indexes used
 Physical database design
Different views describe how users see the
data (also referred to as external schema)
 generated on demand from the real data
Physical data independence: the conceptual
schema protects from changes in the physical
structure of data
Logical data independence: external schema
protects from changes in conceptual schema of
data
421B: Database Systems - SQL Queries II
View 1
View 2
Conceptual Schema
Physical Schema
20
Limits of Queries in SQL


SQL is not TURING complete
Examples
 Calculate the variance of the ages of the sailors
 Build the following categories within the Sailors: all sailors younger
than 20, all sailors between 20 and 29, etc. For each category, give
the total number of sailors and their average rating.
 Assume a relation with all direct flights. Determine all cities that are
reachable from Montreal
 Flights(fid, dep-city, arr-city)
SELECT Fn.arr-city
FROM Flights F1, Flights F2, … Flights Fn
WHERE F1.dep-city = ‘Montreal’ AND
fid Dep-city Arr-city
F1.arr-city = F2.dep-city AND
…
22 Montreal Zurich
Fn-1.arr-city = Fn.dep-city
25 Zurich
Munich
31 Munich Bombay
…
421B: Database Systems - SQL Queries II
21
DB Modifications:
insert/delete/update

Insert values for all attributes in the order attributes were declared or
values for only some attributes
 INSERT INTO Sailors VALUES (68,’Jacky’,10, 40)
 INSERT INTO Sailors (sid,name) VALUES (68, ‘Jacky’)

Insert the result of a query
 ActiveSailors(sid,name)
 INSERT INTO ActiveSailors (
SELECT Sailors.sid Sailors.name
FROM Sailors, Reserves
WHERE Sailors.sid = Reserves.sid)

Delete some or all tuples of a relation
 DELETE FROM Boats WHERE bid = 103 / DELETE FROM Boats

Update some of the attributes of some of the tuples
 UPDATE Sailors
SET ranking = 10, age = age + 1
WHERE name = ‘debby’ OR name = ‘lilly’

SQL2 semantics: all conditions in a modification statement must
be evaluated by the system BEFORE any modifications occur.
421B: Database Systems - SQL Queries II
22
```