Learning Through Writing - Western Carolina University

Download Report

Transcript Learning Through Writing - Western Carolina University

Banner and the SQL Select
Statement:
Part Three (Joins)
Mark Holliday
Department of Mathematics and
Computer Science
Western Carolina University
4 November 2005 and 11 November 2005
(updated: 11 November 2005)
Outline
 The Goal
 The Concepts




A First Example
Single Table Selects
Joins
Multiple Connected Select Statements
A First Example
 Outline
 The Relational Model: Single Table
 Lab 1: TOAD, Schema Browser
 Some Structured Query Language (SQL)
Basics
 Lab 2: TOAD, SQL Editor
Single Table Selects
 Outline
 WHERE clause: single condition, multiple
conditions
 Lab 3:
 Order By; Aggregate Functions
 Lab 4:
 Group By; Having
 Lab 5:
Joins
 Outline





Why Multiple Tables?
Inner Joins
Lab 6:
Outer joins
Lab 7:
Why Multiple Tables?
(franz)
One Table database: keeps track of all purchases at our
store (known as the ‘flat file’)
Why Multiple Tables?
(franz)
Every time a new row is inserted into the table, all columns will be
updated. This results in unnecessary "redundant data". For example,
every time Wolfgang Schultz purchases something, the following rows
will be inserted into the table:
Why Multiple Tables?
What happens if we ONE DAY find out Wolfgang Schultz’s last name
is really spelled “S-h-o-o-l-t-z-e” instead of “S-c-h-u-l-t-z”? (franz)
He has purchased LOTS of stuff from us.
And…. He has been making purchases at our store for over five years. In
fact, he is our very BEST customer. His purchases provide for most of
our gross sales.
But now… He says that he will take his business elsewhere if we don’t get
this problem corrected!
And YOU have to get it corrected NOW!!!
Why Multiple Tables?
(franz)
Hold that thought !
This helps to exemplify WHY relational
database table structures are so nice ...
Why Multiple Tables?
Original ‘flat file’ table:
(franz)
Why Multiple Tables?
(franz)
For this example, an ideal database would have two
tables:

One for keeping track of customer information
(customer_info)

And the other to keep track of what they purchase
(purchases).
customer_info table:
purchases table:
Our new tables:
The ‘id’ number
is the common
element that
ties the tables
together.
(franz)
Why Multiple Tables?
(franz)
After redesigning our data structure
into two tables,
whenever a change for the name or
address needs to be made to our
repeating customer,
we update the “customer_info” table
only.
Only the second table, “purchases”,
needs to be updated when purchases
are made.
Why Multiple Tables?
(franz)
We've just eliminated the space-taking,
time-consuming, useless redundant
data of our flat file.
That is, we've just NORMALIZED this
database!
This is the basis for relational database
structure.
Joins: Introduction
Unfortunately, having multiple tables
introduces a problem.
Problem: Information you need for a query is
often in more than one table.
Example: Find the zip code of the person
whose first name is “Ab” and last name is
“Mazlan”.
 need the spraddr table for the zip code and
 need the spriden table for the name
Joins: Introduction
Solution: Need a way to temporarily
(just in the query) connect a row in
one table with the row (or rows) in
another table that “match” it.
Refined Solution: Use some type of
JOIN operation
Joins: Introduction
Problem: How do we find the “matching” rows
in the other table?

In other words, how do we do a join?
Solution:
1. Conceptually, match each row in the left
table with every row in the right table.
•
take the Cartesian Product of the two tables
2. Only keep the row pairs that “match”.
Joins: Introduction (Step One of Join)
pidm last_name
1
2
“smith”
“jones”
--
pidm1
1
1
2
2
Cartesian
Product
Last_name
“smith”
“smith”
“jones”
“jones”
pidm
zip
1
20111
3
20311
pidm2
1
3
1
3
zip
20111
20311
20111
20311
Joins: Introduction (Step Two of Join)
pidm1
1
1
2
2
Last_name
“smith”
“smith”
“jones”
“jones”
pidm2
1
3
1
3
zip
20111
20311
20111
20311
only keep rows that have pidm1 = pidm2
pidm1 Last_name
1
“smith”
pidm2 zip
1
20111
Joins: Introduction
Questions:
1) When does a row from the left table
“match” a row from the right table?
2) What to do with a row in one table
that does not match a row in the
other table?
Joins: Introduction
Question One: When does a row from the left
table “match” a row from the right table?
Answer: The two rows match if the condition
expression is TRUE.
 The expression is a sequence of conditions
connected by AND.
 Each condition is a comparison of a left
table column and a right table column
 The comparison operator is equality.
Joins: Introduction
Example condition expression
 just one condition
spriden.spriden_pidm =
spraddr.spraddr_pidm
Joins: Introduction
Good relational database design
=> the only common column in two
tables is the key for each table
=> use the key of each table to get
from a left table row to the matching
right table row
•
they have the same key value
Joins: Introduction
Question Two: What to do with a row in
one table that does not match a row
in the other table?
 ignore them: inner join
 don’t ignore them: outer join
 This section covers inner joins
INNER Join
(franz)
An example using reserved words to indicate a join:
SELECT spriden_last_name, spriden_first_name,
spraddr_city, spraddr_stat_code, spraddr_zip
FROM spriden INNER JOIN spraddr
ON spriden_pidm = spraddr_pidm
Implicitly,
1) take cartesian product of spriden and spraddr
2) only keep the rows that meet the condition
INNER Join
(franz)
Syntax for a join using reserved
words
SELECT "list-of-columns“
FROM table1 [“type” JOIN] table2
ON “field matching”
WHERE "search-condition(s)"
INNER Join
Table ‘spriden’:
Table ‘spraddr’:
(franz)
INNER Join
(franz)
SELECT spriden_last_name, spriden_first_name,
spraddr_city, spraddr_stat_code, spraddr_zip
FROM spriden INNER JOIN spraddr
ON spriden_pidm = spraddr_pidm
INNER Join
(franz)
Task: Select from ‘spriden’ for the active record
(spriden_change_ind is null). Pull the permanent mailing
address types (‘MA’) in the US (spraddr_natn_code is null).
This will pull a large amount of data.
Let’s create a smaller subset by picking zip codes on the
northern East Coast (zip codes starting with ‘0’).
Let’s also order our output by the last name then first
name.
INNER Join
(franz)
To accomplish this we would take the previous SQL and
make these additions:
SELECT spriden_last_name, spriden_first_name,
spraddr_city, spraddr_stat_code,
spraddr_zip
FROM spriden INNER JOIN spraddr
ON spriden_pidm = spraddr_pidm
WHERE spriden_change_ind is null
and spraddr_atyp_code = 'MA'
and spraddr_zip between '000%' and '1%'
and spraddr_natn_code is null
ORDER BY spriden_last_name, spriden_first_name
INNER Join
(franz)
SELECT substr(spriden_last_name,1,12) || ', ' ||
substr(spriden_first_name,1,12),
rtrim(spraddr_city) || ', ' || spraddr_stat_code || '
' || spraddr_zip
FROM spriden INNER JOIN spraddr
ON spriden_pidm = spraddr_pidm
WHERE spriden_change_ind is null
and spraddr_atyp_code = 'MA'
and spraddr_zip between '000%' and '1%'
and spraddr_natn_code is null
ORDER BY spriden_last_name, spriden_first_name
INNER Join
Part of the Resultant Subset Would Be:
(franz)
INNER Join
(franz)
If we had left out the check for a null change indicator, we would
have received duplicate rows of data. (We would get the Gary
Abbot record with a null indicator, and also the one with an “I”
indicator.)
SELECT substr(spriden_last_name,1,12) || ', ' ||
substr(spriden_first_name,1,12), rtrim(spraddr_city) || ', '
|| spraddr_stat_code || ' ' || spraddr_zip
FROM spriden INNER JOIN spraddr
ON spriden_pidm = spraddr_pidm
WHERE spraddr_atyp_code = 'MA'
and spraddr_zip between '000%' and '1%'
and spraddr_natn_code is null
…spriden_change_ind is null
ORDER BY spriden_last_name, spriden_first_name
INNER Join
(franz)
SELF Join
The left table and the right table in an
inner join might be the same table.
This is called a SELF Join.
When would this be useful?
SELF Join
(franz)
‘spiffy’ table:
Syntax:
SELECT e.first_name EMPLOYEE_FIRST, e.last_name EMPLOYEE_LAST,
m.first_name MANAGER_FIRST, m.last_name MANAGER_LAST
FROM spiffy e INNER JOIN spiffy m
ON e.manager_id = m.employee_id
SELF Join
SELECT e.first_name EMPLOYEE_FIRST, e.last_name EMPLOYEE_LAST,
m.first_name MANAGER_FIRST, m.last_name MANAGER_LAST
FROM spiffy e INNER JOIN spiffy m
ON e.manager_id = m.employee_id
Previous Inner Join examples:
 instances of two different tables joining on the
same column in both tables (e.g. pidm)
The Self Inner Join example:
 two instances of the same table joining on
different columns of the same table (e.g.
manager_id and employee_id)
SELF Join
Result:
(franz)
Laboratory Six
 Objectives:
 Develop competence with inner joins
 Steps:
 First Query
Laboratory Six
 Problem: Find the area code and phone
number for every one whose last name is
“Holliday.”
 Hint: Use the spriden and sprtele tables.
Laboratory Six
Solution One:
SELECT sprtele_phone_area, sprtele_phone_number
FROM spriden INNER JOIN sprtele
ON spriden_pidm = sprtele_pidm
WHERE spriden_last_name = ‘Holliday’
Solution Two:
SELECT sprtele_phone_area,
sprtele_phone_number
FROM spriden, sprtele
WHERE spriden_pidm = sprtele_pidm and
spriden_last_name = ‘Holliday’
OUTER Join
(franz)
Notice that ‘spriden’ has records for Bagnall, Baker and
Barksdale.
‘Spbpers’ has records for Bagnall and Barksdale, but not
Baker.
OUTER Join
(franz)
If we join the tables by an inner join on pidm, we would
only receive rows for Bagnall and Barksdale.
To pick up all the records in ‘spriden’, and join it with the
data in ‘spbpers’ we would use an outer join.
OUTER Join
(franz)
OUTER Joins can be split into three types:
 LEFT outer join
 RIGHT outer join
 FULL outer join
“Left” or “Right” designates what your
base table is for the join.
“Full” returns all rows in both tables.
LEFT OUTER Join
SELECT spriden_last_name, spriden_first_name,
spbpers_sex, spbpers_birth_date
FROM spriden LEFT OUTER JOIN spbpers
ON spriden_pidm = spbpers_pidm
WHERE spriden_change_ind is null
and spriden_entity_ind = 'P'
ORDER BY spriden_last_name, spriden_first_name
(franz)
LEFT OUTER Join
Result:
(franz)
RIGHT OUTER Join
(franz)
SELECT spriden_last_name, spriden_first_name,
spbpers_sex, spbpers_birth_date
FROM spriden RIGHT OUTER JOIN spbpers
ON spriden_pidm = spbpers_pidm
WHERE spriden_change_ind is null
and spriden_entity_ind = 'P'
ORDER BY spriden_last_name, spriden_first_name
RIGHT OUTER Join
Result:
(franz)
FULL OUTER Join
(franz)
SELECT spriden_last_name, spriden_first_name, spbpers_sex,
spbpers_birth_date
FROM spriden FULL OUTER JOIN spbpers
ON spriden_pidm = spbpers_pidm
WHERE spriden_change_ind is null
and spriden_entity_ind = 'P'
ORDER BY spriden_last_name, spriden_first_name
FULL OUTER Join
(franz)
Result:
PIDM added to indicate which rows were selected.
Laboratory Seven
 Objectives:
 Develop competence with outer joins
 Steps:
 Outer Join Query
JOIN Query
(franz)
Write a JOIN. Use either an “INNER”, “OUTER” or “FULL”.
Use tables SPRIDEN and SPRADDR.
Include both PIDMs, last name, first name, first address line,
address type.
Join on PIDM
Select the Active name record
Select entity indicator for a Person
Select PIDMs less than 70,000
Select last names that start with “Bea”
Order the result set by SPRIDEN PIDM, then SPRADDR PIDM
Answer
(franz)
SELECT spriden_pidm, spraddr_pidm,
spriden_last_name,
spriden_first_name,
spraddr_atyp_code,
spraddr_street_line1
FROM spriden LEFT OUTER JOIN spraddr
ON spriden_pidm = spraddr_pidm
WHERE spriden_change_ind is null
AND spriden_entity_ind = 'P'
AND spriden_pidm < 70000
AND spriden_last_name like 'Bea%'
ORDER BY spriden_pidm, spraddr_pidm
Multiple Connected Select Statements
 Outline
 Union, intersection, minus
 Lab 8:
 Subqueries
 Use directly: FROM clause
 Use as a set: new operators
 Use as a single value: aggregate functions
 Lab 9:
 A Query Development Methodology