Managing Business Intelligence
Download
Report
Transcript Managing Business Intelligence
Advanced SQL
Programming
Mark Holm
Centerfield Technology
Goals
Introduce
some useful advanced SQL
programming techniques
Show you how to let the database do more
work to reduce programming effort
Go over some basic techniques and tips to
improve performance
2
Notes
V4R3
and higher syntax used in examples
Examples show only a small subset of what
can be done!
3
Agenda
files - techniques, do’s and don’ts
Query within a query - Subqueries
Stacking data - Unions
Simplifying data with Views
Referential Integrity and constraints
Performance, performance, performance
Joining
4
Joining files
Joins
are used to relate data from different tables
Data can be retrieved with one “open file” rather
than many
Concept is identical to join logical files without an
associated permanent object (except if the join is
done with an SQL view)
5
Join types
Inner
Join
– Used to find related data
Left
Outer (or simply Outer) Join
– Used to find related data and ‘orphaned’ rows
Exception
Join
– Used to only find ‘orphaned’ rows
Cross
Join
– Join all rows to all rows
6
LastName
Doe
Smith
Dept
397
450
Sally
Anderson
250
Department table
FirstName
John
Cindy
Employee table
Sample tables
Dept
397
550
Area
Development
Marketing
250
Sales
Inner Join
• Method #1 - Using the WHERE Clause
SELECT LastName, Division FROM Employee, Department
WHERE Employee.Dept = Department.Dept
• Method #2 - Using the JOIN Clause
SELECT LastName, Division FROM Employee INNER
JOIN Department ON Employee.Dept = Department.Dept
NOTE: This method is useful if you need to influence the order of the tables are
joined in for performance reasons. Only works on releases prior to V4R4.
8
Results
• Return list of employees that are in a valid
department.
Result table
• Employee ‘Smith’ is not returned because she is not
in a department listed in the ‘Department’ table
LastName
Doe
Anderson
Area
Development
Sales
9
Left Outer Join
• Must use Join Syntax
SELECT LastName, Area FROM Employee
LEFT OUTER JOIN Department
ON Employee.Dept = Department.Dept
10
Results
• Return list of employees even if they are not in a
valid department
Result table
• Employee ‘Smith’ has a NULL Area because it could
not be associated with a valid Dept
LastName
Doe
Smith
Area
Development
-
Anderson
Sales
11
Exception Join
• Must use Join Syntax
SELECT LastName, Area FROM Employee
EXCEPTION JOIN Department
ON Employee.Dept = Department.Dept
12
Results
• Return list of employees only if they are NOT in a
valid department
Result table
• Employee ‘Smith’ is only one without a valid
department
LastName
Smith
Area
-
13
WARNING!
The
order tables are listed in the FROM
clause is important
For OUTER and EXCEPTION joins, the
database must join the tables in that order.
The result may be horrible
performance…more on this topic later
14
Observations
Joins
provide one way to bury application
logic in the database
Each join type has a purpose and can be
used to not only get the data you want but
identify “incomplete” information
With some exceptions, if joined properly
performance should be at least as good as
an application
15
Subqueries
Subqueries
are a powerful way to select
only the data you need without separate
statements.
Example: List employees making a higher
than average salary
16
Subquery Example
SELECT FNAME, LNAME FROM EMPLOYEE
WHERE SALARY > (SELECT AVG(SALARY)
FROM EMPLOYEE)
SELECT FNAME, LNAME FROM EMPLOYEE
WHERE SALARY > (SELECT AVG(SALARY)
FROM EMPLOYEE
WHERE LNAME = ’JONES’)
17
Subqueries - types
Correlated
– Inner select refers to part of the outer (parent)
select (multiple evaluations)
Non-Correlated
– Inner select does not relate to outer query (one
evaluation)
18
Subquery Tips 1
Subquery
optimization (2nd statement will
be faster)
– SELECT name FROM employee WHERE
salary > ALL (SELECT salary FROM salscale)
– SELECT name FROM employee WHERE
salary > (SELECT max(salary) FROM salscale)
19
Subquery Tips 2
Subquery
optimization (2nd statement will
be faster)
– SELECT name FROM employee WHERE
salary IN (SELECT salary FROM salscale)
– SELECT name FROM employee WHERE
EXISTS (SELECT salary FROM salscale
WHERE employee.salid = salscale.salid)
20
UNIONs
Unions
provide a way to append multiple
row sets files in one statement
Example: Process all of the orders from
January and February
SELECT * FROM JanOrders WHERE SKU = 199976
UNION
SELECT * FROM FebOrders WHERE SKU = 199976
21
Unions
Each
SELECT statement that is UNIONed
together must have the same number of
result columns and have compatible types
Two forms of syntax
– UNION ALL -- allow duplicate records
– UNION -- return only distinct rows
22
Views
Views
provide a convenient way to
permanently put SQL logic
Create once and use many times
Also make the database more
understandable to users
Can put simple business rules into views to
ensure consistency
23
Views
Example:
Make it easy for the human
resources department to run a report that
shows ‘new’ employees.
CREATE VIEW HR/NEWBIES (EMPLOYEE_NAME, DEPARTMENT, HIRE_DATE) AS
SELECT concat(concat(strip(last_name),','),strip(first_name)),
department,
hire_date
FROM
WHERE
HR/EMPLOYEE
(year(current date)-year(hire_date)) < 2
24
Performance
SQL performance
is harder to predict and
tune than native I/O.
SQL provides a powerful way to manipulate
data but you have little control over HOW it
does it.
Query optimizer takes responsibility for
doing it ‘right’.
25
Performance - diagnosis
Getting
information about how the
optimizer processed a query is crucial
Can be done via one or all of the following:
–
–
–
–
STRDBG: debug messages in job log
STRDBMON: optimizer info put in file
QAQQINI: can be used to force messages
CHGQRYA: messages put out when time limit
set to 0
26
Performance tips
Create
indexes
– Over columns that significantly limit data in
WHERE clause
– Over columns that join tables together
– Over columns used in ORDER BY and
GROUP BY clauses
27
Performance tips
Create
Encoded Vector Indexes (EVI’s)
– Most useful in heavy query environments with
a lot of data (e.g. large data warehouses)
– Helps queries that process between 20-60% of a
table’s data
– Create over columns with a modest number of
distinct values and those with data skew
– EVI’s bridge the gap between traditional
indexes and table scans
28
Performance tips
Encourage
optimizer to use indexes
– Use keyed columns in WHERE clause if
possible
– Use ANDed conditions as much as possible
– OPTIMIZE FOR n ROWS
– Don’t do things that eliminate index use
Data
conversion (binary-key = 1.5)
LIKE clause w/leading wildcard (NAME LIKE
‘%JOE’)
29
Performance tips
Keep
statements simple
– Complex statements are much more difficult to
optimize
– Provide more opportunity for the optimizer to
choose a sub-optimal plan of attack
30
Performance tips
Enable
DB2 to use parallelism
– Query processed by many tasks (CPU
parallelism) or by getting data from many disks
at once (I/O parallelism)
– CPU parallelism requires IBM’s SMP feature
and a machine with multiple processors
– Enabled via the QQRYDEGREE system value,
CHGQRYA, or the QAQQINI file
31
Other useful features
CASE
clause - conditional calculations
ALIAS - access to multi-member files
Primary/Foreign keys - referential integrity
Constraints
32
CASE
Conditional
calculations with CASE
SELECT Warehouse, Description,
CASE RegionCode
WHEN 'E' THEN 'East Region'
WHEN 'S' THEN 'South Region'
WHEN 'M' THEN 'Midwest Region'
WHEN 'W' THEN 'West Region'
END
FROM Locations
33
CASE
Avoiding
calculation errors (e.g. division by 0)
SELECT Warehouse, Description,
CASE NumInStock
WHEN 0 THEN NULL
ELSE CaseUnits/NumInStock
END
FROM Inventory
34
ALIAS names
The
CREATE ALIAS statement creates an alias
on a table, view, or member of a database file.
– CREATE ALIAS alias-name FOR table member
Example:
Create an alias over the second
member of a multi-member physical file
– CREATE ALIAS February FOR MonthSales
February
35
Referential Integrity
Keeps
two or more files in synch with each
other
Ensures that children rows have parents
Can also be used to automatically delete
children when parents are deleted
36
Referential Integrity Rules
A row
inserted into a child table must have
a parent row (typically in another table).
Parent rules
– A parent row can not be deleted if there are
dependent children (Restrict rule) OR
– All children are also deleted (Cascade rule) OR
– All children’s foreign keys are changed (Set
Null and Set Default rules)
37
Parent table
Child table
38
Foreign
Key
Primary
Key
Primary key must
be unique
Referential Integrity syntax
ALTER TABLE
Hr/Employee ADD
CONSTRAINT EmpPK PRIMARY KEY
(EmployeeId)
ALTER TABLE Hr/Department ADD
CONSTRAINT EmpFK FOREIGN KEY
(EmployeeId) REFERENCES Hr/Employee
(EmployeeId) ON DELETE CASCADE
ON UPDATE RESTRICT
39
Check Constraints
Rules
which limit the allowable values in one or
more columns:
CREATE TABLE Employee
(FirstName CHAR(20),
LastName CHAR(30),
Salary CHECK (Salary>0 AND Salary<200000))
40
Check Constraints
Effectively
does data checking at the database
level.
Data checking done with display files or
application logic can now be done at the
database level.
Ensures that it is always done and closes “back
doors” like DFU, ODBC, 3-rd party utilities….
41
Other resources
Database Design and Programming for DB2/400 - book by Paul Conte
SQL for Smarties - book by Joe Celko
SQL Tutorial - www.as400network.com
AS/400 DB2 web site at http://www.as400.ibm.com/db2/db2main.htm
Publications at http://publib.boulder.ibm.com/pubs/html/as400/
Our web site at http://www.centerfieldtechnology.com
42
Summary
SQL is
a powerful way to access and
process data
Used effectively, it can reduce the time it
takes to build applications
Once tuned, it can perform very close (and
sometimes better) than HLL’s alone
43
Good Luck
and
Happy SQLing