Managing Business Intelligence

Download Report

Transcript Managing Business Intelligence

Advanced SQL
Programming
Mark Holm
Centerfield Technology
Goals
 Introduce
some useful advanced SQL
programming techniques
 Show you how to let the database do more
work to reduce programming effort
 Go over some basic techniques and tips to
improve performance
2
Notes
 V4R3
and higher syntax used in examples
 Examples show only a small subset of what
can be done!
3
Agenda
files - techniques, do’s and don’ts
 Query within a query - Subqueries
 Stacking data - Unions
 Simplifying data with Views
 Referential Integrity and constraints
 Performance, performance, performance
 Joining
4
Joining files
 Joins
are used to relate data from different tables
 Data can be retrieved with one “open file” rather
than many
 Concept is identical to join logical files without an
associated permanent object (except if the join is
done with an SQL view)
5
Join types
 Inner
Join
– Used to find related data
 Left
Outer (or simply Outer) Join
– Used to find related data and ‘orphaned’ rows
 Exception
Join
– Used to only find ‘orphaned’ rows
 Cross
Join
– Join all rows to all rows
6
LastName
Doe
Smith
Dept
397
450
Sally
Anderson
250
Department table
FirstName
John
Cindy
Employee table
Sample tables
Dept
397
550
Area
Development
Marketing
250
Sales
Inner Join
• Method #1 - Using the WHERE Clause
SELECT LastName, Division FROM Employee, Department
WHERE Employee.Dept = Department.Dept
• Method #2 - Using the JOIN Clause
SELECT LastName, Division FROM Employee INNER
JOIN Department ON Employee.Dept = Department.Dept
NOTE: This method is useful if you need to influence the order of the tables are
joined in for performance reasons. Only works on releases prior to V4R4.
8
Results
• Return list of employees that are in a valid
department.
Result table
• Employee ‘Smith’ is not returned because she is not
in a department listed in the ‘Department’ table
LastName
Doe
Anderson
Area
Development
Sales
9
Left Outer Join
• Must use Join Syntax
SELECT LastName, Area FROM Employee
LEFT OUTER JOIN Department
ON Employee.Dept = Department.Dept
10
Results
• Return list of employees even if they are not in a
valid department
Result table
• Employee ‘Smith’ has a NULL Area because it could
not be associated with a valid Dept
LastName
Doe
Smith
Area
Development
-
Anderson
Sales
11
Exception Join
• Must use Join Syntax
SELECT LastName, Area FROM Employee
EXCEPTION JOIN Department
ON Employee.Dept = Department.Dept
12
Results
• Return list of employees only if they are NOT in a
valid department
Result table
• Employee ‘Smith’ is only one without a valid
department
LastName
Smith
Area
-
13
WARNING!
 The
order tables are listed in the FROM
clause is important
 For OUTER and EXCEPTION joins, the
database must join the tables in that order.
 The result may be horrible
performance…more on this topic later
14
Observations
 Joins
provide one way to bury application
logic in the database
 Each join type has a purpose and can be
used to not only get the data you want but
identify “incomplete” information
 With some exceptions, if joined properly
performance should be at least as good as
an application
15
Subqueries
 Subqueries
are a powerful way to select
only the data you need without separate
statements.
 Example: List employees making a higher
than average salary
16
Subquery Example
SELECT FNAME, LNAME FROM EMPLOYEE
WHERE SALARY > (SELECT AVG(SALARY)
FROM EMPLOYEE)
SELECT FNAME, LNAME FROM EMPLOYEE
WHERE SALARY > (SELECT AVG(SALARY)
FROM EMPLOYEE
WHERE LNAME = ’JONES’)
17
Subqueries - types
 Correlated
– Inner select refers to part of the outer (parent)
select (multiple evaluations)
 Non-Correlated
– Inner select does not relate to outer query (one
evaluation)
18
Subquery Tips 1
 Subquery
optimization (2nd statement will
be faster)
– SELECT name FROM employee WHERE
salary > ALL (SELECT salary FROM salscale)
– SELECT name FROM employee WHERE
salary > (SELECT max(salary) FROM salscale)
19
Subquery Tips 2
 Subquery
optimization (2nd statement will
be faster)
– SELECT name FROM employee WHERE
salary IN (SELECT salary FROM salscale)
– SELECT name FROM employee WHERE
EXISTS (SELECT salary FROM salscale
WHERE employee.salid = salscale.salid)
20
UNIONs
 Unions
provide a way to append multiple
row sets files in one statement
 Example: Process all of the orders from
January and February
SELECT * FROM JanOrders WHERE SKU = 199976
UNION
SELECT * FROM FebOrders WHERE SKU = 199976
21
Unions
 Each
SELECT statement that is UNIONed
together must have the same number of
result columns and have compatible types
 Two forms of syntax
– UNION ALL -- allow duplicate records
– UNION -- return only distinct rows
22
Views
 Views
provide a convenient way to
permanently put SQL logic
 Create once and use many times
 Also make the database more
understandable to users
 Can put simple business rules into views to
ensure consistency
23
Views
 Example:
Make it easy for the human
resources department to run a report that
shows ‘new’ employees.
CREATE VIEW HR/NEWBIES (EMPLOYEE_NAME, DEPARTMENT, HIRE_DATE) AS
SELECT concat(concat(strip(last_name),','),strip(first_name)),
department,
hire_date
FROM
WHERE
HR/EMPLOYEE
(year(current date)-year(hire_date)) < 2
24
Performance
 SQL performance
is harder to predict and
tune than native I/O.
 SQL provides a powerful way to manipulate
data but you have little control over HOW it
does it.
 Query optimizer takes responsibility for
doing it ‘right’.
25
Performance - diagnosis
 Getting
information about how the
optimizer processed a query is crucial
 Can be done via one or all of the following:
–
–
–
–
STRDBG: debug messages in job log
STRDBMON: optimizer info put in file
QAQQINI: can be used to force messages
CHGQRYA: messages put out when time limit
set to 0
26
Performance tips
 Create
indexes
– Over columns that significantly limit data in
WHERE clause
– Over columns that join tables together
– Over columns used in ORDER BY and
GROUP BY clauses
27
Performance tips
 Create
Encoded Vector Indexes (EVI’s)
– Most useful in heavy query environments with
a lot of data (e.g. large data warehouses)
– Helps queries that process between 20-60% of a
table’s data
– Create over columns with a modest number of
distinct values and those with data skew
– EVI’s bridge the gap between traditional
indexes and table scans
28
Performance tips
 Encourage
optimizer to use indexes
– Use keyed columns in WHERE clause if
possible
– Use ANDed conditions as much as possible
– OPTIMIZE FOR n ROWS
– Don’t do things that eliminate index use
 Data
conversion (binary-key = 1.5)
 LIKE clause w/leading wildcard (NAME LIKE
‘%JOE’)
29
Performance tips
 Keep
statements simple
– Complex statements are much more difficult to
optimize
– Provide more opportunity for the optimizer to
choose a sub-optimal plan of attack
30
Performance tips
 Enable
DB2 to use parallelism
– Query processed by many tasks (CPU
parallelism) or by getting data from many disks
at once (I/O parallelism)
– CPU parallelism requires IBM’s SMP feature
and a machine with multiple processors
– Enabled via the QQRYDEGREE system value,
CHGQRYA, or the QAQQINI file
31
Other useful features
 CASE
clause - conditional calculations
 ALIAS - access to multi-member files
 Primary/Foreign keys - referential integrity
 Constraints
32
CASE
 Conditional
calculations with CASE
SELECT Warehouse, Description,
CASE RegionCode
WHEN 'E' THEN 'East Region'
WHEN 'S' THEN 'South Region'
WHEN 'M' THEN 'Midwest Region'
WHEN 'W' THEN 'West Region'
END
FROM Locations
33
CASE
 Avoiding
calculation errors (e.g. division by 0)
SELECT Warehouse, Description,
CASE NumInStock
WHEN 0 THEN NULL
ELSE CaseUnits/NumInStock
END
FROM Inventory
34
ALIAS names
 The
CREATE ALIAS statement creates an alias
on a table, view, or member of a database file.
– CREATE ALIAS alias-name FOR table member
 Example:
Create an alias over the second
member of a multi-member physical file
– CREATE ALIAS February FOR MonthSales
February
35
Referential Integrity
 Keeps
two or more files in synch with each
other
 Ensures that children rows have parents
 Can also be used to automatically delete
children when parents are deleted
36
Referential Integrity Rules
 A row
inserted into a child table must have
a parent row (typically in another table).
 Parent rules
– A parent row can not be deleted if there are
dependent children (Restrict rule) OR
– All children are also deleted (Cascade rule) OR
– All children’s foreign keys are changed (Set
Null and Set Default rules)
37
Parent table
Child table
38
Foreign
Key
Primary
Key
Primary key must
be unique
Referential Integrity syntax
 ALTER TABLE
Hr/Employee ADD
CONSTRAINT EmpPK PRIMARY KEY
(EmployeeId)
 ALTER TABLE Hr/Department ADD
CONSTRAINT EmpFK FOREIGN KEY
(EmployeeId) REFERENCES Hr/Employee
(EmployeeId) ON DELETE CASCADE
ON UPDATE RESTRICT
39
Check Constraints
 Rules
which limit the allowable values in one or
more columns:
CREATE TABLE Employee
(FirstName CHAR(20),
LastName CHAR(30),
Salary CHECK (Salary>0 AND Salary<200000))
40
Check Constraints
 Effectively
does data checking at the database
level.
 Data checking done with display files or
application logic can now be done at the
database level.
 Ensures that it is always done and closes “back
doors” like DFU, ODBC, 3-rd party utilities….
41
Other resources

Database Design and Programming for DB2/400 - book by Paul Conte

SQL for Smarties - book by Joe Celko

SQL Tutorial - www.as400network.com
AS/400 DB2 web site at http://www.as400.ibm.com/db2/db2main.htm
Publications at http://publib.boulder.ibm.com/pubs/html/as400/
Our web site at http://www.centerfieldtechnology.com



42
Summary
 SQL is
a powerful way to access and
process data
 Used effectively, it can reduce the time it
takes to build applications
 Once tuned, it can perform very close (and
sometimes better) than HLL’s alone
43
Good Luck
and
Happy SQLing