Query DIFF Utility

Download Report

Transcript Query DIFF Utility

MDCFUG
December 10, 2002
Joan Falcão
[email protected]
Query DIFF Utility
Comparing the data
In 2 tables (or queries)
-- show the differences --
“Why would you want to do that?”
 Specifically – verify that data transferred from
one system to another is identical before
deleting it at the source.
 Generally – have a quality-assurance tool to
analyze data differences:



Verify expected matches
Report on expected differences
Identify surprise differences and matches
Applications
 Find Differences in 2 similar tables
 See how a table has changed over the week
 QA for data entry effort (before and after)
 Verify expected changes due to code mods
 Compare the column names from MS-SQL
and Oracle schemas

Or compare length of text columns
 Analyze whether duplicate keys are duplicate
rows (heavy hand on the “submit” button?)
 See the results of modifying a query

Inner join vs outer join
 Null predicate
More Applications: QA for Ported
Systems or Regression Testing
dbms
data
conversion
routine
new
system
update
old
system
update
updated
dbms
ported
dbms
data
conversion
routine
updated
ported
dbms
Difference?
ported
updated
dbms
Reinventing the wheel?
 http://devex.macromedia.com/developer/gallery/index
.cfm



Cf_Venn: Remember Venn diagrams from school? This
Custom Tag compares two lists, performing any or all of
these operations: Union, Intersection, and Differences.
TDataSourceCompare: Compares all tables, columns,
datatypes of 2 MS SQL database objects that are registered
as datasources on your server. Used for checking your
Production database and Development database to make
sure your columns all match before a critical launch. DB1
First Datasource name. DB2 Second Datasource name.
CF_List_Compare is a custom tag that will compare the
values of two given lists (List 1, and List 2)
Reinventing the wheel?
http://www.databasejournal.com/features/mssql/article.php/1441271
August 7, 2002
SQL Script for Comparing the Contents of Two Tables
Here's some SQL code from Eli Leiba that will all you to compare two
tables -- say, table A and table B -- to determine if their content is the
same. Assuming that A and B have the same structure, here's how it
works. First, from set theory, recall that:
If ((|A| = |B|) && |A U B| = |A|)) ====>>> A = B
|A| = NUMBER of rows in A
|B| = NUMBER of rows in B
Reinventing the wheel?
http://www.databasejournal.com/features/mssql/article.php/1441271
(continued)
Here's the SQL code
(with T-SQL syntax):
declare @cnt1 int
declare @cnt2 int
declare @cnt3 int
declare @res bit
select @cnt1 = count(*) from A
select @cnt2 = count(*) from B
select @cnt3 = count('x')
from (select * from A
UNION
select * from B) as t
if (@cnt1 = @cnt2)
and (@cnt2 = @cnt3)
begin
set @res = 1
print 'A = B'
end
else
begin
set @res = 0
print 'A <> B'
end
go
Previous implementation
– UNION/JOIN query in Access
 Query is messy:
 query for each column name
 query for extra rows in each of the 2 tables
 extra predicates for nulls
 extra predicates to flag duplicate rows

(should table not have a primary key)
 Access permits union across dissimilar
columns (different data types)
 Access DOES permit inner join and union
across linked tables from different DBMS, but
I have not tested this.
Previous implementation
– UNIVAC’s Symbolic Stream Generator
 Generates source code (to be compiled
and executed) according to parameters
 Old technology (no longer available)
 Required a lot of parameters
 One
spec per column (location and type)
 Compared 2 files – very limited
 A query
is much more flexible
General ColdFusion Approach
 Define 2 queries
 With the same column names



rename with AS, if needed
With a common key (possibly multi-column)
Sort on the common key
 Merge queries on the common key
 Report extra keys
 Report data differences when keys match
 Feedback for user
 Column names
 Notification of errors
Parameters
in General-Purpose Differencing
 In a FORM, prompt for:
 Datasource
 Table
specification (query)
 Key specification
 Sort order
 Login parameters (username/password)
Demos
Challenges
in General-Purpose Programming
 Column name(s) not known for
SELECT *
 Number of columns not known
 Number of key columns not known
 Column name syntax in ORDER BY
clause differs across databases
ColdFusion essential features
 CFQUERY’s returned variable:
 queryname.ColumnList

ListGetAt(ISQLquery1.ColumnList,ii)
 CFQUERY column indexing:
 queryname.columnname[index]
 Evalute function:
Evaluate(‘queryname.columnname’)
Evaluate("ISQLquery1."&ListGetAt(ISQLquery1,ii)&"["&q1&"]")
Unexpected problems
 False matches
 SQL Case-sensitive vs CF case-insensitive
 ColdFusion null same as empty string

(unless query uses COALESCE to give a value to nulls)
 False differences
 Extra rows on both sides (“ships passing in the night”)
 Trailing blanks (do not count in SQL, do in CF)
 Sort sequences differ
 Across database products
 Across database configurations of same product
 ColdFusion has its own sort sequence
 SQL query special characters hard to store
 need URLEncodedFormat
More
Demos
Workarounds to sort problems
 Sort by UPPER(columnname)
 Replace known special characters with
something that sorts the same
 UPPER(REPLACE(columnname,’_’,’
 ReSort in ColdFusion,
 via
CF5 query-on-query
’))
Additional Problems
 In CFFORM my own javascript
inadvertantly disables CFINPUT
validation

onSubmit="return SubmitDiffForm(_CF_this);“
 Column names with special characters
won’t “evaluate” (I warn and abort)
 My javascript function to uppercase
character sort columns was too
confusing (now disabled)
Additional Implementation
Notes
 <CFSETTING EnableCFOUTPUTONLY="Yes">

To limit extraneous white space (lots of it)
 Hidden radio button trick – form variable is





always present and has values "Y,N“ or “N”
thisTag.generatedContent gives “playback”
Empty username/password attributes mean
“use defaults” (instead of ODBC error)
PreserveSingleQuotes needed in CFQUERY
Sandbox security may limit DIFFing
Written/tested in CF 4.5; enhanced with
optional CF5 features
Security concerns
 This utility must be protected;
 Otherwise, it’s a hacker’s dream
 NOTE: I use a session variable to hold a login
 Security can be disabled
 (in application.cfm, set BypassSecurity=“Yes”)
 OK, iff protection already exists
 Security must be configured
 Use an existing username/password to login


Avoids password management
Provide parameters in LoginParam.cfm
 Database passwords for queries are omitted
in stored DIFF specification
Feature Creep (extra funcionality)
 Javascript function cascades merge key
to sort key input boxes (less data entry)
 Option to save entire DIFF specification
in a file
 Batch execution (can be scheduled in CF admin)
 Email the differences report
 Resort (CF5 query-on-query) to avoid sort
sequence variance
Future Enhancements?
 Case-sensitive search
 Parameters to limit output
 Numeric tolerances in a match
 Encrypted passwords for datasources
 CfScript to speed it up