X-Data: Test Data Generation for Killing SQL Mutants

Transcript X-Data: Test Data Generation for Killing SQL Mutants

Generating Test Data for
Killing SQL Mutants:
A Constraint Based Approach
Shetal Shah, S. Sudarshan,
Suhas Kajbaje, Sandeep Patidar
Bhanu Pratap Gupta, Devang Vira
CSE Department, IIT Bombay
ICDE 2011
1
Testing SQL Queries: A
Challenge


Complex SQL queries hard to get right
Question: How to check if an SQL query is
correct?


Formal verification is not applicable since we do
not have a separate specification and an
implementation
State of the art solution: manually generate test
databases and check if the query gives the
intended result

Often misses errors
2
ICDE 2011
Example

Find number of high paid instructors in each
department
deptnameGcount(instrID)
deptnameGcount(instrID)
σ sal>100000
dept
σ sal>100000
instructor
dept
instructor
Problem: does not output departments with no high paid instructors
Test dataset should be able to distinguish these two alternatives
ICDE 2011
3
Generating Test Data: Prior
Work

Automated Test Data generation


Initial work based on database constraints alone
Later work: DB constraints + SQL query
 Agenda, Reverse Query Processing (RQP), QAGen
 Our earlier work in ICDE 2010 (poster) on killing SQL
mutants
 More recent (independent) de la Riva et al [AST10]
consider test data generation for killing mutants




But limited space of mutations, no guarantees of
completeness
None of the above guarantee anything about detecting
errors in SQL queries
Question: How do you model SQL errors?
Answer: Query Mutation
4
ICDE 2011
Query Mutations

Mutations: model common programming errors


Join used instead of outerjoin (or vice versa)
Join/selection condition errors




Wrong aggregate (min vs. max)
Mutant: syntactic variation of the given query Q


< vs. <=, missing or extra condition
Mutant may be the intended query
A dataset D kills a mutant M of query Q if output of
Q and M differ on D
For a mutation space, a dataset suite is complete if
all non-equivalent mutants are killed
5
ICDE 2011
Mutation Testing of Queries

Mutation testing can be used to check coverage of a
test suite


Given a test suite, generate mutants in a pre-defined
mutation space; evaluate on a given test suite
Our goal is different:



Generate datasets for the given query, based on possible
mutations
Each dataset and query result on the dataset shown to a
human, who verifies that the query result is what is
expected on that dataset
We do not need to actually generate and execute mutants
6
ICDE 2011
Query and Mutation Space
Query: Single Block SQL Query with unconstrained
aggregation at the top
Mutation Space:
 Join Type Mutation : change of any of inner join, left
outer join, right outer join, full outer join to another



Across all possible join orders (unlike earlier work)
Comparison Operator Mutations: An occurrence of one
of (=, <, >, <=, >=, <>) in the WHERE clause
replaced by another
Unconstrained Aggregation Mutations:

MIN ↔ MAX, SUM ↔ SUM(DISTINCT)
AVG ↔ AVG(DISTINCT), COUNT ↔ COUNT(DISTINCT)
7
ICDE 2011
Killing Join Mutants: Basic Idea

Schema: r(A), s(B)

To kill this mutant: ensure that for some r tuple there is no matching
s tuple
Generated test case



r(A) ={(1)}; s(B) ={}
Basic idea, version 1 [ICDE 2010]
 run query on given database,
 from result extract matching tuples for r and s
 delete s tuple to ensure no matching tuple for r

Limitation: foreign keys, repeated relations
8
ICDE 2011
Need to Ensure Difference in Final
Query Result

Schema: r(A,B), s(C,D), t(E); Extra join above mutated node

To kill this mutant we must ensure that for an r tuple there is no
matching s tuple, but there is a matching t tuple
Generated test case: r(A,B)={(1,2)}; s(C,D)={}; t(E)={(2)}
Our approach: generate not-matching constraints of the form
∃ r1 in r, t1 in t s.t. (r1.B=t1.E and ∃ si in s s.t. (si.C=r1.A))
and solve using a solver
 Informally, we “nullify” s to kill the above mutation


9
ICDE 2011
Working around Foreign Key
Constraints




teaches
instructor
is equivalent to teaches
instructor if there is a
foreign key from teaches.ID to instructor.ID
BUT: teaches
σ dept=CS(instructor)
is not equivalent to teaches
σ dept=CS(instructor)
Key idea: have a teaches tuple with an instructor not
from CS
Selections and joins can be used to kill mutations

We show that it comes for free when we “nullify” relations
to kill other join mutations or generate data to kill selection
mutations
10
ICDE 2011
Join Orders and Equivalent
Mutations

Schema: r(A,B), s(C,D), t(E), Mutation: r

Any result with a r.B being null will be removed by join with t
Similarly equivalence can result due to selections
But mutation of
s to
s would be non equivalent with join
order (r
t)
s


s

SQL Query may not specify join orders; many join trees possible; the
datasets should kill mutants across all join orders

Note: de la Riva [AST10] do not consider alternative join orders
11
ICDE 2011
Equivalent Trees and Equivalence
Classes of Attributes

Whether query conditions written as




A.x = B.x AND B.x = C.x or as
A.x = B.x AND A.x = C.x
should not affect set of mutants generated
Solution: Equivalence classes of attributes
12
ICDE 2011
Data Generation Algorithm
Assumptions

Only PK/FK constraints, single block SQL queries, only simple
arithmetic expressions, predicates are purely conjunctive (and a
few more in paper)

Algorithm Overview

For each dataset generate constraints.

1.
2.
3.
DB constraints : primary key, foreign key constraints
Query constraints: join conditions, selection conditions, other
than conditions inconsistent with non-matching constraints (see
below)
Constraints to kill mutations: not-matching constraints derived
from join/selection conditions

One constraint per dataset
Generate Test Data

1.
Using a constraint solver (currently CVC3) on above constraints
13
ICDE 2011
Main Procedure:
generateDataSet(query q)

preprocess query tree



build equivalence classes
foreign key closure
generateDataSetForOriginalQuery()
A constraint set with only DB and Query constraints such
that the result set for q is non-empty




killEquivalenceClasses()
killOtherPredicates()
killComparisonOperators()
killAggregates()
14
ICDE 2011
Killing Join Mutants: Equijoin
(killEquivalenceClasses)

Query : r



t where r.A = s.B and s.B =t.C
r.A, s.B, t.C are in one equivalence class
Generate one dataset for each element of each
equivalence class
Three datasets generated, with not-matching
mutation constraints:




s
D1: ∃r1εr ∃s1εs (r1.A = s1.B ∧ ∃tiεt (ti.C=r1.A))
D2: ∃s1εs ∃t1εt (s1.B = t1.C ∧ ∃riεr (ri.A=s1.B))
D3: ∃r1εr ∃t1εt (r1.A = t1.C ∧ ∃siεs (si.B=t1.C))
But if t.C is an FK referencing s.B, instead generate

D3’: ∃r1εr (∃tiεt, siεs (si.B=r1.A ∧ ti.C=r1.A))
15
ICDE 2011
Comparison Operation
Mutations



Example of comparison operation mutations:
A < 5 vs. A <= 5 vs. A > 5 vs A >= 5 vs. A=5,
vs A <> 5
Idea: generate three datasets (leaving rest of
query unchanged), one each for:




A<5
A=5
A>5
These datasets will kill all above mutations
16
ICDE 2011
Aggregation Operation
Mutations





Aggregation operations
 count(A) ↔ count(distinct A); sum(A) ↔ sum(distinct A)
 avg(A) ↔ avg(distinct A); min(A) ↔ max(A)
Eg: select min (salary) from instructor group by dept_name;
Dataset Generated:
[CS, Srinivasan, 80000]
One group with two values equal (and <> 0),
[CS, Wu, 80000]
Third different
[CS, Alex, 65000]
will kill each of the above pairs of mutations
Issues:
 Database/query constraints forcing A to be unique
 Database/query constraints forcing G to be unique
Carefully crafted set of constraints, which are relaxed to handle
such cases
 Details in paper
17
ICDE 2011
On Completeness and Complexity




With respect to query size:
 Number of datasets generated: linear
 Number of mutants: exponential
Theorem: The problem of generating data to kill a mutant is NPhard
 reduction from query containment; polynomial in special cases.
Theorem: For the class of queries, and the space of join-type
and selection mutations considered, the suite of datasets
generated by our algorithm is complete, i.e., the datasets kill all
non-equivalent mutations of a given query
Also complete for a restricted classes of aggregation mutations
with aggregation as top operation of tree, under some restrictions
on joins in input
18
ICDE 2011
Performance Results


University database schema from Database System
Concepts 6th Ed consisting of 7 relations
Further Optimizations : Unfolding of bounded first
order quantified constraints


E.g. “for all i in 1 to 4, P(i)” unfolded as
P(1) ∧ P(2) ∧ P(3) ∧ P(4)
Also tested with addition of input database
constraints

Forcing output tuples to come from input database
19
ICDE 2011
Results for inner join queries




# Datasets generated small; # mutants (killed) grows exponentially
Unfolding of constraints gives substantial benefits
Increasing # foreign keys reduces the total time taken
Results similar for queries with left/right outer join
20
ICDE 2011
Results for queries with
selections, aggregations


Unfolding reduces times taken, epsecially when the number of
tuples are large (aggregations with joins)
(Not shown above) using input database increases time taken
but not by much
21
ICDE 2011
Conclusions and Future Work

Presented a principled approach to data generation
for testing queries


We’ve only made first steps; more work to be done:





And showed it works efficiently for a large class of queries
Constrained aggregations and sub-queries
Other mutations: e.g., missing/extra join and selection
conditions
Multiple queries
Query and form parameters
Acknowledgements


Bhupesh Chawda for discussions/implementation help
Microsoft Research India for their generous travel support
22
ICDE 2011
Questions
Thank You
ICDE 2011
23
Related Work



Tuya and Suarez-Cabal [IST07], Chan et al.
[QSIC05] defined a class of SQL query mutations
Shortcoming: do not address test data generation
More recently (and independent of our work) de la
Riva et al [AST10] address data generation using
constraints, with the Alloy solver

Do not consider alternative join orders, No
completeness results, Limitations on constraints
24
ICDE 2011
Dataset for Original Query





Array of tuples of constraint variables, per relation
One or more constraint tuple from array, for each occurrence of a
relation
 plus occurrences to ensure f.k. constraints
Equality conditions between variables based on equijoins
Other selection and join conditions become constraints
Constraints for primary and foreign keys
 f.k. from R.A to S.B


p.k. on R.A



FORALL i EXISTS j (O_R[i].A = O_S[j].B);
FORALL i FORALL j (O_R[i].A = O_R[j].A]) => “all other attrs equal”
since range is over finite domain, p.k. and f.k. constraints can be
unfolded
See sample set of constraints in file cvc3_0.cvc
25
ICDE 2011
Killing Join Mutants: Equijoin


killEquivalenceClasses()
for each equivalence class ec do


Let allRelations := Set of all <rel, attr> pairs in ec
for each element e in allRelations do
 conds := empty set
 Let e := R.a
 S := (set of elements in ec which are foreign keys
referencing R.a directly or indirectly) UNION R:a
 P := ec - S
 if P:isEmpty() then


continue
else … main code for generating constraints (see next
slide)
26
ICDE 2011
Killing Join Mutants:
EquiJoins




conds.add(generateEqConds(P))
conds:add(
“NOT EXISTS i: R[i].a = ” + cvcMap(P[0]))
for all other equivalence classes oe do


for each other predicate p do




conds.add(generateEqConds(oe))
conds:add(cvcMap(p))
conds.add(genDBConstraints()) /*P.K. and F.K*/
callSolver(conds)
if solution exists then

create a dataset from solver output
27
ICDE 2011
Killing Other Predicates


Create separate dataset for each attribute in
predicate
e.g. For Join condition B.x = C.x + 10

Dataset 1 (nullifying B:x):


ASSERT NOT EXISTS (i : B_INT) : (B[i].x = C[1].x +
10);
Dataset 2 (nullifying C:x):

ASSERT NOT EXISTS (i : C_INT) : (B[1].x = C[i].x +
10);
28
ICDE 2011

X-Data: Test Data Generation for Killing SQL Mutants

Transcript X-Data: Test Data Generation for Killing SQL Mutants

Directory