PPT - Personal Web Pages
Download
Report
Transcript PPT - Personal Web Pages
Generating Program Inputs for
Database Application Testing
Kai Pan, Xintao Wu
University of North Carolina at Charlotte
Tao Xie
North Carolina State University
26th IEEE/ACM International Conference on Automated Software Engineering
Nov 11, 2011 Lawrence, Kansas
Background
Functional Testing
Test Generation
Program Inputs
2
Background
Functional Testing
Test Generation
Program Inputs
3
Database States
An Example
Program inputs
Database
4
Motivation
5
Benefits to use an existing database state
Represent real-world objects’ characteristics,
helping detect
faults that could cause failures in real-world
settings
Reduce cost of generating new database
records
6
Dynamic Symbolic Execution (DSE)
Execute the program in both concrete and
symbolic way (also called concolic testing)
Collect constraints along executed path as
path condition
Negate part of the path condition and solve
the new path condition to lead to new path
DSE tools for various program languages
Pex for .NET from Microsoft Research
7
Motivation
Path Condition:
C1: Query construction constraints
8
Motivation
Path Condition:
C1: Query construction constraints
C2: Query/DB constraints
9
Motivation
Path Condition:
C1: Query construction constraints
C2: Query/DB constraints
C3: Result manipulation constraints
10
Motivation
Path Condition:
C1: Query construction constraints
C2: Query/DB constraints
C3: Result manipulation constraints
C1 ^ C2 ^ C3
11
Motivation
Path Condition:
C1: Query construction constraints
C2: Query/DB constraints
C3: Result manipulation constraints
C1 ^ C2 ^ C3
12
A hard part
Motivation
How to derive high-covering
program input values based on
a given database state?
13
Outline
Background
Approach
Evaluation
Conclusion and future work
14
SQL query forms
Fundamental structure: SELECT, FROM, WHERE,
GROUP BY, and HAVING clauses.
SELECT select-list
FROM from-list
WHERE qualification
(GROUP BY grouping-list)
(HAVING group-qualification)
15
SQL query forms (cont’d)
Nested query: a query with another query embedded
within it
Nested query can be unnested into equivalent single level
canonical queries
transoformation rules
SELECT S.sname
SELECT S.sname
FROM Sailors S
FROM
WHERE EXISTS ( SELECT *
WHERE R.sid=S.sid AND R.bid=103
Sailors S, Reserves R
FROM Reserves R
WHERE R.bid=103 AND
R.sid=S.sid)
16
A nested query
Its canonical form
SQL query forms of focus
WHERE clause consisting of a disjunction of
conjunctions
SELECT C1, C2, ..., Ch
FROM from-list
WHERE (A11 AND ... AND A1n) OR ...
OR (Am1 AND ... AND Amn)
17
Outline
Background
Approach
Evaluation
Conclusion and future work
18
Illustrative example
19
Apply DSE on the existing database
Step1: DSE chooses “ type=0, zip=0 ”
executed query:
Q1: SELECT C.SSN, C.income, M.balance
FROM customer C, mortgage M
WHERE M.year=15 AND C.zipcode=1
AND C.SSN=M.SSN
Execution of Q1 zero record,
not covering loop body
20
Apply DSE on the existing database (cont’d)
Step2: DSE flips “type == 0” to “type != 0”
“type=1, zip=0”
executed query:
Q2: SELECT C.SSN, C.income, M.balance
FROM customer C, mortgage M
WHERE M.year=30 AND C.zipcode=1
AND C.SSN=M.SSN
Execution of Q2 zero record
not covering loop body
21
Apply DSE on the existing database (cont’d)
However,
An input like “type=0, zip=27694”
executed query:
Q3: SELECT C.SSN, C.income, M.balance
FROM customer C, mortgage M
WHERE M.year=15 AND
C.zipcode=27695
AND C.SSN=M.SSN
Execution of Q3 one record
{C.SSN = 001, C.income = 50000, M.balance =
20000}.
22
Covering Line14=true and
Line18=false
Apply DSE on the existing database (cont’d)
Furthermore,
An input like “type=0, zip=28222”,
executed query:
Q4: SELECT C.SSN, C.income, M.balance
FROM customer C, mortgage M
WHERE M.year=15 AND
C.zipcode=28223
AND C.SSN=M.SSN
Execution of Q4 one record
{C.SSN = 002, C.income = 150000, M.balance
= 30000}.
As a result, Line14=true and
23
Line18=true
Assist DSE to generate program inputs
How to derive high-covering
program input values based on
a given database state?
24
Our idea: construct auxiliary queries
Auxiliary query :
SELECT C.zipcode,
FROM customer C, mortgage M
WHERE M.year=15 AND
C.SSN=M.SSN
e.g., result set includes “fzip=27695”.
From “fzip=zip+1”, we derive “zip=27694”!
25
Our idea: construct auxiliary queries (cont’d)
Auxiliary query :
SELECT C.zipcode,
FROM customer C, mortgage M
WHERE M.year=15 AND
C.SSN=M.SSN
e.g., result set includes “fzip=27695”.
true
From “fzip=zip+1”, we derive “zip=27694”!
false
Cover Line14=true and Line18=false!
26
Our idea: construct auxiliary queries (cont’d)
Auxiliary query :
SELECT C.zipcode,
FROM customer C, mortgage M
WHERE M.year=15 AND
C.SSN=M.SSN
e.g., result set includes “fzip=27695”.
true
From “fzip=zip+1”, we derive “zip=27694”!
false
Cover Line14=true and Line18=false!
27
Act like “Constraint Solver” for
Program Constraints +DB State Constraints
Approach
Collect query construction
constraints
on program variables used in
the executed queries from
the program code
28
Approach (cont’d)
Collect query construction
constraints
on program variables used in
the executed queries from
the program code
Collect result manipulation
constraints
on comparing with record
values in the query’s result
set (such as “if (diff>100000)”)
29
Construct auxiliary queries
For path “Line04=true, Line14=true”,
construct the abstract query:
true
SELECT C.SSN, C.income, M.balance
FROM customer C, mortgage M
WHERE M.year=15 AND C.zipcode=‘fzip’
AND C.SSN=M.SSN
true
30
Construct auxiliary queries
For path “Line04=true, Line14=true”,
construct the abstract query:
true
SELECT C.SSN, C.income, M.balance
FROM customer C, mortgage M
WHERE M.year=15 AND C.zipcode=‘fzip’
AND C.SSN=M.SSN
Our target
true
31
Construct auxiliary queries
true
SELECT C.SSN, C.income, M.balance
FROM customer C, mortgage M
WHERE M.year=15 AND C.zipcode=‘fzip’
AND C.SSN=M.SSN
Construct auxiliary query
SELECT C.zipcode
32
true
Construct auxiliary queries
true
SELECT C.SSN, C.income, M.balance
FROM customer C, mortgage M
WHERE M.year=15 AND C.zipcode=‘fzip’
AND C.SSN=M.SSN
Construct auxiliary query
SELECT C.zipcode
FROM customer C, mortgage M
33
true
Construct auxiliary queries
true
SELECT C.SSN, C.income, M.balance
FROM customer C, mortgage M
WHERE M.year=15 AND C.zipcode=‘fzip’
AND C.SSN=M.SSN
Construct auxiliary query
SELECT C.zipcode
FROM customer C, mortgage M
WHERE M.year=15 AND C.SSN=M.SSN
34
true
Generate program input values
Run auxiliary query:
SELECT C.zipcode,
FROM customer C, mortgage M
WHERE M.year=15 AND
C.SSN=M.SSN
fzip:27695 or 28223
35
Generate program input values
Run auxiliary query:
SELECT C.zipcode,
FROM customer C, mortgage M
WHERE M.year=15 AND
C.SSN=M.SSN
fzip: 27695 or 28223
zip: 27694 or 28222
36
Generate program input values
Input combinations:
type: 0 or !0
X
zip: 27694 or 28222
true
“type=0, zip=27694” covers
Line04=true, Line14=true, but
Line18=false
true
false
37
Approach (cont’d)
Not enough!
Program variables in
branch condition after
executing the query may
be data-dependent on
returned record values.
How to cover Line18 true
branch?
38
Approach (cont’d)
To cover path
Line04=true,
Line14=true,
Line18=true
We need to extend
previous auxiliary
query
39
true
true
true
Construct auxiliary queries
We extend the WHERE clause
true
SELECT C.zipcode,
FROM customer C, mortgage M
WHERE M.year=15 AND
C.SSN=M.SSN
(----how to extend?----)
true
true
40
Construct auxiliary queries
We extend the WHERE clause
true
SELECT C.zipcode,
FROM customer C, mortgage M
WHERE M.year=15 AND
C.SSN=M.SSN
(----how to extend?----)
true
true
41
Construct auxiliary queries
We extend the WHERE clause
true
SELECT C.zipcode,
FROM customer C, mortgage M
WHERE M.year=15 AND
C.SSN=M.SSN AND
C.income - 1.5 * M.balance > 100000
true
true
42
Generate program input values
Run auxiliary query:
SELECT C.zipcode,
FROM customer C, mortgage M
WHERE M.year=15 AND
C.SSN=M.SSN
AND C.income - 1.5 *
M.balance > 100000
fzip=28223
43
Generate program input values
Run auxiliary query:
SELECT C.zipcode,
FROM customer C, mortgage M
WHERE M.year=15 AND
C.SSN=M.SSN
AND C.income - 1.5 *
M.balance > 100000
fzip=28223
44
zip=28222
Other issues (aggregate calculation)
Extend auxiliary
query with GROUP
BY and HAVING
clauses.
Involve multiple records
45
Other issues (aggregate calculation)
SELECT C.zipcode,
sum(M.balance)
FROM customer C, mortgage M
WHERE M.year=15 AND
C.SSN=M.SSN
AND C.income - 1.5 *
M.balance > 100000
GROUP BY C.zipcode
HAVING sum(M.balance) >
500000
46
Other issues (cardinality constraints)
SELECT C.zipcode
FROM customer C, mortgage M
WHERE M.year=15 AND C.SSN=M.SSN
AND C.income - 1.5 * M.balance > 100000
GROUP BY C.zipcode
HAVING COUNT(*) >= 3
Use a special DSE technique for dealing with inputdependent loops
47
P. Godefroid and D. Luchaup. Automatic partial loop summarization in dynamic test
generation. In ISSTA 2011.
Outline
Background
Approach
Evaluation
Conclusion and future work
48
Research questions
RQ1 (Effectiveness): What is the percentage
increase in code coverage by the program inputs
generated by Pex with our approach’s
assistance?
RQ2 (Cost): What is the cost of our approach’s
assistance?
49
Evaluation subjects
Two open source database applications
RiskIt
4.3K LOC, database: 13 tables, 57 attributes,
and >1.2 million records
17 DB-interacting methods selected for testing
UnixUsage
2.8K LOC, database: 8 tables, 31 attributes, and
50
>0.25 million records
28 DB-interacting methods selected for testing
Evaluation setup
Measurement for test generation
effectiveness: code coverage
cost: number of runs/paths, execution time
Procedure
run Pex w/o our approach’s assistance
perform our algorithms to generate new
additional test inputs
51
Evaluation results: RiskIt
Higher code coverage
52
Evaluation results: RiskIt
Low additional cost
53
Pex (only) timeout: 120 seconds
Even given longer time, no new coverage observed for Pex (only)
Evaluation results: RiskIt
54
Pex (only) timeout: 120 seconds
Even given longer time, no new coverage observed for Pex (only)
Evaluation results: UnixUsage
Preliminary Evaluation(cont’d)
Summary of evaluation results
RQ1: Effectiveness
RiskIt: 26% higher block coverage over Pex only
UnixUsage: 35% higher block coverage over Pex only
RQ2: Cost
RiskIt:
#runs/paths: 131 more over 1135 (Pex)
execution time: 517 secs more over 1781 (Pex)
UnixUsage
#runs/paths: 93 more over 1197 (Pex)
execution time: 580 secs more over 1718 (Pex)
56
Outline
Background
Approach
Evaluation
Conclusion
57
Conclusion
A new approach that formulates auxiliary queries
to bridge gap between program/DB constraints.
Act like a “constraint solver” for
program constraints + DB constraints
Empirical evaluations on 2 open source DB apps
our approach can assist DSE to generate program
inputs effectively achieving higher code coverage
with low additional cost.
58
Future Work
To construct auxiliary queries directly from
embedded complex queries (e.g., nested
queries), rather than from their
transformed norm forms.
To handle complex program context such
as multiple queries.
59
Thank you! Questions?
Acknowledgment: This work was supported in part by U.S. National
Science Foundation under CCF-0915059 for Kai Pan and Xintao Wu,
and under CCF-0915400 for Tao Xie.
60
Related Work
All previous related work addresses a
different problem: constructing both
program inputs and database states (from
scratch)
M. Emmi, R. Majumdar, and K. Sen. Dynamic test input
generation for database applications. In ISSTA, 2007.
K. Taneja,Y. Zhang, and T. Xie. MODA: Automated test generation
for database applications via mock objects. In ASE, 2010.
61