PPT - Personal Web Pages

Download Report

Transcript PPT - Personal Web Pages

Generating Program Inputs for
Database Application Testing
Kai Pan, Xintao Wu
University of North Carolina at Charlotte
Tao Xie
North Carolina State University
26th IEEE/ACM International Conference on Automated Software Engineering
Nov 11, 2011 Lawrence, Kansas
Background
Functional Testing
Test Generation
Program Inputs
2
Background
Functional Testing
Test Generation
Program Inputs
3
Database States
An Example
Program inputs
Database
4
Motivation
5
Benefits to use an existing database state
 Represent real-world objects’ characteristics,
helping detect
 faults that could cause failures in real-world
settings
 Reduce cost of generating new database
records
6
Dynamic Symbolic Execution (DSE)
 Execute the program in both concrete and
symbolic way (also called concolic testing)
 Collect constraints along executed path as
path condition
 Negate part of the path condition and solve
the new path condition to lead to new path
 DSE tools for various program languages
 Pex for .NET from Microsoft Research
7
Motivation
Path Condition:
C1: Query construction constraints
8
Motivation
Path Condition:
C1: Query construction constraints
C2: Query/DB constraints
9
Motivation
Path Condition:
C1: Query construction constraints
C2: Query/DB constraints
C3: Result manipulation constraints
10
Motivation
Path Condition:
C1: Query construction constraints
C2: Query/DB constraints
C3: Result manipulation constraints
C1 ^ C2 ^ C3
11
Motivation
Path Condition:
C1: Query construction constraints
C2: Query/DB constraints
C3: Result manipulation constraints
C1 ^ C2 ^ C3
12
A hard part
Motivation
How to derive high-covering
program input values based on
a given database state?
13
Outline
 Background
 Approach
 Evaluation
 Conclusion and future work
14
SQL query forms
 Fundamental structure: SELECT, FROM, WHERE,
GROUP BY, and HAVING clauses.
SELECT select-list
FROM from-list
WHERE qualification
(GROUP BY grouping-list)
(HAVING group-qualification)
15
SQL query forms (cont’d)
 Nested query: a query with another query embedded
within it
 Nested query can be unnested into equivalent single level
canonical queries
transoformation rules
SELECT S.sname
SELECT S.sname
FROM Sailors S
FROM
WHERE EXISTS ( SELECT *
WHERE R.sid=S.sid AND R.bid=103
Sailors S, Reserves R
FROM Reserves R
WHERE R.bid=103 AND
R.sid=S.sid)
16
A nested query
Its canonical form
SQL query forms of focus
 WHERE clause consisting of a disjunction of
conjunctions
SELECT C1, C2, ..., Ch
FROM from-list
WHERE (A11 AND ... AND A1n) OR ...
OR (Am1 AND ... AND Amn)
17
Outline
 Background
 Approach
 Evaluation
 Conclusion and future work
18
Illustrative example
19
Apply DSE on the existing database
Step1: DSE chooses “ type=0, zip=0 ”
 executed query:
Q1: SELECT C.SSN, C.income, M.balance
FROM customer C, mortgage M
WHERE M.year=15 AND C.zipcode=1
AND C.SSN=M.SSN
Execution of Q1  zero record,
not covering loop body
20
Apply DSE on the existing database (cont’d)
Step2: DSE flips “type == 0” to “type != 0”
 “type=1, zip=0”
 executed query:
Q2: SELECT C.SSN, C.income, M.balance
FROM customer C, mortgage M
WHERE M.year=30 AND C.zipcode=1
AND C.SSN=M.SSN
Execution of Q2  zero record
not covering loop body
21
Apply DSE on the existing database (cont’d)
However,
An input like “type=0, zip=27694”
 executed query:
Q3: SELECT C.SSN, C.income, M.balance
FROM customer C, mortgage M
WHERE M.year=15 AND
C.zipcode=27695
AND C.SSN=M.SSN
Execution of Q3  one record
{C.SSN = 001, C.income = 50000, M.balance =
20000}.
22
Covering Line14=true and
Line18=false
Apply DSE on the existing database (cont’d)
Furthermore,
An input like “type=0, zip=28222”,
 executed query:
Q4: SELECT C.SSN, C.income, M.balance
FROM customer C, mortgage M
WHERE M.year=15 AND
C.zipcode=28223
AND C.SSN=M.SSN
Execution of Q4  one record
{C.SSN = 002, C.income = 150000, M.balance
= 30000}.
As a result, Line14=true and
23
Line18=true
Assist DSE to generate program inputs
How to derive high-covering
program input values based on
a given database state?
24
Our idea: construct auxiliary queries
Auxiliary query :
SELECT C.zipcode,
FROM customer C, mortgage M
WHERE M.year=15 AND
C.SSN=M.SSN
e.g., result set includes “fzip=27695”.
From “fzip=zip+1”, we derive “zip=27694”!
25
Our idea: construct auxiliary queries (cont’d)
Auxiliary query :
SELECT C.zipcode,
FROM customer C, mortgage M
WHERE M.year=15 AND
C.SSN=M.SSN
e.g., result set includes “fzip=27695”.
true
From “fzip=zip+1”, we derive “zip=27694”!
false
Cover Line14=true and Line18=false!
26
Our idea: construct auxiliary queries (cont’d)
Auxiliary query :
SELECT C.zipcode,
FROM customer C, mortgage M
WHERE M.year=15 AND
C.SSN=M.SSN
e.g., result set includes “fzip=27695”.
true
From “fzip=zip+1”, we derive “zip=27694”!
false
Cover Line14=true and Line18=false!
27
Act like “Constraint Solver” for
Program Constraints +DB State Constraints
Approach
 Collect query construction
constraints
 on program variables used in
the executed queries from
the program code
28
Approach (cont’d)
 Collect query construction
constraints
 on program variables used in
the executed queries from
the program code
 Collect result manipulation
constraints
 on comparing with record
values in the query’s result
set (such as “if (diff>100000)”)
29
Construct auxiliary queries
For path “Line04=true, Line14=true”,
construct the abstract query:
true
SELECT C.SSN, C.income, M.balance
FROM customer C, mortgage M
WHERE M.year=15 AND C.zipcode=‘fzip’
AND C.SSN=M.SSN
true
30
Construct auxiliary queries
For path “Line04=true, Line14=true”,
construct the abstract query:
true
SELECT C.SSN, C.income, M.balance
FROM customer C, mortgage M
WHERE M.year=15 AND C.zipcode=‘fzip’
AND C.SSN=M.SSN
Our target
true
31
Construct auxiliary queries
true
SELECT C.SSN, C.income, M.balance
FROM customer C, mortgage M
WHERE M.year=15 AND C.zipcode=‘fzip’
AND C.SSN=M.SSN
Construct auxiliary query
SELECT C.zipcode
32
true
Construct auxiliary queries
true
SELECT C.SSN, C.income, M.balance
FROM customer C, mortgage M
WHERE M.year=15 AND C.zipcode=‘fzip’
AND C.SSN=M.SSN
Construct auxiliary query
SELECT C.zipcode
FROM customer C, mortgage M
33
true
Construct auxiliary queries
true
SELECT C.SSN, C.income, M.balance
FROM customer C, mortgage M
WHERE M.year=15 AND C.zipcode=‘fzip’
AND C.SSN=M.SSN
Construct auxiliary query
SELECT C.zipcode
FROM customer C, mortgage M
WHERE M.year=15 AND C.SSN=M.SSN
34
true
Generate program input values
Run auxiliary query:
SELECT C.zipcode,
FROM customer C, mortgage M
WHERE M.year=15 AND
C.SSN=M.SSN
 fzip:27695 or 28223
35
Generate program input values
Run auxiliary query:
SELECT C.zipcode,
FROM customer C, mortgage M
WHERE M.year=15 AND
C.SSN=M.SSN
 fzip: 27695 or 28223
 zip: 27694 or 28222
36
Generate program input values
Input combinations:
type: 0 or !0
X
zip: 27694 or 28222
true
“type=0, zip=27694” covers
Line04=true, Line14=true, but
Line18=false
true
false
37
Approach (cont’d)
Not enough!
Program variables in
branch condition after
executing the query may
be data-dependent on
returned record values.
How to cover Line18 true
branch?
38
Approach (cont’d)
 To cover path
Line04=true,
Line14=true,
Line18=true
We need to extend
previous auxiliary
query
39
true
true
true
Construct auxiliary queries
We extend the WHERE clause
true
SELECT C.zipcode,
FROM customer C, mortgage M
WHERE M.year=15 AND
C.SSN=M.SSN
(----how to extend?----)
true
true
40
Construct auxiliary queries
We extend the WHERE clause
true
SELECT C.zipcode,
FROM customer C, mortgage M
WHERE M.year=15 AND
C.SSN=M.SSN
(----how to extend?----)
true
true
41
Construct auxiliary queries
We extend the WHERE clause
true
SELECT C.zipcode,
FROM customer C, mortgage M
WHERE M.year=15 AND
C.SSN=M.SSN AND
C.income - 1.5 * M.balance > 100000
true
true
42
Generate program input values
Run auxiliary query:
SELECT C.zipcode,
FROM customer C, mortgage M
WHERE M.year=15 AND
C.SSN=M.SSN
AND C.income - 1.5 *
M.balance > 100000
 fzip=28223
43
Generate program input values
Run auxiliary query:
SELECT C.zipcode,
FROM customer C, mortgage M
WHERE M.year=15 AND
C.SSN=M.SSN
AND C.income - 1.5 *
M.balance > 100000
 fzip=28223
44
 zip=28222
Other issues (aggregate calculation)
 Extend auxiliary
query with GROUP
BY and HAVING
clauses.
Involve multiple records
45
Other issues (aggregate calculation)
SELECT C.zipcode,
sum(M.balance)
FROM customer C, mortgage M
WHERE M.year=15 AND
C.SSN=M.SSN
AND C.income - 1.5 *
M.balance > 100000
GROUP BY C.zipcode
HAVING sum(M.balance) >
500000
46
Other issues (cardinality constraints)
SELECT C.zipcode
FROM customer C, mortgage M
WHERE M.year=15 AND C.SSN=M.SSN
AND C.income - 1.5 * M.balance > 100000
GROUP BY C.zipcode
HAVING COUNT(*) >= 3
Use a special DSE technique for dealing with inputdependent loops
47
P. Godefroid and D. Luchaup. Automatic partial loop summarization in dynamic test
generation. In ISSTA 2011.
Outline
 Background
 Approach
 Evaluation
 Conclusion and future work
48
Research questions
 RQ1 (Effectiveness): What is the percentage
increase in code coverage by the program inputs
generated by Pex with our approach’s
assistance?
 RQ2 (Cost): What is the cost of our approach’s
assistance?
49
Evaluation subjects
Two open source database applications
 RiskIt
 4.3K LOC, database: 13 tables, 57 attributes,
and >1.2 million records
 17 DB-interacting methods selected for testing
 UnixUsage
 2.8K LOC, database: 8 tables, 31 attributes, and
50
>0.25 million records
 28 DB-interacting methods selected for testing
Evaluation setup
 Measurement for test generation
 effectiveness: code coverage
 cost: number of runs/paths, execution time
 Procedure
 run Pex w/o our approach’s assistance
 perform our algorithms to generate new
additional test inputs
51
Evaluation results: RiskIt
Higher code coverage
52
Evaluation results: RiskIt
Low additional cost
53
Pex (only) timeout: 120 seconds
Even given longer time, no new coverage observed for Pex (only)
Evaluation results: RiskIt
54
Pex (only) timeout: 120 seconds
Even given longer time, no new coverage observed for Pex (only)
Evaluation results: UnixUsage
Preliminary Evaluation(cont’d)
Summary of evaluation results
 RQ1: Effectiveness
 RiskIt: 26% higher block coverage over Pex only
 UnixUsage: 35% higher block coverage over Pex only
 RQ2: Cost
 RiskIt:
 #runs/paths: 131 more over 1135 (Pex)
 execution time: 517 secs more over 1781 (Pex)
 UnixUsage
 #runs/paths: 93 more over 1197 (Pex)
 execution time: 580 secs more over 1718 (Pex)
56
Outline
 Background
 Approach
 Evaluation
 Conclusion
57
Conclusion
 A new approach that formulates auxiliary queries
to bridge gap between program/DB constraints.
 Act like a “constraint solver” for
program constraints + DB constraints
 Empirical evaluations on 2 open source DB apps
 our approach can assist DSE to generate program
inputs effectively achieving higher code coverage
with low additional cost.
58
Future Work
 To construct auxiliary queries directly from
embedded complex queries (e.g., nested
queries), rather than from their
transformed norm forms.
 To handle complex program context such
as multiple queries.
59
Thank you! Questions?
Acknowledgment: This work was supported in part by U.S. National
Science Foundation under CCF-0915059 for Kai Pan and Xintao Wu,
and under CCF-0915400 for Tao Xie.
60
Related Work
All previous related work addresses a
different problem: constructing both
program inputs and database states (from
scratch)
 M. Emmi, R. Majumdar, and K. Sen. Dynamic test input
generation for database applications. In ISSTA, 2007.
 K. Taneja,Y. Zhang, and T. Xie. MODA: Automated test generation
for database applications via mock objects. In ASE, 2010.
61