powerpoint slides

Download Report

Transcript powerpoint slides

A Framework for Testing
Database Applications
David Chays
Polytechnic University
Brooklyn, NY
Joint work with
Phyllis G. Frankl (Polytechnic)
Saikat Dan (Polytechnic)
Filippos Vokolos (Lucent Technologies)
Elaine J. Weyuker (AT&T Labs - Research)
Motivation
• Database systems play an important role in
virtually every modern organization
• Faults can be very costly
• Programmers/testers may lack experience
and/or time
• Little attention has been paid to DB
application program correctness
Outline of Talk
•
•
•
•
•
•
Background
Aspects of DB system correctness
Issues in testing DB application programs
Architecture of tool set
Tool for generating database states
Additional issues and approaches
DBMS and DB application
Database
Management
System
DB
DB schema, eg.,
Emp(ssn, name, addr, sal)
Dept(id, dept-name)
DB application, eg.,
/* C program with
embedded SQL*/
Relational databases
• Data is viewed as a collection of relations
– relation schema
– relation (relation state)
• Tables, tuples, attributes, constraints
for example, create table S (ssn char(11) primary key,
name char(25) not null)
Table S
ssn
001-00-0356
012-34-5678
036-54-5555
051-88-9911
name
Johnson
Smith
Jones
Blake
Aspects of Correctness
•
•
•
•
Does the DBMS perform all operations correctly?
Is concurrent access handled correctly?
Is the system fault-tolerant?
...
Does the application program behave as intended?
Traditional vs. DB programs
input
input
DB state
output
output
DB state
• function
• imperative nature
• function
• declarative nature
Example of an Informal Specification
• Customer-feature
table:
–
–
–
–
customerID
address
features
...
• Billing table
– customerID
– billing plan
– ...
Input customer ID and name of feature to which the
customer wishes to subscribe.
Invalid ID: return 0
feature unavailable in that area: return code 2
feature available but incompatible with existing features:
return code 3
else update customer’s feature record, update billing
table, return code 1
What are the Input/Output Spaces?
• Naïve approach
– I = {customer-IDs} X {feature-names}
– 0 = {0,1,2,3}
• More suitable approach:
– I = {customer-IDs} X {feature-names} X
{database-states}
– 0 = {0,1,2,3} X {database-states}
• Problem:
– must control and observe the DB state
DB Application Testing Goal
• Select “interesting” DB states along with
user inputs that exercise “interesting”
behavior
• Cover wide variety of situations that could
arise in practice
• Do so in a way that facilitates checking of
output to user and resulting DB state
Situations to Explore
• Customer already subscribes to that feature
• Feature not available in customer’s area
• Feature available, but incompatible with
other features customer already has
• Feature available and compatible with
existing features
• Customer doesn’t yet subscribe to any
features
• ...
May involve interplay between several tables
• Table 1:
incompatible features
feature incompatible_feature
F1
F2
...
...
• Table 2: features available
in various areas
• Table 3: customers
and features
ID
011
...
feature area
F1
11235
F2
11235
...
...
area
F1 F2 ... FN
11235
...
Will Live Data Suffice?
• May not reflect sufficiently wide variety of
situations
• May be difficult to find the situations of
interest
• May violate privacy or security constraints
Generating Synthetic Data
• DB state is a collection of relation states,
each of which is a subset of the Cartesian
product of some domains
• Generating domain elements and gluing
them together isn’t enough, since
constraints must be honored
• We attempt to generate interesting data that
obey integrity constraints
• Use schema and user supplied info
DB schema
App source
Suggestions
from tester
State Generator
Input Generator
DB state
State Checker
User input
Output Checker
App exec
Output
Results
DB state generator
• Inputs DB schema (in SQL)
• Parses schema to derive info about
– attributes
– tables
– constraints : uniqueness, not-NULL,
referential integrity
– inputs additional info from user
– suggested attribute values, divided into groups,
similar to Category-Partition Testing [OstrandBalcer]
– additional annotations
Example Schema
create table s (sno char(5), sname char(20), status decimal(3),
city char(15), primary key(sno));
create table p (pno char(6) primary key, pname char(20),
color char(6), weight decimal(3), city char(15));
create table sp (sno char(5), pno char(6), qty decimal(5),
primary key(sno,pno),
foreign key(sno) references s,
foreign key(pno) references p);
Stmt
Create table s( sno char(5), primary key(sno) );
Column Definition
Table Constraint
Create Stmt
Nodetag type = T_ColumnDef
Nodetag type = T_CreateStmt
colname = “sno”
relname = “s”
type name = “bpchar”
Nodetag type = T_Constraint
contype = CONSTR_PRIMARY
keys
Constraints = NIL
T_IDENT
name = “sno”
Stmt
Create table s( sno char(5) primary key );
Column Definition
Create Stmt
Nodetag type = T_ColumnDef
Nodetag type = T_CreateStmt
colname = “sno”
relname = “s”
type name = “bpchar”
Constraints
contype =
CONSTR_PRIMARY
0
globalTablePointer
1
2
3
sno | F|
| SF|| char
F| F||F|prF| |F|un | ~nn
sname | F|
| SF|| char
F| F|| F|
~pr
F||F|
~un | ~nn
status | |F|SF|
| dec
F| F|| ~pr
F| F|| F|
~un | ~nn
city
City | F|
| SF|| char
F| F|| F|
~pr
F| |F|
~un | ~nn
cp
cp
cp
cp
pno || F|
P |F|char
F| F|| F|
pr F|| F|un | ~nn
pname | F|
P |F|
char
F| F|
| ~pr
F| F|| ~un
F| | ~nn
color
color || F|
P |F|char
F| F|| ~pr
F| F|| ~un
F| | ~nn
weight
weight|| F|
P |F|dec
F| F|
| ~pr
F| F|| ~un
F| | ~nn
city | P | char | ~pr | ~un | ~nn
cp
cp
cp
cp
cp
S |4|
0
1
P |5|
2
3
4
SP | 3 |
0
1
Null
2
sno |SP | char | pr | un | ~nn | foreign cp
pno |SP | char | pr | un | ~nn | foreign cp
qty |SP | dec | ~pr | ~un | ~nn
cp
Selecting Attribute Values
• Initial prototype queries tester for suggested
values and guidance on how to use those
values
• Values may be partitioned into data groups
(choices)
• Tester may specify probabilities for data
groups
--choice_name: low
10
20
30
-----choice_name: medium
300
400
-----choice_name: high
5000
6000
Each category (column) can have a list
of choices pointed to by cp.
cp
low
medium
high
10
300
5000
20
400
6000
30
DB table generation
• Tester specifies table sizes
• Tool generates tuples for insertion
– select data group or NULL, guided by annotations
– select value from data group, obeying constraints
– keep track of values used
• Outputs sequence of SQL insert statements
Input files for Parts-Supplier database
sno:
--choice_name: sno
S1
S2
S3
S4
S5
sname:
--choice_name: sname
Smith
Jones
Blake
Clark
Adams
pname:
--choice_name: interior
seats
airbags
dashboard
-----choice_name: exterior
doors
wheels
bumper
pno:
--choice_name: pno
P1
P2
P3
P4
P5
status:
--choice_name: status
--null_prob: 50
0
1
2
3
color:
--choice_name: color
blue
green
yellow
city:
--choice_name: domestic
--choice_prob: 90
Brooklyn
Florham-Park
Middletown
-----choice_name: foreign
--choice_prob: 10
London
Bombay
weight:
--choice_name: weight
100
300
500
city:
--choice_name: domestic
--choice_prob: 90
Brooklyn
Florham-Park
Middletown
-----choice_name: foreign
--choice_prob: 10
London
Bombay
status:
--choice_name: status
--null_prob: 50
0
1
2
3
A database state produced by the tool
Table s
sno
S1
S2
S3
S4
sname
NULL
Smith
Jones
Blake
status
0
1
NULL
NULL
Table sp
city
Brooklyn
Florham-Park
London
Middletown
Table p
pno
P1
P2
P3
pname
NULL
Seats
airbags
color
blue
green
yellow
weight
100
300
500
city
Brooklyn
Florham-Park
Middletown
sno
S1
S1
S1
S2
S2
S2
S3
S3
S3
S4
pno
P1
P2
P3
P1
P2
P3
P1
P2
P3
P1
qty
5000
300
10
6000
400
5000
20
300
30
6000
Related work
• Lyons-77, DB-Fill, TestBase
• Like our approach, rely on user to supply
attribute values
• Do not handle integrity constraints as
completely
• Require tester to describe tables in specialpurpose language (rather than SQL)
Testing Techniques in DB literature
• Focus on DB system performance, rather
than DB application correctness
• Benchmarks
• Performance of SQL processor
– Generation of large number of DML statements
[Slutz]
• Generation of huge tables with given
statistical properties [Grey et al]
Summary
• Issues
• Framework
• Prototype
Future Work
• Refinement based on feedback from DB
application developers / testers
• Other DB state generation heuristics
– boundary values
– “missing” constraints
– difficult SQL features
• Interplay between DB state and user inputs
• Checking DB state after test execution
• Checking application outputs