Generating Meaningful Test Databases

Download Report

Transcript Generating Meaningful Test Databases

Multi-RQP
Generating Test Databases for the
Functional Testing of OLTP Applications
Carsten Binnig
Joint work with: Donald Kossmann, Eric Lo
DBTest Workshop, SIGMOD 2008, Vancouver
Motivation
• Today’s testing techniques are not very efficient
– 20-70% of the costs of a SW project are spent for testing
 Costs caused by software errors in the US in 2000: ~$20-$60 bn
• Test automation is not a trivial problem
– Writing “test programs” which verify the application behavior
 Maintainability / Quality of “test programs” is not easy
• Test automation for DB apps is even harder
– Testing a certain behavior needs a particular database state
 Existing tools generate test databases independent from test cases
 How to generate relevant test databases for OLTP apps?
(other work: [ICDE07], [SIGMOD07])
2
Example: Online-Library (1)
Use Case: „Reservation of a book“
1. User enters the inventory number of the book
2. System shows the details of the book
•
Exception 1: Book belongs to the closed stacks
3. User enter her account information
4. System checks account info and finishes reservation
•
•
Exception 2: User account is disabled
Exception 3: User account that has overdue fines
3
Example: Online-Library (2)
Test Case (Expected Behavior) + Test Database
•
Test Case 1 (Exception 1):
– A book that belongs to the closed stack
•
Test Case 2 (Exception 2):
– A book that does not belong to the closed stacks +
– A user account that is disabled
•
Test Case 3 (Exception 3) : …
•
Test Case 4 (Successful Reservation) :
– A book that does not belong to the closed stacks +
– A valid user account with no overdue fines
4
Outline
• Introduction
• State of the Art / Requirements
• (Multi-) Reverse Query Processing (RQP)
• Conclusions / Future Work
5
State of the Art
• General-purpose Database Generators => Random data
over database schema (size of tables, data distributions)
– Low test coverage => Data does not enable execution of all test cases
– High maintenance costs => Manual adaptation of data necessary
• Script-based Database Generators => Application-specific
data (e.g., a bunch of SQL INSERT statements)
– High initial costs => Writing code to generate test database
– Hard to extend => Hard to analyze side-effects
• Data Extractors => Extract data from existing applications
– Test coverage strongly depends on existing data
– High initial effort => Test data needs to be anonymized
6
Requirements for the
Generation of Test Databases
• Specify a test database individually per test case
– High test coverage is possible
– Good extensibility for new test cases
• Allow a declarative specification of the test database
– Maintainability of the specification is good (automatic evolution?)
– Data generation can be optimized (runtime, amount of data)
• Specify only relevant the data for each test case
– Initial costs to specify the test database are low
– Changeability of specification is good
• Enable logical data independence of test data specification
– Database schema can be changed without changing the
specification of the test database
7
Outline
• Introduction
• State of the Art / Requirements
• (Multi-) Reverse Query Processing (RQP)
• Conclusions / Future Work
8
Reverse Query Processing (RQP)
• Problem Statement:
– Given: SQL Query Q, Result R, Database Schema S
– Output: Database D with Q(D)=R and D satisfies S
• Example (Test Case 1): There exists at least one book
which does belong to the closed stacks
Application
Test Case 1
S: CREATE TABLE book (
b_id INTEGER PRIMARY KEY,
b_title VARCHAR (20),
b_closed BOOLEAN NOT NULL,
... )
Q: SELECT COUNT(*) AS cnt
FROM book
WHERE b_closed = 1
R: {< >=1 >}
9
b_title
b_closed
…
1
Title A
1
…
2
Title B
0
…
D: b_id
RQP
RQP Basic Idea
Query Processing:
Reverse Query Processing:
• Input: Database D, Query Q
• Output: Result R
• Input: Query Q, Result R
• Output: Database D
Result
Result
Query Processor
Query
Reverse Query Processor
Query
Database
Database
=> RQP can generate many different databases
10
Query Processing (Simplified)
SQL Query:
χCOUNT(*)
Q: SELECT COUNT(*)
FROM book
WHERE b_closed = 1
Query Result:
Query Plan
(Relational Algebra):
R:
1
as cnt
σb_closed
= 1
book
11
cnt
D:
b_id
b_title
b_closed
…
1
Title A
1
…
b_id
b_title
b_closed
…
1
Title A
1
…
2
Title B
0
…
Reverse Query Processing
(Query Compilation)
Reverse Query Plan
(Reverse Relational Algebra):
χ-1COUNT(*)
Q: SELECT COUNT(*) AS cnt
FROM book
WHERE b_closed = 1
as cnt
σ-1b_closed
book
12
= 1
Data flow
SQL Query:
Reverse Query Processing
(Top-Down Data Generation)
R: {< >=1 >}
R: cnt
1
χ-1COUNT(*)
as cnt
σ-1b_closed
= 1
Data flow
Q:
book
b_id
b_title
b_closed
…
1
Title A
1
…
b_title
b_closed
…
1
Title A
1
…
2
Title B
0
…
D: b_id
S: CREATE TABLE book (
b_id INTEGER PRIMARY KEY,
b_title VARCHAR (20) NOT NULL,
b_closed BOOLEAN NOT NULL,
... )
13
Multi Reverse Query Processing
• Problem:
– One query + result are often not sufficient to specify a test
database for more complex test cases
– Multiple queries + result are necessary
• Example (Test Case 4):
– Book that does not belong to the closed stacks (Q1 und R1) and
– Valid user account without overdue fines (Q2 und R2)
• Idea: Restrict input query classes such that …
– MRQP can be solved efficiently by using RQP
– User can still specify any test database
14
RQP-disjoint Queries
• Idea: Q1/R1 and Q2/R2 specify disjoint data sets
• Example (Test Case 4):
– Book that does not belong to the closed stacks (Q1 and R1) and
– Valid user account without overdue fines (Q2 and R2)
Q1: SELECT COUNT(*) AS cnt
FROM book
WHERE b_closed=0
Q2: SELECT COUNT(*) AS cnt
FROM user
WHERE u_fines=0
R1: {< 1 >}
R2: {< 1 >}
D1:
b_id
b_title
b_closed
…
1
Title A
0
…
D2:
Table ‘book’
u_id
u_name
u_fines
…
1
User A
0
…
Table ‘user’
15
Query-Refinement
• Idea: Q1/R1 specifies a „subset“ of Q2/R2
• Example (Test Case for Use Case “Book Search”):
– Ten books of author ‘Grisham’ (Q1 and R1)
– One of these books should belong to the closed stacks (Q2 and R2)
QQ
1: SELECT COUNT(*) AS cnt
1’:
FROM book
WHERE b_author='Grisham'
AND b_closed <> 1
10>}
>}
RR
1’:
1: {< 9
Q2: SELECT COUNT(*) AS cnt
FROM book
WHERE b_author='Grisham'
AND b_closed = 1
R2: {< 1 >}
D1’:
D2:
b_id
b_author
b_closed
…
2
Grisham
0
…
…
…
…
…
10
Grisham
0
…
Table ‘book’
b_id
b_author
b_closed
…
1
Grisham
1
…
Table ‘book’
16
Outline
• Introduction
• State of the Art / Requirements
• (Multi-) Reverse Query Processing (RQP)
• Conclusions / Future Work
17
Conclusions / Future Work
• Problems of existing test database generators
– Test databases are generated independent of test cases
– Low test coverage, high maintainability costs, …
• (M)RQP to specify and generate test databases
– Test data specification: declarative, minimal, …
– Data generation based on database techniques (e.g., algebra
operators, …)
• Open Research Problems
– Evolution of test databases
– Study usability of MRQP
– …
18
Questions
? snoitseuQ
?
19