Bug Localization with Association Rule Mining

Download Report

Transcript Bug Localization with Association Rule Mining

Bug Localization with
Machine Learning Techniques
Wujie Zheng
[email protected]
Background of Bug Localization


Software is far from bug-free
Debugging is a methodical process of finding
and correcting the bugs in a program



Manual debugging is laborious and expensive
It is possible to automatic or semi-automatic this
process
Bug localization is to find a set or a ranking of
source code locations that are likely buggy
through automatic analysis
Techniques of Bug Localization



Program Slicing
Experimental Methods
Machine Learning Methods
Program Slicing

Debugging with Program Slicing


Program Dependence Graph (PDG)



Control dependence
Data dependence
Static Slice vs. Dynamic Slice



Start from the inputs or the variables that cause the failure (but not root
cause), find all statements that may cause the failure
Static slice: all statements that may affect the value of a variable at a
program point for any arbitrary execution of the program
Dynamic slice: all statements that actually affect the value of a variable at a
program point for a particular execution of the program
Forward Slice vs. Backward Slice


Forward slice: the statements that are affected by the value of variable v at
statement s
Backward slice: the statements that affect the value of variable v at
statement s
PDG
void main() {
int sum = 0;
int i = 1;
while (i<11) {
sum = add(sum,i);
i = add(i, 1);
}
printf(“sum=%d\n”, sum);
printf(“i=%d\n”, i);
}
static int add(int a, int b) {
return (a+b);
}
In the illustration above, control dependence edges are shown in blue and data
dependence edges are shown in green. Intraprocedural dependences are shown
with solid lines while interprocedural dependences are shown with dashed lines.
Experimental Methods

Delta Debugging



Generalize and simplify some failing test case to a
minimal test case that still produces the failure
Separate the test case into two sub tests, choose
the sub test that fails, and repeat the process.
When both sub tests pass, perform another
divide-and-test again
Essentially a binary search algorithm
Delta Debugging
Machine Learning Methods

Problem Setting



A set of failing and passing test cases are
observed (test cases as the samples)
The statements are instrumented and some
traces are collected (statements as the features)
Find a set or a ranking of the statements that are
likely buggy
Machine Learning Methods

Direct correlation analysis



Build a classification model of the test cases (feature
selection)



Tarantula
LIBLIT05
Logistic regression
Decision tree
Build a behavior model of the statements from
passing test cases, and then check violations in
failing test cases



Dynamic invariants
SOBER
PPDG
Tarantula

Visualization:
Logistic regression

The logistic function

Maximize the likelihood of the training set

The regularization term

The input features are normalized first
The larger the resulting coefficient is, the
more suspicious the statement is

Dynamic invariants

Given a set of test cases, mine the implicit rules
over the variables

The templates




Invariants over any variable: x=a …
Invariants over two numeric variables: y=ax+b …
Invariants over three numeric variables: y=ax+by+c …
…
SOBER

SOBER [Liu05]


the probability density function of the evaluation
bias of P on passing runs and failing runs
respectively
The bug relevance score of P is then defined as
the difference between them
Probabilistic Program Dependence Graph
(PPDG)

Conditional Independence


Dependence Network


The state of a statement is independent of the previous
statements conditioned on its immediate parents
Allow cycles, suitable for loops in programs
Usage in debugging


The distributions of conditional probabilities are learned
from the passing runs
The lower the conditional probability of a state has, the
more suspicious the statement it belongs to is
Our Work

Belongs to the group of “Build a classification model
of the test cases (feature selection)”

Association Rule Mining

The procedure




Mine all the strong (high frequency and high confidence)
association rules X=>fail
Select the rule X’=> with highest confidence (the best
classification model of this kind)
Output the statement set X’ as the results
Problems:



It is hard to justify that such rule (only conjunction phrase) based
model is suitable
The resulting rule may contain lots of statements. Further ranking
scheme is still needed
High computational overhead
Our Work

Belongs to the group of “Build a classification model
of the test cases (feature selection)”

Feature Subset Selection

The procedure





Iteratively evaluates a candidate subset of features, then modifies
the subset and evaluates if the new subset is an improvement
over the old.
Greedy forward selection
Evaluation of the subsets requires a scoring metric that grades a
subset of features.
F-measure (i.e., harmonic average of precision and recall,
1/(1/precision+1/recall))
Problems

Too simple, just a traditional method
How to improve the work?

Adopt the association rule mining method to
mine the behavior models


Association rule mining is better for mining
frequent patterns than building classification
models
Difficulties:



Usage pattern may dominate, while failure statements
related pattern may be not frequent
If we consider relative frequent pattern for every
statements, there may be too many results to be mined
and used effectively
Consider program dependence graph may help
How to improve the work?

Consider the debugging problem of specific
bugs such as data race



Data race is an important kinds of bugs, and it is
difficult to debug
Sequential pattern mining?
Combine dynamic analysis with machine learning

Existing dynamic analysis methods can provide valuable
features, e.g., the potential data race in a certain
execution (many false positives)
How to improve the work?

…
Other topics of automated debugging

Failure classification




NLP
Failure trace clustering
NLP + Failure trace clustering
Failure replaying


Record the failures in user sites compactly and
reproduce it in developer sites
Debugging with replaying


Traverse to any point of an execution without restarting the
program
Query the trace database
Thank you!