Data Mining Static Code Attributes to Learn Defect

Download Report

Transcript Data Mining Static Code Attributes to Learn Defect

Software Fault Prediction
Supervisor: Dr. Keyvan Pour
By: K. Shahryari
fall 2014
1
Recall
Software Defect Prediction can be formulated as a binary classification
problem, where software modules are classified as defect-prone or defect-free,
using a set of software metrics.
2
Feature Vector
3
Selected Papers to Present
 Data
Mining Static Code Attributes to Learn Defect Predictors
 A General Software Defect-Proneness Prediction Framework
 Using Coding-Based Ensemble Learning to Improve Software
Defect Prediction.
4
Data Mining Static Code Attributes to Learn Defect Predictors
T. Menzies, J. Greenwald, and A. Frank
IEEE Transaction on Software Engineering
vol. 33, no. 1, pp. 2-13, Jan. 2007
5
Data Mining Static Code Attributes to
Learn Defect Predictors
 How
the attributes are used to build predictors is much more
important than which particular attributes are used.
 The baseline experiment of this article shows that the rule-based or
decision-tree learning methods used in prior works are clearly
outperformed by a naive Bayes data miner with a log-filtering
preprocessor on the numeric data.
6
Base Idea
 The baseline experiment of this article shows that the rule-based or decision-tree
learning methods used in prior are clearly outperformed by a naive Bayes With a logfiltering preprocessor on the numeric data.
 For this study, we learn defect predictors from static code attributes defined by
McCabe and Halstead. McCabe and Halstead are “module”-based metrics, where a
module is the smallest unit of functionality.
 Defect predictors learned from static code attributes since they are useful, easy to
use, and widely used.
7
Data Set
 NASA MDP
8
Metrics
9
NASA- CM1
10
 An interesting repeated pattern in our data sets are exponential distributions in
the numeric attributes.
11
Data preprocessing
 All the data was passed through one of two filters:
1. none; no change, or
2. logNums; logarithmic filtering.
12
LEARNERS
 The data was passed to three learners from the WEKA data mining toolkit:
 OneR,
 J48,
 naïve Bayes.
13
OneR
 Holte’s OneR algorithm builds prediction rules using one or more values from a
single attribute. For example, OneR executing on the kc4 data set can return
EDGE_COUNT:
o < 2.99
o >= 2.99
→defect-free
→defective
 One way to view OneR’s defect predictions rules is a decision tree of maximum
depth 1 whose leaves are either the label defective or defect-free.
14
J48
 The J48 learner builds decision trees of any depth. For example, J48 executing on the
kc4 data set can return.
 which may be read as follows:
“A module is defective if it has nonzero call-pairs
and has more than 3.12 lines and does not have a
low normalized cyclomatic complexity (0.02) or
it has a low normalized cyclomatic complexity and
a low node-count (up to 3.47).”
15
Naïve Bayes
16
Subsetting method
 Various attribute subset selection algorithms find what attributes can be deleted
without damaging the performance of the learned predictor.
 Subsetting can be used independently of the learning technique of choice as a
general method for data reduction.
 The simplest and fastest subsetting method is to rank attributes from the most
informative to least informative.
17
19
Performance Measures
 Ideally, we seek predictors that maximize acc percent, pd percent, and notPf
percent.
20
Results
 This study used the (M=10)*(N=10) way cross-evaluation.
 These results strongly endorse building defect predictors using niave Bayes with
logNums.
 The combination of learner+filter generated predictors with average results of
pd = 71% and pf = 25%.
 pd: probability detection
 pf: probability false alarm
21
QUARTILE CHARTS OF
PERFORMANCE DELTAS
22
23
24
25
A General Software Defect-Proneness
Prediction Framework
 Proposed framework consists of two parts:
oScheme evaluation and
oDefect prediction.
 The scheme evaluation focuses on evaluating the performance of a learning
scheme, while the defect prediction focuses on building a final predictor using
historical data according to the learning scheme and after which the predictor is
used to predict the defect-prone components of a new (or unseen) software
system.
26
Proposed Framework
27
Learning Scheme
 A learning scheme is comprised of:
◦ data preprocessor,
◦ attribute selector,
◦ learning algorithm.
28
Scheme Evaluation
29
30
31
In this framework, an M× N-way cross-validation is used for estimating the
performance of each predictive model, that is, each data set is first divided into N
bins, and after that a predictor is learned on (N-1) bins, and then tested on the
remaining bin. This is repeated for the N folds so that each bin is used for
training and testing while minimizing the sampling bias.
32
Data preprocessing
 This is an important part of building a practical learner. In this step, the training
data are preprocessed, such as removing outliers, handling missing values, and
discretizing or transforming numeric attributes. In our experiment, we use a logfiltering preprocessor which replaces all numerics n with their logarithms log n.
 Two data preprocessors
◦ None: data unchanged;
◦ Log: all of the numeric values are replaced by their logarithmic values, as used by MGF.
33
Attribute selection
 The data sets may not have originally been intended for defect prediction even if all
of the attributes are useful for its original task, not all may be helpful for defect
prediction. Therefore, attribute selection has to be performed on the training data.
34
Attribute selection
Two attribute selectors
The standard wrapper method is employed to choose attributes. This means the
performances of the learning algorithms are used to evaluate the selected
attributes.
Two different search strategies (based on greedy algorithms) are used as follows:
 Forward selection
 Backward elimination
Attribute selection
Forward selection: Starts from an empty set and evaluates each attribute
individually to find the best single attribute. It then tries each of the remaining
attributes in conjunction with the best to find the best pair of attributes. In the
next iteration, each of the remaining attributes are tried in conjunction with the
best pair to find the best group of three attributes. This process continues until no
single attribute addition improves the evaluation of the subset.
Backward elimination: Starts with the whole set of attributes and eliminates one
attribute in each iteration until no single attribute elimination improves the
evaluation of the subset
Learner construction
Three learning algorithms:
Naive Bayes (NB), J48,6 and OneR.
37
Learning Schemes
Twelve learning schemes: The combination of two data preprocessors, two
attribute selectors, and three learning algorithms yields a total of 12 different
learning schemes.
•
•
•
•
NB Log FS;
J48 Log FS;
OneR Log FS;
NB Log BE;
•
•
•
•
J48 Log BE;
OneR Log BE;
NB None FS;
J48 None FS;
•
•
•
•
OneR None FS;
NB None BE;
J48 None BE;
OneR None BE.
Defect Prediction
 The defect prediction part of our framework is straightforward; it consists of
predictor construction and defect prediction.
 During the period of the predictor construction:
1. A learning scheme is chosen according to the Performance Report.
2. A predictor is built with the selected learning scheme and the whole
historical data. While evaluating a learning scheme, a learner is built with
the training data and tested on the test data.
 After the predictor is built, new data are preprocessed in same way as historical
data, then the constructed predictor can be used to predict software defect with
preprocessed new data.
39
Defect Prediction
40
Data Set
There are 17 data sets in total,
13 from NASA and the remaining
4 from the PROMISE repository.
41
Performance Measures
 The receiver operating characteristic (ROC) curve is often used to evaluate the
performance of binary predictors.
The point (pf = 0, pd = 1) is the ideal position.
Framework Comparison for All of the Data Sets
43
Balance
AUC