Software Engineering Research

Transcript Software Engineering Research

Software Metrics and Defect
Prediction
Ayşe Başar Bener
Problem 1

How to tell if the
project is on
schedule and within
budget?

Earned-value charts.
Problem 2

How hard will it be
for another
organization to
maintain this
software?

McCabe Complexity
Problem 3

How to tell when the
subsystems are
ready to be
integrated

Defect Density
Metrics.
Problem Definition

Software development
lifecycle:






Requirements
Design
Development
Test (Takes ~50% of overall time)
Detect and correct defects
before delivering software.
Test strategies:



Expert judgment
Manual code reviews
Oracles/ Predictors as secondary
tools
Testing
Defect Prediction

2-Class Classification Problem.

Non-defective


Defective


If error = 0
If error > 0
2 things needed:


Raw data: Source code
Software Metrics -> Static Code Attributes
Static Code Attributes














void main()
{
//This is a sample code
//Declare variables
int a, b, c;
// Initialize variables
a=2;
b=5;
//Find the sum and display c if greater
than zero
c=sum(a,b);
c>0
if c < 0
printf(“%d\n”, a);
return;
}
c





int sum(int a, int b)
{
// Returns the sum of two numbers
return a+b;
}
Module
LOC
LOCC
V
CC
Error
main()
16
4
5
2
2
sum()
5
1
3
1
0
LOC: Line of Code
LOCC: Line of commented Code
V: Number of unique operands&operators
CC: Cyclometric Complexity
+
Research on Defect Prediction




Defect prediction using machine learning techniques
How effectively we can estimate defect density?
 Regression models
 First classification, then regression
Defect prediction in multi version software
Defect prediction in embedded software







B. Turhan, and A. Bener, "A Multivariate Analysis of Static Code Attributes for Defect Prediction", QSIC 2007, Portland, USA, October 11-12, 2007
A.D. Oral and A. Bener, "Defect Prediction for Embedded Software", ISCIS 2007, Ankara, Turkey, November 9-11, 2007.
Software Defect Identification Using Machine Learning Techniques”, E. Ceylan, O. Kutlubay, A. Bener, EUROMICRO SEAA, Dubrovnik, Croatia,
August 28th - September 1st, 2006
"Mining Software Data", B. Turhan and O. Kutlubay, Data Mining and Business Intelligence Workshop in ICDE'07 , İstanbul, April 2007
"A Two-Step Model for Defect Density Estimation", O. Kutlubay, B. Turhan and A. Bener, EUROMICRO SEAA, Lübeck, Germany, August 2007
"A Defect Prediction Method for Software Versioning", Y. Kastro and A. Bener, Software Quality Journal (in print).
“Software Defect Density Estimation Using Static Code Attributes: A Two Step Model ”, O. Kutlubay, B. Turhan, A. Bener, Eng. App. of AI (under
review)
Constructing Predictors



Baseline: Naive Bayes.
Why?: Best reported results so far (Menzies et
al., 2007)
Remove assumptions and construct different
models.
 Independent Attributes ->Multivariate dist.
 Attributes of equal importance
"Software Defect Prediction: Heuristics for Weighted Naïve Bayes", B. Turhan and A. Bener, ICSOFT2007, Barcelona, Spain, July 2007.
“Software Defect Prediction Modeling”, B. Turhan, IDOESE 2007, Madrid, Spain, September 2007
“Yazılım Hata Kestirimi için Kaynak Kod Ölçütlerine Dayalı Bayes Sınıflandırması”, UYMS2007, Ankara, September 2007
“A Multivariate Analysis of Static Code Attributes for Defect Prediction”, B. Turhan and A. Bener QSIC 2007, Portland, USA, October 2007.
Weighted Naive Bayes
2
Naive Bayes
1  x  mij 
g i ( x)   
 log( P (Ci ))


2 j 1  s j 
Weighted Naive Bayes
 x  mij 
1
  log( P(Ci ))
g i ( x)    w j 
2 j 1  s j 
d
d
t
j
t
j
2
Datasets
Name
# Features
#Modules
Defect Rate(%)
CM1
38
505
9
PC1
38
1107
6
PC2
38
5589
0.6
PC3
38
1563
10
PC4
38
1458
12
KC3
38
458
9
KC4
38
125
40
MW1
38
403
9
Performance Measures
Actual
Defects
no yes
no
Prd
yes
A
C
B
D
Accuracy: (A+D)/(A+B+C+D)
Pd (Hit Rate): D / (B+D)
Pf (False Alarm Rate): C / (A+C)
Results: InfoGain&GainRatio
Data
WNB+IG (%)
WNB+GR (%)
IG+NB (%)
pd
pf
bal
pd
pf
bal
pd
pf
bal
CM1
82
39
70
82
39
70
83
32
74
PC1
69
35
67
69
35
67
40
12
57
PC2
72
15
77
66
20
72
72
15
77
PC3
80
35
71
81
35
72
60
15
70
PC4
88
27
79
87
24
81
92
29
78
KC3
80
27
76
83
30
76
48
15
62
KC4
77
35
70
78
35
71
79
33
72
MW1
70
38
66
68
34
67
44
07
60
Avg:
77
31
72
77
32
72
65
20
61
Results: Weight Assignments
WC vs CC Data?
• When to use WC or CC?
• How much data do we need to construct a
model?
ICSOFT’07
ICSOFT’07
Thank You
http://softlab.boun.edu.tr

Software Engineering Research

Transcript Software Engineering Research

Directory