Empirical Evaluation of Defect Projection Models for Widely

Transcript Empirical Evaluation of Defect Projection Models for Widely

Empirical Evaluation of
Defect Projection Models
for Widely-deployed
Production Software Systems
FSE 2004
Paul Li, Mary Shaw, Jim Herbsleb
Bonnie Ray, P. Santhanam
Institute for Software Research Intl.,
School of Computer Science
Carnegie Mellon University
Pittsburgh, PA 15213
Center for Software Engineering
IBM T.J. Watson Research Center
Hawthorne, NY 10532
Overview one



Defect occurrences are problems
Methods that deal with the economic
consequences require accurate defect
occurrence rate projections
Defect occurrence rate projection for
widely-deployed production software
systems has novel problems
Overview two

We have two empirical results that can
help defect occurrence rate
projections:


Part 1: The Weibull model is better than other
previous published models
Part 2: Naïve parameter extrapolation methods
that do not consider changes in characteristics
are inadequate
The real world problem

Methods to deal with the economic
consequences:




Maintenance resource allocation
Service contracts
Software insurance
All require accurate defect occurrence
rate projections:

Projections of the rate of user reported defect
occurrences after a software release becomes
available for each distinct release
The context

Widely-deployed production software
systems:




Many software and hardware configurations in use
Unknown deployment and usage patterns
Constrained development process
Evolving contents over time over multiple releases
The research problem

How do you predict the rate of defect
occurrences?
Now
Defect occurrences
Release N
Release N-2 Release N-1
Release N+1
Months
The research questions

Is therethis
Given
a model
model,that
howdescribes
can we predict
the
model occurrence
defect
parameters pattern?
for the next release?
Defect occurrences
Now
?
?
Months
The research approach

In the context of widely-deployed
production software:


Perform analysis to develop hypotheses
concerning models/methods
Use real world data to empirically test
hypotheses
The data

User-reported defects in 22 releases of
four widely-deployed productions
software systems:




8 releases of a commercial operating system
3 releases of a commercial middleware system
8 releases of an open source operating system
(OpenBSD)
3 releases of an open source middleware system
(Jakarta Tomcat)
Relation to prior work

Software reliability modeling and software
certification:


Total number of defects prediction and
defect prone module identification:


Assume software and hardware configurations and
deployment and usage patterns are known
Produce results that are insufficient for maintenance
planning and software insurance
No work on projecting defect occurrence
rates for open source software systems
Part 1: which model to use?
Now
Defect occurrences
?
Months
Previously published models
Total number
of defect occurrences
Model type
Exponential
Goel & Okumoto [1979]
Weibull
Schick-Wolverton [1978]
Gamma
Yamada, Ohba, &
Osaki [1983]
Power
Duane [1964]
Logarithmic
Musa-Okumoto [1975]
Increasing component,
dominates when t is small
Model shape
Decreasing component,
dominates when t is large
Model form
λ(t) = N α e – α t
α
λ(t) = N α β t α-1 e – β t
λ(t) = N β α t α-1 e – β t
λ(t) = α β e – β t
λ(t) = α (α β t +1) – 1
Defect occurrences
Model comparison
Months
Model
AIC Score
Exponential model
110
Power model
113
Logarithmic model
112
Gamma model
90
Weibull model
83
Conclusion: Weibull is better



Has the best AIC score in 73% of the
releases
Is within the 95% C.I. of the best AIC
score in 95% of the releases
Is good despite differences in the type of
system, style of development, and the
kind of data
Part 2 : How to extrapolate
model parameters?
Weibull = N α β t
α-1
e– βt
α
Defect occurrences
Now
?
Months
Parameter extrapolation methods
Tomcat 4.0
β:
16.8946
Tomcat 3.3
β:
15.4439
.5
.5
Moving averages (2 releases)
estimate of Tomcat 4.1, β :
16.16925

.41
.59
Exponential smoothing (2 releases)
estimate of Tomcat 4.1, β :
16.29725
No consideration of similarities and
differences in characteristics between
historical releases and current release.
Extrapolation process
α=2.79 α=2.28 α=2.51
β=6.83 β=4.66 β=5.69
N known
α projected
β projected
Now
Defect occurrences
previous
projected
projection
uninformed
difference
guess
baseline difference
actual
t1
Months
t2
Defect projection evaluation
Releases/
System
one
release
two
releases
Open source
OS R2.8
1.06
0.70
Open source
OS R2.9
1.32
0.93
1.04
Open source
OS R3.0
0.87
0.42
0.43
0.44
Open source
OS R3.1
0.72
0.70
0.73
0.71
0.73
Open source
OS R3.2
0.76
0.91
0.87
0.99
0.97
1.02
Open source
OS R3.3
1.56
1.10
0.85
0.86
0.66
0.66
three
releases
four
releases
five
releases
six
releases
seven
releases
0.57
Theil statistics for forecasting experiments using moving averages method
Conclusion: Naïve methods
are inadequate



In 50% of forecasting experiments, more
information did not improve projections
In 44% of forecasting experiments, Theil
statistics are greater than or equal to 1
Methods that consider changes in
characteristics of widely-deployed
production software systems should be
considered
Summary

Results

Weibull model is the preferred model:


Naïve parameter extrapolation methods are
inadequate:


May allow us to quantify effects of changes in
characteristics by examining changes in parameter values
Motivates further work to capture and account for changes
in characteristics to improve projections
Accurate defect occurrence rate
projections may aid better planning and
may enabled software insurance
The end
Questions, suggestions, comments
Email: [email protected]
The AIC model selection
criterion
Number of observations
Number of model parameters
AIC = n log σ2 + 2 |S|




Variance
Bias
Residual standard error
Compares model fits with different number
of parameters.
Accounts for variance and bias.
Follows a ~ X2 (Chi-squared) distribution.
4 ~ 95% Confidence Interval.
The Theil forecasting statistic
A2
P
Historical releases:
Theil forecasting statistic:
A1
P2
√ (Σ(Actual – Predicted)2)
√( Σ(Actual)2)
P2
Parameter
extrapolation
method
Actual = (A2-A1)
Predicted = (P2-A1)
Special cases:
Perfect forecast: P2 = A2
(Actual – Predicted) = ((A2-A1) – (P2-A1)) = ((A2-A1) – (P2-A1))
= ((A2-A1) – (A2-A1)) = 0 → Theil statistic of 0
Uninformed forecast: P2 = A1
(Actual – Predicted) = ((A2-A1) – (P2-A1)) = ((A2-A1) – (A1-A1))
= ((A2-A1) – 0) = ((A2-A1) – 0) = Actual → Theil statistic of 1
Current release:

Empirical Evaluation of Defect Projection Models for Widely

Transcript Empirical Evaluation of Defect Projection Models for Widely

Directory