Transcript Slide 1
Prediction of Software Defects
SASQAG March 2004
by
Steve Neuendorf
S Neuendorf 2004
• Affiliations
– Independent Consultant
www.serv.net/~steve
– David Consulting Group
www.davidconsultinggroup.com
– Human Systems Knowledge Networks
www.hskni.com
S Neuendorf 2004
Faults, Defects and Failures
There is considerable disagreement about the definitions of defects,
errors, faults and failures. (Fenton)
FAULTS
DEFECTS
FAILURES
which are
observed as
introduce
are discovered
in products
Yes
Internal
Defects
or Errors
S Neuendorf 2004
Before
release
No
(External)
Defects
Defect Prediction
# Created
# Left
# Found
# Created?
# Left?
# Found?
# Found?
# Fixed?
S Neuendorf 2004
Defect Addition Rates
Phase
Defects per FP
Requirements
1.00
Design
1.25
Coding
1.75
Documentation
0.60
Bad Fixes
0.40
Total
5.00
Jones 5
S Neuendorf 2004
Defect Removal Efficency
10
9
Malpractice
Defects per FP
8
7
6
5
4
US Average
3
2
1
Best in Class
0
55%
60%
65%
70%
75%
80%
85%
90%
Defect Removal Efficiency
S Neuendorf 2004
95%
100%
Jones 5
Other Considerations
• Not much information about the
relationship between defects and failures
• No relationship between defect(s) and
failure severity
• No consistency in terminology (BSOD =
Bug?)
S Neuendorf 2004
Defect Prediction
• Many types of Models
– Process Models
– Multivariate Models
– Size and Complexity metrics
– Belief Models
–...
S Neuendorf 2004
The Components of a Process
Product Procedure Attributes Example
Fixed
Fixed
Fixed
Repetitive Manufacturing
Fixed
Fixed
Variable
Steel Building Erection
Fixed
Variable
Fixed
Manufacturing/Rework
~Fixed
Variable
Variable
Craft
Variable Fixed
~Fixed
Art
Variable ~Fixed
Variable
“Megalithic” Projects
Variable ~Variable
~Fixed
Software Engineering (>= SEI 2)
Variable Variable
Variable
Computer Programming (< SEI 2)
S Neuendorf 2004
Process Models
Management
Current Capability
Teams
Predicted
Performance
(e.g., E, Q or T)
Tools
Techniques
Technology
Process Model
Environment
Actual Performance
S Neuendorf 2004
Process Models
• Phase Containment Models
– Rely on history to identify
• How many defects were produced in each phase,
• How many defects from that phase were discovered and
corrected (Phase Containment)
– Predict defects for each phase and track discovery
and removal. Assume that defects predicted and not
found were passed to the next phase.
– Simple, easy to implement with common tools.
S Neuendorf 2004
Multivariate Models
• Analogous to Parametric Estimating
Models. (e.g., COQUALMO)
• Uses any of many variables and analysis
of the relationships of the values for those
variables and the results observed in
historic projects.
• Good if you have a good match for the
projects from which the model is created.
S Neuendorf 2004
Size and Complexity metrics
• Given enough “Defects per” data, all that
is needed is the “per What” to predict
defects – Right?
• Size and Complexity do not cause defects
– faults do.
S Neuendorf 2004
Using Testing Metrics
• Look at total defects found in each phase pre
and post release, including analysis as to where
the defect originated.
• Use statistics to determine total defects
introduced in each phase and use that as a
predictor.
• With a stable environment, very high confidence
(95%) is claimed.
• Good opportunity for Benchmarking Approach.
S Neuendorf 2004
Using Process Quality Data
SEI CMM Levels
Defect Potentials
Removal Efficiency
Delivered Defects
1
5
85%
0.75
2
4
90%
0.40
3
3
95%
0.15
4
2
97%
0.06
5
1
99%
0.01
Jones 5
S Neuendorf 2004
Bayesian Belief Networks (BBN)
• Seems to be the area of most interest.
• Combines history with reality.
• Old Concept (1763) – New life (we got
tools)
• Graphical network of the probabilistic
relationships between variables.
– Uses expert beliefs about the relationships
assess the the impact of evidence on the
probabilities of outcomes
S Neuendorf 2004
Example BBN
Reliability
Use
# of Latent Defects
Code Complexity
Coders’ Performance
Use of Standard
Staff Experience
Problem Complexity
Adapted from Fenton 4
S Neuendorf 2004
Node probability table (NPT) for
the node 'Reliability'
Use
low
Defects
Reliability
med
high
low
med
high
low
med
high
low
med
high
low
0.10
0.20
0.33
0.20
0.33
0.50
0.20
0.33
0.70
med
0.20
0.30
0.33
0.30
0.33
0.30
0.30
0.33
0.20
high
0.70
0.50
0.33
0.50
0.33
0.20
0.50
0.33
0.10
S Neuendorf 2004
References
1. A Bayesian Approach to Deriving Parameter Values for
a Software Defect Predictive Model; Robert Stoddard,
John Henderson, Ph.D.; Sixth International Conference
on the Applications of Software Measurement.
2. A Critique of Software Defect Prediction Models;
Norman Fenton, Martin Neil; Centre for Software
Reliability
3. Predicting Software Errors and Defects; Mark
Criscione, et.al.
4. Predicting Software Quality using Bayesian Belief
Networks; Martin Neil & Norman Fenton
5. Estimating Software Costs, Jones, McGraw-Hill, 1999
S Neuendorf 2004