Perspectives on fault data quality

Download Report

Transcript Perspectives on fault data quality

Perspectives on fault data
quality
Tracy Hall
Reader in Software Engineering
Brunel University
Two short talks on this topic…
Schedule




Why are we interested in fault data?
What are the problems with the quality of
fault data?
How did we investigate the quality of fault
data?
What does all this mean?
The 3rd CREST Open Workshop,
27th January 2010
Why are we interested in fault
data?


The analysis of historical fault data could enable
us to predict potential fault hotspots in code
Lots of previous studies analysed fault data:


OSS repositories, NASA data, industrial data
Only a few directly address the problem of
extracting reliable fault data:


Zimmerman & Zeller Group (Saarland)
Ostrand & Weyuker
The 3rd CREST Open Workshop,
27th January 2010
What are the problems with the
quality of fault data?



Very little direct fault data around
Little use of bug reporting repositories
Mining faults from change repositories
problematic:




Identifying all elements of one change
Separating fault-fixing changes from other changes
Indirect relationship between fault fixes and faults
Problems exacerbated by:


Size of change repository
Reliability of data in repository
The 3rd CREST Open Workshop,
27th January 2010
How did we investigate the quality
of fault data?


Performed a small study using Barcode OSS
Chose Barcode as:


used by Meyers & Binkley to investigate the use of
program slicing metrics
Identify sets of fault data using three methods:
1. Manual analysis of change diffs
2. Keyword search
3. Size of change search
The 3rd CREST Open Workshop,
27th January 2010
1. Manual analysis of change diffs


Manually analysed 199 change diffs
Three researchers independently classified each
as either:





Fault fix
Not fault fix
Don’t know
Inter rater reliability score computed for
agreement level
Planned to use this as the baseline fault data set
The 3rd CREST Open Workshop,
27th January 2010
Researcher 2
1
DK
13
0
31
NF
8
0
77
Researcher 3
3
Researcher 1
F
DK
NF
F
29
0
25
DK
8
0
43
Researcher 3
Researcher 1
F
DK
NF
F
37
0
33
Researcher 2
2
F
DK
NF
F
31
7
32
DK
15
5
24
NF
8
39
38
1. 114/199 agreements Kappa .28
NF
21
0
73
2. 74/199 agreements Kappa .027
3. 102/199 agreements kappa .17
The 3rd CREST Open Workshop,
27th January 2010
2. Keyword search
The 3rd CREST Open Workshop,
27th January 2010
Keyword search results
Diff results
Change log analysis
F
NF
F
21
26
NF
6
78
99/131 agreements kappa .4
The 3rd CREST Open Workshop,
27th January 2010
3. Size of change search
The 3rd CREST Open Workshop,
27th January 2010
Diff results
Fixed window
F
NF
F
25
22
NF
19
65
90/131 agreements kappa .3
Diff results
Sliding window
F
NF
F
29
18
NF
21
63
92/131 agreements kappa .3
The 3rd CREST Open Workshop,
27th January 2010
What does all this mean?
The 3rd CREST Open Workshop,
27th January 2010