Transcript Document

Week 3 Presentation
Istehad Chowdhury
CISC 864
Mining Software Engineering Data
Research Paper
Who Should Fix This Bug?
John Anvik, Lyndon Hiew and Gail C. Murphy
Department of Computer Science
University of British Columbia
{janvik, lyndonh, murphy}@cs.ubc.ca
Problem with Open Bug
Repository

Overall, to cope with the surge of bugs in
large open source projects.


Many bug reports are invalid or duplicate of
another bug report


“Everyday, almost 300 bugs appear that need triaging. This
is far too much for only the Mozilla programmers to handle.”
Eclipse, 36%
Every bug report should be triaged


To check validity and duplicity
To assign the bug to an appropriate developer
Problem


cont..
Triager may not be sure whom to assign the
bug.
Lot of time is wasted in reassigning and
regaining

24% reports in Eclipse are re-assigned
The research work

Goal:


Technique:


suggest whom to assign this bug to
Using data mining and machine learning
Result:

60% precision and 10% recall
Precision and Recall
Life Cycle of a Bug Report
Roles





Reporter/Submitter
Resolver
Contributor
Triager
The roles are overlapping
Approach to the problem

Semi automated
1.
2.
3.
4.
Characterizing bug reports
Assigning a label to each report
Choosing reports to train the supervised
machine learning algorithm
Applying the algorithm to create the classifier
for recommending assignments.
Heuristics on labeling bug
reports

FIXED (who provided last approved patch),
Firefox

FIXED (whoever marked report as resolved),
Eclipse

DUPLICATE: whoever resolved the report is
duplicate. Eclipse and Firefox

WORKSFORME (Firefox) -- unclassifiable.
Experimental Results
Fig. Recommender accuracy and recall
Validating Results with GCC



Why so poor result?
Why recall is low in all cases, esp. gcc?
Shows need of similarity in project natures.
Trying Alternatives
Trying Alternatives
cont..

Unsupervised Machine learning

Incremental Machine learning

Incorporating Additional sources of Data

Component based classifier
Component based classifier
Points to Ponder
Points to Ponder


cont..
Are new developers assigned any bug?
“Needs further study to context of
which it can be applied”-empirical
research
Points to Ponder

Was there enough instances to evaluate using
Cross Validation?


cont..
For firefox 75%, gcc 86% developers have less
than 100 reports
Why was the labeling mechanism more
successful in case of gcc and Eclipse than
firefox?

1% for Eclipse, 47% for firefox
Points in favor

The research work was very intense

Thoroughly studied


Honest in identifying the limitations and
smart pointing out of the future works
It opens up interesting doors of future
research
Points Against


The study may not be suitable for a
environment where there is a frequent
change in the active set of developers
The findings are too project specific and
works well on “actual bugs” reports
Points Against


cont..
If there is any naivety in the heuristics it also
propagates to the filtering process based on
the heuristics to train the classifier.
I liked the way included the lesson learned
section. However, the authors should have
explained in more details how the mappings
were done .
Concluding Remarks


It shows promise for improving the bug
assignment problem for OSS
“Coordination bug reports and CVS is
challenging”

The effort is worth praising

Identifies need for further research
Questions and Comments?