Transcript Slide 0

Process Mining Software
Repositories
Wouter Poncin
Alexander Serebrenik
Mark van den Brand
Versioning
system
Mail archive
Bug
repository
Wiki
Log
FRamework for Analysing Software
Process mining
Sept. 23, 2010
Repositories
• Analysis
mining and
Alice sends a bill
visualization
ofpreprocessing
the
• and
analysis
Separation
of
analysis
to Model
the owner
of
tools
software
• visualization
the vehicle
VHReuse
of existing techniques
development
12-ST.
 Flexibility
…
Twitter
/ Department of Mathematics and Computer Science
20-7-2015
PAGE 1
Business processes vs.
Software processes
Business processes
Software processes
One data source
“Natural case”:
• association of different
events
• claim id, person id, vehicle id
Explicit events
Multiple data sources
Many different options
• files, developers, topics…
Unique data representation
Different representations
in different data sources
/ Department of Mathematics and Computer Science
Implicit events
• is the mail relevant to the
bug report?
20-7-2015
PAGE 2
How do we apply FRASR?
S.E. Question
Define data
sources
FRASR
Define case
mapping
/ Department of Mathematics and Computer Science
ProM
Attach
event
bindings
Calculate
developer
matching
Answer
Export
20-7-2015
PAGE 3
Case study 1: Developer roles
• Multiple sources + process mining (analysis)
• S.E. question
• Classify developers according to their roles
• Classification of Nakakoji et al. IWPSE 2002: 8 roles
• Core member involved for a relatively long period and
made significant contributions to the development and
evolution of the system
−  3 years (project run:  8 years)
− Version control: file added, file modified
− More version control events than average
S.E.
Question
FRASR
/ Department of Mathematics and Computer Science
ProM
Answer
20-7-2015
PAGE 4
Case study 1: System under
investigation
• aMSN: instant messaging application
•
•
•
•
•
38 million downloads, 20th most popular at SourceForge
February 26, 2002 – July 9, 2010
7 bug repositories: 3137 bug reports
3 mail archives: 34947 messages
Subversion: 12062 commits
Define
data
sources
S.E.
Question
Define
case
mapping
FRASR
/ Department of Mathematics and Computer Science
Attach
event
bindings
ProM
Calculate
developer
matching
Export
Answer
20-7-2015
PAGE 5
Case study 1: FRASR configuration
• We are interested in developers  case = developer
• Each data source type requires specific extraction
technique  event-binding = type-specific
• 1725 developers  matching = heuristic
Define
data
sources
S.E.
Question
Define
case
mapping
FRASR
/ Department of Mathematics and Computer Science
Attach
event
bindings
ProM
Calculate
developer
matching
Export
Answer
20-7-2015
PAGE 6
Case study 1: Results
Time
Developers
S.E.
Question
FRASR
/ Department of Mathematics and Computer Science
ProM
Answer
20-7-2015
PAGE 7
Case study 1: Results
ProM Dotted Chart visualization
Versioning: file
modified,
renamed or
deleted
Bug
ticket
created
S.E.
Question
Versioning:
file added
Other
bug
events
Mail
FRASR
/ Department of Mathematics and Computer Science
ProM
Answer
20-7-2015
PAGE 8
Case study 1: Results
Core developers (examples)
Problem in the original classification
Peripheral developers
Bug reporter
S.E.
Question
FRASR
/ Department of Mathematics and Computer Science
ProM
Answer
20-7-2015
PAGE 9
Case study 1: Classification
Role
Bug reporter
• x #developers
1443
Bug fixer
Peripheral developer
Active developer
3
29
6
Core member
Project leader
Other
7
3
234
Total
1725
/ Department of Mathematics and Computer Science
Bugs are
usually fixed
by peripheral
developers
Only ticketcommented
or mail-reply
20-7-2015
PAGE 10
Case study 2: Bug life cycle in Bugzilla
Theory according to the Bugzilla Guide
One source +
process mining
(mining)
/ Department of Mathematics and Computer Science
S.E. question:
Is Bugzilla used
the way it is
supposed to be?
20-7-2015
PAGE 11
Case study 2: Bug life cycle in Bugzilla
Practice vs. Theory
/ Department of Mathematics and Computer Science
Process model mined
from GCC Bugzilla
(42373 bugs)
20-7-2015
PAGE 12
Conclusions
Process mining software repositories
= separation of preprocessing and analysis
 flexibility and reuse of the existing techniques
?
sourc
es
FRASR
case
events
/ Department of Mathematics and Computer Science
ProM
develo
pers
!
export
20-7-2015
PAGE 13