Transcript Document

Predicting
Change Propagation
in Software Systems
Ahmed E. Hassan and Richard C. Holt
Software Architecture Group (SWAG)
University Of Waterloo
Change Propagation
• Central aspect of software development
• Mis-propagating likely to introduce bugs
• Many tools and approaches to assist:
– Dependency browsers / Visualization tools
– Slicing
– OO/AOP
• No good understanding of how changes propagate
The Change Propagation
Process
New Req., Bug Fix
“How does a change in one source code
entity propagate to other entities?”
Determine
Initial Entity
To Change
Change
Entity
Determine
Other Entities
To Change
Consult
Guru for
Advice
For Each Entity
Suggested Entity
No More
Changes
Change Propagation Example
Given a Set where A, B, C,
D changed together
A
C
D
B
Change Propagation Example
Given a Set where A, B, C,
D changed together
A
C
D
B
Change Propagation Example
Given a Set where A, B, C,
D changed together
X
A
C
D
B
Change Propagation Example
Given a Set where A, B, C,
D changed together
X
C
A
D
B
W
Y
Change Propagation Example
Given a Set where A, B, C,
D changed together
X
C
A
D
B
W
Y
Change Propagation Example
Given a Set where A, B, C,
D changed together
Precision= 2/5 = 40%
X
C
A
D
Recall = 2/3 = 66%
B
W
Y
Measuring the Performance
Pr edicted _ Entities_ That _ Changed
Pr ecision 
Pr edicted _ Entities
Pr edicted _ Entities_ That _ Changed
Re call 
Changed_ Entities
• We want:
– High Precision to avoid wasting time
– High Recall to avoid bugs
What are Good Predictors
(Heuristics)?
• Entity data:
– Historical co-change
– Code structure relations: (Call/Use/Define)
– Code layout (same file, directory)
• Developer data: prior/frequent changes
• Process data: recent changes
• Name Similarity data
Pruning Techniques
•
•
•
•
•
Frequency
Recency
Hybrid
Random
No pruning
Empirical Study
• Used change sets from 5 open source
projects with over 40 years of development:
– NetBSD, FreeBSD, OpenBSD, Postgres, GCC
• Change sets recovered from source control
repositories (CVS) using a specialized
extractor called C-REX
• Measure average Precision and Recall
Extracted Data
• CVS stores information at the line level
• We recover information at the entity level:
– Entities (func/var) added/removed
– Dependencies added/removed
• At any moment in time, we can track:
– Changed entities (change sets)
– The static dependency graph
– All developers who worked on an entity
Change Sets
• Using lexical technique we remove all
General Maintenance (indentation/copyright
update) sets
• We divide change sets into:
– Sets with new entities added
– Sets with no new entities
Studied Heuristics
• Developer (DEV)
• Entity based historical co-change (HIS)
• Entity based code structure: Call, Use,
Define (CUD)
• Entity based code layout (FIL)
Results
• CUD worst predictor:
– Only 42% of propagations are due to static depend.
• DEV has high recall (74%), very low precision
– Code ownership is not strictly enforced
• FIL has high recall (83%), low precision 12%
– But limited to same file
• HIS best predictor for propagation
– Recall 87%, low precision 6%
Results (Using Hybrid Heuristic)
• Combine HIS and FIL heuristic along with pruning:
Given a changed entity “E”, suggest all entities which
– Changed A% of time with “E”
– Changed A-B% of time with “E” AND are in the same file
Recall
51%
Precision 49%
We can suggest half of entities that should change and
half of our suggestions are correct (A=80, B=10)
Conclusion
• Simple Static dependencies are not a good
predictor of change propagation
• Historical co-change can successfully assist
in propagating changes
• Other heuristics are possible by combining
basic heuristics and pruning techniques
Special Issue of
IEEE TSE on MSR
• MSR: Mining Software Repositories
• Editors:
– Ahmed Hassan, Ric Holt, Audris Mockus, Philip Johnson
• Deadlines:
–
–
–
–
Intent to Submit: October 1, 2004
Submission Deadline: October 22, 2004
Notification of Acceptance: April 18, 2005
Publication Date: October 2005