Memories of Bug Fixes
Download
Report
Transcript Memories of Bug Fixes
Memories of Bug Fixes
Sunghun Kim, Kai Pan, and E. James Whitehead Jr.,
University of California, Santa Cruz
Presented By Gleneesha Johnson
CMSC 838P, Fall 2006
Source Code Repositories
• Collection of code changes
• Most long developed projects have one
• Typically used to store histories and make
backups
• Knowledge contained within hasn’t been
fully leveraged
– Previous development experience
– Changes that fix bugs are particularly
interesting
Horizontal Bug Finding Techniques
• Applicable across all projects
• Based on pre-defined bug patterns,
theorem proving, and model checking
• ESC/Java, FindBugs, JLint, PMD
Vertical Bug Finding Techniques
• Aim for project-specific bugs
• Different project requirements, business logic, and
semantics
• BugMem
– Learn bug patterns over time
Bug Fix Memories
• Project-specific bug and fix knowledge
base
– Developed by analyzing history of bug fixes
– Can be used to detect bugs and suggests
fixes
Building Bug Fix Memories
• Extract source code, change logs, and deltas
– Kenyon
• File change
– Contains list of region pairs that show differences
between two file versions.
– Regions called hunks
• Consist of source code lines
• Identify changes where a bug was fixed
– Keyword search
– Search for references to bug reports
Hunks
• Hunk pair (HP) → deleted hunk (DH) (bug
hunk (BH)) & added hunk (AH) (fix hunk
(FH))
Extracting Components From
Hunks
• Extracts syntax patterns, components,
from hunks
• Parsing – extracts syntax components
• Normalizing – generalizes syntax to match
similar code
• Filtering – eliminates noise
– Information filtering
– Diff filtering
Raw Component Extraction
(Parsing)
• Preprocessing
• Basic syntax lines
– Differentiate between composite statements (if, for,
while) and simple statements (method call,
assignment, etc.)
– all multi-line simple statements are on a single line.
– conditional predicates of if, for, while, etc. all lie on a
single line.
• Raw components
– Based on abstract syntax tree of basic syntax line
– 4 kinds: static Java call, Java call, user-defined call,
and non-call
Normalization
• If the variable type of a raw component is
known, normalize variable
– i.e., foo.flag → Foo.flag
• Generate two components for raw
components that contain numeric,
boolean, char or string literals
– One with and one without normalization for
the literal
– i.e., i = 1 → int = 1 & int = int
Normalization Continued
• For a method call, actual parameters are
normalized to the type of the parameter
• Normalization level
– Indicates a component’s degree of
normalization amongst others extracted from
same raw component
– Level increases with degree of normalization
Examples
Information Filtering
• Information value – indicates how much
unique information a component carries
– Determined by summing information value of
its elements
• Information value threshold – used to filter
components with little unique information
– Two is used as a threshold in the paper
Diff Filtering
• Removes code unchanged between bug
and fix hunks
Searching Memories
• Bugs found by searching for matching
patterns in bug hunks
• Suggestions made by returning code in
corresponding fix hunks
• Several options for component searching
– Adjust degree of matching and omission of
very common components
Options
Evaluation
Evaluation Continued
• Determine half and full hit rates
– Half hit – bug previously seen
– Full hit – bug and fix previously seen
• Five open source projects
Evaluation Continued
Evaluation Continued
Evaluation Continued
Comparison With PMD
Limitations
• Missing Memories
• Not applicable in initial stage of a project
• Doesn’t catch cross-file relationships
Uses
• Bug Finding Tool
– BugMem
– Can be integrated into Eclipse
• Code example repository
– Useful to developers new to a project
The End!
Questions?