Transcript Document
Automatic Program Repair
With Evolutionary Computation
Westley Weimer
Computer Science Dept.
University of Virginia
Charlottesville, VA 22904
[email protected]
Stephanie Forrest
Claire Le Goues
ThanhVu Nguyen
Dept. of Computer Science
University of New Mexico
Albuquerque, NM 87131
[email protected]
Computer Science Dept.
University of Virginia
Charlottesville, VA 22904
[email protected]
Dept. of Computer Science
University of New Mexico
Albuquerque, NM 87131
[email protected]
Presented by:
Teodoro Rosati
CIS 601, Spring 2014
March 4, 2014
The material in this paper is taken from two original
publications, titled “A Genetic Programming Approach to
Automated Software Repair” (Genetic and Evolutionary
Computation Conference, 2009) and “Automatically Finding
Patches Using Genetic Programming” (Proceedings of the
2009 IEEE 31st International Conference on Software
Engineering, IEEE Computer Society).
What’s the Problem?
Finding bugs is relatively easy…
• Famous Costly Bugs
– FDIV Intel Pentium processor (1994): $500 million
– floating point unit had a flawed division table
– Y2K (1999). Cost: $500 billion
– 2 digit YR storage (e.g. 95) and “00” = 1900
– Mars Climate Crasher (1998). Cost: $125 million
– Imperial Units (lbs of force) Metric Units (Newtons)
• Many techniques and software solutions for
detecting and mitigating software errors:
–
–
–
–
Syntactic Bug Pattern Detection
Decompilation and Data Flow Analysis
Automated Theorem Proving
Model Checking
2
I Found The Bug,
Here’s the catch…now I have to fix it
3
One solution…
Manual Program Repair
• Up to 90% of total cost of a software project for
maintenance after delivery
– Modifying existing code, repairing defects,
maintaining code during its lifecycle
– Products often are
shipped with known
and unknown bugs
because lack of
development resources
4
A better solution…
Automatic Program Repair:
Traditional Program Analysis Methods
Evolutionary Computation (Genetic Programming)
5
Experimental Design
• Genetic Programming
• Evolves computer programs tailored to a task
• Meaning a program is modified using similar
pathways to genetic evolution (mutation/crossover)
• GP techniques have been applied to
unannotated off-the-shelf legacy C programs
• Individual variants V with the highest fitness are
selected for continued evolution
6
Evolutionary Computation
Genetic Programming
• Inspired by biological natural selection
– Endler’s guppy experiment:
• Diverse source population
– Guppies variously colored
• Natural selection on population
– Habitat variation, coarse vs. fine gravel
– Predator presence, sight based
– Evolution of drug resistant bacteria
• Source = normal + resistant bacteria
• Antibiotics promote resistant bacteria
7
Evolutionary Computation
Genetic Programming
• Natural selection and programs
– Population of Program Variants
• Variants/Individuals
8
Genetic Programming Repair
Technical Approach
1. What is it doing wrong?
– Input a set of negative (-) test cases that characterizes a
fault
2. What is it supposed to do?
– Input a set of positive (+) test cases that encode
functionality requirements
3. Where should we change it?
– Program locations of the (-) test cases
4. How should we change it?
– Insert, delete and swap program statements and control
flow. Insertions are preferred
5. When are we finished?
– First variant that passes (+) and (-) cases
– Minimize differences between variant and original
9
AST
Automatic Program Repair
Representation
• Abstract Syntax Tree (AST)
– C programs
• Genes: Statements are basic units
– Conditional “if (x>y) {max = x;}”
• Expressions within a statement
– “{max = x;}”
– Selection Actions:
Insert, Delete, Swap of Genes
____________________________
If ( x > y )
{
max = x;
}
If
>
Gene
x
=
y
max
10 x
Genetic Programming
Mutation Operators
Mutation Operator
– Insert, Delete or Swap a gene
11
Genetic Programming
Crossover Operators
Crossover Operator
– Between 2 Parent Sub-Trees
Crossback Operator
– Between Variant V and Parent
Measuring Fitness
• Variants are compiled
• Testcase evaluated in virtual machine/sandbox
• Fitness measured using formula:
fitness(P) = WPosT { t PosT P passes t }
+
WNegT { t NegT P passes t }
Note:
WPosT = weight of each successful positive test
WNegT = weight of each successful negative test
13
For example…
• December 31, 2008
– A bug was reported in Microsoft Zune
media players
• Zune would freeze when the value of the
input days is the last day of a leap year
(e.g. 10,593) INFINITE LOOP!
14
1.
What is it doing wrong? Negative
Test Cases
• Negative Test Case
– input days set to 10,593 (last day of leap year)
– program executes lines 1 – 16
– then repeats lines 3, 4, 8 and 11 infinitely
2.
What’s it supposed to do? Positive
Test Cases
• Positive Test Case
– input days set to 1,000 (non-leap year)
– program executes lines 1–8, 11-18 once as
expected
3.
Where should we change it?
• Program Locations visited when
executing the negative test cases
– lines 3, 4, 8 and 11
4.
How should we change it? INSERT
• Insert an entire statement or
gene
– stmtj is added to stmti
stmtj
4.
How should we change it? DELETE
• Delete an entire statement
or gene
– stmti is transformed into an
empty block statement
stmti
stmti
4.
How should we change it? SWAP
• Swap an entire statement
or gene
– Second statement stmtj is
chosen uniformly at random
from anywhere in the
program to replace stmti
stmti
stmtj
stmti
stmtj
5.
When we are finished? Minimize
Differences
• Variant V program passes all the test cases
– Minimization Step to discard unnecessary changes
– Average repair time = 42 seconds
21
Performance of the Zunebug Repair…
• Evolution of the Zunebug repair 1 GP trial
– The darker curve plots the average fitness of the population
– The lighter curve plots the fitness of the individual V primary repair
22
Performance of the Zunebug Repair…
• Evolution of the Zunebug repair with 20 positive
and 4 negative test cases (equally weighted)
– The boxes represented the average over 70 distinct trials
– The error bars represent one standard deviation.
23
Performance of the GP Algorithm…
Eleven defects repaired by Genetic Programming
24
Performance of the GP Algorithm…
• GP search time scales with weighted path size.
– 18 programs successfully repaired by GP (ave. of 100 runs)
– x-axis log10 of the weighted path length
– y-axis log10 of the total number of fitness evaluations
25
Caveats…
• Limitations (assumptions)
•
•
•
•
•
defect is reproducible
program behaves deterministically on test cases
postive test cases encode program requirements
no overlap in path taken by negative and postive test cases
existing program can provide repair statements
• Evolution
• Not rigorously tested (parameter values, selection strategies,
and operator design)
• Fault Localization
• critical to find viable fixes, but poorly understood
• Fitness Function
• Oversimplification
• Repair Quality
• dependent on a high-quality set of positive test cases
26
Future work…
• Generic set of repair templates for GP as source
code for mutations
• Extend with data structure definitions and variable
declarations
• Assembly- and bytecode-level repairs
• Testing on more sophisticated errors
– Race conditions
• Assessing size and distribution of bugs for
targeting
27
Questions
28