Program Slicing on Java bytecode (with Soot
Download
Report
Transcript Program Slicing on Java bytecode (with Soot
Program Slicing on Java byte-code
for Locating Functional Concerns
Takashi Ishio† Ryusuke Niitani †
Gail Murphy‡ Katsuro Inoue †
†
Osaka University, Japan
‡ University of British Columbia, Canada
Department of Computer Science,
Graduate School of Information Science & Technology,
Osaka University
Concern Location
A functional concern is code that helps fulfill a
functional requirement.
A software
maintenance task usually focuses on a
functional concern.
Concern location comprises “Search and Explore.”
Search
grep or other feature location tools
Explore
“interesting” methods
the interaction among the methods
call graph, class hierarchy tree, cross reference
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Example: Autosave function in jEdit
jEdit periodically saves the contents of text area.
A user
can specify the frequency.
We can easily find
Autosave class,
Buffer.autosave() method and
BufferIORequest.autosave() method.
How the classes and methods are interacting?
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Exploring Interaction among methods
Important information: control-flow and data-flow.
Which
method triggers the autosave function.
Which class has a necessary data (e.g. filename).
How a method saves the contents to a text file.
We have to read following classes:
Autosave, Buffer, BufferIORequest,
PerspeciveManager, VFSManager, FileVFS …
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Automated Concern Location
We are trying to extract
a concern graph from
code fragments specified
by a developer.
Our
approach is based on
program slicing.
Our tool is based on Soot,
a Java bytecode analysis
framework.
Code fragments
related to a functionality
Program Slicing
with Heuristics
a program slice
Slice-to-ConcernGraph
Translation
A concern graph
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Autosave concern graph
Input = Autosave.*(), Buffer.autosave(),
BufferIORequest.autosave()
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Program Slicing
Slicing extracts statements
related to criteria statements
specified by a user.
1.
A program P is converted to
a program dependence graph.
2.
use
vertices: statements in P
edges: control/data dependence relations
A user specifies “slicing criteria” statements in P.
3.
data
dependence
definition
1 i = 3;
2 if (a > 0) { control
3 print i;
dependence
<3,i>
4 }
The statements are translated into “criteria vertices” in the PDG.
A program slice, a set of statements that affect or depend on
criteria, is extracted by graph traversal from criteria vertices.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Slice including unrelated concerns
Slicing usually extracts many statements.
A functional
unit is connected to other units by
control/data-flow.
28% on average in C program†
slicing
activate
Autosave
reset
set/reset
autosave_dirty
flag
UndoManager
set
CompleteWord
† Binkley, D., Gold, N. and Harman, M.: An Empirical Study of Static
Program Slice Size. ACM TOSEM Vol.16, No.2, Article 8, April 2007.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Slicing with Barriers
A barrier is a vertex or an edge that
terminate graph traversal†.
slicing
activate
Autosave
reset
A barrier blocks graph traversal.
set/reset
autosave_dirty
flag
UndoManager
set
CompleteWord
† Krinke, J.: Slicing, Chopping, and Path Conditions with Barriers.
Software Quality Journal, Vol.12, No.4, pp.339-360,December 2004.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Similarity-based Barrier
The key idea is following:
if two methods are contributing to the same functionality,
the methods use similar methods, fields and classes.
Name Set NS(m) = a set of types, classes, methods and fields referred in m.
A long name is “tokenized”.
e.g. “java.io.File” “java”, “io”, “File”, “java.io.File”
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Example of Similarity
package org.gjt.sp.util;
NS(IntegerArray.add)
class IntegerArray {
org.gjt.sp.util.IntegerArray,
private int[] array;
org, gjt, sp, util, integer, array,
private int len;
void, add, int, len, int[],
public void add(int num) {
java.lang.System, java, lang, system,
if(len >= array.length) {
arraycopy
int[] arrayN = new int[len * 2];
System.arraycopy(array,0,arrayN,0,len);
sim = 0.639
array = arrayN;
NS(IntegerArray.getSize)
}
org.gjt.sp.util.IntegerArray,
array[len++] = num;
org, gjt, sp, util, integer, array,
}
getSize, get, size, int, len
public final int getSize() {
return len;
NS(IntegerArray.setSize)
sim = 0.801
}
org.gjt.sp.util.IntegerArray,
public final void setSize(int len) {
org, gjt, sp, util, integer, array, len,
this.len = len;
setSize, set, size, void, int
}}
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Identifying Barriers
Program slicing is blocked at method m
if m is not related to slicing criteria
Similarity(m, C) ≦ threshold
C = a set of methods that contain slicing criteria vertices.
A method
m is related to slicing criteria if
slicing criteria includes a method n
such that m is similar to n.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Slicing algorithm
Slicing with summary edges
and barriers
defined
by Horwitz
extended by Krinke
PDG based on Jimple code
“jimple”
is an intermediate
representation for bytecode.
Code fragments
related to a functionality
Calculate similarity
for each method
Identify barriers
3-address code
Slicing with Barriers
Simple control-flow: “if” and “goto”
Independent of JVM stack operation
a program slice
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Visualizing a slice as a concern graph
Concern Graph
A vertex is a class, a method or a field.
An edge represents a relation between two vertices.
call, create, check, read, write, superclass, …
applied rule-based translation.†
We
Slice
v1 in m1
Concern Graph
v2 in m2
call
m2
m1
call or parameter
v1 in m1
read
m1
field
READ obj.field
† Kameda, D. and Takimoto, M.: Building Cocnern Graph Based on
Program Slicing. IPSJ Transactions on Programming, Vol.46, No.11 (Pro
26), pp.45-56.
inScience,
Japanese.
Department
of Computer
Graduate School of Information Science & Technology, Osaka University
A graphical output with Graphviz
We omit intra-class edges in graphical format.
Detail
is provided in textual format.
e.g. “Autosave.setInterval(interval) calls
new Timer(interval, Autosave).”
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
The effectiveness of barriers
Barriers reduced concern graph size:
methods 20 methods
Printable on an A3 or A4-sized paper
1000
Comparing extracted graphs
with hand-made concern
graphs (not finished yet).
concern graph size on
6 maintenance tasks
on jEdit and our Slicer
Our previous experiment is reported in:
仁井谷竜介,石尾隆,井上克郎:
プログラムスライシングを用いた機能的
関心事の抽出手法の提案と実装.
PPL 2007. in Japanese.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Information extracted from Java program
To construct a dependence graph
Control
dependence relation
Data dependence relation
Call Graph (with dynamic binding information)
To identify barriers
a
set of types, methods, fields referred in each
method m
To slice the dependence graph
Mapping
source code to vertices
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Slicing Tool Overview
Soot Framework (http://www.sable.mcgill.ca/soot/)
Java
Class
Files
Jimple
Translator
Jimple
3-Address
Code
SPARK
Points-to Set
Analysis
Control-Flow
Data-flow
Analysis
Call Graph
Points-to Set
Annotated
Jimple
PDG
Constructor
Slicing
Criteria
Concern
Graph
Slicer
PDG
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Our effort to implement the system
The program size
PDG Construction: 2731 LOC (without comments)
Slicing: 9296 LOC (without comments)
slicing algorithms, heuristic functions and concern graph translation
We could implement the PDG construction phase
in two weeks:
One week to understand how Soot works.
The other week to implement code.
Soot enabled us to focus on the essential part of
the research idea.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Advantage of Soot
A rich
analysis toolkit
Soot
provides control-flow and data-flow for each
method.
Jimple is simpler than source code and bytecode.
Complex
Java statements are simplified during
compilation.
Control-flow
Exceptional
UnitGraph
Data-flow
use Method
Body
use
SmartLocalDefs
1
n
Unit
1
n
Value
is-a
Stmt
(Jimple code)
is-a
Expr
Local
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Limitation of Soot
Soot is not a program analysis framework.
Soot
keeps all data in memory to compile Jimple code to
bytecode after the optimization.
Soot requires 2-4GB RAM to analyze jEdit and JDK.
Soot
supports only the simple workflow: whole program
analysis (call-graph construction) followed by local
program analysis.
We cannot implement a statistics tool (whole-program analysis)
that uses the result of method-local analysis.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Summary
Concern location based on program slicing
We
introduced heuristics in order to extract a functional
concern of interest to a developer.
Input is the same as a traditional program slicing.
Most
of graphs can be printed on an A3-sized paper.
Soot framework reduced the implementation effort.
Soot
is a good framework, but
we hope a framework specialized for program analysis.
easy-to-learn, extensible and scalable
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University