Program Slicing on Java bytecode (with Soot

Download Report

Transcript Program Slicing on Java bytecode (with Soot

Program Slicing on Java byte-code
for Locating Functional Concerns
Takashi Ishio† Ryusuke Niitani †
Gail Murphy‡ Katsuro Inoue †
†
Osaka University, Japan
‡ University of British Columbia, Canada
Department of Computer Science,
Graduate School of Information Science & Technology,
Osaka University
Concern Location

A functional concern is code that helps fulfill a
functional requirement.
 A software
maintenance task usually focuses on a
functional concern.

Concern location comprises “Search and Explore.”
 Search

grep or other feature location tools
 Explore

“interesting” methods
the interaction among the methods
call graph, class hierarchy tree, cross reference
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Example: Autosave function in jEdit

jEdit periodically saves the contents of text area.
 A user
can specify the frequency.

We can easily find
Autosave class,
Buffer.autosave() method and
BufferIORequest.autosave() method.

How the classes and methods are interacting?
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Exploring Interaction among methods

Important information: control-flow and data-flow.
 Which
method triggers the autosave function.
 Which class has a necessary data (e.g. filename).
 How a method saves the contents to a text file.

We have to read following classes:
Autosave, Buffer, BufferIORequest,
PerspeciveManager, VFSManager, FileVFS …
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Automated Concern Location

We are trying to extract
a concern graph from
code fragments specified
by a developer.
 Our
approach is based on
program slicing.
 Our tool is based on Soot,
a Java bytecode analysis
framework.
Code fragments
related to a functionality
Program Slicing
with Heuristics
a program slice
Slice-to-ConcernGraph
Translation
A concern graph
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Autosave concern graph
Input = Autosave.*(), Buffer.autosave(),
BufferIORequest.autosave()
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Program Slicing

Slicing extracts statements
related to criteria statements
specified by a user.
1.
A program P is converted to
a program dependence graph.


2.
use
vertices: statements in P
edges: control/data dependence relations
A user specifies “slicing criteria” statements in P.

3.
data
dependence
definition
1 i = 3;
2 if (a > 0) { control
3 print i;
dependence
<3,i>
4 }
The statements are translated into “criteria vertices” in the PDG.
A program slice, a set of statements that affect or depend on
criteria, is extracted by graph traversal from criteria vertices.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Slice including unrelated concerns

Slicing usually extracts many statements.
 A functional
unit is connected to other units by
control/data-flow.
 28% on average in C program†
slicing
activate
Autosave
reset
set/reset
autosave_dirty
flag
UndoManager
set
CompleteWord
† Binkley, D., Gold, N. and Harman, M.: An Empirical Study of Static
Program Slice Size. ACM TOSEM Vol.16, No.2, Article 8, April 2007.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Slicing with Barriers

A barrier is a vertex or an edge that
terminate graph traversal†.
slicing
activate
Autosave
reset
A barrier blocks graph traversal.
set/reset
autosave_dirty
flag
UndoManager
set
CompleteWord
† Krinke, J.: Slicing, Chopping, and Path Conditions with Barriers.
Software Quality Journal, Vol.12, No.4, pp.339-360,December 2004.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Similarity-based Barrier

The key idea is following:
if two methods are contributing to the same functionality,
the methods use similar methods, fields and classes.
Name Set NS(m) = a set of types, classes, methods and fields referred in m.
A long name is “tokenized”.
e.g. “java.io.File”  “java”, “io”, “File”, “java.io.File”
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Example of Similarity
package org.gjt.sp.util;
NS(IntegerArray.add)
class IntegerArray {
org.gjt.sp.util.IntegerArray,
private int[] array;
org, gjt, sp, util, integer, array,
private int len;
void, add, int, len, int[],
public void add(int num) {
java.lang.System, java, lang, system,
if(len >= array.length) {
arraycopy
int[] arrayN = new int[len * 2];
System.arraycopy(array,0,arrayN,0,len);
sim = 0.639
array = arrayN;
NS(IntegerArray.getSize)
}
org.gjt.sp.util.IntegerArray,
array[len++] = num;
org, gjt, sp, util, integer, array,
}
getSize, get, size, int, len
public final int getSize() {
return len;
NS(IntegerArray.setSize)
sim = 0.801
}
org.gjt.sp.util.IntegerArray,
public final void setSize(int len) {
org, gjt, sp, util, integer, array, len,
this.len = len;
setSize, set, size, void, int
}}
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Identifying Barriers

Program slicing is blocked at method m
if m is not related to slicing criteria
Similarity(m, C) ≦ threshold
C = a set of methods that contain slicing criteria vertices.
 A method
m is related to slicing criteria if
slicing criteria includes a method n
such that m is similar to n.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Slicing algorithm

Slicing with summary edges
and barriers
 defined
by Horwitz
 extended by Krinke

PDG based on Jimple code
 “jimple”
is an intermediate
representation for bytecode.



Code fragments
related to a functionality
Calculate similarity
for each method
Identify barriers
3-address code
Slicing with Barriers
Simple control-flow: “if” and “goto”
Independent of JVM stack operation
a program slice
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Visualizing a slice as a concern graph

Concern Graph


A vertex is a class, a method or a field.
An edge represents a relation between two vertices.
call, create, check, read, write, superclass, …
applied rule-based translation.†


We
Slice
v1 in m1
Concern Graph
v2 in m2
call
m2
m1
call or parameter
v1 in m1
read
m1
field
READ obj.field
† Kameda, D. and Takimoto, M.: Building Cocnern Graph Based on
Program Slicing. IPSJ Transactions on Programming, Vol.46, No.11 (Pro
26), pp.45-56.
inScience,
Japanese.
Department
of Computer
Graduate School of Information Science & Technology, Osaka University
A graphical output with Graphviz

We omit intra-class edges in graphical format.
 Detail
is provided in textual format.
e.g. “Autosave.setInterval(interval) calls
new Timer(interval, Autosave).”
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
The effectiveness of barriers

Barriers reduced concern graph size:
methods  20 methods
 Printable on an A3 or A4-sized paper
 1000

Comparing extracted graphs
with hand-made concern
graphs (not finished yet).
concern graph size on
6 maintenance tasks
on jEdit and our Slicer
Our previous experiment is reported in:
仁井谷竜介,石尾隆,井上克郎:
プログラムスライシングを用いた機能的
関心事の抽出手法の提案と実装.
PPL 2007. in Japanese.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Information extracted from Java program

To construct a dependence graph
 Control
dependence relation
 Data dependence relation
 Call Graph (with dynamic binding information)

To identify barriers
a
set of types, methods, fields referred in each
method m

To slice the dependence graph
 Mapping
source code to vertices
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Slicing Tool Overview
Soot Framework (http://www.sable.mcgill.ca/soot/)
Java
Class
Files
Jimple
Translator
Jimple
3-Address
Code
SPARK
Points-to Set
Analysis
Control-Flow
Data-flow
Analysis
Call Graph
Points-to Set
Annotated
Jimple
PDG
Constructor
Slicing
Criteria
Concern
Graph
Slicer
PDG
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Our effort to implement the system

The program size
 PDG Construction: 2731 LOC (without comments)
 Slicing: 9296 LOC (without comments)


slicing algorithms, heuristic functions and concern graph translation
We could implement the PDG construction phase
in two weeks:
 One week to understand how Soot works.
 The other week to implement code.

Soot enabled us to focus on the essential part of
the research idea.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Advantage of Soot
 A rich
analysis toolkit
 Soot
provides control-flow and data-flow for each
method.
 Jimple is simpler than source code and bytecode.
 Complex
Java statements are simplified during
compilation.
Control-flow
Exceptional
UnitGraph
Data-flow
use Method
Body
use
SmartLocalDefs
1
n
Unit
1
n
Value
is-a
Stmt
(Jimple code)
is-a
Expr
Local
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Limitation of Soot

Soot is not a program analysis framework.
 Soot
keeps all data in memory to compile Jimple code to
bytecode after the optimization.

Soot requires 2-4GB RAM to analyze jEdit and JDK.
 Soot
supports only the simple workflow: whole program
analysis (call-graph construction) followed by local
program analysis.

We cannot implement a statistics tool (whole-program analysis)
that uses the result of method-local analysis.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Summary

Concern location based on program slicing
 We
introduced heuristics in order to extract a functional
concern of interest to a developer.

Input is the same as a traditional program slicing.
 Most

of graphs can be printed on an A3-sized paper.
Soot framework reduced the implementation effort.
 Soot
is a good framework, but
we hope a framework specialized for program analysis.

easy-to-learn, extensible and scalable
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University