Slice - Trusted Cloud Group
Download
Report
Transcript Slice - Trusted Cloud Group
Intraprocedural Static
Slicing of Binary
Executables
Cristina Cifuentes
Antoine Fraboulet
University of Queensland
International Conference on Software
Maintenance(ICSM ’97)
INDEX
ICSM
Author
Motivation
Background
Basic knowledge
Algorithm
1st ICSM
Since its start in 1983, ICSM (International Conference on
Software Maintenance) has grown and developed into an
international forum for software maintenance researchers
and practitioners to examine key issues facing the
software maintenance community.
LINK: http://conferences.computer.org/icsm/
2nd AUTHOR
Cristina Cifuentes
Antoine Fraboulet
Cristina Cifuentes
Senior Staff Engineer for Sun Microsystems
Laboratories working in the Static Program
Analysis group
ACM SIGPLAN Executive Committee,
Treasurer and member-at-large, 2007-2012
Chair of the IEEE Reverse Engineering and
Reengineering Committee, Technical Council
on Software Engineering, 2002-2003
Technical Report
User-Input Dependence Analysis via Graph Reachability By: Bernard
Scholz, Chenyi Zhang and Cristina Cifuentes Report Number:TR-2008-171 Mar 31,
2008
Partitioning of Code for a Massively Parallel Machine By: Michael Ball, Cristina
Cifuentes and Deepankar Bairagi Report Number:TR-2004-134 Nov 1, 2004
A Transformational Approach to Binary Translation of Delayed Branches with
Applications to SPARC® and PA-RISC Instructions Sets By: Cristina
Cifuentes and Norman Ramsey Report Number:TR-2002-104 Jan 1, 2002
Experience in the Design, Implementation and Use of a Retargetable Static
Binary Translation Framework By: Cristina Cifuentes, Mike Van Emmerik, Brian
T.Lewis and Norman Ramsey Report Number:TR-2002-105 Jan 1, 2002
Walkabout-A Retargetable Dynamic Binary Translation Framework By: Brian T.
Lewis, David Ung and Cristina Cifuentes Report Number:TR-2002-106 Jan 1, 2002
Antoine Fraboulet
Associate Professor at INSA Lyon and a member of the
CITI Lab and of the INRIA Amazones group
Member of the program committee of t2pWSN
07 workshop
Member of the program committee of MSN
07 conference
Member of the program committee of MSN
06 conference
Member of the commitee of the TSI Journal
Organization committee of the RECAP 2006 workshop
Paper
Assembly to High-Level Language Translation Cifuentes Cristina; Simon Doug;
Fraboulet Antoine In International Conference on Software Maintenance (11/1998)
228-237
Loop Alignment for Memory Accesses Optimization Fraboulet Antoine ; Huard
Guillaume; Mignotte AnneIn International Symposium on System synthesis
(ISSS) (10/1999) 71-77
Memory Optimization of Data Flow Applications at the Codesign Level Fraboulet
Antoine ; Just-Meunier Laurence; Mignotte AnneIn Sophia Antipolis Forum on
MicroElectronics (SAME) (10/2000) 16-21
Loop fusion for memory space optimization Fraboulet Antoine ; Godary Karen ;
Mignotte AnneIn International Symposium on System Synthesis (10/2001) 95 – 100
Source Code Loop Transformations for Memory Hierarchy Optimizations
Fraboulet Antoine ; Mignotte AnneIn International Conference on Parallel Architectures
and Compilation Techniques. Workshop on MEmory access DEcoupled Architectures
(MEDEA) (2001) 6
Recommend
Ákos Kiss, Judit Jász, Gábor Lehotai, Tibor Gyimóthy. Interprocedural Static Slicing
of Binary Executables. 3rd IEEE International Workshop on Source Code Analysis
and Manipulation (SCAM 2003), 26-27 September 2003, Amsterdam, The Netherlands
2003
Ákos Kiss, Judit Jász, Tibor Gyimóthy. Using Dynamic Information in the
Interprocedural Static Slicing of Binary Executables. Software Quality Journal
2005, Volume 13
3rd MOTIVATION
Primary initial goal of slicing was to assist with debugging
Programmers naturally form program slices,mentally, when
they debug and understand programs
A program slice consists of the parts of a program that
potentially affect the values computed at some point of
interest (slicing criteria)
4th Background
Tools for executable programs
Decode the information stored in the binary file
Decode the machine instructions and translate them to an assembly representation or
an equivalent intermediate representation
Problem?
Separation of data and code
Huge amount of codes when debugging
cannot determine what paths to traverse next when faced with an indexed jump
instruction, or an indirect call or jump instruction on the value of a register
Q: Then solution?
Representation of Binaries
The general format of a binary executable varies widely based on the binary-file
format used by the OS
When running a program, the binary-file format is decoded by
the operating system’s loader, which loads the program into
memory and passes control to the program via its entry point
Workflow
Step1(binaryassembly or IL):
dasm not only decodes the binary file and its machine instructions, but also stores the
instructions in terms of low-level instructions(icode) and control flow graphs for each
procedure
Step2(assembly or IL high-level language)
Dateflow analysis is performed in the CFGs to recover high-level information
Dasm:the disassembler of the dcc decompiler, which implements a decoder of the
EXE format and creates an intermediate representation of the program.
Icode:resemble assembly instructions with property of only performing one operation
at time
5th BASIC KNOWLEDGE
Program Slice
Classification:
Backward Slicing
Forward Slicing
Or
Static Slicing
Dynamic Slicing
Backward slice consists of all statements that the
computation at the slicing criteria may depend on
public class SimpleExample {
static int add(int a, int b){
return(a+b);
}
public static void main(final String[] arg){
int i = 1;
int sum = 0;
while (i < 11) {
sum = add(sum, i);
i = add(i, 1);
}
System.out.println("sum = " + sum);
System.out.println("i = " + i);
}
}
Slicing
Criterion
Forward slice includes all statements depending on the
slicing criterion
public class SimpleExample {
static int add(int a, int b){
return(a+b);
}
Slicing
public static void main(final String[] arg){
Criterion
int i = 1;
int sum = 0;
while (i < 11) {
sum = add(sum, i);
i = add(i, 1);
}
System.out.println("sum = " + sum);
System.out.println("i = " + i);
}
}
Static slice: all possible executions of the program are
taken into account.
Dynamic slice is constructed with respect to only one
execution of the program (iteration number is taken into
account)
Slicing Method
The most popular approaches are based on dependency
graphs (non-executable slices)
Q: How does a specific statement affect the others?
How to Determine a Slice
Construct a Program Dependence Graph
A Combination of Data Dependency Graph and Control Dependency Graph
Identify Data Dependency
1. a:=3
2. b:=a
b depends on a
Identify Control Dependency
1.
2.
3.
4.
if a=true then
b:=1
else
c:=0
Both assignments depend on if statement
movl $0x0,0xfffffff8(%ebp)
cmpl $0x0,0xfffffff8(%ebp)
jne
0x8048475 <main+49>
movl $0x1,0xfffffffc(%ebp)
mov
$0x5,%eax
sub
0xfffffffc(%ebp),%eax
mov
%eax,0xfffffff4(%ebp)
jmp
0x8048485 <main+65>
movl $0x7,0xfffffffc(%ebp)
mov
0xfffffffc(%ebp),%eax
sub
$0x5,%eax
mov
%eax,0xfffffff4(%ebp)
mov
0xfffffffc(%ebp),%eax
mov
%eax,0xfffffff8(%ebp)
Control
Data Dependence
Dependence
Graph
Graph
movl $0x0,0xfffffff8(%ebp)
cmpl $0x0,0xfffffff8(%ebp)
jne
0x8048475 <main+49>
movl $0x1,0xfffffffc(%ebp)
mov
$0x5,%eax
sub
0xfffffffc(%ebp),%eax
mov
%eax,0xfffffff4(%ebp)
jmp
0x8048485 <main+65>
movl $0x7,0xfffffffc(%ebp)
mov
0xfffffffc(%ebp),%eax
sub
$0x5,%eax
mov
%eax,0xfffffff4(%ebp)
mov
0xfffffffc(%ebp),%eax
mov
%eax,0xfffffff8(%ebp)
Control Dependency Graph
A node V is post-dominated by a node W if every directed
path from V to Stop contains W
An instruction Y is control dependent on another instruction
X iff
There exists a directed path P from X to Y with another instruction Z in P, postdominated by Y
X is not post-dominated by Y
A
CFG
C
B
D
STOP
Post
Dominator
Tree
A
D
B
C
6th ALGORITHM
Determine the slice using the conventional
algorithm
Add unconditional jumps and returns to the slice
Fix jump labels
BASIC COMPONENTS
Lvalue
An lvalue is expressed as a pair of a base plus an offset. The
base address can be either the starting address for the storage
for a variable (local or global) or any pointer expression.
Step1
Start point
Using basic blocks as the node granularity
Control dependency
Control dependencies are based on the PDT of a procedure
Control flow analysis of the CFG for the purposes of recovering the
underlying control structures of a graph and their nesting level
Data dependency
Data dependencies are presented as ud-chains at the procedure level
Ud-chains are generated for each register and condition code used in an
instructon
Idiom analysis & dead-condition code elimination
Step2 & Step 3
Lexical successor tree is used to represent the next high-level statement at
the same nesting level of a given statement
An unconditional jump and a return instruction introduce a break in the flow of
control of the program so that they are added to the final slice if they are in
the path to the instructions in the slice
Fix target labels by checking all the jumps that belong to the slice
Thank you!