Slice - Trusted Cloud Group

Download Report

Transcript Slice - Trusted Cloud Group

Intraprocedural Static
Slicing of Binary
Executables
Cristina Cifuentes
Antoine Fraboulet
University of Queensland
International Conference on Software
Maintenance(ICSM ’97)
INDEX

ICSM

Author

Motivation

Background

Basic knowledge

Algorithm
1st ICSM
Since its start in 1983, ICSM (International Conference on
Software Maintenance) has grown and developed into an
international forum for software maintenance researchers
and practitioners to examine key issues facing the
software maintenance community.
LINK: http://conferences.computer.org/icsm/
2nd AUTHOR

Cristina Cifuentes

Antoine Fraboulet
Cristina Cifuentes



Senior Staff Engineer for Sun Microsystems
Laboratories working in the Static Program
Analysis group
ACM SIGPLAN Executive Committee,
Treasurer and member-at-large, 2007-2012
Chair of the IEEE Reverse Engineering and
Reengineering Committee, Technical Council
on Software Engineering, 2002-2003
Technical Report





User-Input Dependence Analysis via Graph Reachability By: Bernard
Scholz, Chenyi Zhang and Cristina Cifuentes Report Number:TR-2008-171 Mar 31,
2008
Partitioning of Code for a Massively Parallel Machine By: Michael Ball, Cristina
Cifuentes and Deepankar Bairagi Report Number:TR-2004-134 Nov 1, 2004
A Transformational Approach to Binary Translation of Delayed Branches with
Applications to SPARC® and PA-RISC Instructions Sets By: Cristina
Cifuentes and Norman Ramsey Report Number:TR-2002-104 Jan 1, 2002
Experience in the Design, Implementation and Use of a Retargetable Static
Binary Translation Framework By: Cristina Cifuentes, Mike Van Emmerik, Brian
T.Lewis and Norman Ramsey Report Number:TR-2002-105 Jan 1, 2002
Walkabout-A Retargetable Dynamic Binary Translation Framework By: Brian T.
Lewis, David Ung and Cristina Cifuentes Report Number:TR-2002-106 Jan 1, 2002
Antoine Fraboulet




Associate Professor at INSA Lyon and a member of the
CITI Lab and of the INRIA Amazones group
Member of the program committee of t2pWSN
07 workshop
Member of the program committee of MSN
07 conference
Member of the program committee of MSN
06 conference

Member of the commitee of the TSI Journal

Organization committee of the RECAP 2006 workshop
Paper





Assembly to High-Level Language Translation Cifuentes Cristina; Simon Doug;
Fraboulet Antoine In International Conference on Software Maintenance (11/1998)
228-237
Loop Alignment for Memory Accesses Optimization Fraboulet Antoine ; Huard
Guillaume; Mignotte AnneIn International Symposium on System synthesis
(ISSS) (10/1999) 71-77
Memory Optimization of Data Flow Applications at the Codesign Level Fraboulet
Antoine ; Just-Meunier Laurence; Mignotte AnneIn Sophia Antipolis Forum on
MicroElectronics (SAME) (10/2000) 16-21
Loop fusion for memory space optimization Fraboulet Antoine ; Godary Karen ;
Mignotte AnneIn International Symposium on System Synthesis (10/2001) 95 – 100
Source Code Loop Transformations for Memory Hierarchy Optimizations
Fraboulet Antoine ; Mignotte AnneIn International Conference on Parallel Architectures
and Compilation Techniques. Workshop on MEmory access DEcoupled Architectures
(MEDEA) (2001) 6
Recommend


Ákos Kiss, Judit Jász, Gábor Lehotai, Tibor Gyimóthy. Interprocedural Static Slicing
of Binary Executables. 3rd IEEE International Workshop on Source Code Analysis
and Manipulation (SCAM 2003), 26-27 September 2003, Amsterdam, The Netherlands
2003
Ákos Kiss, Judit Jász, Tibor Gyimóthy. Using Dynamic Information in the
Interprocedural Static Slicing of Binary Executables. Software Quality Journal
2005, Volume 13
3rd MOTIVATION



Primary initial goal of slicing was to assist with debugging
Programmers naturally form program slices,mentally, when
they debug and understand programs
A program slice consists of the parts of a program that
potentially affect the values computed at some point of
interest (slicing criteria)
4th Background

Tools for executable programs

Decode the information stored in the binary file

Decode the machine instructions and translate them to an assembly representation or
an equivalent intermediate representation

Problem?

Separation of data and code

Huge amount of codes when debugging

cannot determine what paths to traverse next when faced with an indexed jump
instruction, or an indirect call or jump instruction on the value of a register
Q: Then solution?



Representation of Binaries
The general format of a binary executable varies widely based on the binary-file
format used by the OS
When running a program, the binary-file format is decoded by
the operating system’s loader, which loads the program into
memory and passes control to the program via its entry point
Workflow


Step1(binaryassembly or IL):
dasm not only decodes the binary file and its machine instructions, but also stores the
instructions in terms of low-level instructions(icode) and control flow graphs for each
procedure

Step2(assembly or IL  high-level language)

Dateflow analysis is performed in the CFGs to recover high-level information


Dasm:the disassembler of the dcc decompiler, which implements a decoder of the
EXE format and creates an intermediate representation of the program.
Icode:resemble assembly instructions with property of only performing one operation
at time
5th BASIC KNOWLEDGE
Program Slice
Classification:

Backward Slicing

Forward Slicing
Or

Static Slicing

Dynamic Slicing
Backward slice consists of all statements that the
computation at the slicing criteria may depend on
public class SimpleExample {
static int add(int a, int b){
return(a+b);
}
public static void main(final String[] arg){
int i = 1;
int sum = 0;
while (i < 11) {
sum = add(sum, i);
i = add(i, 1);
}
System.out.println("sum = " + sum);
System.out.println("i = " + i);
}
}
Slicing
Criterion
Forward slice includes all statements depending on the
slicing criterion
public class SimpleExample {
static int add(int a, int b){
return(a+b);
}
Slicing
public static void main(final String[] arg){
Criterion
int i = 1;
int sum = 0;
while (i < 11) {
sum = add(sum, i);
i = add(i, 1);
}
System.out.println("sum = " + sum);
System.out.println("i = " + i);
}
}
Static slice: all possible executions of the program are
taken into account.
Dynamic slice is constructed with respect to only one
execution of the program (iteration number is taken into
account)
Slicing Method

The most popular approaches are based on dependency
graphs (non-executable slices)
Q: How does a specific statement affect the others?
How to Determine a Slice
 Construct a Program Dependence Graph
A Combination of Data Dependency Graph and Control Dependency Graph
 Identify Data Dependency
1. a:=3
2. b:=a
b depends on a
 Identify Control Dependency
1.
2.
3.
4.
if a=true then
b:=1
else
c:=0
Both assignments depend on if statement
movl $0x0,0xfffffff8(%ebp)
cmpl $0x0,0xfffffff8(%ebp)
jne
0x8048475 <main+49>
movl $0x1,0xfffffffc(%ebp)
mov
$0x5,%eax
sub
0xfffffffc(%ebp),%eax
mov
%eax,0xfffffff4(%ebp)
jmp
0x8048485 <main+65>
movl $0x7,0xfffffffc(%ebp)
mov
0xfffffffc(%ebp),%eax
sub
$0x5,%eax
mov
%eax,0xfffffff4(%ebp)
mov
0xfffffffc(%ebp),%eax
mov
%eax,0xfffffff8(%ebp)
Control
Data Dependence
Dependence
Graph
Graph
movl $0x0,0xfffffff8(%ebp)
cmpl $0x0,0xfffffff8(%ebp)
jne
0x8048475 <main+49>
movl $0x1,0xfffffffc(%ebp)
mov
$0x5,%eax
sub
0xfffffffc(%ebp),%eax
mov
%eax,0xfffffff4(%ebp)
jmp
0x8048485 <main+65>
movl $0x7,0xfffffffc(%ebp)
mov
0xfffffffc(%ebp),%eax
sub
$0x5,%eax
mov
%eax,0xfffffff4(%ebp)
mov
0xfffffffc(%ebp),%eax
mov
%eax,0xfffffff8(%ebp)
Control Dependency Graph
 A node V is post-dominated by a node W if every directed
path from V to Stop contains W
 An instruction Y is control dependent on another instruction
X iff
There exists a directed path P from X to Y with another instruction Z in P, postdominated by Y
X is not post-dominated by Y
A
CFG
C
B
D
STOP
Post
Dominator
Tree
A
D
B
C
6th ALGORITHM

Determine the slice using the conventional
algorithm

Add unconditional jumps and returns to the slice

Fix jump labels
BASIC COMPONENTS

Lvalue
An lvalue is expressed as a pair of a base plus an offset. The
base address can be either the starting address for the storage
for a variable (local or global) or any pointer expression.
Step1

Start point

Using basic blocks as the node granularity

Control dependency

Control dependencies are based on the PDT of a procedure

Control flow analysis of the CFG for the purposes of recovering the
underlying control structures of a graph and their nesting level

Data dependency

Data dependencies are presented as ud-chains at the procedure level

Ud-chains are generated for each register and condition code used in an
instructon
Idiom analysis & dead-condition code elimination
Step2 & Step 3



Lexical successor tree is used to represent the next high-level statement at
the same nesting level of a given statement
An unconditional jump and a return instruction introduce a break in the flow of
control of the program so that they are added to the final slice if they are in
the path to the instructions in the slice
Fix target labels by checking all the jumps that belong to the slice
Thank you!