Software Reverse Engineering Education

Download Report

Transcript Software Reverse Engineering Education

Software Reverse Engineering Education
http://www.reversingproject.info
Teodoro Cipresso, [email protected]
San José State University, Spring 2009
Advisor: Dr. Mark Stamp
Committee: Dr. Robert Chun, Dr. David Taylor
http://www.reversingproject.info
Background Information
Introduction to Software Reverse Engineering



Software Reverse Engineering (SRE) can be described as
the practice of analyzing a software system to create
abstractions that identify the individual components and
their dependencies, and, if possible, the overall system
architecture [1].
Once the components and design of an existing system
have been recovered, it becomes possible to repair and
even enhance them.
Reverse engineering skills are also used to detect and
neutralize viruses, worms and other malware, as well as
to protect intellectual property [1].
2
http://www.reversingproject.info
Background Information (cont’d)
Importance of SRE Education


“More emphasis is needed in SE [and CS] undergraduate
and graduate programs on the issue of software evolution
and change. Students need to be educated on the theory
and practice of software comprehension, maintenance and
reengineering. They need to learn how to live with the
monsters from the past and tame them” [2].
“Most of the time, students are trained in developing very
small programs starting from scratch. This approach is
really misleading since most students learn to believe that
software engineering is just about developing brand new
software. In fact many students will be involved in
evolution-related activities after completion of their
studies” [3].
3
http://www.reversingproject.info
Background Information (cont’d)
Student Feedback on SRE Education


Incorporation of software reverse engineering techniques
and methodologies into regular course work was tried at
the University of Missouri-Rolla [1].
The results of this experiment were quite positive:


77% of students thought that the incorporation of SRE
techniques and methodologies reinforced concepts
taught during lectures.
82% of students wanted SRE to be included in future
courses, especially those that deal with software
design.
4
http://www.reversingproject.info
Background Information (cont’d)
Development-Related Reversing Scenarios
Figure 1. Development-related software reverse engineering scenarios.
5
http://www.reversingproject.info
Background Information (cont’d)
Security-Related Reversing Scenarios
Figure 2. Security-related software reverse engineering scenarios.
6
http://www.reversingproject.info
Background Information (cont’d)
Legacy Software Development Process
Figure 3. Software development process in a typical enterprise software system.
7
http://www.reversingproject.info
Project Overview
Baseline Education in Software Reverse Engineering
Computer
programmers with
an improved ability
to understand,
evolve, and secure
software.
Educate
programmers on
software reversing,
antireversing, and
patching
Educate
programmers on
software
reengineering and
reuse
Educate
programmers on
software security
and malware
detection
Figure 4. Activities related to providing a baseline SRE education.
8
http://www.reversingproject.info
Materials and Methods



More than ten peer-reviewed articles on the topics of
software reverse engineering, re-engineering,
maintenance, reuse, and security were selected and used
to address the research questions.
Of the articles selected, three were chosen for their
specific coverage of experiences with teaching courses in
software reversing, reengineering, and maintenance.
Drew upon my experience, which is just shy of a decade,
with designing and developing legacy software
modernization tools at IBM.
9
http://www.reversingproject.info
Results
Overview of Developed SRE Course Modules

Reversing and Patching Wintel Machine Code

Reversing and Patching Java Bytecode

Applying Anti-Reversing Techniques to Machine Code

Applying Anti-Reversing Techniques to Java Bytecode

Reengineering and Reuse of Legacy Software

Identifying, Monitoring, and Reporting Malware
10
http://www.reversingproject.info
Results (cont’d)
Overview of Developed SRE Course Modules

Reversing and Patching Wintel Machine Code

Reversing and Patching Java Bytecode

Applying Anti-Reversing Techniques to Machine Code

Applying Anti-Reversing Techniques to Java Bytecode

Reengineering and Reuse of Legacy Software

Identifying, Monitoring, and Reporting Malware
11
http://www.reversingproject.info
Results (cont’d)
Reversing and Patching Wintel Machine Code



An introduction to the compilation of high-level languages
to machine code is provided. Assembly is contrasted as
having a one-to-one mapping to machine code
The negative results of experimentation with two
decompilers (Boomerang and REC) for machine code are
documented. Given the current state of decompiler
technology, it was concluded that working with
disassembly is the most feasible approach.
A Wintel machine code reversing and patching exercise
was developed against Password Vault, a non-trivial
application that is provided with the exercise to avoid any
legal concerns with reversing software written by others.
12
http://www.reversingproject.info
Results (cont’d)
Reversing and Patching Wintel Machine Code (cont’d)



The machine code reversing and patching exercise asks
the learner to create a new executable version of the
application that no longer has a trial limitation of five
password records per user.
A reliable, and repeatable reversing strategy is used:
place a breakpoint on a memory artifact and trace back
stack frames to locate the section in the disassembly.
For instructional purposes, an animated solution that
demonstrates the application of this reversing strategy
using OllyDbg, an interactive debugger-disassembler, was
developed using Qarbon Viewlet Builder.
13
http://www.reversingproject.info
Results (cont’d)
Reversing and Patching Wintel Machine Code (cont’d)
Figure 5. Animated solution to the Wintel reversing and patching exercise.
14
http://www.reversingproject.info
Results (cont’d)
Reversing and Patching Wintel Machine Code (cont’d)
Figure 6. Animated solution to the Wintel reversing and patching exercise.
15
http://www.reversingproject.info
Results (cont’d)
Reversing and Patching Wintel Machine Code (cont’d)
Figure 7. Animated solution to the Wintel reversing and patching exercise.
16
http://www.reversingproject.info
Results (cont’d)
Reversing and Patching Wintel Machine Code (cont’d)
Figure 8. Animated solution to the Wintel reversing and patching exercise.
17
http://www.reversingproject.info
Results (cont’d)
Reversing and Patching Wintel Machine Code (cont’d)
Figure 9. Animated solution to the Wintel reversing and patching exercise.
18
http://www.reversingproject.info
Results (cont’d)
Reversing and Patching Wintel Machine Code (cont’d)
Figure 10. Animated solution to the Wintel reversing and patching exercise.
19
http://www.reversingproject.info
Results (cont’d)
Reversing and Patching Wintel Machine Code (cont’d)
Figure 11. Animated solution to the Wintel reversing and patching exercise.
20
http://www.reversingproject.info
Results (cont’d)
Reversing and Patching Wintel Machine Code (cont’d)
Figure 12. Animated solution to the Wintel reversing and patching exercise.
21
http://www.reversingproject.info
Results (cont’d)
Reversing and Patching Wintel Machine Code (cont’d)
Figure 13. Animated solution to the Wintel reversing and patching exercise.
22
http://www.reversingproject.info
Results (cont’d)
Reversing and Patching Wintel Machine Code (cont’d)
Figure 14. Animated solution to the Wintel reversing and patching exercise.
23
http://www.reversingproject.info
Results (cont’d)
Reversing and Patching Wintel Machine Code (cont’d)
Idea for an advanced Wintel machine code (**) exercise:

It should be feasible to patch in additional function to the
Password Vault machine code:



The GCC compiler can generate assembly language
instead of machine code, so the programmer can work
in a high-level language.
Patching in the generated assembly code would require
some significant amount of time spent in the program
understanding phase.
Final integration of the new code would require
modification of the Windows PE header to increase the
size of the .code section, also the .rdata and .data
sections if new variables and constants are added.
24
http://www.reversingproject.info
Results (cont’d)
Overview of Developed SRE Course Modules

Reversing and Patching Wintel Machine Code

Reversing and Patching Java Bytecode

Applying Anti-Reversing Techniques to Machine Code

Applying Anti-Reversing Techniques to Java Bytecode

Reengineering and Reuse of Legacy Software

Identifying, Monitoring, and Reporting Malware
25
http://www.reversingproject.info
Results (cont’d)
Reversing and Patching Java Bytecode




An introduction to interpreted/intermediate executable
formats such as Java bytecode is provided. These formats
are contrasted with machine code and assembly language.
Java bytecode “disassembly” using javap is covered for
help with analysis of bytecode generated by javac.
The positive results of experimentation with the Jad Java
bytecode decompiler are documented; it is concluded that
direct reading/writing of bytecode is not necessary.
A Java bytecode reversing and patching exercise was
developed against a Java version of Password Vault.
26
http://www.reversingproject.info
Results (cont’d)
Reversing and Patching Java Bytecode (cont’d)



The Java bytecode reversing and patching exercise asks
the learner to create a new executable version of the
application that no longer has a trial limitation of five
password records per user.
Since the Password Vault application consists of a small
number of classes in a single package, a simple reversing
strategy of unpacking the Jar archive, batch decompiling
the classes, modifying the generated Java source, and
recompiling is used.
For instructional purposes, an animated solution that
demonstrates the application of this reversing strategy
using FrontEnd Plus, a graphical interface to Jad, was
developed using Qarbon Viewlet Builder.
27
http://www.reversingproject.info
Results (cont’d)
Reversing and Patching Java Bytecode (cont’d)
Figure 15. Animated solution to the Java bytecode reversing and patching exercise. 28
http://www.reversingproject.info
Results (cont’d)
Reversing and Patching Java Bytecode (cont’d)
Figure 16. Animated solution to the Java bytecode reversing and patching exercise. 29
http://www.reversingproject.info
Results (cont’d)
Reversing and Patching Java Bytecode (cont’d)
Figure 17. Animated solution to the Java bytecode reversing and patching exercise. 30
http://www.reversingproject.info
Results (cont’d)
Reversing and Patching Java Bytecode (cont’d)
Figure 18. Animated solution to the Java bytecode reversing and patching exercise. 31
http://www.reversingproject.info
Results (cont’d)
Reversing and Patching Java Bytecode (cont’d)
Figure 19. Animated solution to the Java bytecode reversing and patching exercise. 32
http://www.reversingproject.info
Results (cont’d)
Reversing and Patching Java Bytecode (cont’d)
Figure 20. Animated solution to the Java bytecode reversing and patching exercise. 33
http://www.reversingproject.info
Results (cont’d)
Reversing and Patching Java Bytecode (cont’d)
Figure 21. Animated solution to the Java bytecode reversing and patching exercise. 34
http://www.reversingproject.info
Results (cont’d)
Reversing and Patching Java Bytecode (cont’d)
Figure 22. Animated solution to the Java bytecode reversing and patching exercise. 35
http://www.reversingproject.info
Results (cont’d)
Reversing and Patching Java Bytecode (cont’d)
Idea for an advanced Java bytecode (**) exercise:

Use available Java class libraries, such as jclasslib, to
directly read and write Java bytecode.



Write a Java program that scans through the bytecode
for the Java Password Vault application and locates the
instructions for the trial limitation.
Once the instructions are located, overwrite them with
a sequence that disables the trial limitation.
This can be good practice for getting a feel for writing
code that patches an executable.
36
http://www.reversingproject.info
Results (cont’d)
Overview of Developed SRE Course Modules

Reversing and Patching Wintel Machine Code

Reversing and Patching Java Bytecode

Applying Anti-Reversing Techniques to Machine Code

Applying Anti-Reversing Techniques to Java Bytecode

Reengineering and Reuse of Legacy Software

Identifying, Monitoring, and Reporting Malware
37
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Machine Code



An brief introduction to basic anti-reversing techniques is
provided: Eliminating Symbolic Information, Obfuscating
the Program, and Embedding Anti-Debugger Code.
Machine code typically has very little symbolic information
that can be altogether eliminated, therefore a discussion
illustrates how debuggers insert quite a bit of information
that makes machine code easier to reverse.
The technique Obfuscating the Program, is demonstrated
in a Wintel machine code anti-reversing exercise where
data, computation, and control flow obfuscations are
applied to the C++ source code for Password Vault.
38
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Machine Code (cont’d)


Commercial tools such as EXECryptor www.strongbit.com,
fully obfuscate and pack Windows executables, using
advanced algorithms that are based on the elementary
techniques described in this module.
It is difficult to provide a “before and after” illustration of
machine code that is obfuscated using EXECryptor, so the
examples and exercise in this module are implemented
first at the source code level and then confirmed in the
machine code using live and static analysis.

In the case of control-flow obfuscation, only static
analysis is used, where subsequent run traces are
compared using an edit-distance measurement.
39
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Machine Code (cont’d)

The Wintel machine code anti-reversing exercise asks the
learner to create a new executable version of the
Password Vault application where the following
transformations are applied:



Encryption of string literals (data obfuscation).
Obfuscation of the numeric representation of the
password record limit (computation obfuscation).
Obfuscation of the method that performs the record
limit check (control flow obfuscation).
40
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Machine Code (cont’d)

Encryption of String Literals (data obfuscation):
Figure 23. Strings are decrypted each time they are used using a bundled cipher. 41
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Machine Code (cont’d)

Obfuscation of the numeric representation of the
password record limit (computation obfuscation):
Figure 24. Complex evaluations obscure the actual condition.
42
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Machine Code (cont’d)

Obfuscation of the numeric representation of the
password record limit (computation obfuscation) (cont’d):
Figure 25. Testing for a function of a number can slow a reverser down.
43
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Machine Code (cont’d)

Obfuscation of the method that performs the record limit
check (control flow obfuscation):



We introduce some non-essential, recursive, and
randomized logic to the password limit check to make
it more difficult for a reverser to perform static and/or
live analysis.
Since no standards exist for control flow obfuscation, a
custom algorithm was designed to hinder live and
static analysis through use of recursive and
randomized procedure calls.
Recursion grows the stack considerably, making
stepping through the code difficult, while
randomization makes execution unpredictable
(breakpoints may not trigger & run traces differ).
44
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Machine Code (cont’d)
Depth of the recursion is
randomized on each check
of the limit.
Random procedure call
targets generate and
return a number that is
added to an instance
variable, preventing the
procedures from being
identified as NOOPs by a
code optimizer.
Figure 26. A control flow obfuscation algorithm for the record limit check.
45
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Machine Code (cont’d)



To measure the effectiveness of the control flow algorithm
in hindering analysis, three execution traces of the section
of the code containing the record limit check were
compared.
The Levenshtein Distance (LD) was computed between the
three traces where each instruction in the trace was
compared. LD was modified to consider each line as
opposed to each character.
The execution traces were collected using OllyDbg and
had to be cleaned of disassembly artifacts such as line
numbers, base addresses, and comments in order to
ensure that the analysis was fair.
46
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Machine Code (cont’d)
Figure 27. Comparison of executions of record limit check on identical program input.
47
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Machine Code (cont’d)



The Wintel anti-reversing module also demonstrates
source code obfuscation which is a useful anti-reversing
technique for source code.
There may exist a requirement to ship the source code of
an application so that the machine code can be generated
on the end user’s computer.
If the source code contains intellectual property that is
worth protecting, one can perform transformations to the
source code which make it difficult to read, but have no
impact on the machine code that would ultimately be
generated when the program is compiled.
48
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Machine Code (cont’d)

Demonstration of the COBF source code obfuscator:
VerifyPassword.cpp:
01: int main(int argc, char *argv[])
02: {
03:
const char *password = "jup!ter";
04:
string specified;
05:
cout << "Enter password: ";
06:
getline(cin, specified);
07:
if (specified.compare(password) == 0)
08:
{
09:
cout << "[OK] Access granted." << endl;
10:
} else
11:
{
12:
cout << "[Error] Access denied." << endl;
13:
}
14: }
COBF invocation:
01: C:\cobf_1.06\src\win32\release\cobf.exe
02: @C:\cobf_1.06\src\setup_cpp_tokens.inv -o cobfoutput -b -p C:
03: \cobf_1.06\etc\pp_eng_msvc.bat VerifyPassword.cpp
49
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Machine Code (cont’d)
COBF obfuscated source for VerifyPassword.cpp:
01: #include"cobf.h"
02: ls lp lk;lf lo(lf ln,ld*lj[]){ll ld*lc="\x6a\x75\x70\x21\x74
03: \x65\x72";lh la;lb<<"\x45\x6e\x74\x65\x72\x20\x70\x61\x73\x73
04: \x77\x6f\x72\x64""\x3a\x20";li(lq,la);lm(la.lg(lc)==0){lb<<"\x5b
05: \x4f\x4b\x5d\x20\x41" "\x63\x63\x65\x73\x73\x20\x67\x72\x61\x6e
06: \x74\x65\x64\x2e"<<le;}lr{lb<<"\x5b\x45\x72\x72\x6f\x72\x5d
07: \x20\x41\x63\x63\x65\x73\x73\x20\x64" "\x65\x6e\x69\x65
08: \x64\x2e"<<le;}}
COBF generated header (cobf.h):
01:
02:
03:
04:
05:
06:
07:
08:
#define
#define
#define
#define
#define
#define
#define
#define
ls
lp
lk
lf
lo
ld
ll
lh
using
namespace
std
int
main
char
const
string
09:
10:
11:
12:
13:
14:
15:
#define
#define
#define
#define
#define
#define
#define
lb
li
lq
lm
lg
le
lr
cout
getline
cin
if
compare
endl
else
50
http://www.reversingproject.info
Results (cont’d)
Overview of Developed SRE Course Modules

Reversing and Patching Wintel Machine Code

Reversing and Patching Java Bytecode

Applying Anti-Reversing Techniques to Machine Code

Applying Anti-Reversing Techniques to Java Bytecode

Reengineering and Reuse of Legacy Software

Identifying, Monitoring, and Reporting Malware
51
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Java Bytecode



While experiments with decompiling machine code were
not successful, decompilation of Java bytecode to Java
source code yielded acceptable results.
Given these results, one does need to be concerned with
protecting Java bytecode from decompilation if there is
significant intellectual property in the program.
Obfuscating bytecode is inherently easier than obfuscating
source code because bytecode has a significantly more
strict and organized representation than source code.
52
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Java Bytecode (cont’d)



Variable, class, and method names, are all left intact when
compiling Java source code to Java bytecode. This is a
stark difference from machine code where variable and
local method names are not preserved.
A high-level of protection can be achieved for Java
bytecode by applying three transformations: Name
Obfuscation, String Encryption, and Control Flow
Obfuscation.
Zelix Klassmaster, a commercial product, is capable of all
performing all three. Unfortunately no open-source or
free tool exists that can perform all three.
53
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Java Bytecode (cont’d)




The trial version of Zelix Klassmaster is restricted to 30
days, and the company will only e-mail a trial version to
“non-free” e-mail addresses.
Not much is learned by having everything done for us, so
this module sees how far one can get with open-source
and free software.
ProGuard and RetroGuard are free Java bytecode
obfuscators capable of Name Obfuscation.
SandMark, a Java bytecode watermarking and obfuscation
tool from the University of Arizona, is capable of String
Encryption and some weak control flow obfuscations.
54
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Java Bytecode (cont’d)




A Java bytecode anti-reversing exercise was developed
against the Java version of Password Vault.
Since the learner will have already experienced manually
applying obfuscations in the Wintel machine code antireversing, this exercise focuses on the use of tools.
In the exercise, it is expected that the Java bytecode for
the Password Vault application will be incrementally
obfuscated using two or more tools.
For instructional purposes, an animated solution that
demonstrates obfuscating the Password Vault Java
bytecode to the point of inhibiting decompilation, was
developed using Qarbon Viewlet Builder.
55
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Java Bytecode (cont’d)
Figure 28. Animated solution to the Java bytecode anti-reversing exercise.
56
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Java Bytecode (cont’d)
Figure 29. Animated solution to the Java bytecode anti-reversing exercise.
57
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Java Bytecode (cont’d)
Figure 30. Animated solution to the Java bytecode anti-reversing exercise.
58
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Java Bytecode (cont’d)
Figure 31. Animated solution to the Java bytecode anti-reversing exercise.
59
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Java Bytecode (cont’d)
Figure 32. Animated solution to the Java bytecode anti-reversing exercise.
60
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Java Bytecode (cont’d)
Figure 33. Animated solution to the Java bytecode anti-reversing exercise.
61
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Java Bytecode (cont’d)
Figure 34. Animated solution to the Java bytecode anti-reversing exercise.
62
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Java Bytecode (cont’d)
Figure 35. Animated solution to the Java bytecode anti-reversing exercise.
63
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Java Bytecode (cont’d)
Figure 36. Animated solution to the Java bytecode anti-reversing exercise.
64
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Java Bytecode (cont’d)
Figure 37. Animated solution to the Java bytecode anti-reversing exercise.
65
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Java Bytecode (cont’d)
Figure 38. Animated solution to the Java bytecode anti-reversing exercise.
66
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Java Bytecode (cont’d)
Figure 39. Animated solution to the Java bytecode anti-reversing exercise.
67
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Java Bytecode (cont’d)
Figure 40. Animated solution to the Java bytecode anti-reversing exercise.
68
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Java Bytecode (cont’d)
Figure 41. Animated solution to the Java bytecode anti-reversing exercise.
69
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Java Bytecode (cont’d)
Figure 42. Animated solution to the Java bytecode anti-reversing exercise.
70
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Java Bytecode (cont’d)
Figure 43. Animated solution to the Java bytecode anti-reversing exercise.
71
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Java Bytecode (cont’d)
Figure 44. Animated solution to the Java bytecode anti-reversing exercise.
72
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Java Bytecode (cont’d)
Figure 45. Animated solution to the Java bytecode anti-reversing exercise.
73
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Java Bytecode (cont’d)
Figure 46. Animated solution to the Java bytecode anti-reversing exercise.
74
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Java Bytecode (cont’d)
Figure 47. Animated solution to the Java bytecode anti-reversing exercise.
75
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Java Bytecode (cont’d)
Figure 48. Animated solution to the Java bytecode anti-reversing exercise.
76
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Java Bytecode (cont’d)
Figure 49. Animated solution to the Java bytecode anti-reversing exercise.
77
http://www.reversingproject.info
Results (cont’d)
Applying Anti-Reversing Techniques to Java Bytecode (cont’d)
Figure 50. Animated solution to the Java bytecode anti-reversing exercise.
78
http://www.reversingproject.info
Results (cont’d)
Overview of Developed SRE Course Modules

Reversing and Patching Wintel Machine Code

Reversing and Patching Java Bytecode

Applying Anti-Reversing Techniques to Machine Code

Applying Anti-Reversing Techniques to Java Bytecode

Reengineering and Reuse of Legacy Software

Identifying, Monitoring, and Reporting Malware
79
http://www.reversingproject.info
Results (cont’d)
Reengineering and Reuse of Legacy Software



The question of whether to reengineer or reuse
components of a software system most often arises in the
context of large business or government organizations.
Over time the processes and procedures of a business or
organization will inevitably be reflected in the software
systems that enable efficient, day-to-day operations [5].
While reverse engineering of legacy software is inherently
intractable, some of us will inevitably find ourselves in a
situation where no other option is available because the
cost of rewriting a large, complex software system is
prohibitive [6].
80
http://www.reversingproject.info
Results (cont’d)
Reengineering and Reuse of Legacy Software (cont’d)

If good development practices were followed, legacy
software is typically composed of three layers [5]:
Figure 51. Layers of a well-structured legacy software application.
81
http://www.reversingproject.info
Results (cont’d)
Reengineering and Reuse of Legacy Software (cont’d)



Legacy applications that are not sufficiently
componentized, such that their general organization
resembles the three layers, are not good candidates for
reengineering and reuse.
The most widely accepted technique to reuse legacy
application components is that of Wrappering [5], where a
new piece of code provides an interface to a legacy
application component or layer without requiring code
changes to it.
Typically, candidate applications should be well-structured
such that the business logic can be isolated, encapsulated,
and made into reusable components.
82
http://www.reversingproject.info
Results (cont’d)
Reengineering and Reuse of Legacy Software (cont’d)



Unless enough of an application's source code remains
such that it's possible to identify the names of reusable
entry points (procedures) and their I/O data structures,
attempting to reuse the application may be difficult.
While it is possible to learn the names of entry points that
have been explicitly exported by an application in the case
of a DLL, the names don't indicate the layout of the
expected I/O data structures.
One way to discover the entry points and I/O data
structures in legacy machine code is to read the source
code of other applications which depend on it.
83
http://www.reversingproject.info
Results (cont’d)
Reengineering and Reuse of Legacy Software (cont’d)



The COBOL programming language is most often
associated with legacy software applications.
Normally, COBOL programs have a single entry point;
additional “alternate” entry points are rare.
Legacy COBOL programs often include functional
discriminators in their I/O data structures.
Figure 52. Mapping legacy functional discriminators to an object-oriented design. 84
http://www.reversingproject.info
Results (cont’d)
Reengineering and Reuse of Legacy Software (cont’d)



In a real-world situation, we would be looking to reuse
legacy components whose machine code is the result of
thousands of lines of high-level language statements
(COBOL) that implement a particular business process.
Since our focus is more on reuse and reengineering of
legacy code at a basic level, it's not necessary to
encumber ourselves with a very large program in order to
learn strategies for reuse and reengineering.
Included with this module is a small COBOL “calculator”
that we wish to make reusable from Java. This program is
assumed to be something from the business logic layer.
85
http://www.reversingproject.info
Results (cont’d)
Reengineering and Reuse of Legacy Software (cont’d)
01:
02:
03:
04:
05:
06:
07:
08:
09:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
******************************************************************
** Simple COBOL program that performs integer arithmetic
**
******************************************************************
IDENTIFICATION DIVISION.
PROGRAM-ID. 'SMPLCALC'.
DATA DIVISION.
WORKING-STORAGE SECTION.
77 MSG-NUMERIC-OVERFLOW PIC X(25)
VALUE 'Numeric overflow occurred'.
77 MSG-SUCCESSFUL PIC X(22)
VALUE 'Completed successfully'.
LINKAGE SECTION.
* Input/Output data structure
01 SMPLCALC-INTERFACE.
02 SI-OPERAND-1 PIC S9(9) COMP-5.
02 SI-OPERAND-2 PIC S9(9) COMP-5.
02 SI-OPERATION PIC X.
88 DO-ADD VALUE '+'.
88 DO-SUB VALUE '-'.
88 DO-MUL VALUE '*'.
02 SI-RESULT PIC S9(18) COMP-3.
02 SI-RESULT-MESSAGE PIC X(128).
PROCEDURE DIVISION USING
BY REFERENCE SMPLCALC-INTERFACE.
MAINLINE SECTION.
* Perform requested arithmetic
86
http://www.reversingproject.info
Results (cont’d)
Reengineering and Reuse of Legacy Software (cont’d)
27:
INITIALIZE SI-RESULT SI-RESULT-MESSAGE
28:
EVALUATE TRUE
29:
WHEN DO-ADD
30:
COMPUTE SI-RESULT = SI-OPERAND-1 + SI-OPERAND-2
31:
ON SIZE ERROR
32:
PERFORM HANDLE-SIZE-ERROR
33:
END-COMPUTE
34:
WHEN DO-SUB
35:
COMPUTE SI-RESULT = SI-OPERAND-1 - SI-OPERAND-2
36:
ON SIZE ERROR
37:
PERFORM HANDLE-SIZE-ERROR
38:
END-COMPUTE
39:
WHEN DO-MUL
40:
COMPUTE SI-RESULT = SI-OPERAND-1 * SI-OPERAND-2
41:
ON SIZE ERROR
42:
PERFORM HANDLE-SIZE-ERROR
43:
END-COMPUTE
44:
END-EVALUATE
45: * Successful return
46:
MOVE MSG-SUCCESSFUL TO SI-RESULT-MESSAGE
47:
MOVE 2 TO RETURN-CODE
48:
GOBACK
49:
.
87
http://www.reversingproject.info
Results (cont’d)
Reengineering and Reuse of Legacy Software (cont’d)


Many commercial tools support importing a COBOL data
structure and generating Java marshalling classes.
These marshalling classes are intended to be used with
the J2EE Connector Architecture (JCA) where a Java
application wrappers a legacy software application.
Figure 53. Example JCA implementation for accessing a legacy application.
88
http://www.reversingproject.info
Results (cont’d)
Reengineering and Reuse of Legacy Software (cont’d)



A popular alternative to using the JCA architecture to
reengineer and reuse legacy applications is to implement
a Service Oriented Architecture (SOA).
SOA components become capable of communicating
without the tight and fragile coupling of traditional binary
interfaces because they are wrappered with a platformneutral interface such as XML and Web services.
When XML is used as envisioned, all data, both of type
character and numeric are represented as printable text—
completely divorced from any platform specific
representation or encoding.
89
http://www.reversingproject.info
Results (cont’d)
Reengineering and Reuse of Legacy Software (cont’d)



The net effect of this is that two entities or programs can
interact without having to know the data structures that
comprise each other's binary interface.
Of course, the XML that is exchanged cannot be arbitrary,
so industry standards such as XML Schema (XSD), and
Web Services Definition Language (WSDL) fill this gap.
A Web service is considered to be WS-I compliant, or
generally interoperable, if it meets many criteria, one of
which is the use of XML for the input and output of each
operation exposed by service.
90
http://www.reversingproject.info
Results (cont’d)
Reengineering and Reuse of Legacy Software (cont’d)



This particular requirement of WS-I where XML is the
interoperable interface of choice, sets the stage for a
meaningful exercise.
A Legacy Software Reengineering and Reuse Exercise was
developed for this module where the focus is on
wrappering a COBOL program so that is reusable from
Java using XML in a local environment.
The learner is asked to create a language neutral XML
interface to the COBOL calculator program and invoke it
from a Java program, which incidentally makes it reusable
from other Java programs.
91
http://www.reversingproject.info
Results (cont’d)
Reengineering and Reuse of Legacy Software (cont’d)

Overview of the architecture for the exercise:
Figure 54. Architecture for legacy application reengineering and reuse from Java. 92
http://www.reversingproject.info
Results (cont’d)
Reengineering and Reuse of Legacy Software (cont’d)

Steps in the reengineering and reuse exercise:




Create an XML Schema which represents all of the
data in the SMPLCALC-INTERFACE COBOL data
structure.
Write a Java interface ISimpleCalculator.java for three
computation types supported by SMPLCALC.cbl.
Write a Java class JSimpleCalculator.java that
implements the interface defined in
ISimpleCalculator.java and provides a user interface.
Use the Java command-line utility xjc, in combination
with the XML Schema, generate Java to XML
marshalling code (JAXB).
93
http://www.reversingproject.info
Results (cont’d)
Reengineering and Reuse of Legacy Software (cont’d)

Steps in the reengineering and reuse exercise (cont’d):

Write a small C/C++ JNI program
Java2CblXmlBridge.cpp which exports a method
Java2SmplCalc that:


Invokes XML2CALC.cbl, passing the XML document
received from JSimpleCalculator.java.
Returns the XML generated by XML2CALC.cbl to
JSimpleCalculator.java.
94
http://www.reversingproject.info
Results (cont’d)
Reengineering and Reuse of Legacy Software (cont’d)

Steps in the reengineering and reuse exercise (cont’d):

Write a COBOL program XML2CALC.cbl:




Marshalls XML from Java2CblXmlBridge.cpp into
SMPLCALC-INTERFACE.
Invokes SMPLCALC.cbl, passing SMPLCALCINTERFACE by reference.
Marshalls SMPLCALC-INTERFACE back to XML
before returning to Java2CblXmlBridge.cpp.
Compile XML2CALC.cbl and link it with the object code
for SMPLCALC.cbl (SMPLCALC.obj).
95
http://www.reversingproject.info
Results (cont’d)
Reengineering and Reuse of Legacy Software (cont’d)

Steps in the reengineering and reuse exercise (cont’d):


Create a DLL to be loaded by JSimpleCalculator.java by
compiling and linking Java2CblXmlBridge.cpp with the
object code for XML2CALC.cbl.
Update JSimpleCalculator.java to use the JAXB
marshalling code to send/receive XML through the JNI
layer and display the results.
96
http://www.reversingproject.info
Results (cont’d)
Reengineering and Reuse of Legacy Software (cont’d)

Highlights of the solution code:

SimpleCalculator.xsd
<element name="SI-OPERAND-1">
<simpleType>
<restriction base="integer">
<totalDigits value="9" />
</restriction>
</simpleType>
</element>
. . .
<element name="SI-OPERATION">
<simpleType>
<restriction base="string">
<enumeration value="+" />
<enumeration value="-" />
<enumeration value="*" />
</restriction>
</simpleType>
</element>
97
http://www.reversingproject.info
Results (cont’d)
Reengineering and Reuse of Legacy Software (cont’d)

Highlights of the solution code (cont’d):

ISimpleCalculator.java
98
http://www.reversingproject.info
Results (cont’d)
Reengineering and Reuse of Legacy Software (cont’d)

Highlights of the solution code (cont’d):

JSimpleCalculator.java
99
http://www.reversingproject.info
Results (cont’d)
Reengineering and Reuse of Legacy Software (cont’d)

Highlights of the solution code (cont’d):

JSimpleCalculator.java (cont’d)
100
http://www.reversingproject.info
Results (cont’d)
Reengineering and Reuse of Legacy Software (cont’d)

Highlights of the solution code (cont’d):

Java2CblXmlBridge.c
101
http://www.reversingproject.info
Results (cont’d)
Reengineering and Reuse of Legacy Software (cont’d)

Highlights of the solution code (cont’d):

XML2CALC.cbl
102
http://www.reversingproject.info
Results (cont’d)
Reengineering and Reuse of Legacy Software (cont’d)

Sample run of solution code:
Figure 55. Reuse of COBOL from Java using JAXB, JNI, and COBOL XML Support.103
http://www.reversingproject.info
Results (cont’d)
Reengineering and Reuse of Legacy Software (cont’d)

Sample run of solution code:
Figure 56. Reuse of COBOL from Java using JAXB, JNI, and COBOL XML Support.104
http://www.reversingproject.info
Results (cont’d)
Overview of Developed SRE Course Modules

Reversing and Patching Wintel Machine Code

Reversing and Patching Java Bytecode

Applying Anti-Reversing Techniques to Machine Code

Applying Anti-Reversing Techniques to Java Bytecode

Reengineering and Reuse of Legacy Software

Identifying, Monitoring, and Reporting Malware
105
http://www.reversingproject.info
Results (cont’d)
Identifying, Monitoring, and Reporting Malware



Malware describes a category of software that does always
operate in a way that benefits the user.
Of course, those of us who have ever used software might
contend that this definition of malware will cause
programs that we use every day to be categorized as
malware.
So let's qualify it a bit: the malicious or annoying
behaviors of malware are intentional, not the result of one
or more bugs.
106
http://www.reversingproject.info
Results (cont’d)
Identifying, Monitoring, and Reporting Malware (cont’d)

There are currently five types of malware that affect
computer systems [6] [7]:





Viruses: require some deliberate action to help them
spread.
Worms: similar to a virus but can spread by itself over
computer networks.
Trojan Horses: functional software that performs
hidden malicious or annoying operations.
Backdoor: a vulnerability purposely embedded in
software.
Rabbit: a program that exhausts system resources.
107
http://www.reversingproject.info
Results (cont’d)
Identifying, Monitoring, and Reporting Malware (cont’d)



Malware usually isn't of just one type; for example, 3 of
the top 10 malicious codes families reported in 2008 were
Trojans with a backdoor component [8].
Using the machine code and bytecode reversing
experiences gained from the previous modules, one could
try reversing malware.
Using virtualization tools such as VMware to create
secondary operating system images on which to analyze
malware can still result in infection of the primary
operating system.
108
http://www.reversingproject.info
Results (cont’d)
Identifying, Monitoring, and Reporting Malware (cont’d)



The goal of this module is to help the learner become
familiar with using tools to identify, monitor, and report
software that might be malicious.
Since it's not practical to ask a learner to install a virus,
worm, backdoor, or rabbit, we are left with the possibility
of a benign software Trojan. (discussed later).
In 1996, Mark Russinovich founded a company called
“Winternals Software” where he was the chief software
architect on a comprehensive suite of tools for diagnosing,
debugging, and repairing Windows® systems and
applications [9].
109
http://www.reversingproject.info
Results (cont’d)
Identifying, Monitoring, and Reporting Malware (cont’d)



Mark's company has since been purchased by Microsoft
and his suite of tools have been rebranded “Windows
Sysinternals” and are offered for free on Microsoft
Technet.
Mark's story is an interesting one because he is
recognized as an expert on the internals of Windows®
even though he did not participate in its development—a
true testament to what can be learned about software
through reverse engineering.
The Sysinternals suite contains 66 different utilities, but
we'll focus on the most useful one in this context of
analyzing the behavior of malware: Process Monitor.
110
http://www.reversingproject.info
Results (cont’d)
Identifying, Monitoring, and Reporting Malware (cont’d)

The Process Monitor can capture detailed information
about any running process in a Windows® system
including: file system, registry, and network activity.
Figure 57. Process Monitor session for the Password Vault application.
111
http://www.reversingproject.info
Results (cont’d)
Identifying, Monitoring, and Reporting Malware (cont’d)



Of course, Process Monitor itself doesn't identify malware,
it simply reports what a process is doing.
With a little bit of ingenuity, one can identify Trojan
Horses by looking for activities that don't seem to fit with
the advertised functionality of a program.
It's common practice to download free software from the
Internet, and because we've been convinced that opensource software, which is sometimes confused with free
software, should have the fewest number of
vulnerabilities, we do it without much afterthought.
112
http://www.reversingproject.info
Results (cont’d)
Identifying, Monitoring, and Reporting Malware (cont’d)



Incidentally, the data on the number of vulnerabilities
found in popular Internet browsers does not support this
belief.
“Mozilla browsers were affected by 99 new vulnerabilities
in 2008, more than any other browser; there were 47 new
vulnerabilities identified in Internet Explorer, 40 in Apple
Safari, 35 in Opera™, and 11 in Google® Chrome” [8].
It seems counter-intuitive that an open-source browser
would have twice as many security holes than a closedsource browser like Internet Explorer.
113
http://www.reversingproject.info
Results (cont’d)
Identifying, Monitoring, and Reporting Malware (cont’d)




Becoming familiar with the Windows® Sysinternals suite
can help you evaluate whether the software on your
Windows® machine is acting in your best interest.
If you suspect a particular program to be malware, it can
be submitted online to a service called ThreatExpert.
ThreatExpert is a Web-based tool that supports
submission of software executables that are to be
evaluated against an on-line malware database.
Matching against existing malware is just one part of
ThreatExpert's automated engine; the service tries to
execute suspected malware in an isolated environment in
order to perform heuristic analysis of its actions.
114
http://www.reversingproject.info
Results (cont’d)
Identifying, Monitoring, and Reporting Malware (cont’d)
Figure 59. Example ThreatExpert report summary for submitted malware.
115
http://www.reversingproject.info
Results (cont’d)
Identifying, Monitoring, and Reporting Malware (cont’d)



A Malware Identification and Monitoring Exercise was
developed against a Java Alarm Clock application. This
program was written to be a benign software Trojan.
The exercise asks the learner to identify the behaviors of
the Alarm Clock application that make it a software Trojan
using the Windows Sysinternals tool suite.
The Alarm Clock application bytecode has been
aggressively obfuscated to discourage the use of
decompilation as a strategy for learning the program’s
behavior.
116
http://www.reversingproject.info
Results (cont’d)
Identifying, Monitoring, and Reporting Malware (cont’d)

The Alarm Clock application is a benign software Trojan
that, in addition to being a rudimentary alarm clock,
performs unadvertised functions on background threads:

Logs information from the Windows® registry

Logs locations of office documents in the file system.

Scans for computers that respond to an ICMP ping.

Paced background threads are used.
117
http://www.reversingproject.info
Results (cont’d)
Identifying, Monitoring, and Reporting Malware (cont’d)
Figure 60. Background threads log information about the user’s system.
118
http://www.reversingproject.info
Results (cont’d)
Identifying, Monitoring, and Reporting Malware (cont’d)
Figure 61. Process Monitor session for the Alarm Clock application.
119
http://www.reversingproject.info
Conclusions




Since programmers would benefit from reverse
engineering education, instructors need to be able to
teach it to them.
At the present time, computer science instructors will be
hard pressed to find materials for teaching a course that
are compatible with classroom delivery.
Several books exist on reverse engineering that cater to
industry professionals or those interested in self-study.
However, in a university setting, instructors engage
students in ordered learning through exercises, quizzes,
and exams.
120
http://www.reversingproject.info
Conclusions



Universities should continue to work toward establishing
standard content for software reverse engineering and
software maintenance courses.
Software Reverse Engineering is an activity that relies
heavily on tools. Better tools can only make this activity
more feasible and reliable.
The market for reverse engineering tools does not seem
saturated; there appear to be some opportunities for
either new open-source projects or commercial products.
121
http://www.reversingproject.info
Thank you!
122
http://www.reversingproject.info
References
[1] M. R. Ali, “Why teach reverse engineering?” ACM SIGSOFT SEN, v.30, n.4,
pp.1-4, Jul 2005.
[2] M. El-Ramly, “Experience in teaching a software reengineering course,” in
Proceedings of the 28th International Conference on Software Engineering
(ICSE). Shanghai, China, 2006, pp. 699-702.
[3] A. V. Deursen, J. Favre, R. Koschke, and J. Rilling, “Experiences in Teaching
Software Evolution and Program Comprehension,” in Proceedings of the
11th IEEE international Workshop on Program Comprehension, Washington,
DC, 2003, pp. 2834-284.
[4] B. W. Weide, W. D. Heym, J. E. Hollingsworth, “Reverse engineering of
legacy code exposed,” in Proceedings of the 17th international Conference
on Software Engineering, Seattle, Washington, WA, 1995, pp. 327-331.
[5] H. M. Sneed, “Encapsualtion of legacy software: A technique for reusing
legacy software components”, in Annals of Software Engineering, v.9, n.4,
pp.293-313, 2000.
123
http://www.reversingproject.info
References (cont’d)
[6] B. W. Weide, W. D. Heym, J. E. Hollingsworth, “Reverse engineering of
legacy code exposed,” in Proceedings of the 17th international Conference
on Software Engineering, Seattle, Washington, WA, 1995, pp. 327-331.
[7] E. Eliam, Secrets of Reverse Engineering, Indianapolis, IN: Wiley, 2005. M.
Stamp, Information Security: Principles and Practice, Hoboken, NJ: John
Wiley & Sons, 2006.
[8] Symantec Corp. (2009, Apr.). Symantec Global Internet Security Threat
Report. [Online]. Available:
http://eval.symantec.com/mktginfo/enterprise/white_papers/bwhitepaper_
internet_security_threat_report_xiv_04-2009.en-us.pdf. (Accessed April
26th, 2009).
[9] Microsoft Corporation, Windows Sysinternals: utilities to help manage,
troubleshoot and diagnose Windows systems and applications. [Online].
Available: http://technet.microsoft.com/en-us/sysinternals/default.aspx.
(Accessed April 30th, 2009).
124