Eureka: A Framework for Enabling Static Malware Analysis
Download
Report
Transcript Eureka: A Framework for Enabling Static Malware Analysis
Eureka: A Framework for
Enabling Static Malware Analysis
the 13th European Symposium on Research in Computer Security
(ESORICS) conference 2008
WANG Zhi
Outline
1
Overview of Generic Unpacker
2
System Call Level Heuristic
3
Statistics-Based Unpacking
4
Evaluation Metrics
Overview of Unpacker
Static analyses: decompile and analyze
the logical structure, flow, and data
stored within the binary itself.
Dynamic analyses: monitor the
behavior of the malware binary at
runtime.
Fine-grained monitor (Instruction-level)
Coarse-grained monitor (page-level)
Generic Automatic Unpackers
PolyUnpack
Renovo
Instruction-level Instruction-level
Model-base
trigger
slow
OmniUnpack
Page-level
Eureka
System call level
Heuristic trigger Heuristic trigger Heuristic and
Statistical trigger
slow
fast
fast
The variability in unpacking strategies come from the
granularity of tracking unpacking behavior.
Eureka
Eureka
Coarse-grained
execution
tracing
Statistical
bigram analysis
bigram.
NtTerminateProcess
NtCreateProcess
Coarse-grained Execution Tracing
Eureka uses the event of program exit
as a trigger.
NtTerminateProcess implies that the
unpacked malicious payload has been
successfully decrypted.
A large fraction of current malware use a new
process (NtCreateProcess) to execute the
unpacked malicious payload.
Problems
Not all malware exit and keep an
executing version resident in memory
Packers can make spurious event of creating
new process.
Malware authors can simply avoid exiting the
malware process.
The above two simple heuristics may work for
a large fraction of malware today( as much as
80%), it may not be the same for future
malware.
Evaluation
Statistical bigram analysis
Mining statistical patterns in x86 code
Use simple n-gram analysis
Use the IDA Pro to extract regions from
executable that were marked as functions.
Looking for the most common bigrams
( opcode pairs or 2-byte opcodes) and space
bigrams( byte pairs separated by 1 or more
bytes)
Found FF 15(call) , FF 75(push), E8---00 and
E8---FF are prevalent in x86 code.
Occurrence summary of bigrams
calc
explorer
notepad
ping
shutdown
FF 15(call)
246
3045
415
58
132
FF 75(push)
235
2494
245
41
85
E8---FF(call)
1583
2201
180
87
49
E8---00(call)
746
1091
108
57
66
Bigram Counts
Bigram counts during execution of goat file
packed with Aspack
Bigram Counts
Bigram counts during execution of goat file
packed with Molbox
Bigram Counts
Bigram counts during execution of goat file
packed with Armadillo
Bigram Counts
There are consistent and significant
shifts in the bigram counts.
The simple bigram counting approach
had over a 95% success rate in
distinguishing between packed and
unpacked malware instance.
Evaluation Metrics
Code-to-data ratio
An observable difference between packed
code and unpacked code is the amount of
identifiable code and data found in the binary
Use IDA Pro to identify valid code sequences.
In IDA Pro, data are represented by db, dw or
dd.
In packed executables, the ratio is below 3%.
In unpacked executables, the ratio is above
50%.
Code-to-data ratio
Packed
Unpacked
Code-to-data ratio
Grey area stand for data
Blue area stand for code
Original notepad.exe memory space
Packed notepad.exe memory space