Eureka: A Framework for Enabling Static Malware Analysis

Download Report

Transcript Eureka: A Framework for Enabling Static Malware Analysis

Eureka: A Framework for
Enabling Static Malware Analysis
the 13th European Symposium on Research in Computer Security
(ESORICS) conference 2008
WANG Zhi
Outline
1
Overview of Generic Unpacker
2
System Call Level Heuristic
3
Statistics-Based Unpacking
4
Evaluation Metrics
Overview of Unpacker
Static analyses: decompile and analyze
the logical structure, flow, and data
stored within the binary itself.
Dynamic analyses: monitor the
behavior of the malware binary at
runtime.
 Fine-grained monitor (Instruction-level)
 Coarse-grained monitor (page-level)
Generic Automatic Unpackers
PolyUnpack
Renovo
Instruction-level Instruction-level
Model-base
trigger
slow
OmniUnpack
Page-level
Eureka
System call level
Heuristic trigger Heuristic trigger Heuristic and
Statistical trigger
slow
fast
fast
The variability in unpacking strategies come from the
granularity of tracking unpacking behavior.
Eureka
Eureka
Coarse-grained
execution
tracing
Statistical
bigram analysis
bigram.
NtTerminateProcess
NtCreateProcess
Coarse-grained Execution Tracing
Eureka uses the event of program exit
as a trigger.
 NtTerminateProcess implies that the
unpacked malicious payload has been
successfully decrypted.
 A large fraction of current malware use a new
process (NtCreateProcess) to execute the
unpacked malicious payload.
Problems
Not all malware exit and keep an
executing version resident in memory
 Packers can make spurious event of creating
new process.
 Malware authors can simply avoid exiting the
malware process.
 The above two simple heuristics may work for
a large fraction of malware today( as much as
80%), it may not be the same for future
malware.
Evaluation
Statistical bigram analysis
Mining statistical patterns in x86 code
 Use simple n-gram analysis
 Use the IDA Pro to extract regions from
executable that were marked as functions.
 Looking for the most common bigrams
( opcode pairs or 2-byte opcodes) and space
bigrams( byte pairs separated by 1 or more
bytes)
 Found FF 15(call) , FF 75(push), E8---00 and
E8---FF are prevalent in x86 code.
Occurrence summary of bigrams
calc
explorer
notepad
ping
shutdown
FF 15(call)
246
3045
415
58
132
FF 75(push)
235
2494
245
41
85
E8---FF(call)
1583
2201
180
87
49
E8---00(call)
746
1091
108
57
66
Bigram Counts
Bigram counts during execution of goat file
packed with Aspack
Bigram Counts
Bigram counts during execution of goat file
packed with Molbox
Bigram Counts
Bigram counts during execution of goat file
packed with Armadillo
Bigram Counts
There are consistent and significant
shifts in the bigram counts.
The simple bigram counting approach
had over a 95% success rate in
distinguishing between packed and
unpacked malware instance.
Evaluation Metrics
Code-to-data ratio
 An observable difference between packed
code and unpacked code is the amount of
identifiable code and data found in the binary
 Use IDA Pro to identify valid code sequences.
 In IDA Pro, data are represented by db, dw or
dd.
 In packed executables, the ratio is below 3%.
 In unpacked executables, the ratio is above
50%.
Code-to-data ratio
Packed
Unpacked
Code-to-data ratio
Grey area stand for data
Blue area stand for code
Original notepad.exe memory space
Packed notepad.exe memory space