Lecture 7 - The University of Texas at Dallas

Download Report

Transcript Lecture 7 - The University of Texas at Dallas

Detecting Malicious
Executables
Mr. Mehedy Masud (PhD Student)
Prof. Latifur Khan
Prof. Bhavani Thuraisingham
Department of Computer Science
The University of Texas at Dallas
Lecture#7
Introduction: Detecting Malicious
Executables
0 What are malicious executables?
- Harm computer systems
- Virus, Exploit, Denial of Service (DoS), Flooder, Sniffer,
Spoofer, Trojan etc.
- Exploits software vulnerability on a victim
- May remotely infect other victims
- Incurs great loss. Example: Code Red epidemic cost $2.6
Billion
Malicious code detection: Traditional approach
0
-
Signature based
Requires signatures to be generated by human experts
So, not effective against “zero day” attacks
State of the Art: Automated
Detection
Automated detection approaches:
O
Behavioural: analyse behaviours like source, destination
address, attachment type, statistical anomaly etc.
●
Content-based: analyse the content of the malicious executable
– Autograph (H. Ah-Kim – CMU): Based on automated
signature generation process
– N-gram analysis (Maloof, M.A. et .al.): Based on mining
features and using machine learning.
●
New Ideas
Content -based approaches consider only machinecodes (byte-codes).
✗
Is it possible to consider higher-level source codes
for malicious code detection?
✗
Yes: Diassemble the binary executable and retrieve
the assembly program
✗
Extract important features from the assembly
program
✗
Combine with machine-code features
✗
Feature Extraction
Binary n-gram features
✗
–
Sequence of n consecutive bytes of binary executable
Assembly n-gram features
✗
–
Sequence of n consecutive assembly instructions
System API call features
✗
–
DLL function call information
The Hybrid Feature Retrieval Model
●
Collect training samples of normal and malicious
executables.
●
Extract features
●
Train a Classifier and build a model
●
Test the model against test samples
Hybrid Feature Retrieval (HFR)
●
Training
Hybrid Feature Retrieval (HFR)
●
Testing
Feature Extraction
Binary n-gram features
–
Features are extracted from the byte codes in the form of
n-grams, where n = 2,4,6,8,10 and so on.
Example:
Given a 11-byte sequence:
0123456789abcdef012345,
The 2-grams (2-byte sequences) are: 0123, 2345, 4567,
6789, 89ab, abcd, cdef, ef01, 0123, 2345
The 4-grams (4-byte sequences) are: 01234567, 23456789,
456789ab,...,ef012345 and so on....
Problem:
–
Large dataset. Too many features (millions!).
Solution:
–
–
Use secondary memory, efficient data structures
Apply feature selection
Feature Extraction
Assembly n-gram features
–
Features are extracted from the assembly programs in
the form of n-grams, where n = 2,4,6,8,10 and so on.
Example:
three instructions
“push eax”; “mov eax, dword[0f34]” ; “add ecx, eax”;
2-grams
(1) “push eax”; “mov eax, dword[0f34]”;
(2) “mov eax, dword[0f34]”; “add ecx, eax”;
Problem:
–
Same problem as binary
Solution:
–
Same solution
Feature Selection
●
Select Best K features
●
Selection Criteria: Information Gain
●
Gain of an attribute A on a collection of
examples S is given by

| Sv |
Gain ( S, A)  Entropy ( S) 
Entropy ( Sv )
|
S
|
VValues ( A)
Experiments
0 Dataset
– Dataset1: 838 Malicious and 597 Benign executables
– Dataset2: 1082 Malicious and 1370 Benign executables
– Collected Malicious code from VX Heavens
(http://vx.netlux.org)
0 Disassembly
– Pedisassem (
http://www.geocities.com/~sangcho/index.html )
0 Training, Testing
– Support Vector Machine (SVM)
– C-Support Vector Classifiers with an RBF kernel
Results
●
●
●
HFS = Hybrid Feature Set
BFS = Binary Feature Set
AFS = Assembly Feature Set
Results
●
●
●
HFS = Hybrid Feature Set
BFS = Binary Feature Set
AFS = Assembly Feature Set
Results
●
●
●
HFS = Hybrid Feature Set
BFS = Binary Feature Set
AFS = Assembly Feature Set
Future Plans
●
●
System call:
– seems to be very useful.
– Need to Consider Frequency of call
– Call sequence pattern (following program path)
– Actions immediately preceding or after call
Detect Malicious code by program slicing
– requires analysis
Buffer Overflow Attack
Detection
Mohammad M. Masud,
Latifur Khan,
Bhavani Thuraisingham
Department of Computer Science
The University of Texas at Dallas
Introduction
●
Goal
–
–
●
Intrusion detection.
e.g.: worm attack, buffer overflow attack.
Main Contribution
–
–
'Worm' code detection by data mining coupled
with 'reverse engineering'.
Buffer overflow detection by combining data
mining with static analysis of assembly code.
Background
●
What is 'buffer overflow'?
–
●
A situation when a fixed sized buffer is overflown
by a larger sized input.
How does it happen?
–
example:
........
char buff[100];
gets(buff);
........
memory
Input
string
buff
Stack
Background (cont...)
●
Then what?
buff
memory
buff
........
char buff[100];
gets(buff);
........
Stack
Stack
Return address
overwritten
Attacker's code
memory
buff
Stack
New return address points
to this memory location
Background (cont...)
●
So what?
–
–
●
It can now
–
–
–
●
Program may crash
or
The attacker can execute his arbitrary code
Execute any system function
Communicate with some host and download
some 'worm' code and install it!
Open a backdoor to take full control of the victim
How to stop it?
Background (cont...)
●
Stopping buffer overflow
–
–
●
Preventive approaches
–
–
–
●
Preventive approaches
Detection approaches
Finding bugs in source code. Problem: can only
work when source code is available.
Compiler extension. Same problem.
OS/HW modification
Detection approaches
–
–
Capture code running symptoms. Problem: may
require long running time.
Automatically generating signatures of buffer
overflow attacks.
CodeBlocker (Our approach)
●
A detection approach
●
Based on the Observation:
–
●
Main Idea
–
●
Attack messages usually contain code while
normal messages contain data.
Check whether message contains code
Problem to solve:
–
Distinguishing code from data
Severity of the problem
●
It is not easy to detect actual instruction
sequence from a given string of bits
Our solution
●
●
●
●
●
Apply data mining.
Formulate the problem as a classification
problem (code, data)
Collect a set of training examples, containing
both instances
Train the data with a machine learning
algorithm, get the model
Test this model against a new message
CodeBlocker Model
Feature Extraction
Disassembly
●
We apply SigFree tool
–
implemented by Xinran Wang et al. (PennState)
Feature extraction
●
Features are extracted using
–
–
●
N-gram analysis
Control flow analysis
N-gram analysis
What is an n-gram?
-Sequence of n instructions
Traditional approach:
-Flow of control is ignored
2-grams are: 02, 24, 46,...,CE
Assembly program
Corresponding IFG
Feature extraction (cont...)
●
Control-flow Based N-gram analysis
What is an n-gram?
-Sequence of n instructions
Proposed Control-flow based
approach
-Flow of control is considered
2-grams are:
02, 24, 46,...,CE, E6
Assembly program
Corresponding IFG
Feature extraction (cont...)
●
Control Flow analysis. Generated features
–
–
–
●
Checking IMR
–
–
●
A memory is referenced using register
addressing and the register value is undefined
e.g.:
mov ax, [dx + 5]
Checking UR
–
●
Invalid Memory Reference (IMR)
Undefined Register (UR)
Invalid Jump Target (IJT)
Check if the register value is set properly
Checking IJT
–
Check whether jump target does not violate
instruction boundary
Feature extraction (cont...)
●
Why n-gram analysis?
–
●
Intuition: in general,
disassembled executables should have a
different pattern of instruction usage than
disassembled data.
Why control flow analysis?
–
Intuition: there should be no invalid memory
references or invalid jump targets.
Putting it together
●
Compute all possible n-grams
●
Select best k of them
●
●
Compute feature vector (binary vector) for
each training example
Supply these vectors to the training algorithm
Experiments
●
Dataset
–
–
–
●
Real traces of normal messages
Real attack messages
Polymorphic shellcodes
Training, Testing
–
Support Vector Machine (SVM)
Results
●
●
CFBn: Control-Flow Based n-gram feature
CFF: Control-flow feature
Novelty / contribution
●
●
●
We introduce the notion of control flow based
n-gram
We combine control flow analysis with data
mining to detect code / data
Significant improvement over other methods
(e.g. SigFree)
Advantages
●
1) Fast testing
●
2) Signature free operation
3) Low overhead
●
4) Robust against many obfuscations
Limitations
●
●
Need samples of attack and normal
messages.
May not be able to detect a completely new
type of attack.
Future Works
●
Find more features
●
Apply dynamic analysis techniques
●
Semantic analysis
Reference / suggested readings
–
X. Wang, C. Pan, P. Liu, and S. Zhu. Sigfree: A
signature free buffer overflow attack blocker. In
USENIX Security, July 2006.
–
Kolter, J. Z., and Maloof, M. A. Learning to detect
malicious executables in the wild Proceedings of
the tenth ACM SIGKDD international conference
on Knowledge discovery and data mining
Seattle, WA, USA Pages: 470 – 478, 2004.
Email Worm Detection
(behavioural approach)
Outgoing
Emails
The Model
Feature
extraction
Training data
Test data
Machine
Learning
Classifier
Clean or Infected ?
Feature Extraction
Per email features
= Binary valued Features
Presence of HTML; script tags/attributes; embedded images;
hyperlinks;
Presence of binary, text attachments; MIME types of file
attachments
= Continuous-valued Features
Number of attachments; Number of words/characters in the
subject and body
Per window features
= Number of emails sent; Number of unique email recipients; Number
of unique sender addresses; Average number of words/characters
per subject, body; average word length:; Variance in number of
words/characters per subject, body; Variance in word length
= Ratio of emails with attachments
Feature Reduction & Selection
Principal Component Analysis
= Reduce higher dimensional data into lower dimension
= Helps reducing noise, overfitting
Decesion Tree
= Used to Select Best features
Experiments
0 Data Set
- Contains instances for both normal and viral emails.
– Six worm types:
● bagle.f, bubbleboy, mydoom.m, mydoom.u, netsky.d,
sobig.f
- Collected from UC Berkeley
●
Training, Testing:
- Decision Tree: C4.5 algorithm (J48) on Weka
Systems
- Support Vector Machine (SVM) and Naïve Bayes
(NB).
Results
Conclusion & Future Work
●
Three approaches has been tested
–
–
–
Apply classifier directly
Apply dimension reduction (PCA) and then
classify
Apply feature selection (decision tree) and then
classify
●
Decision tree has the best performance
●
Future Plans
–
●
Combine content based with behavioral
approaches
Offensive Operations
–
Honeypots, Information operations