IMDS: Intelligent Malware Detection System

Download Report

Transcript IMDS: Intelligent Malware Detection System

IMDS: Intelligent Malware Detection System
Yanfang Ye
Dingding Wang
Tao Li
Dongyi Ye
Motivation
Watch Out!
Virus!
Threat to the security of computer systems
Signature based anti-virus systems fail to
detect polymorphic or new malware
Some data mining techniques have shown
promising results on small collection of
malicious executables
Polymorphic or New
X
Signature based detection
Our goal:
Develop more effective and efficient
data mining solutions to large collection
of malicious executables
OOA mining based classification
Data Collection and Preprocessing
 PE viruses are in the majority of viruses rising in recent years
 17366 malicious executables provided by Anti-virus Laboratory of
KingSoft Corporation
 12214 benign executables gathered from Windows system files
 Develop a PE parser to construct API execution sequences
System Architecture
OOA_Fast_FP_Growth
algorithm
Association rule based
classification
Objective Oriented Association Mining
 OOA Mining -- model association patterns relating to a user’s objective
e.g. Obj1 = (Group = Malicious)
Algorithms: OOA_Apriori, OOA_FP-Growth [1]
 OOA_Fast_FP-Growth algorithm[4] -- A modification of OOA_FP-Growth[2,3]
 Paths are directed, thus, fewer pointers are needed and less memory space is required
 Each node is the sequence number of an item, which is determined by the support count
of the item
 Example
(Kernel32.dll, OpenProcess;CopyFileA;CloseHandle;GetVersionExA;GetModuleFileNameA;WriteFile)
Obj = (Group = Malicious) (os = 0.29, oc = 0.99)
 Associative Classification
CBA[5] -- build on rules with high support and confidence
Experimental results (1)
• Efficiency
Running time of different OOA mining algorithms
Efficiency of different scanners
(sample: 3393 malicious / 2217 benign)
(sample: 500 malicious / 1500 benign)
N: Norton
AntiVirus
M:McAfee
D:Dr.Web
K:Kaspersky
SAVE [6]:
Static
Analyzer of
Vicious Executables
False positives of different scanners
(1000 benign files)
Experimental results (2)
• Detection Ability
Polymorphic malware detection
Unknown malware detection
Experimental results (3)
• Detection accuracy with different data mining solutions
Results by using different classifiers. TP, TN, FP,
FN, DR, and ACY refer to True Positive, True Negative, False
Positive, False Negative, Detection Rate, and Accuracy,
respectively
Conclusion
 Summary
 IMDS is an integrated system for malware detection, which consists of PE parser, OOA rule generator
and rule based classifier
 It is the first try to apply associative mining to detect malicious code among large scale of executables
 The effectiveness and efficiency of IMDS outperform many widely-used anti-virus software and other
data mining based malware detection methods
 Future Work
 Conduct further study to take sequence into consideration
Selected References
•
•
•
•
•
•
•
•
[1] Y.Shen, Q.Yang, and Z.Zhang. Objective-oriented utility-based association mining. In Proceedings of
ICDM’02.
[2] J. Han and M. Kamber. Data mining: Concepts and techniques, 2 nd edition. Morgan Kaufmann, 2006.
[3] J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In Proceedings of
SIGMOD, pages 1.12, May 2000.
[4] M. Fan and C. Li. Mining frequent patterns in an FP-tree without conditional FP-tree generation.
Journal of Computer Research and Development, 40:1216.1222, 2003.
[5] B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. In Proceedings of
KDD’98.
[6] A. Sung, J. Xu, P. Chavez, and S. Mukkamala. Static analyzer of vicious executables (SAVE). In
Proceedings of the 20th Annual Computer Security Applications Conference, 2004.
[7] J. Xu, A. Sung, P. Chavez, and S. Mukkamala. Polymorphic malicious executable scanner by API
sequence analysis. In Proceedings of the International Conference on Hybrid Intelligent Systems, 2004.
[8] J. Wang, P. Deng, Y. Fan, L. Jaw, and Y. Liu. Virus detection using data mining techniques. In
Proceedings of IEEE International Conference on Data Mining, 2003.