DroidScope: Seamlessly Reconstructing the OS and Dalvik
Download
Report
Transcript DroidScope: Seamlessly Reconstructing the OS and Dalvik
1
DroidScope: Seamlessly Reconstructing
the OS and Dalvik Semantic Views for
Dynamic Android Malware Analysis
Lok Kwong Yan, and Heng Yin
Syracuse University
Air Force Research Laboratory
USENIX 2012
Presentation: 2012-09-11 曾毓傑
2
Outline
• Introduction
• Background
• Architecture
• Interface & Plugins
• Evaluation
• Discussion & Conclusion
3
INTRODUCTION
4
Introduction
• Malicious applications exist in official and unofficial
marketplace with a rate of 0.02% and 0.2% respectively
• Virtualization-based analysis approach
• Analysis runs underneath the entire virtual machine
• Difficult for an attack within VM to disrupt the analysis
• Loss the semantic contextual information when the analysis
component is moved out of the box
• We need to intercept certain kernel events and parse
kernel data structure to reconstruct the semantic
knowledge
5
DroidScope
• Reconstruct two levels of semantic knowledge
• OS-level: to understand the activities of the malware process and
its native components
• Java-level: comprehend the behaviors in the Java components
• Built on top of QEMU emulator
• Build tools for analysis
• Native instruction tracer
• Dalvik instruction tracer
• API tracer
• Taint tracker
6
BACKGROUND
7
Android System Overview
Android System
Parent process
for all Android
processes
libdvm.so provide
Java-level abstraction
Kernel data structure
8
DroidScope Overview
9
ARCHITECTURE
10
Architecture
• Integrating the changes into the QEMU emulator
• Came from Android SDK
• Leave Android system unchanged
• For different virtual devices can be loaded
• Reconstruct OS-level and Java-level views
• Monitors how malware’s Java components communicate with
Android Java Framework
• Monitors how malware’s native components interact with the Linux
Kernel
• Monitors how malware’s Java components and native components
communicate through the JNI interface
11
Reconstructing OS-level View
• Basic Instrumentation
• Insert extra instructions during the code translation phase for
system status
Target Instructions
Add additional code for detection
Tiny Code Generator(TCG)
Native Instructions
12
Reconstructing OS-level View (Cont.)
• For example, context switch in ARM architecture would
change the c2_base0 and c2_base1 registers, which stores
the page table address
• Extract semantic knowledge
• System calls
• Running processes, threads
• Memory maps
13
Reconstructing OS-level View (Cont.)
• System calls
• ARM architecture use service zero instruction svc #0 as making
system calls, and system call number is in register R7
• Processes and Threads
• Read task_struct structure for process information
• pid, tgid, pgd, uid, gid, euid, egid, comm, cmdline, thread_info
• sys_fork, sys_execve, sys_clone, and sys_prctl system calls
trigger the information update
• Memory maps
• mm_struct
• sys_mmap2 triggers the information update
14
Reconstructing Java-level View
• Dalvik Instructions
• Knowing which instruction is executing right now
• Register R15 points to the currently executing Dalvik instruction
15
Reconstructing Java-level View (Cont.)
• Just-In-Time Compiler
• Some hot, heavily used instructions are compiled into native
machine code
• Those code execution would skip the mterp component
Call dvmGetCodeAddr() for
address of compiled code
Flush JIT cache, return NULL
and reset counter to disable
JIT function
16
Reconstructing Java-level View (Cont.)
• Dalvik Virtual Machine States
• Record Register R4 to R8 for storing DVM states
R4: Program Counter
R5: Stack Frame Pointer
R6: InterpState Structure
R7: Instruction Counter
R8: mterp Base Address
17
Reconstructing Java-level View (Cont.)
• Java Objects
• Obtaining data inside Java objects such as string data
18
Symbol Information
• Native library symbols
• Use objdump to retrieve symbol information
• Some malwares often stripped of all symbol information
• Dalvik or Java symbols
• Use dexdump to retrieve symbol information
• Data structures of DVM also contains some symbol information
• InterpState Structure (Register R6) has a method field points to
the Method structure for the currently executing method
• Method structure has a name field points to method name
19
INTERFACE & PLUGINS
20
Interface & Plugins
• APIs for analysis customization
• The instrumentation logic in DroidScope is complex and dynamic
• An event based interface to facilitate custom analysis tool
developement
21
Sample Plugin
• Setup which program to be analyzed and print all Dalvik
opcode information
22
API Implementation
• API tracer
• Instrument the invoke* and execute* Dalvik bytecodes to identify
and log method invocations
• Native instruction tracer
• Gather each instruction including the raw instruction, its operands,
and their values
• Dalvik instruction tracer
• Decode instructions into dexdump format, including values and all
available symbol information
• Taint Tracker
• Monitor sensitive information and keep track data propagation
23
EVALUATION
24
Evaluation
• Benchmark checking efficiency and capability
• 7 benchmark apps
• AnTuTu Benchmark
• AnTuTu CaffeineMark
• CaffeineMark
• CF-Bench
• Mobile Processor Benchmark
• Benchmark by Softweg
• Linpack
25
Evaluation
• Performance
• Capability
• Analysis of DroidKongFu
• Analysis of DroidDream
26
DISCUSSION &
CONCLUSION
27
Discussion
• Limited Code Coverage
• One drawback of dynamic analysis
• By manipulating the return value of function call, we may increase
the code coverage
• Other Dalvik Analysis Tools
• Dalvik/Java Static Analysis: Woodpecker, DroidMoss
• Native Static Analysis: IDA, binutils, BAP
• Android Dynamic Analysis: TaintDroid, DroidRanger
• Linux Kernel Dynamic Analysis: logcat, adb
28
Conclusion
• We presented DroidScope, a fine grained dynamic binary
instrumentation tool for Android that rebuilds two levels of
semantic information