Finding Diversity in Remote Code Injection Exploits

Transcript Finding Diversity in Remote Code Injection Exploits

Finding Diversity in
Remote Code Injection Exploits
University of California, San Diego
Justin Ma, Stefan Savage, Geoffrey M. Voelker
and
Microsoft Research
John Dunagan, Helen J. Wang
Presented by
Kenneth Poon Fai Yiu
2.4.2007
1
Outline
 Introduction / Objectives
 Benefits of Malware Family Tree
 A Remote Code Injection Attack
 Shellcode
 Methodology for Measuring Diversity
 Analysis of Exploit Diversity
 NIDS vs Polymorphism
 Factors Driving the Evolution
 Observations
2
Introduction
 Internet users are facing with increasing threats of online crimes
due to the presence of numerous malware running on the Internet
 Previous studies were focused on methods for defending against
such attacks
 Few researches have been done on the malware ecosystem, such as
• the relationship between different pieces of malware
• the factors that drive the structural and functional evolution of
malware
3
Objectives
 To develop a measurement methodology for identifying and
measuring the diversity among remote code injection exploits
 Use the measured data to
• understand the diversity of today’s malware, and
• construct a shellcode phylogeny (i.e. a malware family tree)
for selected vulnerabilities
4
Benefits of Malware Family Tree
 Simplify the categorization and analysis of malware
 Provide insight into the factors influencing malware development
and evolution
 Help in estimating the market-share and vigor of different cybercriminal organizations
5
Glossary
 Vulnerability
• A system bug or design flaw allowing an attacker to misuse an application (e.g.
executing commands on the system)
 Malware (“Malicious Software”)
• Software designed to infiltrate or damage a computer system, e.g. computer viruses,
worms, spyware and adware
 Exploit
• Software that attack a vulnerability of a system in order to gain control of it
• A “remote exploit” is an exploit that works over a network
• A kind of malware; use interchangeably with “malware” throughout the presentation
 Code injection
• A technique to add codes into a computer program to modify its functionality
 Shell code
• A piece of machine code used as the payload of an exploit
• May contain mechanism to avoid detection by detection by anti-intrusion system
 Phylogeny
• A biological term - the study of evolutionary relationship among organisms
• The classification of exploits according to their relationship in the evolutionary history
6
(Source: Wikipedia)
A Remote Code Injection Attack
Malware


Windows XP
Third,
thethere
malware
attacks the
computer
by
injecting
exploit
code
Second,
is a acomputer
with
Internet
connection
installed
with this(e.g.
First, there
exists
software
(e.g.
MS Window
XP) with
vulnerability
(shellcode,
data
and
random
character
fillers) to the malware
vulnerability
software
without
applying
any
patch
a stack based
buffer
overflow)
and
a corresponding
targeted for
such vulnerability
7
A Remote Code Injection Attack
Host Memory
Exploit Packets
Vulnerable
Buffer
Return Address
Fourth, the codes overwrite the data in the buffer beyond the boundaries
Sixth,
the exploit
shellcode
may
(1)
download
additional
tothe
theshellcode
computer,
Fifth,
the
gains
control
of the location
computer
andsoftware
executes
and changes
the contents
of memory
adjacent
to the
buffer
which
(2)
a centralized
or (3)
reconfigure
operating
system to
mayjoin
be used
by other“botnet”
buffers and
variables.
If thethe
buffer
is a stack-based
evade
buffer,detection
the return address of the calling function can also be changed (e.g.
to the address of the shellcode)
8
Shellcode
 Small, simple, hand-coded machine programs
 Initial payload of an exploit that first executes on a newly
compromised machine
 Polymorphism (variation in the style of construction)
 May be encrypted and only decrypted just before execution
 XOR encoding is a commonly used encoding scheme
 May contain anti-debugging code (including self-modifying code)
to complicate disassembly and analysis of the shellcode
9
Methodology for Measuring Diversity
Exploit collection
(To collect exploit samples)

Extracting shellcodes
(To extract shellcodes from the collected exploit samples)

Exploit emulation
(To run the extracted shellcodes to retrieve the instruction code bytes )

Clustering
(To group the instructions code bytes into families)
10
Methodology for Measuring Diversity
“Exploit Collection”
 Examine network traces of traffic using a fully-patched
Windows XP computer connected to a residential DSL network
 Capture exploit attempts from the DSL network to 4 wellknown vulnerabilities for 2 days starting from 6/9/2006 5:00 pm
 The 4 vulnerabilities are
• SQL Name Resolution (Slammer)
• LSASS (Sasser)
• MS RPC IsystemActivator (Blaster)
• MS RPC RemoteActivation (Blaster)
11
Methodology for Measuring Diversity
“Extracting Shellcodes”
 Extract shellcodes directly from the collected network trace using
“Shield”
 “Shield”
• A tool originally designed for filtering exploits for known
vulnerabilities
• But modified to collect data that is beyond the buffer boundary
12
Methodology for Measuring Diversity
“Exploit Emulation”
 Most shellcodes are encrypted; decoding is needed to reveal the actual
executable code
 The solution is restricted binary emulation, i.e. allowing the exploit
decoding routines to execute in order to reveal the actual instruction
codes
 Implement the emulator on a Linux platform
 Load an encoded shellcode, declare it as a statically allocated buffer,
treat the buffer as a function and allow it to run
 Overcome the issue with non-executable prefixes by iteratively retrying
failed emulations at subsequent offsets
 Mark the executed instruction bytes for later analysis
 Emulation stops when the control flow makes an absolute jump to a
location outside the buffer
13
Methodology for Measuring Diversity
“Clustering” (1)
 A datamining technique for grouping objects with similar
characteristics
 Perform clustering on the shellcode instruction bytes using exedit
distance - a metric for measuring the similarity between 2 sets of
shellcode instruction bytes generated by binary emulation
 Construct a dendrogram to visualize the clustering results
 Evaluate the resulting clusters manually to confirm the constructed
family tree is a sensible representation of the phylogeny of the
exploit families
14
Methodology for Measuring Diversity
“Clustering” (2)
 Exedit Distance
 Relative edit distance over the shellcode instruction bytes, which
is the number of edit operations (insertion, deletion, substitution)
used to transform one string to another
 For each sample,
• Mark the executed instruction bytes
• Concatenate the marked bytes in the order they appear in the
payload (i.e., memory order) to construct a string
representation
• Compress each consecutive run of the NOP (No operation)
instructions into one single NOP instruction
 Compute the relative edit distance over all exploits using these
strings
15
Analysis of Exploit Diversity
“SQL Name Resolution” (1)
 Malware: Slammer worm
• First noticed on 25.1.2003; infected 75,000 computers in 10
minutes
• Exploited two buffer overflow bugs in Microsoft's SQL Server
and Desktop Engine database products
• By generating random IP addresses and send itself out to those
addresses
• Dramatically slowed down general Internet traffic
• Patch was available six months before the worm’s first launch
(Source: Wikipedia)
16
Analysis of Exploit Diversity
“SQL Name Resolution” (2)
 767 exploit samples were collected
 2 apparent variations of Slammer were detected
 766 exploits with the exact same payload and 1 outlier
 The outlier was identical to all the other payloads except for the
last 91 bytes; evidence shows that the payload was likely
corrupted on the network before being captured in the trace
 By discarding the outlier sample, there was only 1 Slammer
exploit in the DSL trace; so, no exploit diversity
17
Analysis of Exploit Diversity
“LSASS” (1)
 Malware: Sasser worm
• First noticed on 30.4.2004; disrupted operations for airlines, banks,
and government offices globally
• Exploited a buffer vulnerability in LSASS (Local Security Authority
Subsystem Service) of MS Windows 2000 and XP
• By scanning different ranges of IP addresses and connects to victims’
computers primarily through TCP port 139 or 445
• Patch was available in 4.2004, prior to the release of the worm
• Written by a 18 years old CS student in Germany; arrested and
received a 21 month suspended sentence
(Source: Wikipedia)
18
Analysis of Exploit Diversity
“LSASS” (2)
Histogram of shellcode instance
 1769 exploit samples were collected
 56 distinct payload were identified
19
Analysis of Exploit Diversity
“LSASS” (3)
Dendrogram
 Each x-axis position represents a unique shellcode
 The y-axis shows relative edit distance
 A horizontal line segment at y-axis value ‘y’ indicates that two subclusters had cluster distance ‘y’ when they were merged into one
cluster
20
Analysis of Exploit Diversity
“LSASS” (4)
Dendrogram
Evolution diagram
 Most cluster merges occurred at a small exedit distance of 10%; use 10% as
threshold for defining families among the exploits
 5 families of shellcodes can be identified
 Manual examination of the shellcodes concluded that the identified
families were indeed 5 separate code bases
 LSASS-2, 3 and 4 had sufficient similarity to conclude that they were
evolved of the same code base
21
Analysis of Exploit Diversity
“LSASS” (5)
Dendrogram
Evolution diagram
 Shellcodes within each family exhibits small amount of variation, which
corresponds to phone-home/connect-back IP addresses encoded in the payload for
the victims to connect to a specified host for downloading additional codes or files
• Connect-back refers to connecting to the victim’s immediate parent in the
infection chain
• Phone-home refers to connecting to a central location
22
Analysis of Exploit Diversity
“ISystemActivator” (1)
 Malware: Blaster worm
• First noticed on 11.8.2003; infected hundreds of thousands of
computers within the first 24 hours, and several millions more in the
following few months
• Exploited a buffer overflow in the RPC service of Windows 2000 and
XP
• By creating a DDoS attack against MS’s “windowsupdate.com”
• The worm contains a hidden string, which reads “billy gates why do
you make this possible? Stop making money and fix your software!!”
• Patche was available 1 month earlier than the release of the worm
• Written by an 18 years old US resident; arrested and sentenced to an
18-month prison term
(Source: Wikipedia)
23
Analysis of Exploit Diversity
“ISystemActivator” (2)
Histogram of
shellcode instance
 1561 exploit samples were collected
 90 distinct payload were identified
 10 variations responsible for most of the observed exploits
while 80 distinct shellcodes appearing only once
24
Analysis of Exploit Diversity
“ISystemActivator” (3)
Dendrogram
 Most cluster merges happened below a distance of 10%, use this distance
value as the threshold to define families among the exploits
 6 families of shellcodes can be identified
 The low initial threshold distance of 10% and the large gap between cluster
merges at distance of 85% indicate that exploits within a family are similar,
but vary substantially between families
25
Analysis of Exploit Diversity
“ISystemActivator” (4)
Dendrogram
Evolution diagram
 Manual examination of the shellcodes confirmed that the clusters
reflected 6 different code bases
 Slight differences among exploits within each family due to variations in
data constants
 Relatively low 10% exedit distance between ISys-2 and ISys-3 implied a
close relationship
26
Analysis of Exploit Diversity
“ISystemActivator” (5)
Dendrogram
Evolution diagram
 Only difference was that ISys-3 contained a connect, but ISys-2 contained
a bind, listen, and accept; believe that these two families were derived
from the same code base except that
• ISys-3 required the newly-infected host to connect back to the
infecting host, while
• ISys-2 required the newly-infected host to bind on a socket and wait
for a connection attempt from the infecting host
27
Analysis of Exploit Diversity
“RemoteActivation” (1)
Histogram of
shellcode instance
 Malware: Blaster worm
 RemoteActivation was the original MS RPC vulnerability that Blaster
and its variants exploited before also targeting ISystemActivator
 338 distinct exploit payloads were identified; each exploit attempt used a
unique payload
28
Analysis of Exploit Diversity
“RemoteActivation” (2)
Dendrogram
 Exedit distance among the shellcodes was very small; most cluster
merges occur below a distance of 1%
 Use this value as threshold results in 2 distinct families; the 1.3%
interfamily exedit distance indicates that the families are closely
related
29
Analysis of Exploit Diversity
“RemoteActivation” (3)
Dendrogram
Evolution diagram
 Manual examination of the shellcodes reveals that the last third of the payload contained
randomly generated characters which accounted for the variation within each family
 Two very similar but functionally different types of RemoteActivation exploits in the
trace; 10% belonged to Remact-0, the bind version, while the other 90% belonged to
family Remact-1, the connect-back version
 All payloads shared the same prefix which resembles part of the Metasploit Framework
but cannot be proofed (Metasploit is a toolkit for generating exploits, and includes
options for generating encoded shellcodes and random filler characters)
30
NIDS vs Polymorphism
 To what extent exploit polymorphism will limit the effectiveness of
Network Intrusion Detection Systems (NIDS)?
 Tried to generate the signatures required to exhaustively cover all
exploits observed for each vulnerability in the DSL residential trace
 For each individual vulnerability except LSASS, one signature sufficed
to cover the set of exploits; the size of each signature is 100 bytes
 Tested the signatures against a 5-GB trace of network traffic and none
of the signatures yielded false positives
 The results indicate that polymorphism was not effective for evading
detection
31
Factors Driving the Evolution
 Having reviewed the relationship between different pieces of malware,
but what are the factors that drive the structural and functional
evolution of malware?
 Two hypotheses are:
• The malware authors wish to use polymorphism to prevent the
malware from being caught by NIDS signatures (perhaps they do
not realize that their polymorphism was ineffective against evasion),
or
• Today’s polymorphism is unrelated to evading NIDS signatures; the
variation in shellcodes was due to functional variation (e.g., the
bind and connect-back varieties)
32
Observations (1)
 About 4,500 samples of exploits were collected in a DSL connection in 2days time; it indicates that once a computer is connected to the Internet, it
is exposed to huge amount of malware attacks (an attack every 40 seconds)
 For all the Microsoft vulnerabilities studied in the paper, Micorsoft had in
fact released the relevant patches before the exploit attacks were first
launched
 Users should be able protect their machines from such attacks if patches for
the vulnerabilities are applied promptly
 The public announcement of patch releases by Microsoft advertises the
existence of vulnerability to the malware authors, who can perform reverse
engineering on the patch to discover the vulnerability and write the
malware
33
Observations (2)
 Identification of exploit families based on cluster merges threshold seems
arbitrary; choosing a different threshold value will result in different
number of families and their compositions
 Though the exploit families can be verified by manual examination of the
shellcodes, such methodology may not be appropriate if the samples
involved are in the magnitude of millions – not scalable
 Simple relationships are built for some shellcode instances; the
relationships of the other shellcode instances remain unknown – complete
family tree (phylogeny) cannot be built
 Unlike the relationships of organisms; correctness of the constructed
shellcode phylogeny is difficult to prove
 Recommend to repeat the research using other datamining techniques and
distance metrics to see their effects on the resulting exploit families
34
~ End ~
35

Finding Diversity in Remote Code Injection Exploits

Transcript Finding Diversity in Remote Code Injection Exploits

Directory