A Multifaceted Approach to Understanding the Botnet Phenomenon
Download
Report
Transcript A Multifaceted Approach to Understanding the Botnet Phenomenon
Moheeb Abu Rajab, Jay Zarfoss, Fabian Monrose, Andreas Terzis
Computer Science Department
Johns Hopkins University
A MULTIFACETED APPROACH TO
UNDERSTANDING THE BOTNET
PHENOMENON (2006)
Jonathan Brant
CAP 6135 – Spring 2010
Overview
Introduction
Background
Measurement Methodology
Results and Analysis
Malware Collection
Graybox testing
Longitudinal Tracking of Botnets
Botnet Prevalence
Spreading Methods
Growth Patterns
Botnet Structures
Effective Botnet Size
Lifetime
“Insider’s view”
Conclusion
Introduction
Botnets – “networks of infected end-hosts that are
under the control of a human operator”
Bots
– end-hosts
Botmaster – human operator
Command and Control channels facilitate botmaster
commands to bots in the botnet
Channels
can use different communication mechanisms
(e.g. P2P)
Most
modern botnets use Internet Relay Chat (IRC)
Originally used to form large chat rooms
Introduction
Botnets almost always used for illegal activities
Extortion
E-mail
spamming
Identity theft
Software piracy
Introduction
Paper attempts to address inquiries such as:
Number
of botnet “species”
Behavioral
Evolution
categorization of different species
of a botnet
Background
Step 1 – Botnets
commandeer victims via
remotely exploiting
vulnerability of software
running on victim
Infection
strategies include:
Self-replicating
worms
E-mail
viruses
Social engineering
Convincing victims to run
malicious code on their machine
Background
Step 2 – Victim executes
shellcode and image of bot
binary is fetched from
location within botnet
When
fetch is complete, the
binary installs itself on
target machine and
automatically starts on each
reboot
Background
Step 3 – Bot attempts to
contact IRC server (address
stored in executable)
Using
a DNS name instead
of IP address allows
botmaster to retain control
if IP is blacklisted by ISP
Background
Step 4 – Bot attempts to
establish IRC session and join
C2 channel
Three authentication steps:
Bot authenticates itself using
PASS message
Bot issues C2 channel password
This is the IRC session password
This password and session
password are in bot binary
Botmaster authenticates to bot
population
This prevents other botmasters
from seizing control of botnet
Background
Step 5 – Channel topic is
parsed and executed
Contains default command
that every bot executes
Future commands coming
from botmaster can vary
widely
Wide variety of available
commands/responses
increases difficulty of
classifying botnet behaviors
Measurement Methodology
Data collection includes three phases:
Malware
collection
Binary analysis via gray-box testing
Tracking of IRC botnets through IRC and DNS trackers
Measurement| Malware Collection
Goal is to collect as many bot binaries as possible
Must support a wide array of data collection endpoints and
be highly scalable
Distributed darknet
Locally deployed
darknet
Allocated but
unused portion of
IP address space
14 distributed
nodes using
PlanetLab testbed
Measurement| Malware Collection
Modified nepenthes platform
Mimics replies generated by
vulnerable services
Raw packets from PlanetLab
nodes translated
Collects first-stage exploit
(shell-code)
Using translation module written
in Click
Packets were injected into
local tunneling interface
Measurement | Malware Collection
On-line download
modules in nepenthes
disabled to prevent
excessive downloads
Binaries retrieved by
generating list of URL
targets and sending to
download station
Download station
filtered entries in list and
extracted unique
sources/URLs
Measurement | Malware Collection
Honeynet catches exploits
missed by nepenthes
Composed of honeypots
running unpatched, virtual
instances of Windows XP
Each honeypot assigned
private static IP on
separate VLAN
Infected honeypots sustain
IRC connections until VM’s
reimaged
Suspect binaries retrieved
by comparing VM contents
to clean Windows image
Measurement | Malware Collection
Gateway routes
darknet traffic to
various parts on
internal network
Half
of darknet
prefixes directed to
local responder and
other half to honeynet
NAT
used to map each
honeypot to 128
darknet IP addresses
Measurement | Malware Collection
Serves as firewall
preventing honeypots
from conducting
outbound attacks or
infecting each other
Cross-infection prevented
by:
Placing each honeypot
on separate VLAN and
terminating cross-VLAN
traffic
Terminating cross-VLAN
traffic
Outbound traffic block on
popular vulnerable ports
135, 139, 445, etc.
Measurement | Malware Collection
Runs IRC detection
module
Application-level traffic
searched for common IRC
protocol strings
NICK, JOIN, USER
Once IRC connection
witnessed, detection
module establishes record
for IRC session
When honeypot attempts
to reconnect, connection
allowed to proceed to IRC
server
Measurement | Malware Collection
Detection
module only
allows one honeypot to
connect to an IRC
server at given point in
time
Gateway
detects when
honeypot is infected
Rules inserted to block
inbound attacks to that
honeypot
Measurement | Malware Collection
Gateway
also
performs miscellaneous
tasks
Triggering
honeypot re-
imaging
Loading clean
Windows images
Pre-filtering for
download station
Running local DNS
server to resolve DNS
queries from honeypots
Measurement | Graybox Testing
Graybox testing used to extract features of
suspicious binaries
Analysis spans two distinct phases (performed on
isolated network segment)
First
phase derives network fingerprint of binary
Second phase extracts binaries IRC-specific features
Measurement | Graybox Testing
Phase 1: Creation of a network fingerprint
Server acts as network sink
All network activity initiated by malware will be detected
Traffic logs automatically processed to extract network
fingerprint
f net DNS , IPs, Ports, scan
DNS – target of DNS requests
IPs – destination IP addresses
Ports – contacted ports and protocols
Scan – whether or not default scanning behavior was detected
Default scanning behavior – any attempt to contact more than 20
distinct destinations on the same port during the monitored period
Measurement | Graybox Testing
Phase 2: Extraction of IRC-related features
Modified version of UnrealIRC daemon instantiated on
network sink
IRC listens on all ports ever observed in network fingerprint
Upon detecting an IRC connection, IRC-fingerprint is created
firc PASS, NICK ,USER, MODE , JOIN
PASS – initial password to establish IRC session
NICK – nickname
USER – username
MODE – modes set
JOIN – IRC channels to be automatically joined (and their
associated passwords)
Measurement | Graybox Testing
(Phase 2 continued…)
To
learn botnet “dialect”, bot connects to local IRC
server and enters default channel
IRC
query engine plays role of botmaster
Bot behavior is learned by subjecting it to series of
commands
Command set includes:
IRC commands observed in honeynet traces
Commands extracted from publicly available bot source
code
Measurement | Longitudinal Tracking
Botnet tracking is performed by two means:
The
use of a custom, lightweight IRC tracker
Probing DNS caches across the globe
Measurement | Longitudinal Tracking
IRC Tracker
“A
modified IRC client that can join a specified IRC
channel and automatically answer directed queries
based on the template created by the graybox testing
technique”
IRC tracker instantiates new IRC session to IRC server
using fingerprint and template
IRC
trackers need to appear responsive
Measurement | Longitudinal Tracking
In
order to appear “real”, the following must be
performed:
Traffic
filtered so inappropriate information is not included
in template
Filtering performed automatically while bot is executing
Computer
specifications (e.g. memory, disk space) are
changed to resemble specifications of a real machine
IRC query engine issues a set of commands that require
stateful responses
Emulates a bot’s stateful software
Measurement | Longitudinal Tracking
DNS Tracking
Most
bots issue DNS queries to resolve IP addresses of
IRC servers
Caches of DNS servers are probed to determine
number of DNS servers giving cache hits
“Cache
hit” implies at least one client queried DNS server
during lifetime of its DNS entry
Measurement | Longitudinal Tracking
Original list contained 1.6 million DNS servers
First filter removed top level domains
Second filter checked consistency of replies
.gov, .mil, etc.
Two consecutive DNS queries
First query was recursive and forced DNS server to completely
resolve query
Second query was not recursive and obtained local answers
from server cache
TTL field in second response should be smaller than first
After filtering, master list consisted of 800,000 name servers
For a given IRC server, the caches of all DNS servers were
probed and any associated cache hits recorded
Results and Analysis
Results include:
Traffic
3
IRC
3
traces captured on local darknet
month period
logs gathered
month period
DNS
45
cache hit results from tracking 65 IRC servers
day period
Results| Botnet Prevalence
Botnet Traffic share
Two week snapshot of total incoming SYN packets to local
darknet vs. packets originating from botnet spreaders
A botnet spreader is any source that delivered a bot executable
27% of incoming SYNs
attributed to botnet
spreaders
76%
come from botnet
spreaders if target
ports considered
Results| Botnet Prevalence
More than 90% of all traffic during peaks targeted ports
used by botnet spreaders
More than 70% of sources during peak periods sent shell
exploits
This suggests the
total amount of
botnet-related traffic
is far greater than
27%
Results| Botnet Prevalence
11% (85,000) of probed servers were involved in
at least one botnet activity
55% of servers in
dataset are for .com
domains
82% of DNS cache hits
from name servers in
that domain
29% of .com servers
had at least 1 cache hit
.cn servers only 0.2% of
total servers
95% of them exhibited
botnet activity
Results|Spreading Methods
Botnets use a variety of means to spread and recruit
new victims
Email
Web
Active scanning (most prevalent)
Botnets can be grouped into two types:
Worm-like
Continuosly scan ports following target selection algorithm
Variable scanning behavior
Uses a number of scanning algorithms
Uniform, non-uniform, localized
Results|Spreading Methods
192 botnets captured
34
botnets were Type-I
Upon
infection, bot starts scanning IP space for new victims
Initiates connection to IRC servers (identified by hard-coded
list of DNS names)
All IRC servers/channels bots tried to join were unreachable
Channel was banned by public IRC server
DNS name did not resolve to valid IP address
Still, botnet grew over time due to persistence of scanning
Results|Spreading Methods
Type-II botnets were the most prevalent class
Scanning triggered by a command
More difficult to track due to continuosly changing behavior
Localized and targeted scanning are were most prevalent techniques
Localized scanning focused on Class B address space
Targeted scanning focused on Class A address space
Results|Growth Patterns
In order to examine botnet growth patterns, two
approaches were taken:
Cumulative
number of unique DNS cache hits for distinct
botnets over time was plotted
Growth pattern was compared to behavior learned
from IRC tracker
Results|Growth Patterns
Botnets with semi-exponential growth patterns exhibit
persistent random scanning activity (unchanging over time)
Example: for one botnet, topic of the corresponding channel was
set to randomly scan port 445 indefinitely for one month
Related to worm infections
Results|Growth Patterns
Also representative of botnets with intermittent activity profiles
Example: Botnet III corresponds to botnet that infected honeypots on
3/13/2006
IRC server went down between 4/12/2006 – 4/30/2006
When IRC server became available, growth slope increased and honeypots
were re-infected by the same botnet
Results|Growth Patterns
Predominantly used time-scoped scanning
commands
As
opposed to continuous scanning like the previous two
Results|Growth Patterns
Botnet evolution estimated by counting unique
sources for message broadcast to the channel
Only
plotted botnets of comparable size on a given
plot
Trends confirm heterogeneity in botnets
Results | Botnet Structures
60% of 318 collected malicious binaries were IRC bots
Four predominant IRC structures were revealed
All bots connected to a single IRC server
IRC servers can be connected to form an IRC network supporting large
numbers of users
30% of botnets bridged on multiple servers
50% bridged between two servers only
Seemingly unrelated botnets appear more similar when comparing their
naming conventions, channel names, and operators’ user IDs
Prevalent among smaller classes of botnets (few hundred users)
70% of observed botnets fell into this category
These botnets may seem to belong to the wrong botmaster
Selected group of bots commanded to download an updated binary
Results in bots being moved to a different IRC server
Results | Effective Botnet Size
Botnet footprint can become fairly large (> 15,000
bots)
Predominant
structures were botnets managed by a
single or few servers
Distinction drawn between
Botnet’s
footprint
Number of bots connected to IRC channel at a given
time
Effective
Size
Results | Effective Botnet Size
Some “chatty” IRC servers broadcast join/leave information for members
on channel
Maximum size of online
population is significantly
smaller than botnet’s footprint
Number of online bots versus time for these IRC servers is plotted in figure 9
Footprint greater than
10,000
No more than 3,000 bots
online at the same time
Effective size has little impact
on long term activity,
however, it affects number of
bots available to execute
commands in a timely manner
Results | Lifetime
Discrepancy between footprint and effective size
likely due to the long lifetime of a typical botnet
Bot
death rates and high churn rates can affect botnet’s
effective size
Results | Lifetime
High churn rates
Bots do not stay long on IRC channel
Average stay time: 25 minutes
90% stay less than 50 minutes
Likely causes include
Client instability (as
a result of infection)
Machine hibernation
Botmasters
commanding bots to
leave the channel
Results | Botnet Software Taxonomy
183 of 192 confirmed IRC-based bot executables responded to
probes of IRC query engine
49% of bots run AV/FW killer – a utility that disables anti-virus and
firewall processes
43% run identd server which performs user identification
40% run system security monitor which tightens bot security
Ensures only intended bots join a given IRC channel
E.g. disables DCOM service and file sharing
38% run a registry monitor which alerts the bot of any attempts to
disable it
Results | Botnet Software Taxonomy
Number of exploits within bot binaries varied from
3 to 29
Average
of 15 exploits per binary
Most popular exploits (appeared in over 75% of
binaries)
DCOM135
LSASS445
NTPASS
Results | Botnet Software Taxonomy
Authors evaluated effectiveness of ClamAV and
Norton anti-virus on 192 malicious binaries
ClamAV
classified 137 binaries as malicious
Norton anti-virus classified 179 binaries as malicious
Windows XP service pack 2 still not immune
Results | “Insider’s view”
Traces show that:
Botmasters
share information concerning what prefixes
should not be scanned
Bots are tweaked to minimize chatter on C2 channel
Bots are probed to detect and isolate “misbehavers”
Also
look for “super-bots” with high bandwidth network links
and large storage capacities
Results | “Insider’s view”
Bots migrate from one IRC channel to another, instructed by:
Command from botmaster
Download of replacement software that points to a different C2
server
Results | “Insider’s view”
Control commands include channel joins and leaves
Mining category includes commands that collect
machine specifications
Attack category includes commands from
botmasters to attack other network computers
Results | “Insider’s view”
Small botnets receive larger portion of control and mining
commands
Hands-on botmasters that devote large amounts of time to
manually control their botnet
Medium and large
botnets have a larger
percentage of cloning
and download
commands
Cloning could include
the use of one botnet to
attack another botnet
by overloading its IRC
server with join requests
Conclusion
Botnets are a major contributor to overall unwanted internet
traffic
Most botnet traffic can be attributed to scans used to recruit new
bots
IRC is still the dominant protocol used for C2 communications
Effective sizes of botnets can range from a few hundred to
a few thousand
Botnet footprints are usually much larger than effective size
This is due to high churn rate within a botnet
Bot’s average channel occupancy is less than half an hour
Graybox testing revealed sophistication of modern bot software
E.g. Self-protection measures
Contributions
Established empirical measurements for botnet
prevalence
Particularly in considering DNS cache hits by IRC botnets
that were tracked
Classified typicality's of bot binaries
Registry monitoring tactics
Locking down host vulnerabilities
Classified most prevalent botnet activities as a function
of botnet size
Delineated between botnet footprint and “effective
size.”
Large experiment samples further solidified results
Critique
Focused mainly on Windows-based systems
It would be interesting to see the effectiveness of noted
infection strategies on Unix systems
Only evaluated two anti-virus applications
Perhaps include other popular anti-virus applications
McAfee, Symantec Corporate, AVG, etc.
Authors noted 60% of binaries collected were IRC bots
Did the other 40% use a different communication
mechanism?
If so, it would be interesting to know how they were structured and
if the authors evaluated them in any way
References
[1] Rajab, M.A., Zarfoss, J., Monrose, F., & Terzis A. (2006). A
multifaceted approach to understanding the botnet
phenomenon. Proceedings of the 6th ACM SIGCOMM
conference on Internet measurement, Rio de Janeriro, Brazil