Automatically Generating Models for Botnet Detection
Download
Report
Transcript Automatically Generating Models for Botnet Detection
Automatically Generating
Models for Botnet Detection
Presenter: 葉倚任
Authors: Peter Wurzinger, Leyla Bilge, Thorsten Holz,
Jan Goebel, Christopher Kruegel, Engin Kirda
European Symposium on Research in Computer Security
(ESORICS'09)
Outline
Introduction
System Overview
Model Generation Data
Generating Detection Models
Evaluation
Conclusion
Introduction
Two main kinds of network-based detection
system
Vertical correlation technique
Detection of individual bots
Checking traffic patterns, content of C&C traffic,
and bot related activities.
Require prior knowledge of C&C channels and
propagation vectors of bot
Horizontal correlation technique
Detection of a group of bots
Based on network traffic
Require that at two bots in the monitor networks
Introduction (cont’d)
Characteristic behavior of a bot
Receive commands from botmater
Carry out some actions in response to these
commands
This paper proposed a two-stage detection
model to leverage these two characteristics
In the experiments, the authors generated 18
different bot families.
16 controlled via IRC,
One via HTTP (Kraken)
One via a peer-to-peer network (Storm Worm).
System Overview
Input of the system
A collection of bot binaries
Launch a bot in a controlled environment and
record its network activities (traces)
Identify the commands that this bot receives as
well as its corresponding responses
Translate observations into detection models
Output of the system
Detection models for different bot families
Detecting Procedure
Stateful model (two-stage detection)
1. Checking if a bot command is sent
2. If yes in stage 1, checking if the responses
is above a threshold or not
(e.g., the number of new connections opened by a host)
Use content-based specifications to model
commands
(comparable to intrusion detection signatures)
Use network-based specifications to model
responses
(comparable to anomaly detection)
Model Generation Data
Run each bot binary for a period of
several days
Locating bot responses
Finding commands
Extracting model generation data
Locating bot responses
Assumption: bot responses that lead to a
change in network behavior
Partition network traffic into consecutive time
intervals of equal length
For each time interval, define 8 normalized
features (called traffic profile):
Locating bot responses (cont’d)
Convert the traffic profiles (vectors) into time
series data d(t) as follows:
where ε is the sliding window size
Locate bot responses by using CUSUM algorithm
ε = 5 and an interval of 50 seconds delivered the
best results in the tests
Finding bot commands
After locating bot responses, a small section of
network traffic (snippet) is extracted for each
response
Cluster those traffic snippets that lead to
similar responses
Extracting model generation data
Extract two pieces of information the
subsequent model generation step
A snippet
Contains 90 seconds of traffic
Plus last 30 seconds of the previous one and first
10 seconds of the following one A snippet
Average of the traffic profile vectors
This period is the time from the start of the
current response to the next change in behavior
Generating Detection Models
Command model generation
Response model generation
Command model generation
The goal is to identify common elements in a
particular behavior cluster
First, apply a second clustering refinement step
that groups similar network packet payloads
within each behavior cluster
The longest common subsequence algorithm is
applied to each set of similar payloads
Generate one token sequence per set
Response model generation
Compute the element-wise average of the
individual behavior profiles for a behavior
cluster
Give minimal bounds for certain network
features
1,000 for UDP packets
100 for HTTP packets
10 for SMTP packets
20 for different IPs
A detection model is not generated if a
response profile exceeds none of these
thresholds
Evaluation
Collected a set of 416 different (based on MD5
hash) bot samples
From Anubis
The collection period was more than 8 months
Each bot produce a traffic trace with a length of five
days
Divided into families of bots
16 different IRC bot families (with 356 traffic traces)
One HTTP bot family (with 60 traffic traces)
One p2p bot family (Storm Worm, with 30 traffic
traces)
Detection Capability
Split our set of 446 network traces into training
sets and test sets
Each training set contained 25% of one bot
family's traces
This procedure was performed four times per
family (four-fold cross validation)
Real-World Deployment
Deployed a sensor
In front of the residential homes of RWTH Aachen
University
At a Greek university network
The total traffic is in the order of 94 billion
network packets over a period of over three
months at two different sites in Europe
Real-World Deployment
In the Greek network, most cases were false
positives.
BotHunter w/o Blacklist means BotHunter
without blacklists of known DNS names and IP
addresses
The detection rate of BotHunter w/o Blacklist in
the detection capacity experiment drops to
39%
Conclusion
This paper proposed a two-stage detection
method which included a command model and
a response model
Automatically derives signatures for the bot
commands and network-level specifications for
the bot responses
Can generate models for IRC bots, HTTP bots,
and even P2P bots such as Storm