Trajectory Clustering: A Partition-and

Download Report

Transcript Trajectory Clustering: A Partition-and

Rishi : Identify Bot Contaminated Hosts
By IRC Nickname Evaluation
In Proceedings of USENIX Workshop on Hot Topics in Understanding
Botnets (HotBots), 2007
Reporter : Fong-Ruei , Li
2009/9/15
Machine Learning and Bioinformatics Lab
1
Outline





Introduction
Background
Communication Channel Detection
Results and Evaluation
Conclusion
2009/9/15
Machine Learning and Bioinformatics Lab
2
Introduction

Currently


stop a given botnet is to disable the
communication channel for the bots
However

2009/9/15
the hosts stay infected and are in most
cases still backdoored, allowing an
attacker to reclaim the machine at any
time.
Machine Learning and Bioinformatics Lab
3
Background

Internet Relay Chat(IRC)

Each of the different servers hosts a
number of different chat rooms


Every user connected to an IRC server
has its own unique username

2009/9/15
called channels
called nickname
Machine Learning and Bioinformatics Lab
4
Background

BotMaster


communicate with the botnet is to use
IRC
Bots


2009/9/15
join a specific channel on a public or
private IRC server
to receive further instructions
Machine Learning and Bioinformatics Lab
5
Communication Channel
Detection

All bots have one characteristic in
common:


they need a communication channel
Our approach focuses on


2009/9/15
detecting the communication channel
between the bot and the botnet controller
it is possible to detect a bot even before it
performs any malicious actions
Machine Learning and Bioinformatics Lab
6
Project Rishi

Every captured packet extracts :





2009/9/15
Time of suspicious connection
IP address and port of suspected source
host
IP address and port of destination IRC
server
Channels joined
Utilized nickname
Machine Learning and Bioinformatics Lab
7
Network setup of Rishi
2009/9/15
Machine Learning and Bioinformatics Lab
8
Basic Concept - Rishi
2009/9/15
Machine Learning and Bioinformatics Lab
9
Scoring Function

Checks for the occurrence of several
criteria :

suspicious substrings


special characters


the name of a bot (e.g., RBOT or l33t-)
like [ , ] , and |
long numbers.

nickname consists of many digits:

2009/9/15
1 point
for each two consecutive digits
Machine Learning and Bioinformatics Lab
10
Scoring Function

True signs for an infected host raise
the final score by more than one point



a match with one of the regular
expressions
a connection to a blacklisted server
the use of a blacklisted nickname
>1
points
2009/9/15
Machine Learning and Bioinformatics Lab
11
Regular Expression

Each nickname is tested against
several regular expressions


which match known bot names
For example the following expression:
\[[0-9]\|[0-9]{4,}


2009/9/15
like [0|1234]
like |1234
Machine Learning and Bioinformatics Lab
10
points
12
Whitelisting

The software utilizes :



hard coded whitelist
dynamic whitelist
Each nickname, which receives zero
points

2009/9/15
is added to the dynamic whitelist
Machine Learning and Bioinformatics Lab
13
Blacklisting

Two blacklists:

the first blacklist is hard coded


the second one is a dynamic list

2009/9/15
in the configuration file
with nicknames added to it automatically
according to the final score
Machine Learning and Bioinformatics Lab
14
Example

Imagine that the nickname


The next captured nickname


RBOT|DEU|XP-1234 was added to the
blacklist
RBOT|CHN|XP-5678
17
points
 for
1 point
each
due50%
to thecongruence
suspicious substrings
10 points
more
than
with a
RBOT,CHN,
and XP blacklist
name stored
on the dynamic


2009/9/15
7
points
1 points each due to the two occurrences of the
special character |
1 point each due to two occurrences of consecutive
Machine Learning and Bioinformatics Lab
15
digits
Example




1 point each due to the suspicious substrings
7
points
RBOT,CHN, and XP
1 points each due to the two occurrences of the
special character |
1 point each due to two occurrences of consecutive
digits
10 points for more than 50% congruence with a
name stored on the dynamic blacklist
17
points
2009/9/15
Machine Learning and Bioinformatics Lab
16
Results and Evaluation

RWTH Aachen university



2009/9/15
30,000 computer users to support
Rishi runs on a Quad-CPU Intel Xeon
3,2Ghz system with 3GB of memory
installed
we are monitoring a 10 GBit network
Machine Learning and Bioinformatics Lab
17
Results and Evaluation
2009/9/15
Machine Learning and Bioinformatics Lab
18
Results and Evaluation
2009/9/15
Machine Learning and Bioinformatics Lab
19
Results and Evaluation
2009/9/15
Machine Learning and Bioinformatics Lab
20
Conclusion

Based on characteristics of the
communication channel



2009/9/15
observe protocol messages
use n-gram analysis together with a
scoring function
black-/whitelists
Machine Learning and Bioinformatics Lab
21
Bot Nicknames
2009/9/15
Machine Learning and Bioinformatics Lab
22
The end
Thank you for listening
2009/9/15
Machine Learning and Bioinformatics Lab
23