What is Botnet?

Download Report

Transcript What is Botnet?

Botnet Detection –
Distinguishing Between
Bots and Human Activities
2010/11/26
What is Botnet?



Bots: compromised hosts, “Zombies”
Botnets: networks of bots that are under the control of a
human operator (botmaster)
(generally looks like) Worm + C&C channel
 Command and Control Channel
 Disseminate the botmasters’ commands to their bot armies
Communication (IRC, HTTP, … (can be encrypted))
Worm
2010/11/26
Attack
(DoS, spamming, phishing site, …)
Propagation
(vulnerabilities, file sharing, P2P, …)
Speaker: Li-Ming Chen
2
Lifecycle of a Typical Botnet Infection
Uses of Botnets:
• Phishing attacks
• Spam
• ID/information theft
• DDoS
• Distributing other malwares
2010/11/26
Speaker: Li-Ming Chen
3
Why is Botnet so Daunting?
Underground Economics!
Multilayered/Multifunction
C&C Architecture
Botnet structures
change (e.g., P2P)
Always behind
the mirror
Fast-flux
Secure Comm.!
(hide C&C servers
or other bots behind
an ever-changing
network)
Multi-vector exploitation
+ Social Engineering Tech.
2010/11/26
Speaker: Li-Ming Chen
4
Botnet Detection

Using honeypots or infiltration techniques


To understand the basic behavior of botnets
Passive anomaly analysis


Detect malicious activities
Detect C&C traffic



Traffic signature or statistical features
Response crowd phenomena
Graph-based analysis

Detect botnet’s centralized/P2P structures
2010/11/26
Speaker: Li-Ming Chen
5
Fundamental Problems

How to detect new appeared botnets?



Botnet structures are moved from centralized to
decentralized
Botnet may use own developed C&C protocol
How to identify applications for network traffic?

Investigating a huge number of unknown traffic is inevitable
in botnet detection
2010/11/26
Speaker: Li-Ming Chen
6
Automatic Discovery of Botnet
Communities of Large-Scale
Communication Networks
Wei Lu, Mahbod Tavallaee, and Ali
A. Ghorbani
(Univ. of New Brunswick, Canada)
ASIACCS 2009
Two-leveled Botnet Detection

Propose a hierarchical
framework for automatic botnets
discovery

Higher level: unknown network
traffic  different network
application communities

Lower level: for each application
communities, differentiate
malicious botnet behavior and
normal application traffic
2010/11/26
Speaker: Li-Ming Chen
8
Traffic Classification

Current techniques:


Use transport layer port number, payload signature,
statistical signature, machine learning & clustering
Proposed approach:

(Hybrid) combine (1) payload signatures with (2) a cross
association clustering algorithm
2010/11/26
Speaker: Li-Ming Chen
9
Traffic Classification –
(Step 1) Using Payload Signature

Setup 470 application signatures that are composed by 10
fields

Apply to one day trace of Fred-eZone WiFi network
Require further analysis
2010/11/26
Speaker: Li-Ming Chen
10
Traffic Classification –
(Step 2) Identify Unknown Traffic App.
Both unknown and
known flows
(cross-association)
• After classification, we need
to label each application
community
 Assign label to unknown
flows based on prob. of known
flows in a community (?)
SrcIP
SrcIP
clustering
DstIP
DstIP
one cluster
DstIP
DstIP
clustering
DstPort
DstPort
(to obtain the exact applications underlying
a general application category)
2010/11/26
Speaker: Li-Ming Chen
11
Botnet Detection

Measure 1-gram byte distribution with time bins
(N time bins)
1
256
1
256
:
:
1
F1
F2
:
:
256
F1
FN
find
closest
pair
F1
F1
F5
F5
F5
:
F2
F2
F2
:
F8
FN
:
FN
F8
:
FN
Calculate σ1 and σ2
Botnet cluster has smaller σ
Until there are 2 clusters
2010/11/26
Speaker: Li-Ming Chen
12
Performance Evaluation
• Accuracy > 85%
• Accuracy (for inserted C&C
flows) ~ 100%
TP
2010/11/26
Speaker: Li-Ming Chen
FP
13
My Comments

Proposed approach supports
automatic traffic classification

However, only analyze botnet
in IRC and HTTPWeb
communities

Classification rules are
unclear…

Botnet classification only
focuses on “byte distribution” of
payloads
2010/11/26
Speaker: Li-Ming Chen
14
Are Your Hosts Trading or Plotting?
Telling P2P File-Sharing and Bots Apart
Ting-Fang Yen, Michael K. Reiter
(CMU, UNC)
ICDCS 2010
(International Conference on Distributed Computing
Systems)
Problem and Motivation

P2P bots are more and more popular


Problem:


 botnet C&C traffic will tend to blend into a background of
P2P file sharing
Differentiate bots (plotter) from other P2P hosts (trader)
P2P bots characteristics:




Volume (not for file sharing)
Persistence (maintain connectivity)
Peer churn (less churn in peer membership)
Human-driven vs. Machine-driven (botnet traffic is more
regular and periodic)
2010/11/26
Speaker: Li-Ming Chen
16
Dataset

CMU dataset (basis)

Trader dataset


Known P2P traffic (Gnutella, eMule, BitTorrent) in CMU
dataset
Plotter dataset



Collected form honeypots (Storm & Nugache bots)
Ignore spamming and scanning activities; preserve botnet
control traffic
Insert into CMU dataset for evaluation
2010/11/26
Speaker: Li-Ming Chen
17
Approach
(Volume Test)
New IP
Connected (%)
CDF
Avg. # bytes sent per flow
2010/11/26
(Peer Churn Test)
Speaker: Li-Ming Chen
Hour Index
18
Approach (cont’d)
(Human-driven)
Second
(Machine-driven)
Flow Index
Observe interstitial time distribution
of flows to the same destination IP
for each host
2010/11/26
Speaker: Li-Ming Chen
19
Performance Evaluation

Initial data reduction:


filter out hosts (and its flows) that have relatively low failed
connection rates!! (neither a Trader nor a Plotter)
Identifying Plotters:
TP degrade
reduce FP rate
2010/11/26
Speaker: Li-Ming Chen
20
My Comments

Authors develop a series of tests for separating
plotters and traders



Focus on flow characteristics (instead of packet-level
information)
Evaluate the effectiveness of the three tests and their
combination
Comparing to BotGrep,


BotGrep only detect a P2P communication structure in the
network
This work can distinguish P2P bots and normal P2P users
2010/11/26
Speaker: Li-Ming Chen
21
Other Bots

Chat Bots



Input Data Modification Attacks


e.g., (good) help operate chat rooms, entertain chat users
e.g., (bad) distribute chat spam, “spim”, malware
e.g., online game cheating, click fraud, auctions,…
 Problem:

Is a human in control, or is it a bot (computer)?
2010/11/26
Speaker: Li-Ming Chen
22
Measurement and Classification of
Human and Bots in Internet Chat
Steven Gianvecchio, Mengjun Xie,
Zhenyu Wu, and Haining Wang
(The College of William and Mary)
USENIX Security Symp. 2008
Detecting Chat Bots

Observation:


 Motivation:


Perform a series of measurements on Yahoo! chat to study
the behaviors of chat bots and human
Human behavior is more complex than bot behavior
Propose a classification system to accurately
distinguish char bots from human users

(1) an entropy classifier


Based on message time and size
(2) a machine-learning classifier

2010/11/26
Based on message content
Speaker: Li-Ming Chen
24
Measurement & Pre-labeling



Input: public message posted to Yahoo! char rooms
Time: Aug.~Nov., 2007.
Pre-labeling:



The examiner observes a long conversation between a test
subject and one or more third parties, and then decides if
the subject is a human or a chat bot
Criteria: lack of intelligent response, repetition of similar
phases, presence of spam or malware URLs
Labeling results:

Human, bots {periodic, random, responder, or replay bots},
ambiguous.
2010/11/26
Speaker: Li-Ming Chen
25
0.06
Probability
10–1
Probability
(Human)
350
104
Inter-message Delay (sec.)
(Periodic Bots)
(Random Bots)
0.08
0
104
2010/11/26
104
250
250
(Replay Bots)
0.08
104
0.08
0
(Responder Bots)
0.1
Message Size (byte)
0.14
180
Speaker: Li-Ming Chen
0.06
120
26300
Approach
• Motivation: human behavior is more complex
• Based on message time and size
• Define cutoff score (entropy, entropy rate)
• If the test score > cutoff score, classify as human
• Based on content of chat message to identify chat bots
• Using Bayesian classify to decide P(bot | M)
• M is a feature vector <f1, f2, …, fn>
2010/11/26
Speaker: Li-Ming Chen
27
Performance Evaluation
EN: entropy test
CCN: correlated
conditional entropy test
more difficult to detect
2010/11/26
Speaker: Li-Ming Chen
28
My Comments

Present measurement results and a chat bot
classification system






Chat bots behavior very different from human users
 motivate the use of entropy-based classification
Besides, also propose a machine learning-based
classification scheme
 entropy-based classify can detect unknown bots, while
machine learning classify is more efficient
Complete work
Can this approach extend to detect other bots?
2010/11/26
Speaker: Li-Ming Chen
29
Other Bots

Chat Bots



Input Data Modification Attacks


e.g., (good) help operate chat rooms, entertain chat users
e.g., (bad) distribute chat spam, “spim”, malware
e.g., online game cheating, click fraud
 Problem:

Is a human in control, or is it a bot (computer)?
2010/11/26
Speaker: Li-Ming Chen
30
Is a Bot at the Controls?
Detecting Input Data Attacks
Travis Schluessler, Stephen Goglin,
Erik Johnson
(Intel Corporation)
NetGames 2007
(Workshop on Network and Systems Support for Games)
Detecting Input Data Modification Attacks

Current methods:


CAPTCHA, anti-cheat software
Proposed Approach:


Ensure input data enters a system through a physically
present human input device (HID)
Host-based approach


OS/“software stack” independent
(idea) “input data generated by HIDs” must be the same as
“input data consumed by an application”

2010/11/26
 if the two data streams differ, some form of illicit
modification occurred!
Speaker: Li-Ming Chen
32
System Architecture
Tamper-evident
to modification
illicit
modification
HIDs
2010/11/26
Speaker: Li-Ming Chen
33
Steady State Operation
2010/11/26
Speaker: Li-Ming Chen
34
Evaluation/
Discussion
Quake 3

Performance overhead

Detection Limitation:




Platform hardware modification
Programmable hardware
Attacks that alter the timing of the arrival of input data
Cost:


Implementation/deployment
Cost of circumvention
2010/11/26
Speaker: Li-Ming Chen
35
My Comments


This paper describes a method to detect attacks that
modify input data coming from HIDs
The idea is simple, useful, but hard to implement



Need to isolate “input verification service (IVS)”
Software application needs to register to and co-work with
IVS
Suitable for checking software/system inputs, not
the outputs (e.g., packets sent)
2010/11/26
Speaker: Li-Ming Chen
36