BotMiner: Clustering Analysis of Network Traffic for Protocol
Download
Report
Transcript BotMiner: Clustering Analysis of Network Traffic for Protocol
Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee
College of Computing, Georgia Institute of Technology
USENIX Security '08
Presented by Lei Wu
April 13 th , 2009
Motivation and Background
System description
Experimental analysis
Conclusion
Motivation and Background
System description
Experimental analysis
Conclusion
This paper proposes a general detection
framework BotMiner that is independent of
botnet Command and Control (C&C)
protocol and structure, and requires no a
priori knowledge of botnets
Bot
A malware instance that runs
autonomously and
automatically on a compromised
computer (zombie) without
owner’s consent
Botnet: network of bots
controlled by criminals
Definition: “A coordinated
group of malware instances that
are controlled by a botmaster via
some C&C channel”
25% of Internet PCs are part of a
botnet!
Why BotMiner?
Traditional methods are not enough. Botnets
can change their C&C content (encryption, etc.),
protocols (IRC, HTTP, etc.), structures (P2P,
etc.), C&C servers, infection models …
Cluster similar communication traffic and similar
malicious traffic, and performs cross cluster
correlation to identify the hosts that share both
similar communication patterns and similar
malicious activity patterns
Revisit the definition of Botnet again
“A coordinated group of malware instances that are
controlled by a botmaster via some C&C channel”
We need to monitor two planes
C-plane (C&C communication plane): “who is talking to
whom”
A-plane (malicious activity plane): “who is doing what”
Horizontal correlation
Bots are for long-term use
Botnet: communication and activities are
coordinated/similar
Motivation and Background
System description
Experimental analysis
Conclusion
A-Plane
Monitor + Clustering
Network
Traffic
Cross-Plane
Correlation
C-Plane
Monitor + Clustering
Report
A-Plane
Monitor + Clustering
Network
Traffic
Cross-Plane
Correlation
C-Plane
Monitor + Clustering
Report
Log information on who is doing what
Monitor four types of malicious activities
Scanning
Spamming
Binary downloading
Exploit attempts
Based on Snort, adapt some existing intrusion
detection techniques (e.g. BotHunter, PEHunter)
Two-layer clustering on activity logs
A-Plane
Monitor + Clustering
Network
Traffic
Cross-Plane
Correlation
C-Plane
Monitor + Clustering
Report
Capture network flows and records information on
who is talking to whom
Adapt an efficient network flow capture tool named
fcapture, which is based on Judy library
Each flow record contains the following information:
time, duration, source IP, source port, destination IP,
destination port, and the number of packets and bytes
transferred in both directions
Architecture of the C-plane clustering
First two steps are not critical, however, they can reduce the
traffic workload and make the actual clustering process
more efficient
In the third step, given an epoch E (typically one day), all
TCP/UDP flows that shares the same protocol, source IP,
destination IP and port, are aggregated into the same Cflow
Extract a number of statistical features from each C-
flow and translate them into d-dimensional pattern
vectors compute the discrete sample distribution of
(currently) four random variables
the number of flows per hour (fph)
the number of packets per flow (ppf)
the average number of bytes per packets (bpp)
the average number of bytes per second (bps)
Temporal related
statistical distribution
information: FPH and BPS
Spatial related
statistical distribution
information: BPP and PPF
Compute the overall discrete sample distribution of
the random variable considering all the C-flows in the
traffic for an epoch E, then describe that random
variable (approximate) distribution as a vector of 13
elements.
Apply the same algorithm for all four random
variables, and therefore we map each C-flow into a
pattern vector of d = 52 elements
Why multi-step?
Coarse-grained clustering
Using reduced feature
space: mean and variance
of the distribution of
FPH, PPF, BPP, BPS for
each C-flow (2*4=8)
Efficient clustering
algorithm: X-means
Fine-grained clustering
Using full feature space
(13*4=52)
A-Plane
Monitor + Clustering
Network
Traffic
Cross-Plane
Correlation
C-Plane
Monitor + Clustering
Report
Botnet score s(h) for every host h
h will receive a high score if it has performed multiple types
of suspicious activities, and if other hosts that were clustered
with h also show the same multiple types of activities
Similarity score between host hi and hj
Two hosts in the same A-clusters and in at least one common
C-cluster are clustered together
Use the Davies-Bouldin (DB) validation index to find the
best dendrogram cut, which produces the most compact
and well separated clusters
Motivation and Background
System description
Experimental analysis
Conclusion
Motivation and Background
System description
Experimental analysis
Conclusion
Evading C-plane monitoring and clustering
Misuse whitelist
Manipulate communication patterns
Evading A-plane monitoring and clustering
Very stealthy activity
Individualize bots’ communication/activity
Evading cross-plane analysis
Extremely delayed task
Propose a detection framework which is independent
of botnet C&C protocol and structure, and requires no
a priori knowledge of specific botnets
Build a prototype system based on the general
detection framework, and evaluate it with multiple
real-world network traces including normal traffic and
several real-world botnet traces
Offline system
Long time data collection and analysis
No incremental ability of analysis
The experiment is not convincing enough
Only shows the system performance on day-2, what
about the other days?
Not a real “real world experiment”
Fast detection and online analysis
More efficient clustering, more robust features
More experiments in different and real network
environment
Sides of the paper in USENIX Security’08
http://faculty.cs.tamu.edu/guofei/paper/botMiner-Security08-slides.pdf
Sad Planet, Kayak Adventure. Botnets on the Rampage
http://birdhouse.org/blog/2006/11/16/botnets-on-the-rampage/
Beware of Potential Confickor BotNet Chaos
http://thejunction.net/2009/03/25/april-1st-beware-of-potential-botnet-chaos/
Oracle Data Mining Mining Techniques and Algorithms
http://www.oracle.com/technology/products/bi/odm/odm_techniques_algorithms.
html
Question?