BotMiner: Clustering Analysis of Network Traffic for Protocol

Download Report

Transcript BotMiner: Clustering Analysis of Network Traffic for Protocol

Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee
College of Computing, Georgia Institute of Technology
USENIX Security '08
Presented by Lei Wu
April 13 th , 2009
 Motivation and Background
 System description
 Experimental analysis
 Conclusion
 Motivation and Background
 System description
 Experimental analysis
 Conclusion
 This paper proposes a general detection
framework BotMiner that is independent of
botnet Command and Control (C&C)
protocol and structure, and requires no a
priori knowledge of botnets
 Bot
 A malware instance that runs
autonomously and
automatically on a compromised
computer (zombie) without
owner’s consent
 Botnet: network of bots
controlled by criminals
 Definition: “A coordinated
group of malware instances that
are controlled by a botmaster via
some C&C channel”
 25% of Internet PCs are part of a
botnet!
 Why BotMiner?
 Traditional methods are not enough. Botnets
can change their C&C content (encryption, etc.),
protocols (IRC, HTTP, etc.), structures (P2P,
etc.), C&C servers, infection models …
 Cluster similar communication traffic and similar
malicious traffic, and performs cross cluster
correlation to identify the hosts that share both
similar communication patterns and similar
malicious activity patterns
 Revisit the definition of Botnet again
 “A coordinated group of malware instances that are
controlled by a botmaster via some C&C channel”
 We need to monitor two planes
 C-plane (C&C communication plane): “who is talking to
whom”
 A-plane (malicious activity plane): “who is doing what”
 Horizontal correlation
 Bots are for long-term use
 Botnet: communication and activities are
coordinated/similar
 Motivation and Background
 System description
 Experimental analysis
 Conclusion
A-Plane
Monitor + Clustering
Network
Traffic
Cross-Plane
Correlation
C-Plane
Monitor + Clustering
Report
A-Plane
Monitor + Clustering
Network
Traffic
Cross-Plane
Correlation
C-Plane
Monitor + Clustering
Report
 Log information on who is doing what
 Monitor four types of malicious activities
 Scanning
 Spamming
 Binary downloading
 Exploit attempts
 Based on Snort, adapt some existing intrusion
detection techniques (e.g. BotHunter, PEHunter)
 Two-layer clustering on activity logs
A-Plane
Monitor + Clustering
Network
Traffic
Cross-Plane
Correlation
C-Plane
Monitor + Clustering
Report
 Capture network flows and records information on
who is talking to whom
 Adapt an efficient network flow capture tool named
fcapture, which is based on Judy library
 Each flow record contains the following information:
time, duration, source IP, source port, destination IP,
destination port, and the number of packets and bytes
transferred in both directions
 Architecture of the C-plane clustering
 First two steps are not critical, however, they can reduce the
traffic workload and make the actual clustering process
more efficient
 In the third step, given an epoch E (typically one day), all
TCP/UDP flows that shares the same protocol, source IP,
destination IP and port, are aggregated into the same Cflow
 Extract a number of statistical features from each C-
flow and translate them into d-dimensional pattern
vectors compute the discrete sample distribution of
(currently) four random variables
 the number of flows per hour (fph)
 the number of packets per flow (ppf)
 the average number of bytes per packets (bpp)
 the average number of bytes per second (bps)
Temporal related
statistical distribution
information: FPH and BPS
Spatial related
statistical distribution
information: BPP and PPF
 Compute the overall discrete sample distribution of
the random variable considering all the C-flows in the
traffic for an epoch E, then describe that random
variable (approximate) distribution as a vector of 13
elements.
 Apply the same algorithm for all four random
variables, and therefore we map each C-flow into a
pattern vector of d = 52 elements
 Why multi-step?
 Coarse-grained clustering
 Using reduced feature
space: mean and variance
of the distribution of
FPH, PPF, BPP, BPS for
each C-flow (2*4=8)
 Efficient clustering
algorithm: X-means
 Fine-grained clustering
 Using full feature space
(13*4=52)
A-Plane
Monitor + Clustering
Network
Traffic
Cross-Plane
Correlation
C-Plane
Monitor + Clustering
Report
 Botnet score s(h) for every host h
 h will receive a high score if it has performed multiple types
of suspicious activities, and if other hosts that were clustered
with h also show the same multiple types of activities
 Similarity score between host hi and hj
 Two hosts in the same A-clusters and in at least one common
C-cluster are clustered together
 Use the Davies-Bouldin (DB) validation index to find the
best dendrogram cut, which produces the most compact
and well separated clusters
 Motivation and Background
 System description
 Experimental analysis
 Conclusion
 Motivation and Background
 System description
 Experimental analysis
 Conclusion
 Evading C-plane monitoring and clustering
 Misuse whitelist
 Manipulate communication patterns
 Evading A-plane monitoring and clustering
 Very stealthy activity
 Individualize bots’ communication/activity
 Evading cross-plane analysis
 Extremely delayed task
 Propose a detection framework which is independent
of botnet C&C protocol and structure, and requires no
a priori knowledge of specific botnets
 Build a prototype system based on the general
detection framework, and evaluate it with multiple
real-world network traces including normal traffic and
several real-world botnet traces
 Offline system
 Long time data collection and analysis
 No incremental ability of analysis
 The experiment is not convincing enough
 Only shows the system performance on day-2, what
about the other days?
 Not a real “real world experiment”
 Fast detection and online analysis
 More efficient clustering, more robust features
 More experiments in different and real network
environment
 Sides of the paper in USENIX Security’08

http://faculty.cs.tamu.edu/guofei/paper/botMiner-Security08-slides.pdf
 Sad Planet, Kayak Adventure. Botnets on the Rampage

http://birdhouse.org/blog/2006/11/16/botnets-on-the-rampage/
 Beware of Potential Confickor BotNet Chaos

http://thejunction.net/2009/03/25/april-1st-beware-of-potential-botnet-chaos/
 Oracle Data Mining Mining Techniques and Algorithms

http://www.oracle.com/technology/products/bi/odm/odm_techniques_algorithms.
html
Question?