SMITE_GaTech_0309

Download Report

Transcript SMITE_GaTech_0309

Botnet and Spam Detection in
High-Speed Networks
Wenke Lee and Nick Feamster
Georgia Tech
Overview
• Problem: Botnet and Spam Detection in
high-speed networks
• Common theme: Examine network-level
properties and build classifier
• Two systems: BotMiner and SNARE
– Overview
– Integration with SMITE architecture
• Current integration status and plan
BotMiner: Structure and Protocol Independent
• Botnets can change their C&C content
(encryption, etc.), protocols (IRC, HTTP, etc.),
structures (P2P, etc.), C&C servers, infection
models …
C&C
bot
bot
bot
bot
bot
bot
bot
bot
bot
bot
bot
(a)
(b)
3
Definition of a Botnet
• “A coordinated group of malware instances that
are controlled by a botmaster via some C&C
channel”
– Hosts that have similar C&C-like traffic and similar
malicious activities
• We need to monitor two planes
– C-plane (C&C communication plane): “who is talking
to whom”
– A-plane (malicious activity plane): “who is doing what”
4
BotMiner Architecture
Sensors
Algorithms
Correlation
A-Plane Monitor
Scan
Spam
Binary
Downloading
Exploit
A-Plane
Clustering
...
Network
Traffic
Cross-Plane
Correlation
Activity Log
Reports
C-Plane Monitor
Flow Log
C-Plane
Clustering
5
BotMiner C-plane Clustering
• What characterizes a communication flow (Cflow) between a local host and a remote service?
– <protocol, srcIP, dstIP, dstPort>
– Temporal related statistical distribution information
– E.g., BPS (bytes per second), FPH (flows per hour)
– Spatial related statistical distribution information
– E.g., BPP (bytes per packet), PPF (packets per flow)
6
A-plane Clustering
• Capture “similar activities patterns”
7
Cross-plane Correlation
• Botnet score s(h) for every host h
– A host has higher score if it is in more activity
clusters and in both activity and
communication clusters
– A host with a high score is a bot
• Similarity score between bot host hi and hj
– Two hosts in the same A-clusters and in at
least one common C-cluster are clustered
together
– Each cluster is a bot
8
SMITE Integration: BotMiner
9
Integrating BotMiner and SMITE
• Sensors
– Feature extraction for C-Plane and A-Plane
clustering
– C-Flow temporal and statistical features
• Counting packets and connections between each
pair of endpoints: bytes per second, flows per
hour, bytes per packet, packets per flow
– A-Plane header and payload features
• Destination IP addresses and ports, payload
bytes/strings
– These sensors are not specific to BotMiner
10
Integrating BotMiner and SMITE
• Algorithms
– C-plane clustering
• Multi-step clustering based on statistical and temporal C-flow
features
– A-plane clustering
• Based on activity-specific similarity measures: e.g., spread of
destination IP addresses and ports, Dice’s coefficient of
string similarity, and byte frequency or entropy of payload
– Bot scoring and botnet clustering methods
• Scoring based on participation in C-plane and A-plane
clusters
• Clustering based on common memberships in the C-plane
and A-plane clusters
11
Integrating BotMiner and SMITE
• Correlation
– Botnet detection involves both vertical and horizontal
analysis/clustering:
• Vertical: what activities a host has been involved in
– Bot detection
• Horizontal: what other hosts have similar (vertical) behavior
patterns
– Botnet detection
– Similar analysis can be applied to other alerts
• Improve botnet detection
• Understand malicious activities and plans of attacks
• Measure the scale of attacks
12
Network-Based Spam Detection
• Filter email based on how it is sent, in
addition to simply what is sent.
• Network-level properties are less
malleable
– Hosting or upstream ISP (AS number)
– Membership in a botnet (spammer, hosting
infrastructure)
– Network location of sender and receiver
– Set of target recipients
13
Finding the Right Features
• Goal: Sender reputation from a single packet
header?
–
–
–
–
Low overhead
Fast classification
In-network
Perhaps more evasion resistant
• Key challenge
– What features satisfy these properties and can
distinguish spammers from legitimate senders?
14
Network-Level Features
• Single-Packet
–
–
–
–
–
AS of sender’s IP
Distance to k nearest senders
Status of email service ports
Geodesic distance
Time of day
• Single-Message
– Number of recipients
– Length of message
• Aggregate (Multiple Message/Recipient)
15
Sender-Receiver Geodesic Distance
90% of legitimate
messages travel 2,200
miles or less
16
Density of Senders in IP Space
For spammers, k
nearest senders
are much closer
in IP space
17
Local Time of Day at Sender
Spammers “peak” at
different local times
of day
18
Other Network-Level Features
• Time-of-day at sender
• Upstream AS of sender
• Message size (and variance)
• Number of recipients (and variance)
19
Combining Features: RuleFit
• Put features into the RuleFit classifier
• 10-fold cross validation on one day of query logs
from a large spam filtering appliance provider
• Comparable performance to SpamHaus
– Incorporating into the system can further reduce FPs
• Using only network-level features
• Completely automated
20
Benefits of Whitelisting
Whitelisting top 50 ASes:
False positives reduced to 0.14%
21
Integrating SNARE and SMITE
Sensors
Algorithms/
Correlation
22
Integration with SMITE
• Sensors
– Extract network features from traffic
– IP addresses
– Combine with auxiliary data (routing, time, etc.)
• Algorithms
– Clustering algorithm to identify behavioral fingerprints
– Learning algorithm to classify based on multiple features
• Correlation
– Clusters formed by aggregating sending behavior observed
across multiple sensors
– Various features also require input from data collected across
collections of IP addresses
23
SMITE Integration Challenges
• Sources of labeled data
– SNARE requires clean sources of labeled
data for training
• Data collection
– SNARE’s performance improves when
behavior can be observed across multiple
domains
24
Overall SMITE Integration
25
SMITE Integration: Current Work
• Study pipeline architecture and code
• Modify flow-analyzer to dump 5-tuple flow
information
26
SMITE Integration: Phase I
• Modify flow-analyzer with SMITE team to
generate 5-tuple flow information (mid-March)
• Spam/scan detection, flow aggregation in
BotMiner; Spam feature extraction in SNARE
(end of March)
• Clustering and correlation in BotMiner; Classifier
in SNARE (end of April)
27
SMITE Integration: Phase II
• Evaluate performance of BotMiner and SNARE
– How many hours to process one-day of traffic, or what is
the “lag” time between event and detection?
• Design real-time detection algorithms
– A two-tier system: off-line module output lists of suspicious
hosts, and real-time module inspects all packets of these
hosts; or, off-line module output clusters
• Design algorithms to handle asymmetric traffic
– Cluster on each direction of traffic and cross-correlate
28
Thank You!