Transcript spam bots

A Spam Mail-based Solution for
Botnet Detection and Network
Bandwidth Protection
許富皓
資訊工程學系
中央大學
1
Outline







Introduction
Background
System Design
Work Flow
Evaluation
Related Work
Conclusion
2
Outline







Introduction
Background
System Design
Work Flow
Evaluation
Related Work
Conclusion
3
Spam Mails and Bots


At 2009, research shows that more
than 80% spam mails are sent by the
bots, called spam bots hereafter, of
botnets.
Spam mails take up more than 50% of
network bandwidth.
4
Objectives



Detect members of botnet
Filter spam mails
Save network bandwidth
5
Observation

As our observation, the majority of
spam bots are not e-mail servers, spam
bots usually only send mails but do not
receive mails.
6
Outline







Introduction
Background
System Design
Work Flow
Evaluation
Related Work
Conclusion
7
E-mail Architecture
8
Botnets & Spam Mails
9
Outline







Introduction
Background
System Design
Work Flow
Evaluation
Related Work
Conclusion
10
System Layout
redirect/block
Honeypot
confirmation host
Packet Analyzer
SMTP, POP3, IMAP
End users
confirm
SMTP, POP3, IMAP
Mail Server
11
System Component – Packet
Analyzer (PA) (1)


Located at a router
Detect spam bots based on the IP ack-packets



which use SMTP(S), POP3(S), or IMAP(S) protocol
and
whose sizes are less than 200 bytes (describe later)
Use credit number to record mail transmission
status of an IP address.
12
IP Threat Level Table (IPTLT)
IP
Credit
Action
13
System Component – Packet
Analyzer (PA) (2)


Clean IPTLT periodically to solve the
problem of dynamic allocated IPs (such as,
DHCP).
Add an NAT detection mechanism to avoid
harming innocent hosts behind an NAT.
14
Credit Number


Credit Number is a property of an IP
address.
PA assigns a credit number to every IP
address which has appeared in a mail
packet (SMTP/POP/IMAP) as the source IP
address.
15
Operations of Credit Number



Increasing operation
 When PA detects a SMTP mail packet, the
credit number of the source IP of the mail
packet will be increased by 1.
Decreasing operation
 When PA detects a POP/IMAP mail packet, the
credit number of the source IP of the mail
packet will be decreased by a higher value, 3.
P.S.: By analyzing real world traffic, in a network
the ratio of sending mails to receiving mails is
1:3.
16
Approach to Reduce Packet
Analyzer Performance Overhead

Sampling

A router solution should avoid high
performance overhead; hence, Packet
Analyzer can use sampling to reduce the
performance overhead of packet analyzer.

The sample rate is an adjustable parameter.
17
Avoid Noise Created by Large
Size Mails



No matter what size a mail has, the number of
protocol related packets exchanged between the
sender and receiver is similar to each other.
The sizes of protocol related packets are usually
smaller than 200 bytes.
To avoid counting large size mails sending by
normal users more times, we filter out e-mail
related packets with size larger than 200 bytes.
18
System Component –
Confirmer (1)



Located at a confirmation host.
Check if a host is a mail server because a
mail server may have the same behavior
as a bot.
By connecting to SMTP port of a host to
check whether it is a mail server.
19
System Component –
Confirmer (2)

The record that an IP is used by a
confirmed host is kept in the IPTLT until
the IPTLT is cleaned up; hence, the IP is
only needed to be confirmed once before
the IPTLT is cleaned up.
20
Outline







Introduction
Background
System Design
Work Flow
Evaluation
Related Work
Conclusion
21
Work Flow
Kernel Space
IP threat level table
Packets
NetFilter
PREROUTING
e-mail IP
related
traffic
Credit
Action
Clean up
periodically
Suspect IP
Fetch
action
Fill action
field
Kernel
thread
Accept / Drop
Check SMTP
Suspect Host
Check
result
Linux Router
Suspect IP
Packet Analyzer
Confirmer
Confirmation Host
22
Outline







Introduction
Background
System Design
Work Flow
Evaluation
Related Work
Conclusion
23
Performance Evaluation

Scenario



Send 10000 mails.
Mail size: 3 KB.
Transmitting mails through the router with or without SpamFinder.
End user (E-mail client)
Host: ASUS Desktop AS-D672
CPU: Intel Pentium 4 Dual Core 3.2 GHz
RAM: 4G
LAN: Gigabit Ethernet NIC
OS: Windows 7
SpamFinder
SMTP server
Host: ASUS Desktop AS-D360
CPU: Intel Pentium 4 3.0 GHz
RAM: 512 MB
LAN1: 10 Mb/100 Mb Ethernet Controller
LAN2: 10 Mb/100 Mb Ethernet Controller
OS: Fedora 10, kernel 2.6.27
24
Performance Evaluation

Evaluation Result

Performance Evaluation
Overhead, O(n%)

n: sample rate
0.135
0.13
O(100%) = 4.13 %
0.125
0.12
0.115
O(0.2%) = 3.8 %
0.11
0.105
0.1
without SpamFinder
avg time to send mail
0.1295395
with SpamFinder
0.1348885
with SpamFinder and
sample rate 1/500
0.1344632
25
Effectiveness Evaluation

Scenario



Analyze the real world traffic (about 1300
computers of NCTU dorm network) offered by
NBL (Network Benchmarking Lab)@NCTU
Analyze the whole day traffic of 6/13/2010
(about 2TB)
Replay traffic (250 ~ 350 Mb/s)
Traffic logs
Real world traffic
replay
SpamFinder
Host: HP CQ-45 Notebook
CPU: Intel Core 2 Duo P7450 / 2.13 GHz
RAM: 4G
LAN: 10/100/1000 Gigabit Ethernet LAN
OS: Fedora 12 kernel 2.6.32
26
26
Effective Evaluation

According to the result of analyze, we get
the follows information:



The rate of sending and receiving data is 1:3
With credit threshold = 150, SpamFinder can
save 25% e-mail related traffic
Average packet dropped ratio : 0.31 %


NBL uses CISCO 7609 router to collect packet
traces.
We use a notebook to make our analysis.
27
Effective Evaluation

According to the result of analses, we
get the follows information:


SpamFinder detect 2 spam bots after
analyzing 1 day traffic of NCTU dorm
network
P.S.: that according to tyc.edu.tw reports:
in average in the NCU campus there are
4.1 hosts per day be reported as spam
hosts.
28
Effective Evaluation
Save ratio of e-mail related traffic
25.2%
25.0%
24.8%
24.6%
24.4%
save e-mail related traffic
24.2%
24.0%
23.8%
23.6%
150
200
300
450
650
credit number threshold
29
Effective Evaluation
[5068] ip: 140.#.#.135, credit: 18147, nat: 0, mail server: 0, action: 0
Jun
Jun
Jun
Jun
Jun
Jun
Jun
Jun
Jun
17
17
17
17
17
17
17
17
17
14:11:51
14:11:51
14:11:51
14:11:52
14:11:52
14:11:52
14:11:53
14:11:53
14:11:53
[SEND]
[SEND]
[SEND]
[SEND]
[SEND]
[SEND]
[SEND]
[SEND]
[SEND]
。。。
140.#.#.135:4552 -> 74.125.157.27:25, total_len:1470
140.#.#.135:4552 -> 74.125.157.27:25, total_len:1470
140.#.#.135:4552 -> 74.125.157.27:25, total_len:1326
140.#.#.135:4839->165.131.174.40:25, total_len:1500
140.#.#.135:4839->165.131.174.40:25, total_len:1500
140.#.#.135:4839->165.131.174.40:25, total_len:896
140.#.#.135:4832 -> 74.86.7.196:25, total_len:1500
140.#.#.135:4832 -> 74.86.7.196:25, total_len:1500
140.#.#.135:4832 -> 74.86.7.196:25, total_len:811
。。。
repeat this action 3628 times
30
Effective Evaluation
[3370] ip: 140.#.#.148, credit: 8203, nat: 0, mail server: 0, action: 0
Jun
Jun
Jun
Jun
Jun
Jun
Jun
Jun
Jun
14
14
14
14
14
14
14
14
14
16:24:14
16:24:14
16:24:14
16:24:17
16:24:17
16:24:17
16:24:17
16:24:17
16:24:17
[SEND]
[SEND]
[SEND]
[SEND]
[SEND]
[SEND]
[SEND]
[SEND]
[SEND]
。。。
140.#.#.148:6508 -> 148.123.15.75:25, total_len:1500
140.#.#.148:6508 -> 148.123.15.75:25, total_len:1500
140.#.#.148:6508 -> 148.123.15.75:25, total_len:1142
140.#.#.148:6534->75.126.136.141:25, total_len:1500
140.#.#.148:6534->75.126.136.141:25, total_len:1500
140.#.#.148:6534->75.126.136.141:25, total_len:1500
140.#.#.148:6526 -> 74.125.43.27:25, total_len:1470
140.#.#.148:6526 -> 74.125.43.27:25, total_len:1470
140.#.#.148:6526 -> 74.125.43.27:25, total_len:1470
。。。
repeat this action 2050 times
31
Outline







Introduction
Background
System Design
Work Flow
Evaluation
Related Work
Conclusion
32
Related Work

BotGraph, Large Scale Spamming Botnet
Detection, USENIX’09


BotMiner, Clustering Analysis of Network
Traffic for Protocol- and StructureIndependent Botnet Detection, USENIX’08


Webmail botnet account detection
Network behavior based detection
Wide-scale botnet detection and
characterization, HotBots’07, USENIX
33
Outline







Introduction
Background
System Design
Work Flow
Evaluation
Related Work
Conclusion
34
Limitation


If the e-mail sending traffic passes through
the router, but the e-mail receiving traffic
doesn’t, then the host would be considered
as a spam bot.
SpamFinder cannot detect e-mails that are
sent and received through a Webmail, but
popular web mail services have their
effective anti-spam mechanism to filter
spam mails.
35
Attack Analysis and Future
Work


Attackers might send fake IP packets to
defame some target hosts, we could
check the existence of related
connections to detect these behavior.
In the future, the spam mails (or IP
packets) sent from bots will be
redirected to a honeypot for further
analysis.
36
Conclusion



We propose a network level spam bot detection
mechanism, SpamFinder
Implement it on a Linux router and make
evaluations using real world traffic that offered
by NBL(Network Benchmarking Lab)@NCTU
The evaluation result show that SpamFinder has
low performance overhead and could detect
spam bots and protect network bandwidth
effectively
37
End

Q&A
38