Transcript Defense
Understanding the Network-Level
Behavior of Spammers
By Anirudh Ramachandran and Nick Feamster
Defense Team:
Mike Delahunty
Bryan Lutz
Kimberly Peng
Kevin Kazmierski
John Thykattil
Agenda
Introduction
Background and Related Work
Data Collection
Network-level Characteristics of Spammers
Spam from Botnets
Spam from Transient BGP Announcements
Lessons from Better Spam Mitigation
Conclusion
Introduction
Spam
Multiple emails sent to many recipients
Unsolicited commercial messages
Study based on network level behavior of
spammers
IP address ranges
Spamming modes (route hijacking, bots, etc.)
Temporal persistence of spamming hosts
Characteristics of spamming botnets
Much attention has been paid to studying the
content of spam
Introduction Cont.
Study posits that Network Level properties need to
be investigated in order to determine creative
ways to mitigate spam
Paper analyzes network properties of spam that is
observed at a large spam “sinkhole”
BGP route advertisements
Traces of command and control messages of a Bobax botnet
Legitimate emails
Surprising Conclusions
Most spam comes from a small IP address space (but so does
legitimate email)
Most spam comes from Microsoft Windows hosts – bots
Small set of spammers use short-lived route announcements to
remain untraceable
Background
Methods and Mitigation
Spamming Methods
Direct Spamming – via spam friendly ISPs or dial-up IPs
Open Relays and Proxies – mail serves that allow
unauthenticated to relay email
Botnets – hijacked machines acting under the control of
centralized ‘botmaster’
BGP Spectrum Agility – short-lived route announcements to the
IP addresses from which they send spam; hampers traceability
Mitigation Techniques
Filtering: Content based and IP Blacklists
Related Work
Related Work – Previous Studies
Packet traces to determine bandwidth
bottlenecks from spam sources
Project Honeypot
Sink for email traffic and hands out trap email
addresses to determine harvesting behavior and
identity of spammers
Time monitoring from harvesting to receipt of first
spam message
Countries where harvesting infrastructure is located
Persistence of spam harvesters
Related Work Cont.
Mitigation
SpamAssassin Project – reverse engineering via mail
content analysis
DNS blacklist – 80% of IPs sending spam were in the
blacklist
Unusual Route Announcements
Bogus Well-Known addresses
Suggestions of short lived route announcements
Data Collection
Reserve a “sinkhole”
Registered domain with no legitimate email
addresses
Establish a DNS Mail Exchange record for it.
All emails received by the server are spam
Run metrics on incoming emails
IP address of the relay; also run a traceroute
TPC fingerprint to get the source OS
Results of DNS blacklist from 8 different blacklist servers
Data Collection Cont.
Spam received per day at sinkhole (Aug. 2004 – Dec. 2005)
Data Collection Cont.
“Hijack” the DNS server for the domain running a botnet
Have botnet commands go to a known machine instead.
Monitor the BGP update from the networks where
the spams are received
Collect logs from large email provider (40 million
mailboxes)
Allows analysis of network characteristics for spam and
non-spam
Data Analysis
Study focuses on network level characteristics
Distribution of spam across IP address space is
similar to legitimate emails (although not exact)
Spam over IP address range is not uniform
12% of all received spam comes from two
Autonomous Systems (AS)
37% come from top 20 ASes.
Offers insight into spam prevention
Classifying spam by country: China, Korea, & US
dominate
Defense suggestion
Correlate originating country with IP range to
estimate probability of spam.
Cumulative Distribution Function (CDF) of Spam and Legitimate
Email
Greater
probability of
legitimate
emails
Big increase in
probability of
received spam
Spam Persistence
85% of unique
spammers
send 10
emails or less
If this is true for all,
what’s the value in
filtering by a specific
IP address?
Effectiveness of Blacklists
About 80% of spam listed in at least one major blacklist
Effectiveness of Blacklists Cont.
Most spam bots are detected by at least one DNSRBL
Only 50% of spammers using transient BGP announcements detected by
one DNSRBL
Spam from Botnets
Circumstantial evidence suggests that most
spam originates from bots
Spamming hosts and Bobax drones have very
similar distributions across IP address space
Suggests that much spam received may be due to
botnets such as Bobax
More on Bots
Most individual bots send low volume of spam individually
Operating Systems Used by Spammers
Used OS fingerprinting tool “p0f” in Mail
Avenger
Able to identify OS of 75% of hosts that sent
spam
Of this 75% identifiable segment, 95% run
Windows
Consistent with percentage of hosts on Internet
that run Windows
Only about 4% run other OS, but are
responsible for 8% of received spam.
This goes against common perception that most
spam originates from Windows botnet drones
Spam from Transient BGP Announcements
Some spammers briefly hijack large portions
of IP address space (that do not belong to
them), send spam, and withdraw routes
immediately after spamming
Not much known, not well defended against
Very difficult to trace
Allows spammer to evade DNSRBLs
Used 10% or less of the time, as
complementary spamming tactic
Lessons on Spam Mitigation
Why should we use network-level
information?
Information is less malleable
More constant than spam email contents, which
content-based filters monitor
Information is observable in the middle of the
network
Closer to the source of the spam than other
techniques
Will result in more effective spam filters
When combined with other techniques
Has potential to stop spam that other techniques miss
More Lessons
Improves knowledge of host identity
Bases detection techniques on aggregate
behavior
Protects against route hijacking
“BGP spectrum agility”
Other techniques do not
Uses network-level properties to detect and
filter
Conclusion
Studying the network-level behavior of
spammers
Designing better spam filters with networklevel filters
Network-level behavior filters vs. contentbased filters
Should not replace content-based filters, but
complement them
Questions?