Transcript ppt

SP.a.M/\TØ
Spamato
An Extendable Spam Filter System
by
Keno Albrecht
Nicolas Burri
Roger Wattenhofer
Motivation
• Countless number of different spam filters
– Google: 1,740,000 hits (not spam filters)
– Freshmeat/Sourceforge: 404/420 projects
– Several "once-only" research projects
• Client-side filtering (vs. server-side)
– Email Client Add-On: Outlook (Express), …
– Proxy: Mediator between Client and Server
– Stand-alone: Proprietary “email clients”
Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005
Project Goal
• Build an extendable spam filter system to…
– ease the development of filters; provide filter
container
– help implementing tools for common tasks
– support as many email clients as possible
• Encourage filter developers to use our
framework
Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005
Subject: Free Spam Filter System
To: [email protected]
From: [email protected]
Dear Spam Filter Developer,
This is your once-in-a-lifetime opportunity to use the free spam filter
system Spamato. Spamato aims to bring a practical, easy-to-use, and
effective spam filter technology to the user’s desktop. It has been
designed to be used primarily as an add-on for several email clients. The
combination of multiple filtering techniques leads to a high spam detection
rate and a low false-positive rate. It offers a variety of features that
simplifies your life as a spam filter developer.
Do not reinvent the wheel!
Write your filter in an instance!
Use Spamato!
Visit our homepage at http://www.spamato.net. To unsubscribe click here.
The Spamato-Team
Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005
Architecture
Depending on Add-on:
Java
• Visual Basic
• Java Script
•…
• platform independent
Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005
Filtering Process
Emails are processed in
five phases:
(1) Initialization
(2) Pre-Check
(3) Check
(4) Decision
(5) Post-Check
Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005
Filtering Process
(1) Initialization
• Email client receives email, forwards it to
Spamato, and waits for check result.
msg
isSpam(msg)
Spamato Base
msg
msg
msg
Filtering Process
(2) Pre-Check
• Veto against
further processing
msg
isSpam(msg)
(Configuration, Sender-whitelist)
Spamato Base
• Gain information
for other plugins (URL extractor)
msg
msg
msg
Filter 1
Filter 2
Filter N
PreCheck(msg)
PreCheck(msg)
veto1(msg) veto2(msg)
.....
PreCheck(msg)
vetoN(msg)
Checkpoint PreCheck
veto(msg) = veto1(msg) || veto2(msg) || … || vetoN(msg)
veto(msg) == true
ignore this msg
isSpam(msg)
Spamato Base
Filtering Process
(3) Check
msg
msg
msg
Filter 1
Filter 2
Filter N
• Each filter calculates the spam probability
PreCheck(msg)
PreCheck(msg)
veto1(msg) veto2(msg)
.....
PreCheck(msg)
vetoN(msg)
veto(msg) == true
ignore this msg
Checkpoint PreCheck
veto(msg) = veto1(msg) || veto2(msg) || … || vetoN(msg)
msg
msg
msg
Filter 1
Filter 2
Check(msg)
Check(msg)
isSpam1(msg)
isSpam2(msg)
isSpam(msg)
Filter N
.....
Check(msg)
isSpamN(msg)
PreCheck(msg)
. . . . . Process
PreCheck(msg)
PreCheck(msg)
Filtering
(4) Decision
vetoN(msg)
veto1(msg) veto2(msg)
veto(msg) == true
ignore this msg
Checkpoint PreCheck
• Theveto(msg)
overall
spam probability is calculated
= veto (msg) || veto (msg) || … || veto (msg)
isSpam(msg)
and returned to the email client
1
msg
2
N
msg
msg
Filter 1
Filter 2
Check(msg)
Check(msg)
isSpam1(msg)
isSpam2(msg)
Filter N
.....
Check(msg)
isSpamN(msg)
Decision
isSpam(msg) = globalDecision(isSpam1(msg), isSpam2(msg), …, isSpamN(msg))
isSpam(msg)
Checkpoint PreCheck
veto(msg) = veto1(msg) || veto2(msg) || … || vetoN(msg)
Filtering Process
isSpam(msg)
(5) Post-Check
msg
Filter 1
msg
Filter 2
msg
Filter N
• Check(msg)
Learn from
global decision
.....
Check(msg)
Check(msg)
• Collect statistics
(msg)
isSpam (msg)
isSpam (msg)
•isSpam
Play
sound
1
2
N
Decision
isSpam(msg) = globalDecision(isSpam1(msg), isSpam2(msg), …, isSpamN(msg))
isSpam(msg)
Post Check
Filter1
Filter2
..
.
FilterN
Filters
• Bayesianato: Naïve Bayesian-based filter
• Ruleminator: Rule-based filter
• Razor(Ephemeral): Hash-based filter
» Vipul’s Razor: http://razor.sourceforge.net
• URL-based filters:
– Domainator: Search engine (“Google”) filter
– Earlgrey: Our collaborative multi-domain filter
– Razor(Whiplash): Collaborative single-domain filter
Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005
URL/URI/Domain Filtering
• About 70,000 spam emails investigated
– ~76% with at least one domains, thereof…
• ~20% with more than one distinct domain
• ~2% with ten or more distinct domains
• Spammers obfuscate their messages for
the (sole) purpose of misleading URL filters!
• How to handle “fake” (including ham)
domains? How to find “spam” domains?
Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005
URL-Filters in Comparison
D
26.5%
D
E
R/W
E
11.7%
25.2% 41.4%
R/W
NOT
ONLY
1.1% 27.3%
0.6%
2.5% 42.1%
2.0%
3.1% 15.6%
26.5% (1.1%) of all spam messages were identified by the Domainator, but not by
the Earlgrey (Razor/Whiplash) filter. 27.3% of all messages were not identified by
the Domainator, and 0.6% of all spam messages were solely identified by it.
Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005
Conclusion & Future Work
• Spamato eases the implementation and
deployment of spam filters and tools. It can be
used with all email clients. It is open source.
• A multi-faceted (URL-) filtering approach is
reasonable.
• TODO:
– Integration of more filters and improved analysis tools
– Decision module (dynamic weighting of filter results)
– Trust system for collaborative filters
Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005
Thank you!
Questions?
Comments?
(Un)subscribe?
[email protected]
[email protected]
http://www.spamato.net
http://sf.net/projects/spamato
Spamato - Keno Albrecht - Second Conference on Email and Anti-Spam - July 21 & 22, 2005