Anti-Phishing - Columbia University

Download Report

Transcript Anti-Phishing - Columbia University

Anti-Phishing
Approaches
Lifeng Hu
[email protected]
What is Phishing?




An engineering attack
An attempt to trick individuals into revealing personal
credentials (uname, passwd, credit card info, etc)
Based on faked email and websites
A threat for the internet users
Damages
- 73 million US adults
received more than 50
phishing emails a year
- $2.8 billion loss a year

Phishing Methods





Establish websites having similar interface/URL
as famous websites
Establish cheating websites to get users’
personal information
Establish transparent website between original
websites and users
Send emails containing malicious URL
Send emails containing embed malicious
flash/picture files to avoid text checking of antiphishing
f pn 
good
phishphish
good
good
phishphish
 good
phishgood
good 
phish
False positive/negative rate of
Anti-Phishing Approaches

False negative rate: the rate of phishing websites being
regarded as good in all phishing websites
fn 

phishgood  phishphish
False positive rate: the rate of good websites being
regarded as phishing in all good websites
fp 

phishgood
goodphish
goodphish  goodgood
So, the lower false rates are, the better Anti-Phishing
approach is
Anti-Phishing Approaches
for Specific Websites




Typically, designed by website companies
An example is Sitekey mechanism of
BankOfAmerica online
Pro: False negative rate is low
False positive rate can be zero
Con: Not applicable for phishing emails
Anti-Phishing Approaches
Based on Database






Anti-phishing Firewall : Kaspersky
Anti-phishing Toolbar : Netcraft
All based on on-line database
Toolbar can provide URL statistics data in advance
Pro: Applicable for both websites and emails
False negative rate can be low
False positive rate is low
Con: Need frequent updates
Relatively hard to implement
False negative rate increases if not up-to-date
Anti-Phishing Approaches
Based on Content

-


PILFER: email phishing detection based on machine-learning combining 10
filters:
IP based URL: 192.168.0.1/paypal.cgi?fix=account
Domain age from whois.net
Non-matching URL: <a href=“phishingsite.com"> paypal.com</a>
HTML email : hidden URLs
Malicious JavaScript
<More>…
Pro: Practically, false positive and negative rate are relative low
Machine learning methods make it possible to improve accuracy
No constant update is needed
Con: Still need updates on training data and filters to adapt new styles of
phishing emails
Network cost is a problem
Anti-Phishing Approaches
Based on Content (cont.)

CANTINA: phishing website detection based on TF-IDF weight
- TF: the number of times a given term appears in a specific document
- IDF: a measure of the general importance of the term in all documents
- TF-IDF = TF/IDF, specifies term with frequency in a given document
- Search five top TF-IDF words of current web page in search engine such as
Google
- Current web page should be in top N (30) search results to be legitimate

CANTINA also uses filters similar to PILFER to decrease false positive

Pro: False positive and negative rate are very low
No constant update is needed
Search engine ranking is relative hard to cheat
Con: Network cost is a problem
Too many phishing website searches may affect phishing websites’
ranking

Summary of mentioned
Anti-Phishing Approaches
False Positive
False Negative
Implement
Effort
Adaptation
Update
Cycle
For Specific Websites
Zero
Low
Easy
Specific Website
None
Firewall Based on Database
Low
Medium
Medium
General
Web/Email
Very Frequently
Toolbar Based on Database
Low
Low
Hard
General
Web/Email
Very Frequently
PILFER
Low
Low
Medium
General Email
Sometimes
Very Low
Low
Medium
General
Websites
Few
Anti-Phishing Approaches
CANTINA
Thanks!