Heat-seeking Honeypots: Design and Experience

Download Report

Transcript Heat-seeking Honeypots: Design and Experience

Heat-seeking Honeypots:
Design and Experience
John P. John, Fang Yu, Yinglian Xie, Arvind Krishnamurthy
and Martin Abadi
WWW 2011
Presented by Elias P. Papadopoulos
1
Compromising Web Servers
• Phishing and malware pages, redirecting user traffic to
malicious sites
• Almost 90% of Web attacks take place through
legitimate sites that have been compromised
• Over of 50% of popular search keywords have at least
one malicious link to a compromised site
• Communicate with clients behind NATs and firewalls
2
Honeypots
• A honeypot is a computer security mechanism set to
detect, deflect, or, counteract attempts to gain
unauthorized access to information systems.
• Client-based
- Detect malicious servers that attack clients
• Server-based
- Emulate vulnerable services/software and passively
wait for attackers
3
Heat-seeking Honeypots
1. Actively attrack attackers
2. Dynamically generate and deploy honeypot pages
3. Advertise honeypot pages to attackers via search
engines
4. Analyze the honeypot logs to identify attack patterns
4
Heat-seeking Honeypot Architecture
5
Attacker queries
How attackers find vulnerable Web servers:
1. Brute-force port scanning on the Internet
2. Make use of search engines
Identify malicious queries in the Bing log
E.g. “phpizabi v0.848b c1 hfp1”
6
Creation of Honeypot Pages
• Deployment:
- Search engines (Bing and Google )
- Top three results
- The crawler fetches the Web pages at these URLs
- Strip all Javascript content and rewrite all links of
the page to point to the local
Ex. http://path/to/honeypot/includes/joomla.php
• Install a few common Web applications
- Different VM for each app
7
Advertising Honeypot Pages
• Submit the URLs of the honeypot pages to the search
engines and wait for the crawlers to visit them
•
Increase the chance of honeypot pages (pagerank)
•
Add hidden links (not visible to regular users) pointing
to the honeypot pages on other public Web sites
8
Detecting Malicious Traffic
• Process the log (visitors) and automatically extract
attack traffic
• Identifying crawlers
- Well-known : Google’s crawler uses Mozilla/5.0
(compatible;Googlebot/2.1;+http://www.google.com/bot.html)
- Characterizing the behavior of known crawlers
- Identifying unknown crawlers
• Identify mallicious traffic
9
Identifying Crawlers 1/2
 Known crawlers
- Look at the user agent string and verify that the IP
address matches the organization
- A single search engine uses multiple IP addresses
to crawl different pages (AS)
- Most of crawlers can visit static links
- Only one crawler can visit dynamic links
10
Identifying Crawlers 2/2
 Unknown crawlers
- Other IPs Also grouped by AS numbers
- Similar behavior as the know crawlers
- Threshold: K = |P| / |C|
(P: fraction pages, C: crawlable pages)
11
Identifying Malicious Traffic
• Attackers do not target static pages
• Try to access non-existent or private files
• Whitelist: All the dynamic and static links, for each site
• Try to access links not contained in the whitelist
- Exact set of links present in the honeypots
- Files visited by well behaved crawlers (robots.txt)
12
Results
• Experiment duration: 3 months
• Place : Washington university CS personal home page
• 96 automatically generated honeypot web pages
• 4 manually installed Web application software
packages
• Received 54.477 visits from 6.438 different IPs
13
Distinguishing malicious visits
• Low PageRank
•
One crawler visitors links are
dynamic links in the software.
14
Crawler Visits
Bi-modal distribution
16 ASes crawling more than 75% of the hosted pages
18 ASes visiting less than 25% of the pages
15
Attacker Visits
Joomla
16
Attacker Visits
17
Geographic Locations & Discovery Time
18
Comparing Honeypots
1. Web Server
- No hostname
- No hyperlinks
2. Vulnerable Software
- Pages accessible on the Internet
- Search engines can find them
3. Heat-seeking Honeypot Pages
- Generated as simple static HTML pages
19
Comparison of the total number of visits
and the number of distinct IP addresses
20
Attack Types
21
Attack Types
22
Applying whitelists to the Internet
● Random set of 100 Web Servers whose HTTP access
logs are indexed by search engines
● A request is defined to be from an attacker
- If not present in the whitelist
 Link not accessed by a crawler
- Not present at all
 Request results in an HTTP 404 Error
23
Applying whitelists to the Internet
0.25
For 20% of the sites,
almost 90% of the
traffic came from
attackers
24
Conclusion
•
Heat-seeking Honeypots
○ Deploy honeypot pages corresponding to vulnerable
pages
○ Attract Attackers
•
Detect malicious IP addresses only through their Web
access patterns
•
False-negative rate of at most 1%.
25