All Your iFRAMEs Point to Us

Download Report

Transcript All Your iFRAMEs Point to Us

All Your iFRAMEs Point to Us
Niels Provos,
Panayiotis Mavrommatis
Moheeb Abu Rajab,
Fabian Monrose
Google Inc.
Johns Hopkins University
17th USENIX Security Symposium, August 2008
Speaker: Yi-Ning Chen
1
Outline
• Background
• Infrastructure and Methodology
• Result analysis
–
–
–
–
–
Prevalence of Drive-by Download
Malicious content injection
Drive-by download via ads
Malware distribution infrastructure
Post infection impact
• Conclusion
2
BACKGROUND
3
Web-based attack types
• Push-based
– Traditional scanning and exploiting attack
– can be blocked by firewalls, NATs……
• Pull-based
– Web-based malware infection
• Social engineering technique
• Drive-by download
• In this paper, we focus the problem space on
drive-by download attack.
4
Drive-by download
• Attackers inject content under their control into benign
websites.
• When user visits the website, it will automatically
download spyware, a computer virus or any kind of
malware without knowledge of the user.
• Landing pages (malicious URLs): URLs that initiate driveby download when users visit them
• Landing sites: grouped landing pages by domain name.
• Distribution site: remote site that hosts malicious
payloads.
5
Drive-by Download –Content inject
• Web server compromise
– Exploit web server via vulnerable scripting application
and inject new content to the compromised website
– Injected content: usually a hidden IFRAME contain a
link that redirects the visitor to a URL that hosts a
script crafted to exploit the browser.
• Third party contributed content
– Attacker can inject the exploit URL through the posting
function without compromising the web server.
6
Drive-by download –Exploit
• User visits a web site and trigger the automatic execution
of exploit code.
• Exploit instructs the browser to connect to a malware
distribution site to get malware executable(s).
• The executable automatically installed and start.
7
Drive-by download –Evade detection
• Use randomly seeded obfuscated JavaScript in the
exploit code
• Use many redirection steps before the browser
eventually contacts the malware distribution site.
8
INFRASTRUCTURE AND
METHODOLOGY
9
Methodology
• Pre-processing phase
– Input data: Google’s web repository
– Goal: identify URLs that trigger drive-by downloads
• Verification phase
– Input data: URLs from pre-processing phase
– Goal: whether a candidate URL is malicious
10
Pre-processing phase
• Scoring feature
– “Out of place” IFRAMEs
– Obfuscated JavaScript
– IRFAMEs to known distribution site
• Translate these features into a likelihood score.
• Employ five-fold cross validation to measure the quality
of the machine-learning framework.
• Use average ROC curve to estimate
FPR and TPR for different thresholds.
– FPR: 0.001
– TPR: 0.9
11
Verification phase (1/2)
• Develop a large-scale web-honeynet that
simultaneously runs a large number of Microsoft
Windows images.
• To inspect a candidate URL, the system
1. first loads a clean Windows image
2. automatically starts unpatched IE
3. runs the virtual machine for two minutes
12
Verification phase (2/2)
• Heuristics score candidate URLs based on
– # of created processes, # of observed registry changes, #
of file system changes
• A URL is
– Malicious if it meets the threshold and one of incoming
HTTP responses is marked as malicious by at least one AV
scanner.
– Suspicious if it meets the threshold but passes the AV
scanner.
13
Constructing the Malware Distribution
Network
• Distribution network is defined as the set of malware
delivery tree from all the landing sites that lead to a
particular distribution site.
• A malware delivery tree consists of the landing site (leaf),
all nodes the browser visits until it contacts the malware
distribution site (root)
• Extract Referrer header from recorded HTTP requests the
browser makes after visiting the landing site to construct
the delivery tree.
14
PREVALENCE OF DRIVE-BY
DOWNLOAD
15
Prevalence of Drive-by Download
• Based on data collected from Google over Jan 2007 –
Oct 2007
16
Malicious URL in Google search (1/2)
• Percentage of Google search queries that resulted in
at least one URL labeled as malicious
1.3%
17
Malicious URL in Google search (2/2)
• In the top one million URLs appear in the search engine
result, 6,000 URLs are verified as malicious.
• Top Rank of landing page → 1,588
• Geographic locality -- Top 5 hosting countries
• In China, 96% of the landing sites point to distribution sites
that are also hosted in China.
18
Impact of browsing habits
• Random sample of about 7.2 millions URLs
• Use DMOZ to categorize URLs → 3.6 million URLs
• Malicious websites are present in all website categories.
19
Web server software
• Collect all the “Server” and “X-Power-By” header
token from landing pages
• The results reflect the weak security practices
applied by the web site administrators.
• Running unpatched software increase the risk of
control via server exploitation.
20
DRIVE-BY DOWNLOAD VIA ADS
21
Drive-by download via Ads (1/2)
• Even the web page itself does not contain any
exploits, insecure Ad content poses risk to
advertising web sites.
• Adversaries could inject content to websites without
having to compromising any web servers.
• For each malware delivery tree, if any intermediary
node is in one of the 2,000 well known advertising
networks, the landing site is infectious via Ads.
22
Drive-by download via Ads (2/2)
• 2% of the unique landing sites were delivering malware via
ads.
• But counting the number of ads’ appearance, the percentage
is 12%.
Quick and short effect
23
Redirection steps for Ads
• CDF of the # of redirection steps for Ads that successfully
delivered malware.
• Malware delivered via Ads exhibits longer delivery chains, in
50% of all case.
24
Ad network’s position in delivery tree
• Choosing the 5 Ad network that appear in 75% of all malware
delivery tree.
• The deeper a network‘s relative position, the closer it is
related to the malware distribution site.
25
MALWARE DISTRIBUTION
INFRASTRUCTURE
26
Size of malware distribution network
• Two main types of malware distribution networks
– Networks that use only one landing site → 45%
– Networks that have multiple landing sites → 21,000 landing sites
Use only a single landing site
to avoid detection
27
IP distribution of malware distribution
server
• 50% of the landing sites fell in above ranges.
28
AS location of Malware distribution
sites
• Malware distribution sites’ IP addresses fall into only
500/2,517 ASes.
• 95% of these sites map to only 210 ASes.
29
Unique binaries downloaded
• 42% of the distribution sites delivered a single malware
binary.
30
Overlapping landing sites (1/2)
• Many landing sites are shared among multiple distribution
networks.
• Assume a distribution network i with a set of landing sites Xi.
• The normalized pair-wise intersection of the two networks, Ci,j
is calculate as,
• 80% of the distribution networks share at least one landing
page.
31
Overlapping landing sites (2/2)
32
Content replication across malware
distribution sites
• Using the normalized pair-wise intersection function
mentioned.
• In 25% of the malware distribution sites, at least one binary
is shared between a pair of sites.
33
POST INFECTION IMPACT
34
Download executables
• The average # of downloaded executables after
visiting a malicious URL is 8.
35
Processes started by executables
36
Registry changes
• Browser Helper Object: access privileged state of the browser.
• Preferences: change home page, default search engine or
name server.
• Security: change firewall settings or disable automatic
software updates.
• Startup: persist across reboots.
37
Network activity
• HTTP connections originating from the browser are
omitted.
Due to “downloader” binaries
that fetch, in some case, up to
60 binaries over HTTP.
Adding the compromise
machine to an IRC botnet.
38
Anti-Virus Engine Detection Rates (1/2)
• Visiting the URL caused the creation of at least one new
process on VM → suspicious
• Subject each binary for each of the AV scanner.
Detection Rate =
# of detected samples total
# of suspicious samples
39
Anti-Virus Engine Detection Rates (2/2)
40
False Positives
• Assume all suspicious binaries will eventually be
discovered by the AV vendors.
• Re-scan all undetected binaries two months later
using latest virus definition.
• All undetected binaries from rescanning step are
considered false positives. (FPR < 10%)
• Use a white-list to exclude popular installers
exhibiting behavior similar to that of drive-by
downloads.
41
Conclusion
• Malicious URLs that initiate drive-by download are
spread far and wide.
• 1.3% of search queries to Google’s search engine
return at least one link to a malicious site.
• Syndication relations exist in Ad network are being
abused to deliver malware through Ads.
• Anti-virus engines are lacking in their ability to
protect against drive-by download.
42
Comment
• Accuracy?
– Verification phase: using anti-virus engines to
verify
– Using final result to judge anti-virus engines
43