Transcript Click-fraud

Detecting Fraudulent Clicks
From BotNets 2.0
Adam Barth
Joint work with Dan Boneh, Andrew Bortz, Collin Jackson,
John Mitchell, Weidong Shao, and Elizabeth Stinson
BotNets, Current and Future
Traditional BotNets
Permanent malware
• Infect host
– Email attachments
– Drive-by downloads
BotNets 2.0
Ephemeral
• Browser-based
– Malicious advertisements
– Popular web sites
Click-fraud, Spam,
DDoS, Key-logging
Click-fraud, Spam,
(maybe DDoS)
~100,000 members
Much larger
Browser Security Model
• Same-origin policy for network access
– Origin is scheme://host:port
• Write HTTP anywhere on the network
– Easy using HTML forms
– Except restricted ports, like 25 (SMTP)
• Read from origin only
– Can read some “library” formats from anywhere
• JavaScript, CSS, Images, Applets, etc
Desired Properties of Policy
• Can’t send spam
– Writes to port 25 blocked
• Can’t click advertisements
– Need to READ a token to make a click count
• Unfortunately…
DNS Rebinding Attacks
• Circumvent browser network access policy
• attacker.com points to attacker and target
<policy-file-request/>
<allow-access-from
domain="*"
to-ports="*"
/>
rebind
DNS
attacker’s
server
target
server
• Can read and write sockets to anywhere
An Experiment
• We ran a Flash ad (gains socket access)
– Paid $30
– 50,951 impressions from 44,924 unique IP addresses
• 90.6% of browser vulnerable
– More if we include other rebinding attacks
• $100 to hijack 100,000 IP addresses
– No click required
– Impressions are cheap
Duration of IP Hijacking
A Long Tail
• Some impressions last for days
Using Rebinding for Click-Fraud
• Enroll as a publisher with ad network A
– Publish pay-per-click ads on your site
• Enroll as a advertiser with ad network B
– Buy pay-per-impression Flash ads
• Buy bots for $0.001 each
– Use 99% just to generate impressions on your site
– Use 1% to generate ad clicks on $0.50/per-click ads
– Multiply your money by 5, repeat
Implications for Click-Fraud Defense
• Simulates IP distribution exactly
– Each bot an independent sample from web visitors
– Black-listing IPs as bot infested meaningless
• Traffic time-appropriate for IP
– Human at that IP actually surfing the web right now
• HTTP headers appropriate for IP
– Grab real headers from request for Flash ad
– Can’t get cookies, but many networks don’t use them
Distinguish Bots from Humans
• Bots cannot simulate human cognition
• Can’t use traditional CAPTCHAs
– Too disruptive to the user experience
– User has not interest in proving their humanity
• Click-fraud detection a different problem
– CAPTCHAs determine if this client a human
– We just need estimate the proportion of humans
A Straw-Man Design
• Humans click “Yes!”
• Bots click at random
• Ad network stats:
– 3487 Yes clicks
– 1271 No clicks
• How many bots?
– Expectation: 2542
– High probably bound an
exercise for the reader
A Real Advertisement
• Where will humans
click?
• Bots cannot simulate
• Can’t trick humans
into clicking
– Actually need process ad
Image Recognition Doesn’t Help
• Suppose the bot can identify the hot spots
– Say by segmenting the image using vision techniques
• In what ratio should the bot click?
– Depends on the relative appeal of the hot spots
– Requires human-level AI to get right
• Any error a signal of bot proportion
Fraudster Has to Click on Many Ads
Ad Network can Measure Humans
• At first, run ads on trusted partners
– Record distribution of human click location
– Easy to record (x, y) coordinates of click on web
• Cheap for ad network
– Was going to run ad anyway
• Expensive for attacker to influence
– Must use valuable bot clicks without payout
– Must be clicking everywhere all the time
A Work in Progress
• Need to validate diversity in distribution
– Will run real ads and measure click location
– How does distribution vary by screen location of ad?
• Experiment with ad design
– Objective: human click location hard for bot to predict
• Text ads?
– Less area to click and less enticing visuals
– There still might be a valuable signal in click location
Conclusions
• BotNets 2.0 are coming
– Cheap, large-scale, ephemeral bots in the browser
– Don’t require full-machine compromise
– Heuristic click-fraud detection’s days are numbered
• Click location can divide humans from bots
– Accurate simulation requires human cognition
– Easy for ad networks to deploy
– More science needed to determine effectiveness
Thanks!