Transcript Slides

Defending against large-scale crawls in
online social networks
Mainack Mondal†
Peter Druschel†
†MPI-SWS
Bimal Viswanath†
Krishna Gummadi†
‡Northeastern
Allen Clement†
Alan Mislove‡
University
CoNEXT, December 2012
Ansley Post†*
*Now at Google
Lots of personal data on Online Social Networks (OSNs)
CoNEXT, December 2012
2
What is the concern with aggregation of this large data?
 Aggregators can mine this large data
 To infer attributes missing in the data, e.g. sexual orientation
 Aggregators can republish this data in easily accessible form
 Neither user nor OSN has control over usage of crawled data
 Problem for OSN operators


In 2010, 171 M Facebook user’s data
User data is valuable asset to OSN operatorspublished in BitTorrent
OSN operators are blamed for misuse of user data [NYTimes ’10]
OSNs need to limit large-scale aggregation of user data
CoNEXT, December 2012
3
Challenge
 We are defending against a crawler who
 Wants to crawl as many accounts as possible
 Wants to crawl as fast as possible
 Our goal is
 Limit the rate of crawling
 Make the crawlers as slow as possible
CoNEXT, December 2012
4
Existing solution: Simple rate-limiting
 OSNs rate-limit on per-account or per IP address basis
 Crawlers can defeat rate-limit using multiple accounts
Or, the
crawlers
usemultiple
compromised
accountsor Sybils
The
crawlers
cancan
create
fake accounts
CoNEXT, December 2012
5
Our solution: Genie
 Assumption: Social links to good users are harder to get
than accounts
 Replace user-account-based rate-limiting with link-based
rate-limiting
CoNEXT, December 2012
6
Outline
 Background and key idea
 Genie design



Credit networks
How to use credit networks to defend against crawlers
Using difference between user and crawler activity
 Genie evaluation
CoNEXT, December 2012
7
Credit Networks [EC ‘11]
 Nodes trust each other by providing pair-wise credit
 Credit is used to pay for the services received
1
2
A
4
5
CoNEXT, December 2012
B
8
Credit Networks [EC ‘11]
 Nodes trust each other by providing pair-wise credit
 Credit is used to pay the services received
A
2
3
5
6
2
3
4
3
C
B
To obtain a service, find path(s) with sufficient credits
CoNEXT, December 2012
9
How can we map OSN to credit networks ?
 OSN operator forms credit network from the social network
 Operator replenishes credit on each link at a fixed rate
 Credit deducted from links to view another user’s profile
A
2
3
5
6
3
4
2
3
2
3
C
3
CoNEXT, December 2012
D
4
B
10
How do credit network defend against crawlers?
Rest of the
Network
(normal users)
Sybil accounts
Compromised accounts
Attack cut is small
Attack cut may be larger
(SybilRank, NSDI 2012)
Amount of crawling is proportional to attack cut
CoNEXT, December 2012
11
1
Difference between normal users and crawlers
 Reciprocity in profile views
 Normal users are more reciprocal than crawlers
 Repeated profile views
 Normal users repeatedly visit the same set of profiles
 Locality of views
CoNEXT, December 2012
12
Difference in locality between normal users and crawlers
 Renren graph and user browsing trace [IMC ‘10]

100
90
80
70
% of60
views
50
40
30
20
10
0
33 K users, 96 K activities (2 weeks)
normal user activity
crawler activity
1-hop
2-hop
> 2-hop
Flickr: Mislove et al. [WOSN ‘08]
Most of the normal views are local
Orkut: Cha et al. [IMC ‘09]
CoNEXT, December 2012
13
Genie design principles
 Use a credit network to rate limit links
 Exploit difference between normal and crawler activity to
discriminate crawlers

Charge more for views further away
CoNEXT, December 2012
14
Genie design
 New charging model: Pay more to view profiles far away
Credit charged per link = Shortest path distance between two nodes -1
A
1
3 -2
4
6-2
2
4- 2
2+2
4
2+2
4
3+ 2
5
C
D
B
Rate of crawling decreases with increased path length
CoNEXT, December 2012
15
Outline
 Background and key idea
 Genie design



Credit networks
How to use credit networks to defend against crawlers
Using difference between user and crawler activity
 Genie evaluation
CoNEXT, December 2012
16
Genie evaluation
 Does Genie limit attackers while allowing normal users?
 The parameter to tweak: Credit replenishment rate per link
 Replenishment rate too high: Crawlers will be allowed
 Replenishment rate too low: Users will be heavily penalized
CoNEXT, December 2012
17
Experimental setup
 Genie simulator written in C++
 Input: social graph and user activity trace
 Output: allowed/flagged for each activity
 Normal user activity trace from Renren
 Generated multiple synthetic traces for other graphs
 We model a strong and efficient crawler
 Crawler controls compromised user accounts
 Each good user profile is crawled once
 Crawlers try to crawl as many profiles as possible
CoNEXT, December 2012
18
Does Genie limit crawlers?
% of users
crawled
per week
Only 2.7% of the network is crawled in 1 week
Credits/week per link
The crawlers are slowed down ~3000 times
CoNEXT, December 2012
19
Does Genie penalize good users?
% of user
activity
flagged
2.6% of total activities from 0.8 %users flagged
Credit/week per link
CoNEXT, December 2012
20
Does Genie penalize good users?
10
8
% of user
activity
flagged
Trade-off point
6 % of users
crawled
4 per week
2
0
Credit/week per link
CoNEXT, December 2012
21
Who are these flagged users?
 3 Users with very high number of random profile views
 Shows crawler like behavior
 70% of the flagged activity are by these users
 Users with normal # of profile views but very few friends
 99% of flagged users have less than 5 friends
 Adding 4 more friends unflags 97% of these users
CoNEXT, December 2012
22
Efficiency of Genie
 In our Genie simulator
 To scale up Genie we used Canal library [EuroSys ’12]
 Multithreaded implementation
 Used a 24-core, 48 GB physical memory machine for evaluation
 For a million node social graph
 Memory overhead 5 GB
 Each view request processed in 0.65 ms on average
CoNEXT, December 2012
23
Summary
 We propose rate-limiting links to defend against crawlers
 We strengthen our defense using difference between
normal user and crawler activities
 We evaluated Genie on real world user activity trace
CoNEXT, December 2012
24
Thank you
CoNEXT, December 2012
25