Transcript Document

Defending Sybil Attack
in Peer2Peer Networks
Distributed Search Techniques
Md. Tanvir Al Amin 04 09 05 2064
Shah Md. Rifat Ahsan 10 09 05 2060
Adviser : Dr. Reaz Ahmed
Sybil Attack

A fundamental problem in
distributed systems.
honest
malicious

Single user assumes many
fake/sybil identities
 Already observed in real-world
p2p systems

launch
sybil
attack
Sybil identities can become a
large fraction of all identities
 “Out-vote” honest users in
collaborative tasks
2
Sybil attack




Present in both Application level and P2P Networking
Attacker creates many fake/sybil identities
Many cases of real world attacks : Digg, Youtube
Several research works shown how easy it was to
subvert DHT like Chord or Kademlia using Sybil
Attack
Automated sybil attack on
Youtube for $147!
Defending against Sybil attacks

Traditional solutions rely on central trusted
authorities
 Runs counter to open membership policies of OSNs

Recent proposals leverage social networks
 Lots of research activity recently
 Each optimized under assumptions about the graph
structure
 Each evaluated on different datasets
All schemes analyze the graph
structure to isolate Sybils
SybilGuard [SIGCOMM’06]
SybilLimit [Oakland’08]
Ostra [NSDI’08]
SumUp [NSDI’09]
SybilInfer [NDSS’09]
Whanau [NSDI’10]
MobID [INFOCOM’10]
Defending against Sybil attacks

Recent proposals leverage social networks





Key Insight: Social links are hard to acquire in abundance
Look for small cuts in the graph
Conversely, look for communities around known trusted nodes
Dunbar’s Number
Power law node degrees
Links difficult to create
HOW DO SOCIAL NETWORKS LOOK LIKE
SybilGuard: Defending Against Sybil Attacks
via Social Networks
Sybilguard is a system for detecting Sybil nodes in social graphs.
Features of Sybil Guard
 SybilGuard enables an honest node to identify other nodes
 Verifier node V can verify if suspect node S is malicious
 Guaranteed bound on number of sybil groups
 Guaranteed bound on size of sybil groups
 Completely decentralize
Key Insight:
1. Use a social network to limit Sybils
2.Social links are hard to acquire in abundance
3.Look for small cuts in the graph
DBLP Network
Dunbar’s number

Limits the # of stable social relationships a user can have

To less than a couple of hundred
Linked to size of neo-cortex region of the brain
Observed throughout history since hunter-gatherer societies

Roughly reported to be



150
Also observed repeatedly in studies of OSN user activity


Users might have a large number of contacts
But, regularly interact with less than a couple of hundred of them
Power-law node degrees
U.S. highways
U.S. Airlines
9
Path lengths and diameter


all major networks have short path length from 4.25
– 5.88
six degrees of separation
Facebook, 4.2
million for Octorber
2007, 6.12 from
http://blog.paulwalk.net/2007/10/
08/no-degrees-of-separation/
10
Implications of Path lengths and diameter
The small diameter and path lengths of social
networks are likely to impact the design of techniques
for finding paths in such networks
11
Link degree correlations




high-degree nodes tend to connect to other high-degree nodes ? OR
high-degree nodes tend to connect to low-degree nodes ?
In real society: the former theory is true.
By virtue of two metrics: the scale-free metric and the assortativity.

Suggests that there exists a tightly-connected “core” of the
high-degree nodes which connect to each other, with the
lower-degree nodes on the fringes of the network.

The next question: How big the core is
12
Implications of Link degree correlations
Spread of Information
“A Measurement-driven Analysis of Information Propagation in the Flickr
Social Network” [WWW’ 09]
13
Densely connected core

the graphs have a densely connected core comprising of
between 1% and 10% of the highest degree nodes such that
removing this core completely disconnects the graph.
Sub logarithmic growth
14
Densely connected core

the graphs have a densely connected core comprising of
between 1% and 10% of the highest degree nodes such that
removing this core completely disconnects the graph.
Sub logarithmic growth
15
Implications of densely connected core

Network contains dense core of users





Core necessary for connectivity of 90% of users
Most short paths pass through core
Could be used for quickly disseminating information
So 10% at core
What about remaining nodes (90% at fringe)
16
What does the structure look like
the networks contain a
densely connected core of
high-degree nodes;
octopus
and that this core links
small groups of strongly
clustered, low-degree
nodes at the fringes of
the network.
Mixing time



Random walk: choose each hop randomly
Mixing time: #hops until uniform probability
Fast mixing network: mixing time = O(log n)
Sampling by random walks

A random walk has o(1) chance of escaping*
 True when g bounded by o(n/log n)
 Of r walks, (1-o(1))r = Ω(r) end nodes are good!
 Can’t distinguish good from bad nodes in set
Honest
region
Sybil
region
non-escaping path
escaping
paths
Creating Social Link Is Hard
Social links maintained over Internet
Social network
…
Social network
Honest
region
Attack edges
A malicious user fools an honest user
Creates an attack edge
Sybil
region
…
Sybil resilience & group attachment theory




Sybil schemes find bond groups around a trusted node
But, these are only a fraction of all honest nodes
Bond groups are hard for Sybils to infiltrate
Not the case with identity groups
Yu, Kaminsky, Gibbons, Flaxman, Sigcomm 2006
SYBILGUARD
Problem Formulation and Objective

Social network
» n honest human users
» 1+ malicious users : multiple sybil identities
» SybilGuard enables an honest node to identify other
nodes
» Verifier node V can verify if suspect node S is malicious
SybilGuard

Guaranteed bound on number of sybil groups
»
»

Guaranteed bound on size of sybil groups
»

Divides n nodes into m equivalence classes
A group is sybil if it contains 1+ sybil nodes
In a group, at most w sybil nodes
Completely decentralized
»
»
»
An honest node accepts honest nodes with high probability
Rejects malicious nodes with high probability
Accepts bounded number of sybil nodes
Random Routes



Foundation of SybilGuard: different from random walk
Random route begins at a random edge of a node
At every node
» For an incoming edge i, there is a unique outgoing edge j
» Thus, input to output is one-to-one mapped


A node A with d neighbors uniformly randomly
chooses a permutation “x1,x2, . . . ,xd” among all
permutations of 1,2, . . . ,d.
If a random route comes from the ith edge, A uses
edge xi as the next hop.
SybilGuard Algorithm

Attack Model



node A: verify node B






n honest users: One identity/node each
Malicious users: Multiple identities each (sybil nodes)
A computes d random routes (length w)
B computes d random routes (length w)
If d/2 random routes intersects, accept S
Else reject S
If few attack edges, then a sybil node’s random route is less likely to
reach honest region
And vice-versa
Main Assumptions of SybilGuard
Attack
edges
Honest
Nodes
Sybil
Nodes
Properties of Random Routes

Convergence
» Once two routes merge, they will remain merged




Routes are back-traceable
There can be only one route with length w that
traverses e along the given direction at its ith hop
If two random routes ever share an edge in the same
direction, then one of them must start in the middle
of the other
Cycles can exist, but with low probability
» Prob. (diameter k cycle) = 1/d(k-2)
Sybilguard Algorithm
B
Steps: 2
Step
1:
Choose a verifier (A)
and a suspect
Bootstrap
the(B).
network.
A and B send out
random
of a
All
userswalks
exchange
certain length
signed
keys. (2).
Lookexchange
for intersections.
Key
implies
that both parties are
A knows B is not a Sybil
human
and
because multiple paths
trustworthy.
intersect and they do so
at different nodes.
A
32
SybilGuard Algorithm, cont.
A
B
33
SybilGuard Caveats


Bootstrapping requires human interaction.
Assumes short random walks lie mostly in the honest
region
 Results in poor threshold to colluding attackers.
 In a million node network ,each attack edge
accepts nearly 2000 sybil nodes.
 In million node network , SybilGuard cannot
bound the number of sybils at all if there are >
15,000 attack edges .
SybilLimit
A Near-Optimal Social Network Defense Against Sybil Attacks
SybilLimit
A Near-Optimal Social Network Defense Against Sybil Attacks



Motivation : To mitigate the problems of SybilGuard.
Basic insight : Social network (same as SybilGuard)
SybilLimit Novelity :
1. use many random routes but shorter ones.
2. intersect edges not nodes
3. limit how often each edge is used.
Identity Registration

Each node (honest or sybil) has a locally generated public/private key pair
 “Identity”: V accepts S means
V accepts S’s public key KS
 NO assumption/need PKI

Every suspect S “registers” KS on some other nodes
Registration Goals

Ensure that sybil nodes (collectively)
register only on limited number of
honest nodes

Still provide enough “registration
opportunities” for honest nodes
K: registered keys of
sybil nodes
K: registered keys of
honest nodes
K
K
K
K
K
K
K
K
K
K
K
K
K
K
honest region
K
K
sybil region
Acceptance Criteria

Accept S only if KS is register on
K: registered keys of
sybil nodes
K: registered keys of
honest nodes
sufficiently many honest nodes
K
K
K
K
K
K
K
K
K
K
K
K
K
K
honest region
K
K
sybil region
Key Idea

Take random “walks” of w=  (logn)
hops
 Honest nodes: likely to remain in honest region*
 Sybil nodes: must cross an attack edge to reach honest region
K
K
K
• Register key at
last hop of “walk”
K
K
K
K
K
K
K
K
K
K
K
K
K
honest region sybil region
Verification Procedure
AB
1. request S’s set of tails
2. I have three tails
AB; CD; EF
V
S
3.common tail: EF
4. Is KS registered?
5. Yes.
V accepts S
EF
F
CD
4 messages involved
Tails intersect + key registered
Sybil nodes accepted
g Attack edges

g  O n / log n


g between  n / log n
and On / log n
g  n / log n

SybilGuard
SybilLimit
 ( g  n log n)
 ( g  log n)
unbounded
 ( g  log n)
unbounded
unbounded
SybilInfer: How to Win
the Zombie Wars!
Prateek Mittal, George Danezis
(MSRC Intern) (MSR
Cambridge)
SybilInfer




Work from UIUC and Microsoft Research
A centralized algorithm
Uses the fast mixing properties of social network to
design a Bayesian Classifier
Classify nodes
Formal Model

Assign probabilities of cuts being honest
P( X  Honest| T )

Using Bayes Theorem, we have that :
P ( X  Honest | T ) 
P(T |X  Honest ) P ( X  Honest )
Z
Z  X V P(T | X  Honest)  P( X  Honest)

Next Challenge: Model
P(T | X  Honest)
Formal Model
probX X
probXX 
1
 EXX
|V |
probX X 
1
 EX X
|V |
P(T | X  honest) 
probX X
X
probXX
X
probX X
 prob N  prob N  prob N  prob N
XX
XX
XX
XX
XX
XX
XX
XX
SYBIL PROOF DHT
Distributed Hash Table


Interface: PUT(key, value), GET(key)→value
Route to peer responsible for key
GET( sip://alice@foo )
PUT( sip://alice@foo, 18.26.4.9 )
DHTs are subject to the Sybil attack


Attacker creates many
pseudonyms
Disrupts routing or
stabilization
s
t
{IDt}
The Sybil attack on open DHTs
Brute-force attack
Clustering attack
Sybil Proof DHT

How to build a sybil resilient DHT ?
Works from MIT PDOS Group


Parallel and Distributed Operating Systems
Quest to build Sybil Proof DHT
 Sybil-resistant DHT routing 2005
A Sybil proof One hop DHT SocialNets 2008
 Whanau NSDI 2010

A Sybil proof one hop DHT
Motivation:
 SybilGuard/SybilLimit:Not a DHT, but a “general” Sybil defense
 Honest node accepts at most O(g log n) Sybils
Features :
 DHTs are subject to the Sybil attack
 Social networks provide useful information
 Created a Sybil-resistant one-hop DHT
 Resistant to g = o(n/log n) attack edges
 Table sizes and routing BW O(√n log n)
 Uses O(1) messages to route
Basic one-hop DHT design


Construct finger table by r random walks
Route to t by asking all fingers about t
 If r = Ω(√n log n), some finger knows t WHP

Adversary cannot interfere with routing
s
{t’s IP address}
{know t?}
{know t?}
{t?}
{t?} {t?}
{t?}
{t?}
{forwarded message from s}
t
r
r
Properties of this solution




Finger table size: r = O( nlog n)
Bandwidth to construct: O(r log n) bits
Bandwidth to query: O(r) messages
Probability of failure: 1/poly(n)
Chris Lesniewski-Laas M. Frans Kaashoek NSDI 2010
WHĀNAU: A SYBIL-PROOF
DISTRIBUTED HASH TABLE
Contribution

Whānau: an efficient Sybil-proof DHT protocol
 GET cost: O(1) messages, one RTT latency
 Cost to build routing tables: O(√N log N)
storage/bandwidth per node (for N keys)
 Oblivious to number of Sybils!



Proof of correctness
PlanetLab implementation
Large-scale simulations vs. powerful attack
Social network
Honest
region
Attack edges
Sybil
regio
n
…
Random walks
c.f. SybilLimit [Yu et al 2008]
Building tables using random walks
c.f. SybilLimit [Yu et al 2008]
What have we accomplished?
• Small fraction (e.g. < 50%) of
bad nodes in routing tables
• Bad fraction is independent
of number of Sybil nodes
key
value
PUT(key, value)
PUT Queue
key
SETUP
Social
Network
LOOKUP
Routing Tables
value
Routing table structure



O(√n) fingers and O(√n) keys stored per node
Fingers have random IDs, cover all keys WHP
Lookup: query closest finger to target key
Zyzzyva
Finger tables:
(ID, address)
Aardvark
Key tables:
Kelvin (key,value)
Keynes
From social network to routing tables


Finger table: randomly sample O(√n) nodes
Most samples are honest
ID
IP address
Honest nodes pick IDs uniformly
A
B
Z
C
Y
D
X
E
W
F
V
U
G
T
H
I
S
J
R
K
Q
L
P
O
N
M
Plenty of fingers near key
Sybil ID clustering attack
A
B
Z
C
Y
D
X
E
W
F
V
U
G
T
H
I
S
J
R
K
Q
Many bad fingers near key
L
P
O
N
M
[Hypothetical scenario: 50% Sybil IDs, 50% honest IDs]
Honest layered IDs mimic Sybil IDs
Layer 0
Layer 1
A
A
D
X
C
Y
C
Y
B
Z
B
Z
E
W
D
X
F
V
E
W
F
V
U
G
U
G
T
H
T
H
I
S
J
R
K
Q
L
P
O
N
M
I
S
J
R
K
Q
L
P
O
N
M
Every range is balanced in some layer
Layer 0
Layer 1
A
A
C
Y
B
Z
B
Z
D
X
C
Y
E
W
D
X
F
V
E
W
F
V
U
G
U
G
T
H
T
H
I
S
J
R
K
Q
L
P
O
N
M
I
S
J
R
K
Q
L
P
O
N
M
Two layers is not quite enough
Layer 0
Layer 1
A
A
B
Z
C
Y
S
R
Q
L
P
O
N
M
I
J
K
E
W
F
V
T
D
X
E
W
Ratio =
1 honest
:
10
Sybils
C
Y
D
X
U
B
Z
F
V
G
U
H
T
Ratio =
10 honest
:
100
Sybils
S
R
Q
L
P
O
N
M
G
H
I
J
K
Log n parallel layers is enough
Layer 0
X
Y Z
A
B C
D
E
F
G
H
I
J
W
V
U
T
S
R
Q
Layer 1
P O


L
N M
K
W
V
U
T
S
R
X
Y Z
A
B C
Layer 2
D
E
F
G
H
I
Q
P O
L
N M
K
J
W
V
U
T
S
R
X
Y Z
A
B C
Layer L
D
E
F
G
H
I
Q
P O
L
N M
K
J
log n layered IDs for each node
Lookup steps:
1.
2.
3.
Pick a random layer
Pick a finger to query
GOTO 1 until success or timeout
…
W
V
U
T
S
R
X
Y Z
A
B C
D
E
F
G
H
I
Q
P O
L
N M
K
J
From Social relations to Routing Tables
key
value
PUT(key, value)
PUT
Queue
key
SETUP
Social Network
LOOKUP
Routing Tables
value
Problems



Whanau’s goal is to create a Sybil proof DHT
Which ensures delivery
Whanau uses the idea of random walk in fast mixing
graphs
 Whanau has changed the basic structure of DHT
 Tables contain O(√n log n) entries !!
 The DHT has become a one hop DHT
 But O(√n) entries are insane !!
 Think of a DHT with 100000000 users

How to handle churn ??
OUR IDEA OF A SYBIL PROOF DHT
Our Idea



We are given a social graph
Each node knows about their friends in the social
graph
Same assumptions about SybilGuard or Whanau
 Fast mixing graphs
 Small cut around attack edges
 o(n/log n) attack edges at most
Our Motivation for DHT




Isn’t it possible to keep the basic routing features of
a DHT while making it sybil resilient?
O(log n) table size
Lookup should take O(log n)
We should use social information to build the DHT
Bootstrapping the DHT

Here comes the fundamental question
 How to convert a given social graph into a DHT
 So that the socially connected nodes are near
 Socially far nodes are far in the DHT
 Sybil nodes require significant amount of social
engineering to be strongly connected members of a social
group
A new type of DHT

We want to build a DHT
 Where distance between two nodes in the DHT-Space is
related to their social-distance
 i.e, two friends in the social graph are expected to be onehop distant in the DHT-Space
 Most of the queries will be through friends
 Hence, the probability of reaching a Sybil node is less
 We use the idea of


Plexus
A novel DHT routing based on linear block codes
Plexus: A Scalable Peer-to-Peer Protocol Enabling Efficient Subset
Search : Reaz Ahmed and Raouf Boutaba

ACM/ IEEE TON Feb 2009
Plexus: Index Clustering
Cluster
C = set of cluster heads
Pattern
Linear code, C <n,k,d>
Cluster head
Cluster head  Codeword
Generator matrix based routing
Advertisement, P
advSet(P)  C
Query, Q
qSet(Q)  C
Q  P  qSetQ  advSetP  
85/15
 g47
111 0 g120
1  g
 g  g
2
 23
 021 1 g 220
    
G
 15 0 0 1
   
0gE
  0 0 g k02
k  g k 1

0 1 1 g11n 
  g 2 n 
0 0 1 1
   
0 1 0 1
   

1
1
1
0
  g kn 
<7, 4, 3> Hamming code
Linear Binary Code

C = <n, k, d> linear binary code
 n: number of bits in a codeword
 k: dimension  2k codeword in code
 d: minimum distance between any pair of codeword
G
 e.g., 24=24, 12, 8


Generator Matrix G,
 g1   g11 g12
g  g
 2   21 g 22
G      
  
   
 g k   g k1 g k 2
  g1n 
  g 2 n 
   

   
  g kn 
2k codeword can be formed by applying XOR to
any combination of these k codewords.
86/15
Plexus: Routing Table


In a complete network each peer is responsible for a
codeword
Peer with codeword X maintains links to k+1 peers
with IDs computed as:
X  gi
1 i  k
 Xk+1 = X  g1  g2  …  gk
 Xi =

Xk+1 is used for:
 Replication
 Reducing routing cost
87/15
Plexus: Routing

Observation: C is closed under  operation
X , Y  C  Y  X  gi1  gi2  git
X21=X2g1
…
X23=X2g3
…
X2k=X2gk
X1=Xg1
X2=Xg2
…
Xk=Xgk
Example: Route from X to Y
where,
X231=X23g1
…
X235=X23g5
…
X23k=X23gk
Y  X  g 2  g3  g5
X2
g3
g2
X
g3
X2
g5
X23
g2
X3
g5
X25
X23
X235=Y
g5
g3
Y
X k 1  X  g1  g2  gk
g5
X5
g2
g3
X35
g2
88/15
Strengths of Plexus Routing

Hamming distance based clustering & indexing

Maximum routing hops (within a subnet)


89/15

½ K in normal condition

½ K +2 in presence of failure.
Disjoint routing paths

Source X destination Y

XY is disjoint from XYK+1
X
Y’
Y
Y’K+1
YK+1
Alternate routing paths

Suitable for Multicasting

Improved fault resilience

Improved load balancing
Y
X
YK+1
Social Network to Plexus


Now, the problem reduces to assigning appropriate
linear block codes to the nodes
How to do that ?
Naïve Idea


All nodes u know their friends F1(u). All nodes u send
F1(u) to all of their friends.
At this point, Every node u, in addition to F1(u), can
calculate its "mutual friend list" for each of its
friends. For any two friends u, v :
 Their mutual friend set is M1 (u, v)  F1 (u)  F1 (v)

Every node u, can also calculate F2(u), its exact twohop distant friend list.
F2 (u)  F1 ( F1 (u))  F1 (u)  {u}
Naïve Idea


Each node u, sorts their friends according to an
"influence metric.“ For each friend v of a node u,
Influence(u,v) = Influence of v on u = I (u, v)
| M 1 (u, v) |
I (u, v) 
| F1 (u ) |
it is highly probable that a sybil node will have very
low influence on an honest node via attack edge due
to very small number of mutual friends.
 However not only sybils, but also a common friend of two
groups will have low influence on both group (however,
this case is not handled in any algorithms)
Naïve Idea

Each node u, calculates I(u,v) and I(v,u) for all its
friends. There are 2*deg(u) such quantities.
 C(u) = Those nodes for which u has more influence on v
than v has on u
 P(u) = Those nodes for which v has more influence on u
than u has on v.
 and R(u) = Those nodes for which u and v both has same
influence on each other
C (u)  {v : v  F1 (u), I (v, u)  I (u, v)}
P(u)  {v : v  F1 (u), I (u, v)  I (v, u)}
R(u)  {v : v  F1 (u), I (u, v)  I (v, u)}
Naïve Idea

max{ C(u) } = x = The friend, on which u is maximum
influential.
 However, it doesn’t mean x doesn’t have a friend more
influential than u. It means, u does not have a friend on
which it has more influence than it has on x.

max { P(u) } = y = The friend which has the highest
influence on u.
 It also doesn’t mean y doesn’t have friends on which it has
more influence than it has on u.

max { R(u) } = z = The friends which has same
influence on u as u has on them.
Naïve Idea

lx = I(x,u) ; Iy = I(u,y) , Iz = I(u,z) = I(z,u)
MI = { Ix, Iy, Iz}
If Max { MI } = Ix : u is an “influencial” node
Iy : u is an “influenced” node
Iz : u is an “neutral” node
Naïve Idea

Action D : If u is “influenceD”, it decides not to
generate any ID, and decides to take command from
y. It sends a message to y that it has come into his
control.
Action L : If u is an influentiaL node, it decides to
generate ID for u and some of F1(u) and F2(u)
Action N : If u is Neutral, then decides Action L or
Action D by a uniform bernoulli trial.




Now, u generates ID for itself, for those of Gang(u).
It will try to keep friend IDs as close as possible, also
those of Gang(u) which are friends themselves will
get close ID as possible.
u will inform all of Gang(u) all the ids generated by it.
Members of Gang(u) will take care of id generation
of their neighbors


But how to handle collision ?
Some gossip protocols needed !!
Naïve Idea

Thus Id’s will be assigned in the code space
according to their “Social Groups”