Privacy-Preserving Social Network Publication Against Friendship
Download
Report
Transcript Privacy-Preserving Social Network Publication Against Friendship
PRIVACY-PRESERVING
RELEASES OF SOCIAL
NETWORKS
Chih-Hua Tai
Dept. of CSIE, National Taipei University,
New Taipei City, Taiwan
DATA MINING
The primary task in data mining: development of
models about aggregated data.
Finding frequent patterns
Finding rules
2
FINDING PATTERNS
3
FINDING RULES
4
DATA MINING VS. PRIVACY
The primary task in data mining: development of
models about aggregated data.
Finding frequent patterns
Finding rules
Can we develop accurate models without access to
precise information in individual data records?
Why?
5
PRIVACY ISSUES IN DATA
6
PRIVACY ISSUES IN DATA
7
PRIVACY ISSUES IN DATA
8
DATA MINING AND PRIVACY
The primary task in data mining: development of
models about aggregated data.
Can we develop accurate models without access to
precise information in individual data records?
Answer: yes, by randomization.
R. Agrawal, R. Srikant “Privacy Preserving Data Mining,”
SIGMOD 2000
How about the data utility?
9
DATA MINING VS. SOCIAL NETWORKS
Attributes:
Name, Salary, …
Links:
Friends, Neighborhood, …
Communities:
Interests, Activities, …
10
PRIVACY ISSUES ON SOCIAL NETWORKS
Personal information leaked, even if the vertex
identifies are hidden…
Many information can be
used to re-associate the
vertex with its identity.
Vertex degree:
k-degree anonymity , …
Neighborhood configuration:
k-neighborhood anonymity,
k-automorphism anonymity,
k-isomorphism anonymity,
grouping-and- collapsing, …
11
C.-H. Tai, P. S. Yu, D.-N. Yang and M.-S. Chen, "Privacypreserving Social Network Publication Against Friendship
Attacks," In KDD, 2011.
13
FRIENDSHIP ATTACK
Still there are another type of information for
vertex re-identification – friendship attack
14
FRIENDSHIP ATTACK
Given a target individual A and the degree pair
information D2 = (d1,d2), a friendship attack (D2,A)
exploits D2 to identify a vertex v1 corresponding
to A in a published social network where v1
connects to another vertex v2 with the degree
pair (dv1,dv2) = (d1,d2).
1
2
4
6
Alice
3
5
7
9
8
10
Ex 1. Assume that an
attacker knows that Alice
has 3 connections, Bob has 2
connections, and Alice and
Bob are friends. The attacker
identifies v9 as Alice with
100% confidence.
15
FRIENDSHIP ATTACK
In DBLP data set, the percentages of vertices that can be
re-identified with a probability larger than 1/k by degree
and friendship attacks.
Original
Social Network
k-degree anonymized
Social Network
k
Degree
Attack
Friendship Attack
Friendship Attack
5
0.28%
5.37%
2.89%
10
0.53%
10.69%
4.65%
15
0.73%
14.71%
5.82%
20
0.93%
18.44%
7.23%
16
NEW PRIVACY MODEL AGAINST
FRIENDSHIP ATTACK
k2-degree Anonymity
A social network is k2-degree anonymous if, for
every vertex with an incident edge of degree
pair (d1,d2), there exist at least k − 1 other
vertices, each of which also has an incident
edge of the same degree pair.
1
2
4
6
3
5
7
9
8
10
Ex 2. Even with the knowledge
(D2,A)=((3,2),Alice), the probability
that an attacker can re-identify
Alice in the 22-degree anonymous
social network is limited to ½.
17
THE ANONYMIZATION
Problem formulation:
Given a graph G(V, E) and an integer k, the problem is to
anonymize G to satisfy k2-degree anonymity such that
information distortion is minimized.
The challenges:
Any alteration on an edge will affect the degrees of two
vertices.
18
GRAPH ANONYMIZATION ALGORITHMS
Integer Programming formulation
Obtain the optimal solution with bad scalability
DEgree SEqence ANonymization (DESEAN)
Step1. Degree Sequence Anonymization.
- determine the groups of vertices protecting each
others
Step2. Privacy Constraint Satisfaction.
- eliminate the advantage of knowing friendship
information
Step3. Anonymous Degree Realization.
- have the vertices in the same group share the
same vertex degree
19
ALGORITHM DESEAN
Step1. Degree Sequence Anonymization.
Cluster vertices with similar degrees and select a
target degree dx for each cluster x s. t. each cluster
contains at least k vertices and the weighted degree
difference ω Σvϵx (dx - d v) + (1 − ω) Σvϵx (d v - dx) is as
small as possible.
Ex 3. Given k = 2 and ω = 0.5.
1
2
4
6
3
5
7
9
8
10
20
ALGORITHM DESEAN
Step1. Degree Sequence Anonymization.
Cluster vertices with similar degrees and select a
target degree dx for each cluster x s. t. each cluster
contains at least k vertices and the weighted degree
difference ω Σvϵx (dx - d v) + (1 − ω) Σvϵx (d v - dx) is as
small as possible.
Ex 3. Given k = 2 and ω = 0.5.
1
2
4
6
3
8
10
2
4
5
7
9
1
6
3
5
7
8
21
9
10
ALGORITHM DESEAN
Step2. Privacy Constraint Satisfaction.
Add or delete edges between clusters to ensure that,
for each pair of clusters (x,y), the number of vertices
in x directly connected to the vertices in y is either
zero or not less than k.
Ex 3. Given k = 2 and ω = 0.5.
1
2
4
6
3
5
7
8
22
9
10
ALGORITHM DESEAN
Step2. Privacy Constraint Satisfaction.
Add or delete edges between clusters to ensure that,
for each pair of clusters (x,y), the number of vertices
in x directly connected to the vertices in y is either
zero or not less than k.
Ex 3. Given k = 2 and ω = 0.5.
1
2
4
6
3
1
4
5
7
2
8
6
3
5
7
8
23
9
10
9
10
ALGORITHM DESEAN
Step3. Anonymous Degree Realization.
Adjust edges in G s. t. the vertices in each cluster x
meet the target degree dx selected in Step 1.
Ex 3. Given k = 2 and ω = 0.5.
1
2
4
6
3
5
7
8
24
9
10
ALGORITHM DESEAN
Step3. Anonymous Degree Realization.
Adjust edges in G s. t. the vertices in each cluster x
meet the target degree dx selected in Step 1.
Ex 3. Given k = 2 and ω = 0.5.
1
2
4
6
3
8
10
2
4
5
7
9
1
6
3
5
7
9
8
10
25
C.-H. Tai, P. S. Yu, D.-N. Yang, and M.-S. Chen, ”Structural
diversity for privacy in publishing social networks,” In SDM, 2011.
26
COMMUNITY IDENTIFICATION
Vertex identification is considered to be an important
privacy issue in publishing social networks.
◦
k-degree anonymity, k-neighborhood anonymity, …
In addition to a vertex identity, each individual is also
associated with a community identity.
◦
Could be used to infer the political party affiliation or disease
information sensitive to the public.
◦ Is a kind of structural information
27
COMMUNITY IDENTIFICATION
Community information is explicitly given:
Ex.
Alice knows recently…
Bob is sick
Bob participates in this social network
Bob makes 5 friends. (vertex degree attack)
AIDS Com.
Bob has AIDS!
SLE Com.
28
COMMUNITY IDENTIFICATION
Community information is not given:
Ex.
Alice knows Bob participates in this social network and
has 5 friends. (vertex degree attack)
Alice can know the approximation of Bob’s
neighborhood.
29
COMMUNITY IDENTIFICATION
% of vertices violating k-structural diversity
◦
(a) DBLP
◦ (b) ca-CondMat
k-degree anonymization is insufficient
30
COMMUNITY IDENTIFICATION
structural diversity
◦
(a) original DBLP
◦ (b) 10-degree anonymized DBLP
Vertices with large degrees appear in a small set of
communities
31
NEW PRIVACY MODEL AGAINST
COMMUNITY IDENTIFICATION
k-Structural Diversity
To protect against vertex degree attack, for each vertex, there
should be other vertices with the same degree located in at
least k-1 other communities.
If a graph satisfies k-structural diversity, then it also
satisfies k-degree anonymity.
32
THE ANONYMIZATION
Problem formulation:
Given a graph G(V, E, C) and an integer k, 1≦ k ≦ |C|, the
problem is to anonymize G to satisfy k-structural diversity such
that information distortion is minimized.
The challenges:
How to preserve community structures, even in the
implicit cases, while preserving privacy.
33
PROBLEM FORMULATION
Operation Adding Edge
◦
Connect two vertices belonging to the same community.
◦ Can avoid destroying the communities.
34
PROCEDURE MERGENCE
◦
To protect a vertex v in an existing anonymous group, in
which all the vertices have the same degree d
Com. 1
Com. 2
v
Com. 3
35
PROCEDURE MERGENCE
◦
To protect a vertex v in an existing anonymous group, in
which all the vertices have the same degree d
Com. 1
Com. 2
v
Com. 3
36
PROCEDURE CREATION
To create a new anonymous group for a vertex v, such
that all the vertices in the group locate in at least k
difference communities and have the same degree as v
Com. 1
Com. 2
Com. 3
v
37
PROCEDURE CREATION
To create a new anonymous group for a vertex v, such
that all the vertices in the group locate in at least k
difference communities and have the same degree as v
Com. 1
Com. 2
Com. 3
v
38
THE EDGE-REDITECTION MECHANISM
◦
Is defined on w, v, x in the same community
◦
w: an anonymized vertex
v and x : two not-yet-anonymized vertices
Is to replace the edge (w, v) with the edge (w, x)
v
v
w
w
x
39
THE EDGE-REDITECTION MECHANISM
Q. Why needs the Edge-Reditection mechanism?
By mergence
Com. 1
Com. 3
By mergence
Com. 1
Com. 2
v
Com. 3
w
Com. 2
v
w40
THE EDGE-REDITECTION MECHANISM
Q. Why needs the Edge-Reditection mechanism?
By creation
By creation
Com. 1
v
Com. 1
v
w
Com. 2
Com. 3
w
Com. 2
Com. 3
41
ALGORITHM EDGECONNECT (EC)
Procedures Mergence and Creation
◦
Let Rv be the set of edges that could be redirected away from v
The Edge-Reditection mechanism
42
PROBLEM FORMULATION
Operation Adding Edge
◦
Connect two vertices belonging to the same community.
◦ Can avoid destroying the communities.
Operation Splitting Vertex
◦
Replace a vertex v with a set of substitute vertices, such that
each substitute vertex is connected with at least one edge
incident to v originally.
◦ Each substitute vertex presents partial truth of the vertex v.
43
PROCEDURE CREATIONBYSPLIT
Split a set of vertices, including v, to create a new
anonymous group
Com. 1
v
Com. 3
Com. 2
Com. 1
v2
Com. 2
v1
Com. 3
44
PROCEDURE MERGEBYSPLIT
Split v into a set of substitute vertices s.t. each substitute
vertex is protected in some existing anonymous group
Com. 1
Com. 2
v3
Com. 3
v
v2
v1
45
C.-H. Tai, P.-J. Tseng, P. S. Yu and M.-S. Chen, "Identities
Anonymization in Dynamic Social Networks," In ICDM-11, 2011.
46
THE PROBLEM IN DYNAMIC
SCENARIOS…
A dynamic social network
will be sequentially
released.
G1
G2
An attacker can monitor a
victim for a period w.
Therefore, the adversary
knowledge includes:
The releases G t-w+1,
G t-w+2, …, G t during w
A degree sequence Δvw=(dvtw+1,
dvt-w+2, …, dvt) of a victim v
Ex. John has two friends at time 1,
and three friends at time 2.
47
PRIVACY MODEL: KW-STRUCTURAL
DIVERSITY ANONYMITY
Base case of w=1
A group θd, consisting of all
vertices of degree d, is a
k-shielding group if
there is a vertex subset θ ⊆ θd s. t.
The adversary
knowledge includes:
1. The release social
network G t
2. A degree sequence
Δv1=(dvt) of a
victim v
G
(1) |θ |≥ k, and
(2) any two vertices u and v in θ,
Cv ∩ Cv = ø, where C is the
community identity.
???
48
Ex. Mary has four friends.
PRIVACY MODEL: KW-STRUCTURAL
DIVERSITY ANONYMITY
Dynamic scenarios of w>1
A consistent group ΘΔ is the set of vertices
that always share the same degree during w.
The adversary
knowledge of includes:
1. The releases G tw+1, G t-w+2, …, G t
during w
2. A degree sequence
Δvw=(dvt-w+1, dvtw+2, …, d t) of a
v
victim v during w
A consistent group ΘΔ is a k-shielding if
at each time instant t in w,
there is a vertex subset Θ t ⊆ ΘΔ s. t.
(1) |Θ t |≥ k, and
(2) any two vertices u and v in Θ t, Cv t ∩ Cv t = ø, where C t
is the community identity at time t.
…
49
THE ANONYMIZATION
Problem formulation:
Suppose that every vertex in a series of sequential
releases G t-w+1, G t-w+2, …, G t-1 is protected.
Given G t-w+1, G t-w+2, …, G t-1 and k, anonymize the
current social network Gt s. t. every vertex is protected
in a k-shielding consistent group.
The challenges:
The anonymization is depended on not only the current
social network but also previous w-1 releases.
Searching through all the w-1 releases to eliminate
privacy leak is time consuming.
50
ANONYMIZATION ALGORITHM
Construct CS (Clustering Sequence) -Table to
prevent the search through w graphs.
CS-Table
Summary the vertex information in w-1 previous
releases.
Fetch v’ info. in w graphs without scanning the
graphs.
According to the degree sequence during w, sort the
vertices in hierarchical clustering manner.
Vertices in the same k-consistent shielding group
are close in CS-Table.
CS-Table can be incrementally updated.
According to the vertex ranking in CS-Table,
anonymize each vertex to be protected.
51
THANK YOU~!