Privacy-Preserving Social Network Publication Against Friendship

Download Report

Transcript Privacy-Preserving Social Network Publication Against Friendship

PRIVACY-PRESERVING
RELEASES OF SOCIAL
NETWORKS
Chih-Hua Tai
Dept. of CSIE, National Taipei University,
New Taipei City, Taiwan
DATA MINING

The primary task in data mining: development of
models about aggregated data.
Finding frequent patterns
 Finding rules

2
FINDING PATTERNS
3
FINDING RULES
4
DATA MINING VS. PRIVACY

The primary task in data mining: development of
models about aggregated data.
Finding frequent patterns
 Finding rules


Can we develop accurate models without access to
precise information in individual data records?

Why?
5
PRIVACY ISSUES IN DATA
6
PRIVACY ISSUES IN DATA
7
PRIVACY ISSUES IN DATA
8
DATA MINING AND PRIVACY
The primary task in data mining: development of
models about aggregated data.
 Can we develop accurate models without access to
precise information in individual data records?


Answer: yes, by randomization.


R. Agrawal, R. Srikant “Privacy Preserving Data Mining,”
SIGMOD 2000
How about the data utility?
9
DATA MINING VS. SOCIAL NETWORKS
Attributes:
Name, Salary, …
Links:
Friends, Neighborhood, …
Communities:
Interests, Activities, …
10
PRIVACY ISSUES ON SOCIAL NETWORKS

Personal information leaked, even if the vertex
identifies are hidden…
Many information can be
used to re-associate the
vertex with its identity.
Vertex degree:
k-degree anonymity , …
Neighborhood configuration:
k-neighborhood anonymity,
k-automorphism anonymity,
k-isomorphism anonymity,
grouping-and- collapsing, …
11
C.-H. Tai, P. S. Yu, D.-N. Yang and M.-S. Chen, "Privacypreserving Social Network Publication Against Friendship
Attacks," In KDD, 2011.
13
FRIENDSHIP ATTACK

Still there are another type of information for
vertex re-identification – friendship attack
14
FRIENDSHIP ATTACK

Given a target individual A and the degree pair
information D2 = (d1,d2), a friendship attack (D2,A)
exploits D2 to identify a vertex v1 corresponding
to A in a published social network where v1
connects to another vertex v2 with the degree
pair (dv1,dv2) = (d1,d2).
1
2
4
6
Alice
3
5
7
9
8
10
Ex 1. Assume that an
attacker knows that Alice
has 3 connections, Bob has 2
connections, and Alice and
Bob are friends. The attacker
identifies v9 as Alice with
100% confidence.
15
FRIENDSHIP ATTACK
In DBLP data set, the percentages of vertices that can be
re-identified with a probability larger than 1/k by degree
and friendship attacks.
Original
Social Network
k-degree anonymized
Social Network
k
Degree
Attack
Friendship Attack
Friendship Attack
5
0.28%
5.37%
2.89%
10
0.53%
10.69%
4.65%
15
0.73%
14.71%
5.82%
20
0.93%
18.44%
7.23%
16
NEW PRIVACY MODEL AGAINST
FRIENDSHIP ATTACK

k2-degree Anonymity
 A social network is k2-degree anonymous if, for
every vertex with an incident edge of degree
pair (d1,d2), there exist at least k − 1 other
vertices, each of which also has an incident
edge of the same degree pair.
1
2
4
6
3
5
7
9
8
10
Ex 2. Even with the knowledge
(D2,A)=((3,2),Alice), the probability
that an attacker can re-identify
Alice in the 22-degree anonymous
social network is limited to ½.
17
THE ANONYMIZATION

Problem formulation:


Given a graph G(V, E) and an integer k, the problem is to
anonymize G to satisfy k2-degree anonymity such that
information distortion is minimized.
The challenges:

Any alteration on an edge will affect the degrees of two
vertices.
18
GRAPH ANONYMIZATION ALGORITHMS

Integer Programming formulation


Obtain the optimal solution with bad scalability
DEgree SEqence ANonymization (DESEAN)
Step1. Degree Sequence Anonymization.
- determine the groups of vertices protecting each
others
 Step2. Privacy Constraint Satisfaction.
- eliminate the advantage of knowing friendship
information
 Step3. Anonymous Degree Realization.
- have the vertices in the same group share the
same vertex degree

19
ALGORITHM DESEAN

Step1. Degree Sequence Anonymization.

Cluster vertices with similar degrees and select a
target degree dx for each cluster x s. t. each cluster
contains at least k vertices and the weighted degree
difference ω Σvϵx (dx - d v) + (1 − ω) Σvϵx (d v - dx) is as
small as possible.
Ex 3. Given k = 2 and ω = 0.5.
1
2
4
6
3
5
7
9
8
10
20
ALGORITHM DESEAN

Step1. Degree Sequence Anonymization.

Cluster vertices with similar degrees and select a
target degree dx for each cluster x s. t. each cluster
contains at least k vertices and the weighted degree
difference ω Σvϵx (dx - d v) + (1 − ω) Σvϵx (d v - dx) is as
small as possible.
Ex 3. Given k = 2 and ω = 0.5.
1
2
4
6
3
8
10
2
4
5
7
9
1
6
3
5
7
8
21
9
10
ALGORITHM DESEAN

Step2. Privacy Constraint Satisfaction.

Add or delete edges between clusters to ensure that,
for each pair of clusters (x,y), the number of vertices
in x directly connected to the vertices in y is either
zero or not less than k.
Ex 3. Given k = 2 and ω = 0.5.
1
2
4
6
3
5
7
8
22
9
10
ALGORITHM DESEAN

Step2. Privacy Constraint Satisfaction.

Add or delete edges between clusters to ensure that,
for each pair of clusters (x,y), the number of vertices
in x directly connected to the vertices in y is either
zero or not less than k.
Ex 3. Given k = 2 and ω = 0.5.
1
2
4
6
3
1
4
5
7
2
8
6
3
5
7
8
23
9
10
9
10
ALGORITHM DESEAN

Step3. Anonymous Degree Realization.

Adjust edges in G s. t. the vertices in each cluster x
meet the target degree dx selected in Step 1.
Ex 3. Given k = 2 and ω = 0.5.
1
2
4
6
3
5
7
8
24
9
10
ALGORITHM DESEAN

Step3. Anonymous Degree Realization.

Adjust edges in G s. t. the vertices in each cluster x
meet the target degree dx selected in Step 1.
Ex 3. Given k = 2 and ω = 0.5.
1
2
4
6
3
8
10
2
4
5
7
9
1
6
3
5
7
9
8
10
25
C.-H. Tai, P. S. Yu, D.-N. Yang, and M.-S. Chen, ”Structural
diversity for privacy in publishing social networks,” In SDM, 2011.
26
COMMUNITY IDENTIFICATION

Vertex identification is considered to be an important
privacy issue in publishing social networks.
◦

k-degree anonymity, k-neighborhood anonymity, …
In addition to a vertex identity, each individual is also
associated with a community identity.
◦
Could be used to infer the political party affiliation or disease
information sensitive to the public.
◦ Is a kind of structural information
27
COMMUNITY IDENTIFICATION

Community information is explicitly given:
Ex.
Alice knows recently…
Bob is sick
 Bob participates in this social network
 Bob makes 5 friends. (vertex degree attack)

AIDS Com.
 Bob has AIDS!
SLE Com.
28
COMMUNITY IDENTIFICATION

Community information is not given:
Ex.
Alice knows Bob participates in this social network and
has 5 friends. (vertex degree attack)
 Alice can know the approximation of Bob’s
neighborhood.
29
COMMUNITY IDENTIFICATION

% of vertices violating k-structural diversity
◦
(a) DBLP
◦ (b) ca-CondMat

k-degree anonymization is insufficient
30
COMMUNITY IDENTIFICATION

structural diversity
◦
(a) original DBLP
◦ (b) 10-degree anonymized DBLP

Vertices with large degrees appear in a small set of
communities
31
NEW PRIVACY MODEL AGAINST
COMMUNITY IDENTIFICATION

k-Structural Diversity


To protect against vertex degree attack, for each vertex, there
should be other vertices with the same degree located in at
least k-1 other communities.
If a graph satisfies k-structural diversity, then it also
satisfies k-degree anonymity.
32
THE ANONYMIZATION

Problem formulation:


Given a graph G(V, E, C) and an integer k, 1≦ k ≦ |C|, the
problem is to anonymize G to satisfy k-structural diversity such
that information distortion is minimized.
The challenges:

How to preserve community structures, even in the
implicit cases, while preserving privacy.
33
PROBLEM FORMULATION

Operation Adding Edge
◦
Connect two vertices belonging to the same community.
◦ Can avoid destroying the communities.
34
PROCEDURE MERGENCE
◦
To protect a vertex v in an existing anonymous group, in
which all the vertices have the same degree d
Com. 1
Com. 2
v
Com. 3
35
PROCEDURE MERGENCE
◦
To protect a vertex v in an existing anonymous group, in
which all the vertices have the same degree d
Com. 1
Com. 2
v
Com. 3
36
PROCEDURE CREATION

To create a new anonymous group for a vertex v, such
that all the vertices in the group locate in at least k
difference communities and have the same degree as v
Com. 1
Com. 2
Com. 3
v
37
PROCEDURE CREATION

To create a new anonymous group for a vertex v, such
that all the vertices in the group locate in at least k
difference communities and have the same degree as v
Com. 1
Com. 2
Com. 3
v
38
THE EDGE-REDITECTION MECHANISM
◦
Is defined on w, v, x in the same community


◦
w: an anonymized vertex
v and x : two not-yet-anonymized vertices
Is to replace the edge (w, v) with the edge (w, x)
v
v
w
w
x
39
THE EDGE-REDITECTION MECHANISM
Q. Why needs the Edge-Reditection mechanism?
By mergence
Com. 1
Com. 3
By mergence
Com. 1
Com. 2
v
Com. 3
w
Com. 2
v
w40
THE EDGE-REDITECTION MECHANISM
Q. Why needs the Edge-Reditection mechanism?
By creation
By creation
Com. 1
v
Com. 1
v
w
Com. 2
Com. 3
w
Com. 2
Com. 3
41
ALGORITHM EDGECONNECT (EC)

Procedures Mergence and Creation
◦

Let Rv be the set of edges that could be redirected away from v
The Edge-Reditection mechanism
42
PROBLEM FORMULATION

Operation Adding Edge
◦
Connect two vertices belonging to the same community.
◦ Can avoid destroying the communities.

Operation Splitting Vertex
◦
Replace a vertex v with a set of substitute vertices, such that
each substitute vertex is connected with at least one edge
incident to v originally.
◦ Each substitute vertex presents partial truth of the vertex v.
43
PROCEDURE CREATIONBYSPLIT

Split a set of vertices, including v, to create a new
anonymous group
Com. 1
v
Com. 3
Com. 2
Com. 1
v2
Com. 2
v1
Com. 3
44
PROCEDURE MERGEBYSPLIT

Split v into a set of substitute vertices s.t. each substitute
vertex is protected in some existing anonymous group
Com. 1
Com. 2
v3
Com. 3
v
v2
v1
45
C.-H. Tai, P.-J. Tseng, P. S. Yu and M.-S. Chen, "Identities
Anonymization in Dynamic Social Networks," In ICDM-11, 2011.
46
THE PROBLEM IN DYNAMIC
SCENARIOS…



A dynamic social network
will be sequentially
released.
G1
G2
An attacker can monitor a
victim for a period w.
Therefore, the adversary
knowledge includes:
The releases G t-w+1,
G t-w+2, …, G t during w
 A degree sequence Δvw=(dvtw+1,
dvt-w+2, …, dvt) of a victim v

Ex. John has two friends at time 1,
and three friends at time 2.
47
PRIVACY MODEL: KW-STRUCTURAL
DIVERSITY ANONYMITY

Base case of w=1

A group θd, consisting of all
vertices of degree d, is a
k-shielding group if
there is a vertex subset θ ⊆ θd s. t.
The adversary
knowledge includes:
1. The release social
network G t
2. A degree sequence
Δv1=(dvt) of a
victim v
G
(1) |θ |≥ k, and
 (2) any two vertices u and v in θ,
Cv ∩ Cv = ø, where C is the
community identity.
???

48
Ex. Mary has four friends.
PRIVACY MODEL: KW-STRUCTURAL
DIVERSITY ANONYMITY

Dynamic scenarios of w>1


A consistent group ΘΔ is the set of vertices
that always share the same degree during w.
The adversary
knowledge of includes:
1. The releases G tw+1, G t-w+2, …, G t
during w
2. A degree sequence
Δvw=(dvt-w+1, dvtw+2, …, d t) of a
v
victim v during w
A consistent group ΘΔ is a k-shielding if
at each time instant t in w,
there is a vertex subset Θ t ⊆ ΘΔ s. t.
(1) |Θ t |≥ k, and
 (2) any two vertices u and v in Θ t, Cv t ∩ Cv t = ø, where C t
is the community identity at time t.

…
49
THE ANONYMIZATION

Problem formulation:
Suppose that every vertex in a series of sequential
releases G t-w+1, G t-w+2, …, G t-1 is protected.
 Given G t-w+1, G t-w+2, …, G t-1 and k, anonymize the
current social network Gt s. t. every vertex is protected
in a k-shielding consistent group.


The challenges:
The anonymization is depended on not only the current
social network but also previous w-1 releases.
 Searching through all the w-1 releases to eliminate
privacy leak is time consuming.

50
ANONYMIZATION ALGORITHM
Construct CS (Clustering Sequence) -Table to
prevent the search through w graphs.
 CS-Table

Summary the vertex information in w-1 previous
releases.
 Fetch v’ info. in w graphs without scanning the
graphs.
 According to the degree sequence during w, sort the
vertices in hierarchical clustering manner.
 Vertices in the same k-consistent shielding group
are close in CS-Table.
 CS-Table can be incrementally updated.


According to the vertex ranking in CS-Table,
anonymize each vertex to be protected.
51
THANK YOU~!