Personalized Privacy Preservation: beyond k
Download
Report
Transcript Personalized Privacy Preservation: beyond k
Personalized Privacy Preservation:
beyond k-anonymity and ℓ-diversity
SIGMOD 2006
Presented By Hongwei Tian
Outline
What is Privacy Breaching?
Drawback of K-Anonymity and ℓ-Diversity
Personalized Anonymous
How Adversary attacks?
How Data owner defeats the attacks?
Experiments
What is Privacy Breaching
Mainly, there are two classes
Compare prior belief and posterior belief
prior belief < posterior belief : help adversary
50% --> 80% “Bob has cancer”
prior belief > posterior belief : reasonable?
80% --> 50% “Bob has cancer”
=>I don’t think so. And many others have the same
thought with me.
What is Privacy Breaching
Only posterior belief
K-Anonymity: posterior belief ≤ 1/ k
ℓ-Diversity: posterior belief = p ≤ threshold
In a QI group, there are at least k tuples.
In a QI group, p percent of the tuples appear in the largest
sub-group
Personalized: posterior belief = Prbreach ≤ threshold
Prbreach : Breaching probability
Drawback of K-Anonymity and ℓ-Diversity
a k-anonymous table only prevents association
between individuals and tuples.
ℓ-Diversity and Personalized methods both prevent association
between individuals and sensitive values.
Drawback of K-Anonymity and ℓ-Diversity
a k-anonymous table may lose considerable
information
ℓ-Diversity also has this problem.
Drawback of K-Anonymity and ℓ-Diversity
Consider such a situation:
In one QI group, all tuples come from the same individual v
Adversary only knows v in this QI group from external datasets
I am Bob and I
am unlucky
that I have so
many
diseases
Bob must
be here
Aha, I know
Bob has
four
diseases
Drawback of K-Anonymity and ℓ-Diversity
Do not take into account personal anonymity
requirements
Personalized Anonymous
personalized anonymity: a person can specify
the degree of privacy protection for her/his
sensitive values.
So far, the literature has focused on a
universal approach that exerts the same
amount of privacy preserving for all persons,
without catering for their concrete needs.
Personalized Anonymous
Personalized Anonymous
BREACH PROBABILITY:
For a tuple t ∈ T, its breach probability Pbreach(t) equals the
probability that an adversary can infer from T∗ that any of the
associations {o, v1},..., {o, vx} exists in T, where v1, ..., vx are the
leaf values in SUBTR(t.GN).
Personalized Anonymous
BREACH PROBABILITY
Data owner and Adversary both can compute it
Data owner want to Pbreach(t) < threshold, then the
privacy of the individual corresponding to t holds
Adversary hope to get a Pbreach(t) > threshold, which
breaches the privacy of the individual.
How the adversary do (attack)?
???
How Adversary attacks
Adversary know One Individual One Tuple (Primary Case)
Possible reconstruction
P(5,4) ×3 ×3=1080;
Breaching reconstruction
2 × P(4,3) × 3 ×3=432;
Pbreach(t) = 432/1080=2/5
How Adversary attacks
Adversary know One Individual Multiple Tuples (Non-Primary Case)
Possible reconstruction
54 ×3 ×3=5625;
Breaching reconstruction
2 × 53 × 3 ×3
- 52× 3 ×3 =2025;
Pbreach(t) =
2025/5625=9/25
How Data owner defeats the attacks
The formal computation for Pbreach(t).
Primary Case
Non-Primary Case
Overlap
disjoint
n=5,b=2
n=2,b=2,
c=1/3
How Data owner defeats the attacks
Utility Measure: Information Loss
How Data owner defeats the attacks
Algorithm Picture
Group1
Split
New Table
…
Table
SA-Generalization
Group N
SA-Generalization
Replace if having more utility
How Data owner defeats the attacks
Algorithm
Start from all QI values are roots and
SA values have been generalized to
satisfy every Pbreach (t)< threshold
Top-down split QI attributes, in order
to increase utility information
“Single Split” means every time, only
one attribute can be split into its direct
children
SA-Generalization guarantee in every
QI group, every Pbreach (t)< threshold,
then the whole table prevents privacy
Every iteration, it should find a “Split”
and after SA-Generalization, the utility
information increases; otherwise, it
quits
How Data owner defeats the attacks
Algorithm
Bottom-up generalize SA values to
improve privacy
If tuples in Sprob satisfy privacy
requirement, means all tuples in G
satisfy privacy requirement
All SA values of the tuples which
dissatisfy privacy requirement will
be generalized to the parent of the
Guarding Node which approaches
the root mostly
Finish when no tuples in Sprob
dissatisfy privacy requirement; or
no possibility to generalize.
Experiments
Adult dataset (http://www.ipums.org)
5 QI Attributes, 1 SA Attribute
Pri-leaf, Pri-mixed, Nonpri-leaf, Nonpri-mixed
Breaching Threshold =0.25
All Weight for attributes =1
Experiments
Breaching Probability
References
Xiaokui Xiao and Yufei Tao. Personalized Privacy Preservation. In
SIGMOD 2006.
A. Machanavajjhala, J. Gehrke, and D. Kifer. l-diversity:
Privacybeyond k-anonymity. In ICDE, 2006.
K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Incognito: Efficient
full-domain k-anonymity. In SIGMOD, 2005.
A. Evfimievski, J. Gehrke, and R. Srikant. Limiting privacy breaching
in privacy preserving data mining. In ACM Symposium on Principles
of Database Systems, 2003.