PRIVACY CRITERIA

Download Report

Transcript PRIVACY CRITERIA

PRIVACY CRITERIA
Roadmap

Privacy in Data mining

Mobile privacy

(k-e) – anonymity

(c-k) – safety

Privacy skyline
Privacy in data mining

Random Perturbation (quantitative data)



Given value x, return value x + r, r is a random value from a
distribution
Construct decision-tree classifier on perturbed data s.t.
accuracy is comparable to classifiers of original data
Randomized Response (categorical data)


Basic idea: disguise data by probabilistically changing the
value of sensitive attribute to another value
Distribution of original data can be reconstructed using the
disguised data
Roadmap

Privacy in Data mining

Mobile privacy

(k-e) – anonymity

(c-k) – safety

Privacy skyline
Mobile privacy

Spatial cloaking: Cloaked region




Transformation based matching


Contains location q and at least k-1 other user
locations
Circular region of location q
Contains location q and number of dummy
locations generated by client
Transform region through Hilbert curves by using
Hilbert keys
Casper: user registers with (k, Amin) profile


k: user is k-anonymous
Amin : minimum acceptable resolution of the
cloaked spatial region
Roadmap

Privacy in Data mining

Mobile privacy

(k-e) – anonymity

(c-k) – safety

Privacy skyline
(k-e) - anonymity


Privacy protection for numerical sensitive
attributes
GOAL: group sensitive attribute values s.t.



No less than k distinct values
Range of group larger than threshold e
Permutation-based technique to support
aggregate queries

Constructing help table
Aggregate Query Answering on Anonymized Tables @ ICDE2007
(k-e) - anonymity
Original Table
Table after Permutation
(k-e) - anonymity
Table after Permutation
Help Table
Roadmap

Privacy in Data mining

Mobile privacy

(k-e) – anonymity

(c-k) – safety

Privacy skyline
(c-k) – safety

Goal:



quantify background knowledge k of attacker
maximum disclosure w.r.t. k is less than threshold
c
Express background knowledge through a
language
Worst –Case Background Knowledge for Privacy –Preserving Data Publishing @
ICDE2007
(c-k) – safety

Create buckets , where randomly permute
sensitive attribute values within each bucket
Original Table
Bucketized Table
(c-k) – safety


Bound background knowledge i.e., attacker knows k
basic implications
Atom: tp[S] = s, s S, p  Person


Basic implication:



e.g. tJack[Disease] = flu
For some m, n and Ai, Bi atoms
e.g. tJack[Disease] = flu tCharlie[Disease] = flu
is the language consisting of conjunctions
of k basic implications
(c-k) – safety

Find bucketization B of original table s.t.


B is (c-k) – safe
The maximum disclosure of B w.r.t
is less than threshold c
Roadmap

Privacy in Data mining

Mobile privacy

(k-e) – anonymity

(c-k) – safety

Privacy skyline
Privacy skyline



Original data transformed in Generalized or
Bucketized data
Quantify external knowledge through skyline
for each sensitive value
External knowledge for each individual


Having single sensitive value
Having multiple sensitive values
Privacy Skyline: Privacy with Multidimensional Adversarial Knowledge @ VLDB
2007
Privacy skyline

Three types of knowledge (l, k, m) e.g.(2, 3, 1)

l: Knowledge about target individual t

flueTom[S] and cancerTom[S] (obtained from Tom.s
friend)

k: Knowledge about individuals (u1, ..uk) other
than t


flue Bob[S] and flue Cary[S] and cancer Frank[S]
(obtained from another hospital)
m: Knowledge about the relationship between t
and other individuals (v1, …vm)

AIDS  Ann[S]  AIDS  Tom[S] (because Ann is Tom’s
wife)
Privacy skyline

Example: knowledge threshold (1, 5, 2) and
confidence c=50% for sensitive value AIDS



Adversary knows l≤1 sensitive values that t does
not have
Adversary knows sensitive values of k≤5 others
Adversary knows m≤2 members in t’s same-value
family
Adversary cannot predict
individual t to have AIDS
with confidence 50% when
the above hold
Privacy skyline


If transformed data D* is safe for (1, 5, 2) it is
safe for any (l, k, m) with l≤1, k≤5, m≤2
i.e., the shaded region
Privacy skyline

Skyline for set of incomparable points

{(1, 1, 5), (1, 3, 4), (1, 5, 2)}
Privacy skyline

Given a skyline
{(l1, k1, m1, c1), …,(lr, kr, mr, cr)}



release candidate D* is safe for sensitive
value  iff , for i =1 to r
max {Pr(  t[S] | Lt, (li, ki, mi), D*)} < ci
maximum probability of a sensitive value  to
be for individual t w.r.t external knowledge
and release candidate is below confidence
threshold ci
Original Table
Bucketized
Table
Generalize Table