Differential Privacy
Download
Report
Transcript Differential Privacy
Differential Privacy (2)
Outline
Using differential privacy
Database queries
Data mining
Non interactive case
New developments
Definition
Mechanism: K(x) = f(x) + D, D is some noise.
It is an output perturbation method.
Sensitivity function
How to design the noise D? It is actually linked back to the function f(x)
Captures how great a difference must
be hidden by the additive noise
Adding LAP noise
Why does this work?
Proof sketch
Let K(x) = f(x) + D =r. Thus, r-f(x) has Lap distribution with the scale df/e.
Similarly, K(x’) = f(x’)+D=r, and r-f(x’) has the same distribution
P(K(x) = r) = exp(-|f(x)-r|(e/df))
P(K(x’)= r) = exp(-|f(x’)-r|(e/df))
P(K(x)=r)/P(K(x’)=r) = exp( (|f(x’)-r|-|f(x)-r|)(e/df))
apply triangle inequality <= exp( |f(x’)-f(x)|(e/df)) = exp(e)
Composition
Sequential composition
Parallel composition
--for disjoint sets, the ultimate
privacy guarantee depends only on
the worst of the guarantees of each
analysis, not the sum.
Database queries (PINQ)
Basic aggregate operations
Noisy count
Noisy sum
Noisy average
composition rule
Stable transformation
|T(A) - T(B)| <= c|A-B|, and M provides ediff privacy
=> Composite computation M(T(x)) is cediff privacy
Data mining with differential
privacy (paper)
Decision tree
Basic operation: scan through the
domain to find the split that maximizes
some classification measure
Basic idea of the diff-privacy version
Users interact with the data server to
find out required information
These operations can be transformed to
counting operations -- apply NoisyCount
Sensitivity of the function is determined
by the classification measure
Privacy budget e
User specified total budget e
Composite operations need a specific e’
for each operation
Tradeoff between utility and
privacy
Non interactive differential
privacy
Noisy histogram release
Sampling and filtering
Partitioning
New settings
Against an adversary who has access
to the algorithm’s internal state
Differential privacy under continual
observation