Transcript Slides

How may Auditors Inadvertently
Compromise Your Privacy
Kobbi Nissim
Microsoft
With Nina Mishra HP/Stanford
Work in progress
PORTIA Workshop on Sensitive Data in Medical, Financial, and ContentDistribution Systems
The Setting
q = (f ,i1,…,ik)
f (di1,…,dik)
Statistical
database
• Dataset: d={d1,…,dn}
– Entries di: Real, Integer, Boolean
• Query: q = (f ,i1,…,ik)
– f : Min, Max, Median, Sum, Average, Count…
• Bad users will try to breach the privacy of
individuals
2
The Data Privacy Game:
an Information-Privacy Tradeoff
• Private functions:
i
f
f f
– Want to hide i(d)=di
• Information functions:
– Want to reveal query answers f(di1,…,dik)
• Major question: what may be computed over d (and
given to users) without breaching privacy?
• Confidentiality control methods
– Perturbation methods: give `noisy’ answers
– Query restriction methods: limit the queries users may post,
usually imposing some structure (e.g. size/overlap
restrictions)
3
Auditing
• [AW89] classify auditing as a query restriction method:
– “Auditing of an SDB involves keeping up-to-date
logs of all queries made by each user (not the data
involved) and constantly checking for possible
compromise whenever a new query is issued”
• Partial motivation: May allow for more queries
to be posed, if no privacy threat occurs
• Early work: Hofmann 1977, Schlorer 1976,
Chin, Ozsoyoglu 1981, 1986
• Recent interest: Kleinberg, Papadimitriou, Raghavan
2000, Li, Wang, Wang, Jajodia 2002, Jonsson, Krokhin
2003
4
Auditing
Here’s the answer
OR
Here’s a new query: qi+1
Query denied (as the answer
would cause privacy loss)
Auditor
Query log
q1,…,qi
Statistical
database
5
Design choices in Prior Work (1)
1. Privacy definition:
–
–
Privacy breached (only) when a database entry
may be deduced fully, or within some  accuracy
These privacy guarantees do not generally
suffice:
•
Should take into account: Adversary’s computational
power, prior knowledge, access to other databases…
2. Exact answers given
–
Auditors viewed as a way to give `quality’
answers???
6
Design choices in Prior Work (2)
3. Which information is taken into account in
the auditor decision procedure:
–
•
Decision made based on queries q1,…,qi, qi+1
and their answers a1,…,ai, ai+1
Denials ignored
4. Offline vs. Online:
•
Offline auditing: queries and answers checked
for compromise at the end of the day
•
•
Only detect breaches
Online auditing: answer/deny queries on the
fly
•
Prevent breaches just before they happen
7
Example 1: Sum/Max auditing
di real, sum/max queries, privacy breached if some di learned
q1 = sum(d1,d2,d3)
sum(d1,d2,d3) = 15
q2 = max(d1,d2,d3)
Denied (the answer would
cause privacy loss)
Oh well…
Auditor
8
Some Prior Work on Auditors
Sum/Max
[Chin]
Boolean
[KPR00]
Data
real
Queries
Breach
Sum/max di learned
Complexity
NP-hard
0/1
Sum
--”--
NP-hard*
Max [KPR00]
Real
Max
--”--
PTIME
Interval based
[LWWJ02]
Generalized
results [JK03]
di [a,b]
sum
di within
accuracy 
PTIME
NP-hard /
PTIME
* Approx version in PTIME
9
Can we use the offline version for online auditing?
… After Two Minutes …
di real, sum/max queries, privacy breached if some di learned
q1 = sum(d1,d2,d3)
sum(d1,d2,d3) = 15
q2 = max(d1,d2,d3)
There
must beiffa
q2 is denied
Ohreason
well…
for the
d1=d2=d3
=5
denial…
I win!
Denied (the answer would
cause privacy loss)
Auditor
10
Example 2: Interval Based Auditing
di  [0,100], sum queries,  =1 (PTIME)
q1 = sum(d1,d2)
Sorry, denied
q2 = sum(d2,d3)
sum(d2,d3) = 50
d1,d2 [0,1]
Denial
d3  [49,50]
d1,d2[0,1]
or
[99,100]
Auditor
11
Sounds Familiar?
Colonel Oliver North, on the Iran-Contra Arms Deal:
On the advice of my counsel I respectfully and
regretfully decline to answer the question based on
my constitutional rights.
David Duncan, Former auditor for Enron and
partner in Andersen:
Mr. Chairman, I would like to answer the
committee's questions, but on the advice of
my counsel I respectfully decline to answer
the question based on the protection afforded
me under the Constitution of the United
States.
12
Max Auditing
d1 d2 d3 d4 d5 d6 d7 d8 … dn-1 dn
di real
q1 = max(d1,d2,d3,d4)
M1234
q2 = max(d1,d2,d3)
If denied: d4=M1234
M123 / denied
q2 = max(d1,d2)
If denied: d3=M123
M12 / denied
Auditor
13
Adversary’s Success
q1 = max(d1,d2,d3,d4)
If denied: d4=M1234
Denied with probability 1/4
q2 = max(d1,d2)
If denied: d3=M123
Denied with probability 1/3
Success probability: 1/4 + (1- 1/4)·1/3 = 1/2
q2 = max(d1,d2,d3)
Recover 1/8 of the database!
Auditor
14
Boolean Auditing?
d1 d2 d3 d4 d5 d6 d7 d8 … dn-1 dn
q1 = sum(d1,d2)
di Boolean
1 / denied
q2=sum(d2,d3)
…
1 / denied
qi denied iff di = di+1  learn database/complement
Let di,dj,dk not all equal, where qi-1, qi, qj-1, qj, qk-1, qk all denied
q2=sum(di,dj,dk)
1/2
Recover the entire database!
Auditor
15
Two Problems
• Obvious problem: denied queries ignored
– Algorithmic problem: not clear how to incorporate denials in
the decision
• Subtle problem:
– Query denials leak (potentially sensitive) information
• Users cannot decide denials by themselves
Possible assignments to {d1,…,dn}
Assignments consistent
with (q1,…qi, a1,…,ai)
qi+1 denied
16
A Spectrum of Auditors
“Safe”
“Safe”
Size overlap restriction
Algebraic structure
q1,…,qi, qi+1
<
utility
“Unsafe”
q1,…,qi, qi+1
a1,…,ai
>
q1,…,qi, qi+1
a1,…,ai, ai+1
privacy
*Note: can work in “unsafe” region, but need to prove denials do not
leak crucial information
17
Simulatable Auditing*
An auditor is simulatable if a simulator exists s.t.:
qi+1
q1,…,qi
Statistical
database
Auditor
Deny/answer
qi+1
q1,…,qi
a1,…,ai

Simulator
Deny/answer
Simulation  denials do not leak information
* `self auditors’ in [DN03]
18
Why Simulatable Auditors do not
Leak Information?
Possible assignments to {d1,…,dn}
Assignments consistent
with (q1,…qi, a1,…,ai )
qi+1 denied/allowed
19
Summary
• Improper usage of auditors may lead to privacy
breaches, due to information leakage in the decision
procedure.
– Cell suppression / some k-anonymity methods should be
checked similarly
– Should make sure offline auditors do not leak information in
decision
• Simulatable auditors provably don’t leak information
– Give best utility while still “safe”
– A launching point for further research on auditors
• Further research:
– Auditors with more reasonable privacy guarantees
20