A case for an interactive approach to privacy

Download Report

Transcript A case for an interactive approach to privacy

Lower Bounds on Noise
Sergey Yekhanin
Institute for Advanced Study
Setting
 Database of information about individuals
 E.g. Medical history, Census data, Customer info.
 Need to guarantee confidentiality of individual entries
 Want to make deductions about the database; learn large
scale trends.
 E.g. Learn that a drug V increases likelihood of heart
disease
 Do not leak info about individual patients
Message
 Two approaches to database privacy:
 Interactive: Analyst asks questions; curator returns
approximate answers
Curator
Analyst
Message
 Two approaches to database privacy:
 Interactive: Analyst asks questions; curator returns
approximate answers
 Non-interactive: Publish a “summary” of the database;
analyst can use summary to get answers
Curator
Summary
Analyst
Message
 Two approaches to database privacy:
 Interactive: Analyst asks questions; curator returns
approximate answers
 Non-interactive: Publish a “summary” of the database;
analyst can use summary to get answers
 Thesis: The interactive approach is the right way to give
good accuracy for a given level of privacy
 Any non-interactive solution permitting “too accurate”
answers to “too many” questions leaks private information.
Plan
 Mathematical model of database and queries
 Attacks
 Somewhat accurate answers to all queries lead to privacy
leakage. (Fourier analysis) [Y] (extends [DiNi]).
 Somewhat accurate answers to a fraction of queries lead to
privacy leakage. (Linear programming / Polynomial
interpolation) [DMT,DY]
 Study of privacy leads to a variety of mathematical
challenges!
Model
 [Dinur-Nissim] Simple Model (easily justifiable)
 Database: n-bit binary vector x
 Query: vector a
 True answer: Dot product ax
 Response is ax+ e = True Answer + Noise
 Privacy Leakage: Attacker learns a certain bit of x.
 Blatant Non-Privacy: Attacker learns n−o(n) bits of x.
Fourier attack
Theorem: If a curator adds o(√n) noise to every response;
then an attacker can ask n questions, perform O(n log n)
computation and recover n-o(n) bits of the database.
 Put database records in one-to-one correspondence
with elements of a group Z 2k .
 Think of a database as a function D from Z 2k to {0,1}.
 Choose queries to ask for Fourier coefficients of D.
 Noisy Fourier coefficients approximately determine the
Boolean function D! (Parseval identity).
Linear programming attack
 Theorem: If a curator adds o(√n) noise to 0.773 fraction of
responses; then an attacker can ask O(n) questions,
perform O(n3) computation and recover n-o(n) bits of the
database.
 Arbitrarily large error on arbitrary and unknown 0.239
fraction on answers.
Linear programming attack
 Ask O(n) random +1/-1 questions
 Obtain y=Ax+e, where e is the error vector
 A natural approach to recover x from y:
Solve: min |e'|0 such that y=Ax'+e‘, x' in Rn (hard!)
 Solve a linear program [D, CT, MT]:
min |e'|1
such that
y=Ax'+e'
x' in Rn
Ax'
y
Polynomial interpolation attack
 Model: Questions have O(c) large coefficients
 Theorem: If a curator adds o(c) noise to 0.501 fraction of
responses; then an attacker can ask c questions, perform
O(c4) computation and reliably recover any particular bit
of the database.
 Arbitrarily large error on arbitrary and unknown 0.499
fraction on answers.
Polynomial interpolation attack
 Assume c is prime.
 Think of the space of queries as a linear space
Fcn .
 To obtain a reliable answer to query x = (1,0, … , 0) ,
draw a degree two curve through x.
 Ask all queries that correspond to points on the curve.
 Use polynomial interpolation to carefully combine the
answers.
x
q1
q4
q2
q3
q5
q6
Implications
 Privacy has a Price
 There is no safe way to avoid increasing the noise as the
number of queries increases
 Applies to Non-Interactive Setting
 Any non-interactive solution permitting answers that are “too
accurate” to “too many” questions is vulnerable to attack.
 Cannot just output a noisy table.
 Non-interactive approach has inherent limitations
 Interactive approach works
 Can also publish a summary, as long as its clear which stats
are accurate, and which ones are not.
 Future directions:
 Fewer queries
 Understand what can and what cannot be done privately