Transcript pirtalk3

Private Information Retrieval
Yuval Ishai, Technion
Private Information Retrieval (PIR)
[CGKS95]
• Goal: allow user to query database while hiding
the identity of the data-items she is after.
• Motivation: patent databases, web searches, etc.
• Paradox(?): imagine buying in a store without the
seller knowing what you buy.
(Encrypting requests is useful against third parties; not against
owner of data.)
Modeling
• Database: n-bit string x
n should be thought of as being large
• User: wishes to
– retrieve xi
– keep i private
and
x {0,1}n
Server
??
?
xi
User
i  {1,..., n}
Some “solutions”
1. User downloads entire database.
Drawback: n communication bits (vs. logn+1 w/o privacy).
Main research goal: minimize communication complexity.
2. User masks i with additional random indices.
Drawback: gives a lot of information about i.
3. Enable anonymous access to database.
Note: addresses a different concern: hides identity of user, not
the fact that xi is retrieved.
Fact: PIR as described so far requires (n) communication bits.
Two Approaches
Computational PIR
[CG97,KO97,CMS99,...]
Computational privacy, based on cryptographic
assumptions.
Information-Theoretic PIR
[CGKS95,Amb97,...]
Replicate database among k servers.
Unconditional privacy against t servers.
Default: t=1
Computational PIR for Dummies
Tool: homomorphic encryption
b
 b’
Protocol:
=
b+b’
•User sends E(ej)
E(0) E(0) E(1) E(0) (=c1 c2 c3 c4)
n1/2
0110
1 1 1 0 n1/2
X= 1 1 0 0
0001
•Server replies with E(X·ej)
c2c3
c1 c2c3
c1c2
c4
• User recovers jth column of X
j
 PIR with ~ O(n1/2) communication
Information-Theoretic PIR for Dummies
n1/2
X
S1
n1/2
S2
j
q1
q2
a1=X·q1
q1,q2Z2n
q1 + q2 = ej
a2=X·q2
1/2
a1+a2=X·ej
j
U
 2-server PIR with O(n1/2) communication
Bounds for Computational PIR
servers
comm.
assumption
[CG97]
2
O(n)
[KO97]
1
O(n)
one-way function
homomorphic
QRA /
encryption
[CMS99]
1
polylog(n)
-hiding
[KO00]
1
n-o(n)
trapdoor permutation
Necessary assumptions:
– Nontrivial 1-server PIR implies OWF [BIKM99]
– … and even OT [DMO00]
Bounds for i.t. PIR
Upper bounds:
– O(log n / loglog n) servers, polylog(n) [BF90,BFKR91,CGKS95]
– 2 servers, O(n1/3); k servers, O(n1/k) [CGKS95]
– k servers, O(n1/(2k-1)) [Amb97,Ito99, IK99, BI01]; t-private, O(n t/(2k-1)) [BI01]
– Today: k servers, O(ncloglogk /(klogk)) [BIKR02]. Better for k3.
Lower bounds:
– log n +1 (no privacy)
– 2 servers, ~4log n ; k servers, ck log n
[Man98,KT00]
Restricted lower bounds:
– 2 servers, 1 round, 1-bit answers:
• Linear: 2n [CGKS95]
• Nonlinear: (n) [KW03, BFG02]
1/(m+1)) [GKST02]
– 2 servers, 1 round, linear,
user
reads
m
bits:
(n
m
(more generally, |q|=(n/|a| )).
Why Information-Theoretic PIR?
Cons:
• Requires multiple servers
• Privacy against limited collusions only
• Worse asymptotic complexity (with const. k):
O(nc) vs. no(1) or even polylog(n)
Pros:
•
•
•
•
•
Neat question
Unconditional privacy
Better “practical” efficiency
Allows very short queries or very short answers (+apps) [DIO98,BIM99]
Essentially equivalent to a natural coding question [KT00]
Locally Decodable Codes [KT00]
x
y
i
k-server PIR  k-query LDC:
y = answers to all possible queries on database x
Code length = 2PIR_Query_Length
Alphabet = {0,1}PIR_Answer_Length
#queries = k
Binary LDC  PIR with one answer bit per server
PIR-Related Work
• Extensions
– Symmetric PIR [GIKM98,NP99]
– Private storage [OS98]
– Retrieval by keywords [CGN99]
• Additional settings for PIR [GGM98,DIO98,BIM00,...]
• PIR as a building-block
– Private statistical queries [CIKRRW01]
– Private approximations [FIMNSW01]
– Communication-preserving secure function evaluation [NN01]
Time Complexity of PIR
Focus so far: communication complexity
Obstacle: time complexity
• Protocols require (at least) linear computation per query.
• Thm: servers must spend at least linear expected time.
Workarounds:
• PIR with preprocessing [BIM00]
– some savings possible in multi-server case
– single-server case seems hopeless
• Amortizing computation over multiple queries
– single-user case essentially solved [IKOS02]
– multi-user case open
Open Questions
• Communication complexity of i.t. PIR:
– 2 servers
– 3 servers, binary answers (  3-query binary LDC)
– t>1
• Sufficient assumptions for 1-server PIR
– OT  nontrivial PIR ?
– Trapdoor permutation  good PIR?
– Your favorite assumption  great PIR?
• Time complexity of PIR
– Better bounds for PIR with preprocessing
– Better amortization in a multi-user setting
– Does 1-server PIR require many “public-key” operations?
Breaking the O(n1/(2k-1)) Barrier for
Information-Theoretic PIR
Joint work with
A. Beimel, E. Kushilevitz, J.F. Raymond
communication in
k-server PIR
previous
new
k=2
k=3
k=4
k=5
k=6
O(n1/3)
O(n1/5)
O(n1/7)
O(n1/9)
O(n1/11)
O(n1/5.25)
O(n1/7.87)
O(n1/10.83)
O(n1/13.78)
length of k-query
binary LDC
previous
new
2^O(n)
2^O(n1/2)
2^O(n1/3)
2^O(n1/4)
2^O(n1/5)
2^O(n3/10)
2^O(n1/5)
2^O(n1/7)
Model
X
X
S1
S2
??
?
??
?

X
Sk
??
?
xi
User
i
Arithmetization
x  Px  F [Z1,…,Zm]
i  zi  Fm
 i[n], Px(zi) = xi
P
S1
???
P
S2

???
P
Sk
???
P(z)
User
z
Parameters
Field F = GF(2)
Degree d = const.
#vars m s.t.  m   n  m= O(n1/d) suffices
d 
8
Ex. d=3, m=8, n= 3 
z1=11100000 z2=11010000 …. zn=00000111
M1= Z1Z2Z3 M2= Z1Z2Z4
Mn= Z6Z7Z8
n
Px   xi M i
i 1
Effect of Degree Reduction
degree d, m variables
degree d/c, m variables
size
n
size O(n1/c)
Degree Reduction Using Partial Information
[BKL95,BI01]
P
S1
P
S2 
P
Q
Sk
S1
Q
S2 
P(z)
Q(y)
User
User
z
z is hidden from servers
Q
Sk
y
Each entry of y is known to
all but one server.
k=3,d=6
S1
S2
S3
Q=
+
+
S1
S3
Q1
+
+
S1
Q3
S2
Q2
Q(y)=Q1(y)+Q2(y)+Q3(y)
degQj  d/k = 2
S3
Back to PIR
P
P
S1
S2
P

Sk
Q
Q
S1
S2
Q

P(z)
Q(y)
User
User
z
z is hidden from servers
n comm. bits
Sk
y
Each entry of y is known to
all but one server.
O(n1/k) comm. bits
Let z=y1+…+ yk , where the yj are otherwise random
Q(Y1,…,Yk)= P(Y1+… +Yk)
Initial Protocol
O(m) =
O(n1/d)
O(n1/k)
• User picks random y1,…, yk s.t. y1+…+ yk= z,
and sends to Sj all y’s except yj.
• Servers define an mk-variate degree-d
polynomial Q(Y1,…,Yk)= P(Y1+… +Yk) .
• Each Sj computes degree-(d/k) poly. Qj ,
such that Q(y)= Q1(y)+…+Qk(y).
• Sj sends a description of Qj to User.
• User computes Qj(y)=xi .
A Closer Look
•  M  Sj missing at most d/k variables.
 deg Qj  d/k

Best
previous
binary
PIR
Best
previous
PIR
Useful parameters:
• d=k-1  query length O(n1/(k-1))
d/k =0  answer length 1
• d=2k-1  query length O(n1/(2k-1))
d/k =1  answer length O(n1/(2k-1))
Boosting the Integer Truncation Effect
• Idea: apply multiple “partial” degree reduction steps.
• Generalized degree reduction:
Assign each monomial to the k’ servers V which jointly
miss the least number of variables.
k=3,d=6, k’=2
S1
S2
Q(y ) 
S3
Q=
+
+
+
+
 QV (y )
|V |  k '
S1S2 S2S3 S1S2 S1S2 S1S3
replication
reduction k
k’
reduction
k”
reduction …
1
degree
d
size
n
n’=O(n d’/d)
n”=O(n’ d ”/d’)
…
d’
d”
…
d/k
n d/k /d
The missing operator:
conversion
k
k
d
n
d’
#vars
m
nm’=O(md/d’)
Additional cost: re-distribute new point y’
Example: k=3
replication
reduction 3
2
conversion
2
reduction
1
degree
#vars
7
O(n1/7)
4
O(n1/7)
3
O(n4/21)
1
O(n4/21)
Queries
size
n
O(n4/7)
O(n4/7)
O(n4/21)
Answers
 O(n4/21) communication
In Search of the Missing Operator
P’
P
d’, m’
P(y)=P’(y’)
d, m
y
y’
Must have m’=(md/d’).
Question: For which d’<d can get m’=O(md/d’)?
•Possible when d’|d .
•Open otherwise. Positive result  PIR with complexity
nO(1/k^2)
•Simplest open case: d=3, d’=2, m’=O(m3/2)
A Workaround
• Can’t solve the polynomial conversion problem in
general.
• … but easy to solve given the promise
weight(y)=const.
Q(y1reduction:
,..., y k )   QV (y1 ,..., y k )
• Stronger degree
|V |  k '
Q(y1 ,..., y k )   QV (y1  ...  y k )
V
• Main technical lemma:
good parameters for strong degree reduction.
Open Problems
• Better upper bounds
– What is the true limit of our technique?
• Tight bounds for polynomial conversion
• The case t>1
• Lower bounds