Transcript Document

Locally Decodable
Codes
Uri Nadav
Contents
• What is Locally Decodable Code (LDC) ?
• Constructions
• Lower Bounds
• Reduction from Private Information Retrieval
(PIR) to LDC
Minimum Distance
For every x≠y that satisfy d(C(x),C(y)) ≥ δ
• Error correction problem is solvable for less than δ/2
errors
• Error Detection problem is solvable for less than δ
errors
/2
codeword
Error-correction
Codeword
x
Encoding
C(x)
Input
Errors
Corrupted
codeword
i
Bit to decode
Worst case
error assumption
y
Decoding
x[i]
Decoded bit
Query Complexity
• Number of indices decoder is allowed to read
from (corrupted) codeword
• Decoding can be done with query complexity
Ω(|C(x)|)
• We are interested in constant query
complexity
Adversarial Model
We can view the errors model as an
adversary that chooses positions to
destroy, and has access to the
decoding/encoding scheme (but not to
random coins)
The adversary is allowed to insert at most m errors
Why not decode in blocks?
Adversary is worst case so it can
destroy more than δ fraction of some
blocks, and less from others.
Nice errors:
Worst Case:
Many errors in the same block
Ideal Code C:{0,1} 
n
m
Constant information rate: n/m > c
Resilient against constant fraction of
errors (linear minimum distance)
Efficient Decoding (constant query
complexity)
No Such Code!
Definition of LDC
C:{0,1}  is a (q,,) locally decodable code if there
exists a prob. algorithm A such that:
n
m
x  {0,1} , y   with distance d(y,C(x))<m and
n
m
i  {1,..,n}, Pr[ A(y,i)=xi ] > ½ + 
The Probability is over the coin tosses of A
A reads at most q indices of y (of its choice)
A has oracle access to y Queries are not allowed to be adaptive
A must be probabilistic if q< m
Example: Hadamard Code
• Hadamard is (2,δ, ½ -2δ) LDC
• Construction:
Relative minimum distance ½
Encoding
x1 x2
xn
source word
<x,2n-1>
<x,1> <x,2>
codeword
Example: Hadamard Code
n
Reconstruction
Pick aR{0,1}
the i’th entry
ei=(0,…0,1,0,…,0)
reconstruction formula
2 queries
xi
=
Decoding
x1 x2
xn
source word
<x,a> + <x,a+ei>
<x,2n-1>
<x,1> <x,2>
codeword
If less than δ fraction of errors, then
reconstruction probability is at least 1-2δ
Another Construction…
Reconstruction of bit xi,j:
2 n´ 2
1) A,B
2) A{i},B
3) A,B{j}
4) A{i},B{j}
n´
n
matrix
n matrix
x
{
A Í 1,..., n
Å
Probability of 1-4 for correct decoding
{
B Í 1,..., n
iÎ A, jÎ B
}
xi , j
}
Generalization…
2
n1/ k
´L´ 2
n1/ k
n1/ k ´ L ´ n1/ k CUBE
x
2k queries
1/k
kn
m=2
Smoothly Decodable Code
C:{0,1}  is a (q,c,) smoothly decodable code if there
exists a prob. algorithm A such that:
n
1
m
x  {0,1} and i  {1,..,n}, Pr[ A(C(x),i)=xi ] > ½ + 
n
The Probability
A has access
is to
over
a non
thecorrupted
coin tosses
codeword
of A
2
A reads at most q indices of C(x) (of its choice)
Queries are not allowed to be adaptive
3
i  {1,..,n} and j  {1,..,m}, Pr[ A(·,i) reads j ] ≤ c/m
The event is: A reads index j of C(x) to reconstruct index i
LDC is also Smooth Code
Claim: Every (q,δ,ε) LDC is a (q,q/δ,ε)
smooth code.
Intuition – If the code is resilient against
linear number of errors, then no bit of the
output can be queried too often (or else
adversary will choose it)
Proof: LDC is Smooth
A - a reconstruction algorithm for (q,δ,ε) LDC
Si= {j | Pr[A query j] > q/δm}
Set of indices
read ‘too’ often
There are at mostq queries, so sum of prob.
over j is q , thus |Si| < δm
Proof:LDC is Smooth
A’ – uses A as black box, returns whatever A returns as xi
A’ gives A oracle access to corrupted codeword C(x)’,
return only indices not in S
[C(x)’]j =
0
j  Si
C(x)j otherwise
A reconstructs xi with probability at least 1/2 + ε,
because there are at most |Si| < δm errors
A’ is a (q,q/δ, ε) Smooth decoding algorithm
Proof: LDC is Smooth
indices that A reads too often
C(x)
what A wants
A
what A gets
C(x)’
0
0
0
indices that A’ fixed arbitrarily
Smooth Code is LDC
• A bit can be reconstructed using q
uniformly distributed queries, with ε
advantage , when no errors
• With probability (1-qδ) all the queries are
to non-corrupted indices.
Remember: Adversary does not know decoding
procedure’s random coins
Lower Bounds
• Non existence for q = 1 [KT]
• Non linear rate for q ≥ 2 [KT]
• Exponential rate for linear code, q=2
[Goldreich et al]
• Exponential rate for every code, q=2
[Kerenidis,de Wolf] (using quantum arguments)
Information Theory basics
• Entropy
H(x) = -∑Pr[x=i] log(Pr[x=i])
• Mutual Information
I(x,y) = H(x)-H(x|y)
Information Theory cont…
• Entropy of multiple variable is less than
the sum of entropies! (equal in case of all
variables mutually independent:
H(x1x2…xn) ≤ ∑ H(xi)
• Highest entropy is of a uniformly
distributed random variable.
IT result from [KT]
Thm : Let C : {0,1}n → R be a function. Assume
there exists algorithm A s.t.
∀i ∈[n] , Prx [A(C(x),i) = xi ] > 1/2 + ε
(prob' taken over all random coins of A, and all x)
then
log(| R |) ≥ (1 - H (1 / 2 + ε ))n
Proof
• I(x, C(x)) ≤ H(C(x)) ≤ log(| R |)
• I(x, C(x)) = H(x) - H(x | C(x))
n
≥ H(x) - ∑
H(xi | C ( x ))
i=1
≥ (1 - H(1 / 2 + ε))n
Combined …
log(| R |) ≥ (1- H (1/ 2 + ε ))n
Single query (q=1)
Claim: If C:{0,1}  is (1,δ,ε) locally
decodable then:
n
m,
log | Σ |
n≤
δ(1 - H(1/2 + ε))
No such family of codes!
Good Index
Index j is said to be ‘good’ for i, if
Pr[A(C(x),i)=xi |A reads j] > ½ + ε
Single query (q=1)
1/2 + ε < Prx [A(C(x),i) = xi ]
=
By definition of LDC
∑Pr [A(C(x),i) = x | A(•, i) reads j]Pr[A(•, i) reads j]
j∈{1.. m}
x
i
Conditional prob. summing over disjoint events
There exist at least a single j1 which is
good for i.
Perturbation Vector
Def: Perturbation vector Δj1,j2,…
takes random values uniformly
distributed from ∑, in position
j1,j2,… and 0 otherwise.
Destroys specified
indices in most
unpredicted way
0
0
j1»
j2 »
∑
0
0
∑
0
Adding perturbation
1/2 + ε < Prx [A(C(x)⊕Δj1 , i) = xi ]
=
A resilient
Against at
least 1 error
∑Pr [A(C(x)⊕Δ , i) = x | A(•, i) reads j]Pr[A(•, i) reads j]
j∈{1.. m}
x
j1
i
So, there exists at least one index,
j2 ‘good’ for i.
j2 ≠ j1 , because j1 can not be good!
Single query (q=1)
∑Pr [A(C(x)⊕Δ
j∈[m]
x
j1 , j2 ,.. jδm
> 1/2 + ε
, i) = xi | A(•, i) reads j]Pr[A(•, i) reads j]
A resilientAgainst
δm errors
So, There are at least δm indices of
The codeword ‘good’ for every i.
By pigeonhole principle , there exists
an index j’ in {1..m}, ‘good’ for δn indices.
Single query (q=1)
Think of C(x[1..δn]) projected on j’ as a function
from the δn indices of the input. The range
is ∑, and each bit of the input can be
reconstructed w.p. ½ + ε. Thus by IT result:
log(| Σ |)
δn <
(1 - H(1/2 + ε))
Case q≥2
m = Ω(n)q/(q-1)
Constant time reconstruction
procedures are impossible for codes
having constant rate!
Case q≥2 Proof Sketch
• A LDC C is also smooth
• A q smooth codeword has a small
enough subset of indices, that still
encodes linear amount of information
• So, by IT result, m(q-1)/q = Ω(n)
Applications?
• Better locally decodable codes have
applications to PIR
• Applications to the practice of faulttolerant data storage/transmission?
What about Locally Encodable
• A ‘Respectable Code’ is resilient against
Ω(m) fraction of errors.
• We expect a bit of the encoding to depend
on many bits of the encoding
Otherwise, there exists a bit which
influence less than 1/n fraction of the encoding.
Open Issues
• Adaptive vs Non-Adaptive Queries
guess first q-1 answers with succeess probability ∑q-1
• Closing the gap
Logarithmic number of
queries
• View message as polynomial p:Fk->F
of degree d (F is a field, |F| >> d)
• Encode message by evaluating p at all
|F|k points
• To encode n-bits message, can have
|F| polynomial in n, and d,k around
polylog(n)
To reconstruct p(x)
• Pick a random line in Fk passing through x;
• evaluate p on d+1 points of the line;
• by interpolation, find degree-d univariate
polynomial that agrees with p on the line
• Use interpolated polynomial to estimate
p(x)
• Algorithm reads p in d+1 points, each
uniformly distributed
x+(d+1)y
x
x+y
x+2y
Private Information Retrieval
(PIR)
• Query a public database, without
revealing the queried record.
• Example: A broker needs to query
NASDAQ database about a stock, but
don’t won’t anyone to know he is
interested.
PIR
A k server PIR scheme of one round,
for database length n consists of:
l
K query functions Q1 ,..., Qk : [n] × {0,1}lrnd → {0,1} q
lq
K answer functions A1 ,..., Ak : [n] × {0,1} → {0,1}la
reconstruc tion function : R : [n] × {0,1}lrnd × ( {0,1}la )k
PIR – definition
• These function should satisfy:
Correctnes s : ∀x ∈{0,1} n , i ∈[n ], r ∈{0,1} rnd
R(i, r, ..., Aj (Qj (i, r)),...) = xi
Privacy : For every i, j ∈[n] , s ∈[k] and q ∈{0,1} q ,
Pr [Qs (i, r) = q] = Pr[Qs (j, r) = q].
Simple
Construction of PIR
• 2 servers, one round
• Each server holds bits x1,…, xn.
• To request bit i, choose uniformly A subset
of [n]
• Send first server A.
• Send second server A+{i} (add i to A if it is
not there, remove if is there)
• Server returns Xor of bits in indices of
request S in [n].
• Xor the answers.
Lower Bounds On
Communication Complexity
• To achieve privacy in case of single
server, we need n bits message.
• (not too far from the one round 2
server scheme we suggested).
Reduction from PIR to LDC
• A codeword is a Concatenation of all
possible answers from both servers
• A query procedure is made of 2
queries to the database