Transcript Document

Sublinear-Time
Error-Correction and
Error-Detection
Luca Trevisan
U.C. Berkeley
[email protected]
Contents
• Survey of results on error-correcting
codes with sub-linear time checking
and decoding procedures
• Most of the results not proved by the
speaker
• Some of the results not yet proved
by anybody
Error-correction
x
Encoding
C(x)
Errors
y
i
Decoding
x[i]
Error-detection
x
Encoding
C(x)
Errors
y
Checking
no
Checking
yes
Minimum Distance
If for every x  y.d(C ( x ), C ( y))  
Then :
 error correcti on problem i s solvable
for   / 2 errors
 error detecti on problem i s solvable
for   errors
Also vi ce- versa
Ideally
• Constant information rate
C (x )  O ( x )
• Linear minimum distance
x  x'.d(C(x),C(x' ))  (| C() |)
• Very efficient decoding
Sipser-Spielman: linear time
deterministic procedure
Sub-linear time decoding?
• Must be probabilistic
• Must have some probability of
incorrect decoding
• Even so, is it possible?
Reasons to be interested
• Sub-linear time decoding useful for
worst-case to average-case
reductions, and in informationtheoretic Private Information
Retrieval
• Sub-linear time checking arises in PCP
• Useful in practice?
Hadamard Code
To encode x  {0,1} we write down x, a
n
for all a  {0,1}
n
Length of encoding
is 2 , relative min. distance is 1/ 2
n
“Constant time” decoding
 Want to compute xi  x, ei
 Pickrandom a,
 read x,a and x, a  ei
from encoding
 Xor the two results
Analysis
 Each of two queriesis uniform.
 If there is fraction of errors,
there is prob. 1  2 of getting
both answers right, and so
getting xi right.
Blum - Luby - Rubinfeld
A Lower Bound
• If: the code is linear, the alphabet is
small, and the decoding procedure
uses two queries
• Then exponential encoding length is
necessary
Goldreich-Trevisan, Samorodnitsky
More trade-offs
• For k queries and binary alphabet:
Encodinglength 2 
 n1 /( k 1)
 is possible (no polynomials)
BFLS, CGKS,A, Kushilevitz - Ishai


Encodinglength  nk /(k 1) is necessary
Katz - Trevisan
• More complicated formulas for bigger
alphabet
Construction without
polynomials
 View x as n  n matrix
 For each subsets A, B  [ n ] write  aA,bB x[a, b]
 Exercise: reconstruct x[i, j] with 4 uniformly
distributed queries
 If fraction of errors, then prob. 1  4 
of correct decoding
 Note : encodinglength 2 2
 It's possible to do better
n
Negative result 1
• Suppose C:{0,1}^n -> {0,1}^m
is code with decoding procedure that reads
only k bits of corrupted encoding
• Pick random x, compute C(x), project C(x)
on m^{(k-1)/k} coordinates, prove that it
still contains (n) bits of info. about x.
• Then it must be m=(n^{k/(k-1)})
Katz-Trevisan
Negative Result 2
• Suppose C:{0,1}^n -> {0,1}^m
is linear code with decoding procedure that
reads only 2 bits of corrupted encoding
• Then there are vectors a1…am in {0,1}^n
such that for each i=1,…,n there are (m)
disjoint pairs j1,j2 such that
aj1 xor aj2 = ei
• Then it must be m=exp((n))
Goldreich-Trevisan, Samorodnitksy
Checking polynomial codes
• Consider encoding with multivariate
low-degree polynomials
• Given p, pick random z, do the
decoding for p(z), compare with
actual value of p(z)
• “Simple” case of low-degree test.
Rejection prob. proportional to
distance from code. Rubinfeld-Sudan
Bivariate Low Degree Test
• A degree-d bivariate polynomial
p:F x F -> F is represented as 2|F|
elements of F^d (the univariate
polynomial qa (y) = p(a,y) for each a
and the polynomial rb(x) = p(x,b) for
each b
• Test: pick random a and b, read qa
and rb, check that qa(b)=rb(a)
Analysis
• If |F| is a constant factor bigger
than d, then rejection probability is
proportional to distance from code
Arora-Safra, ALMSS,
Polishuck-Spielman
Efficiency of Decoding vs
Checking
 Can encode n bits of information using


 n/log n elements of an alphabet of size 2  
n log n

and do checkingwith 2 queries
 Regardless of alphabet size, impossible to
achieveconstant information rate if decoding
uses o(logn / log log n) queries
Tensor Product Codes
• Suppose we have a linear code C with
codewords in {0,1}^m.
• Define new code C’ with codewords in
{0,1}^(mxm);
• a “matrix” is a codeword of C’ if each row
and each column is codeword for C
• If C has lots of codeword and large
minimum distance, same true for C’
Generalization of the
Bivariate Low Degree Test
• Suppose C has K codewords
• Define code C’’ over alphabet [K], with
codewords of length 2m
• C’’ has as many codewords as C’
• For each codeword y of C’, corresponding
codeword in C’’ contains value of each row
and each column of y
• Test: pick a random “row” and a random
“column”, check intersection agrees
• Analysis?
Negative Results?
• No known lower bound for locally
checkable codes
• Possible to get encoding length
n^(1+o(1)) and checking with O(1)
queries and {0,1} alphabet?
• Possible to get encoding length O(n)
with O(1) queries and small alphabet?
Applications?
• Better locally decodable codes have
applications to PIR
• General/simple analysis of checkable
proofs could have application to PCP
(linear-length PCP, simple proof of
the PCP theorem)
• Applications to the practice of faulttolerant data storage/transmission?