Transcript Document
Sublinear-Time
Error-Correction and
Error-Detection
Luca Trevisan
U.C. Berkeley
[email protected]
Contents
• Survey of results on error-correcting
codes with sub-linear time checking
and decoding procedures
• Results originated in complexity
theory
Error-correction
x
Encoding
C(x)
Errors
y
i
Decoding
x[i]
Error-detection
x
Encoding
C(x)
Errors
y
Checking
no
Checking
yes
Minimum Distance
If for every x y.d(C ( x ), C ( y))
Then :
error correcti on problem i s solvable
for / 2 errors
error detecti on problem i s solvable
for errors
Also vi ce- versa
Ideally
• Constant information rate
C (x ) O ( x )
• Linear minimum distance
x x'.d(C(x),C(x' )) (| C() |)
• Very efficient decoding
Sipser-Spielman: linear time
deterministic procedure
Sub-linear time decoding?
• Must be probabilistic
• Must have some probability of
incorrect decoding
• Even so, is it possible?
Motivations & Context
• Sub-linear time decoding useful for
worst-case to average-case
reductions, and in informationtheoretic Private Information
Retrieval
• Sub-linear time checking arises in PCP
• Useful in practice?
Error-correction
Hadamard Code
To encode x {0,1} we write down x, a
n
for all a {0,1}
n
Length of encoding
is 2 , relative min. distance is 1/ 2
n
Example
Encoding of…
is…
0 0 0
0 0 0 0 0 0 0 0
0 0 1
0 1 0 1 0 1 0 1
0 1 0
0 0 1 1 0 0 1 1
0 1 1
0 1 1 0 0 1 1 0
1 0 0
0 0 0 0 1 1 1 1
1 0 1
0 1 0 1 1 0 1 0
1 1 0
0 0 1 1 1 1 0 0
1 1 1
0 1 1 0 1 0 0 1
“Constant time” decoding
Want to compute xi x, ei
Pickrandom a,
read x,a and x, a ei
from encoding
Xor the two results
Analysis
Each of two queriesis uniform.
If there is fraction of errors,
there is prob. 1 2 of getting
both answers right, and so
getting xi right.
Goldreich- Levin;Blum - Luby - Rubinfeld
A Lower Bound
• If: the code is linear, the alphabet is
small, and the decoding procedure
uses two queries
• Then exponential encoding length is
necessary
Goldreich-Trevisan, Samorodnitsky
More trade-offs
• For k queries and binary alphabet:
Encodinglength 2
n1 /( k 1)
is possible (no polynomials)
BFLS, CGKS,A, Kushilevitz - Ishai
Encodinglength nk /(k 1) is necessary
Katz - Trevisan
• More complicated formulas for bigger
alphabet
Construction without
polynomials
View x as n n matrix
For each subsets A, B [ n ] write aA,bB x[a, b]
Exercise: reconstruct x[i, j] with 4 uniformly
distributed queries
If fraction of errors, then prob. 1 4
of correct decoding
Note : encodinglength 2 2
It's possible to do better
n
Construction with
polynomials
• View message as polynomial p:Fk->F
of degree d (F is a field, |F| >> d)
• Encode message by evaluating p at all
|F|k points
• To encode n-bits message, can have
|F| polynomial in n, and d,k around
(log n)O(1)
To reconstruct p(x)
• Pick a random line in Fk passing through x;
• evaluate p on d+1 points of the line;
• by interpolation, find degree-d univariate
polynomial that agrees with p on the line
• Use interp’ing polynomial to estimate p(x)
• Algorithm reads p in d+1 points, each
uniformly distributed
Beaver-Feigenbaum; Lipton;
Gemmel-Lipton-Rubinfeld-Sudan-Wigderson
x+(d+1)y
x
x+y
x+2y
Error-detection
Checking polynomial codes
• Consider encoding with multivariate
low-degree polynomials
• Given p, pick random z, do the
decoding for p(z), compare with
actual value of p(z)
• “Simple” case of low-degree test.
• Rejection prob. proportional to
distance from code. Rubinfeld-Sudan
Bivariate Code
• A degree-d bivariate polynomial p:F x F -> F can
be represented as 2|F| univariate degree-d
polynomials (the “rows” and the columns”)
2x2 + xy + y2 + 1 mod 5
Y2+ Y2+
1
y+3
1
3
4
4
3
Y2+2y Y2+3y Y2+4y 2x2
+4
+4
+3
+1
2
0
2
3
3
0
4
2
4
0
0
0
4
2
4
2
3
3
2
0
2x2+x 2x2 2x2
+2
+2x +3x
2x2+4x
+2
Bivariate Low-Degree Test
• Pick a random row and a random
column. Chek that they agree on
intersection
• If |F| is a constant factor bigger
than d, then rejection probability is
proportional to distance from code
Arora-Safra, ALMSS,
Polishuck-Spielman
Efficiency of Decoding vs
Checking
Can encode n bits of information using
n/log n elements of an alphabet of size 2
n log n
and do checkingwith 2 queries
Regardless of alphabet size, impossible to
achieveconstant information rate if decoding
uses o(logn / log log n) queries
Tensor Product Codes
• Suppose we have a linear code C with
codewords in {0,1}^m.
• Define new code C’ with codewords in
{0,1}^(mxm);
• a “matrix” is a codeword of C’ if each row
and each column is codeword for C
• If C has lots of codeword and large
minimum distance, same true for C’
Generalization of the
Bivariate Low Degree Test
• Suppose C has K codewords
• Define code C’’ over alphabet [K], with
codewords of length 2m
• C’’ has as many codewords as C’
• For each codeword y of C’, corresponding
codeword in C’’ contains value of each row
and each column of y
• Test: pick a random “row” and a random
“column”, check intersection agrees
• Analysis?
Negative Results?
• No known lower bound for locally
checkable codes
• Possible to get encoding length
n^(1+o(1)) and checking with O(1)
queries and {0,1} alphabet?
• Possible to get encoding length O(n)
with O(1) queries and small alphabet?
Applications?
• Better locally decodable codes have
applications to PIR
• General/simple analysis of checkable
proofs could have application to PCP
(linear-length PCP, simple proof of
the PCP theorem)
• Applications to the practice of faulttolerant data storage/transmission?