Transcript pizza05

Asymptotic Enumerators of
Protograph LDPCC Ensembles
Jeremy Thorpe
Joint work with Bob McEliece, Sarah Fogal
Outline

Motivation



“Bad Sets”





What causes error floors in codes?
How can we predict error floors in advance of simulation?
Codeword
Stopping Set
Other Sets
Protograph Ensembles and Asymptotic
Enumerators
Computation of Enumerators
Why error-correcting codes?

Error correcting codes are designed to
transmit information


As efficiently as possible
With very low probability of error
What is an Error Floor?

An error floor
occurs when the
probability of error
doesn’t improve
“fast enough” as a
channel improves.
What’s going on?

Error floors occur when the typical
mode of failure changes.


In “waterfall” region, performance
depends on global channel statistics.
In “error floor” region, performance
depends on channel output near:



Low-weight codewords
Low-weight stopping sets [FKV 98]
Low-weight “trapping sets” [R. 04]
Can we predict error floors without
simulation?

Predict the “bad sets”:





Low-weight codewords
Low-weight stopping sets
Low-weight “trapping sets”
This is difficult to do for particular codes
However, for ensembles of codes, the problem
becomes feasible, and has been solved for



Codeword WE, Regular Ensembles [G 63]
Codeword WE, Unstructured Irregular Ensembles [LS 98]
Codeword, Stopping Set WE, UIE [Di 04]
Weight Enumerators Defined
Code Basics


Recall that a code
is a set of vectors
of length n.
The code on the
right is the (7,4)
Hamming Code.
C
=
0000000
0000111
0011001
0011110
0101010
0101101
0110011
0110100
1111111
1111000
1100110
1100001
1010101
1010010
1001100
1001011
Linear Codes

Linear Codes can
be represented by
their parity check
matrices.
C =
C  {x : Hx  0}
T
0000000
0000111
0011001
0011110
0101010
0101101
0110011
0110100
1111111
1111000
1100110
1100001
1010101
1010010
1001100
1001011
1010101
H = 0011110
1100110
Representation by a Graph

Parity check
matrices can be
represented by a
graph.
1010101
H = 0011110
1100110
Codeword weight Enumerator
for the (7,4) Hamming Code
C
=
0000000
0000111
0011001
0011110
0101010
0101101
0110011
0110100
1111111
1111000
1100110
1100001
1010101
1010010
1001100
1001011
A(w)
7
6
5
4
3
2
1
0
0
1
2
3
4
w
5
6
7
Protograph Ensembles


Protograph is
expanded by “N”
to obtain code
graph.
Randomness
comes from
permutations of
each edge type
N=4
Average codeword weight Enumerator
of expanded code N=2


For N=2, there are
(2!)|E|=4096 codes
in the ensemble.
The “ensemble
average” weight
enumerator is
shown.
A(w)
70
60
50
40
30
20
10
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
w
Asymptotic average codeword
weight enumerator

Plot on log scale…

log A(w)

2
1.5
1
0.5
0
-0.5
-1
0
5
10
w
Asymptotic average codeword
weight enumerator


Plot on log scale
Make quantities
intrinsic…

log A( w)
n

1
0.8
0.6
0.4
0.2
0
-0.2
-0.4
0
0.2
0.4

0.6
w
n
0.8
1
Asymptotic average codeword
weight enumerator



Plot on log scale
Make quantities
intrinsic
Take Limit


 log A(n) 
lim

n 
n


1.2
1
0.8
0.6
0.4
0.2
0
-0.2
0
0.2
0.4

0.6
w
n
0.8
1
Codewords, Stopping Sets,
Trapping Sets
Codewords (on a graph)

Assignment x of the
variables of a graph
such that:



each check node is
adjacent an even
number of times to
variables assigned 1
example: x
If half of the variables
of a codeword are
“flipped” in a BSC, ML
decoding fails.
X =
1
0
0
1
1
0
0
Stopping Sets

Assignment x of the
variables of a graph
such that:



Each check node is
adjacent 0 times, or 2 or
more times to variables
assigned 1
example: x, y
If all of a stopping set is
erased, BP decoding
cannot continue.
X =
1
0
0
1
1
0
0
y =
1
0
1
0
1
0
0
Stopping Set Enumerators



On the right is a
“sneak peak” at a
stopping set
enumerator.
Stopping set
enumerators are
uniformly larger than
codeword enumerators
because every
codeword is a stopping
set.

 log A(n) 
lim

n 
n



w
n
Trapping Sets



Trapping sets are sets of variables that
cause quantized decoders to get “stuck”.
Usually, trapping sets have a small number
of Checks connected once to variables
assigned a 1.
Unfortunately, this is not a good
combinatorial characterization, so we forget
about trapping sets for now.
Trapping Set Enumerators

Words like “usually”
and “small number”
usually don’t lead
to useful
combinatorial
characterizations…
?
Formalizing “Bad Sets”


For a given “word” x, define
the vector of assignments
adjacent to check c as xc.
For each check node c,

define the set Ωc.
Define Ω
If Ωc are the sets of even
weight, then Ω is the set of
codewords.
If Ωc are the set of vectors of
non-unit weight, Ω is the
stopping sets.
c



c  0,1
deg(c )
  x : xc  c 
Example


If Ωc are the sets of
even weight, then
x is not in Ω,
because x1 is not in
Ω1.
If Ωc are the set of
vectors of non-unit
weight, x is in Ω.
x =
1
0
1
0
1
c1
c2
c3
xc1  1,1,1,0
xc2  1,0,1,0
xc2  1,0,1,0
0
0
Types of words


Consider an
arbitrary vector x
Define vector θ
where θv is the
fraction of 1’s
assigned to
variables of type v
    10/ 28
θ = 1
0
.25
0
.5
.5
.25
1
1
1
1
0
0
0
0
1
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
1
0
0
x =
A Lemma about types


Lemma: All words of a given type are
equally probably a member of Ω, with
respect to our ensemble.
Sketch of Proof: Suppose x and y have
the same type. If x is in Ω for a certain
set of permutations (= code in the
ensemble), then y is in Ω for some
other set of permutations.
Applying method of Types


There is usually a
“typical” type θ* of
words of any weight Θ.
Even if all types are
equally numerous,
there are only
polynomially many
types, and this cannot
change the exponent.


 log A(n) 
E   lim

n 
n


E   max E  
 :  


 log A(N ) 
E    lim

n 
n


Computing E(θ)



A(Nθ) can be broken
AN  # N   P x  
  x 
down as shown on the
right.
 H  v 
It is easy to compute
v


A
N


e
 P x   
the number of words of
  x 
a given type.
The exponent of the
log AN    H v     c 
probability that all
v
c
checks of a given type
will be satisfied is
defined as Φ.


Computing Φ(θc)


Sanov’s theorem (or
something like it) tells
us that Φ(θc) is the
minimum KL-distance
to a certain distribution
qθc.
Pθc is the set of
distributions on Ωc with
marginals equal to θc.
 c   min D p || q
pPc
c

Finding “p”



We can do a
constrained
minimization to show
that the optimal p must
have a “Boltzmann
distribution” with
parameter s.
It is easy to compute
θc from s, but difficult
in reverse.
It is easy to compute
D(p||q) from s.
e s
p  
s '
e

'
s
θc
p
D(p||q)
Optimizing over θ


E(θ) is not a convex function, thus in
principle, we have to evaluate
everywhere to find its maximum E(Θ).
In practice, we still use convex
optimization techniques.
Using weight enumerators to
design codes!
Why are Enumerators Interesting?


The zero-crossings of
weight enumerators give
us lower bounds on the
typical size of bad sets.
In this example, the zerocrossing of the stopping
set enumerator is about
0.1, so we would expect
a code of size 106 to
have a minimum
stopping set of size 105
(or possibly bigger).
Optimizing Protograph
Ensembles



Currently, many ensembles are designed
only for density evolution threshold.
Such optimized codes are often ignored by
actual implementers because desired error
rates are around 10-10.
By optimizing simultaneously for threshold
and asymptotic enumerator, we may be able
to find efficient codes that can be used in
these applications.
Open Questions


How fast can we compute the weight
enumerator (or just the zero-crossing)?
What’s the achievable region of
threshold and zero-crossing for
protograph ensembles?