Transcript candidacy

Optimizing LDPC Codes for
message-passing decoding.
Jeremy Thorpe
Ph.D. Candidacy
2/26/03
Overview
 Research Projects
 Background to LDPC Codes
 Randomized Algorithms for designing LDPC
Codes
 Open Questions and Discussion
Data Fusion for Collaborative
Robotic Exploration
 Developed a version of the Mastermind game
as a model for autonomous inference.
 Applied the Belief Propagation algorithm to
solve this problem.
 Showed that the algorithm had an interesting
performance-complexity tradeoff.
 Published in JPL's IPN Progress Reports.
Dual-Domain Soft-in Soft-out
Decoding of Conv. Codes
 Studied the feasibility of using the Dual
SISO algorithm for high rate turbo-codes.
 Showed that reduction in state-complexity
was offset by increase in required numerical
accuracy.
 Report circulated internally at DSDD/HIPL
S&S Architecture Center, Sony.
Short-Edge Graphs for
Hardware LDPC Decoders.
 Developed criteria to predict performance and
implementational simplicity of graphs of Regular
(3,6) LDPC codes.
 Optimized criteria via randomized algorithm
(Simulated Annealing).
 Achieved codes of reduced complexity and superior
performance to random codes.
 Published in ISIT 2002 proceedings.
Evalutation of Probabilistic
Inference Algorithms
 Characterize the performance of probabilistic
algorithms based on observable data
 Axiomatic definition of "optimal
characterization"
 Existence, non-existence, and uniqueness
proofs for various axiom sets
 Unpublished
Optimized Coarse Quantizers
for Message-Passing Decoding
 Mapped 'additive' domains for variable and
check node operations
 Defined quantized message passing rule in
these domains
 Optimized quantizers for 1-bit to 4-bit
messages
 Submitted to ISIT 2003
Graph Optimization using
Randomized Algorithms
 Introduce Proto-graph framework
 Use approximate density evolution to predict
performance of particular graphs
 Use randomized algorithms to optimize
graphs (Extends short-edge work)
 Achieves new asymptotic performancecomplexity mark
Bacground to LDPC codes
The Channel Coding Strategy
 Encoder chooses the mth
codeword in codebook C
and transmits it across the
channel
 Decoder observes the
channel output y and
generates m’ based on the
knowledge of the codebook
C and the channel
statistics.
x C  X n
m {0,1}k
Encoder
Channel
Decoder
m'{0,1}k
y Y n
Linear Codes
 A linear code C (over a finite field) can be
defined in terms of either a generator matrix
or parity-check matrix.
 Generator matrix G (k×n)
C  {mG }
 Parity-check matrix H (n-k×n)
C  {c : cH '  0}
LDPC Codes
 LDPC Codes -- linear codes defined in terms
of H
 H has a small average number of non-zero
elements per row or column.
0 1 1 0 0 1 0 0 0 0
1 0 1 1 0 0 0 0 1 1
H 1 1 0 0 1 0 1 1 1 0
0 1 0 1 1 1 0 0 1 0
1 0 0 1 0 0 1 1 0 1
Graph Representation of LDPC
Codes
v |c
c
...
c,  xv  0
Variable nodes
v
...
 H is represented by a
bipartite graph.
 There is an edge from
v to c if and only if:
H (v, c )  0
 A codeword is an
assignment of v's s.t.:
Check nodes
Message-Passing Decoding of
LDPC Codes
 Message Passing (or Belief Propagation)
decoding is a low-complexity algorithm
which approximately answers the question
“what is the most likely x given y?”
 MP recursively defines messages mv,c(i) and
mc,v(i) from each node variable node v to each
adjacent check node c, for iteration i=0,1,...
Two Types of Messages...
 Likelihood Ratio
x , y 
p( y | x  1)
p( y | x  0)
 Probability Difference
 x , y  p( x  1 | y )  p( x  0 | y )
 For y1,...yn independent  For x1,...xn
conditionally on x:
independent:
x , y   x , y
n
1
i
i
 x , y    x , y
i
i
i
i
...Related by the Biliniear
Transform
 Definition:
1 x
B( x) 
1 x
 Properties:
B( B( x))  x
B ( x , y )   x , y
B(  x , y )  x , y
p( y | x  1)
)
p( y | x  0)
p( y | x  0)  p( y | x  1)
p( y | x  0)  p( y | x  1)
p( y | x  0)  p( y | x  1)
2 p( y )
2 p( x  0 | y) p( y)  2 p( x  1 | y ) p( y )
2 p( y)
p( x  0 | y )  p( x  1 | y)
B ( x , y )  B (




 x, y
Message Domains
Probability Difference
Likelihood Ratio
B()
P( x  0 | y )  P( x  1 | y )
e
P( y | x  0)
P( y | x  1)
e
log( )
Log Prob. Difference
log( )
Log Likelihood Ratio
B' ()
Variable to Check Messages
 On any iteration i, the
message from v to c is:
mv,c
(i )
 v  B(mc ',v
( i 1)
v
c
)
c '|v  c
...
c '|v  c
...
 In the additive domain:
(i )
( i 1)
mv,c  log( v )   B' (mc ',v )
Check to Variable Messages
 On any iteration, the
message from c to v is:
mc,v
(i )
v
  B' (mv ',c )
(i )
c
v '|c  v

 B' ( m
v '|c  v
v ',c
(i )
)
...
mc,v
(i )
...
 In the additive domain:
Decision Rule
 After sufficiently many iterations, return the
likelihood ratio:
0, if x , y  B(mc ,v (i 1) )  0
v v

c|v
xˆ  

otherwise
1,
Theorem about MP Algorithm
r
...
v
...
 If the algorithm stops after r
iterations, then the algorithm
returns the maximum a
posteriori probability estimate
of xv given y within radius r of
v.
 However, the variables within
a radius r of v must be
dependent only by the
equations within radius r of v,
...
Regular (λ,ρ) LDPC codes
 Every variable node has degree λ, every
check node has degree ρ.
 Best rate 1/2 code is (3,6), with threshold
1.09 dB.
 This code had been invented by 1962 by
Robert Gallager.
Regular LDPC codes look the
same from anywhere!
 The neighborhood of every
edge looks the same.
 If the all-zeros codeword is
sent, the distribution of any
message depends only on
its neighborhood.
 We can calculate a single
message distribution once
and for all for each
iteration.
Analysis of Message Passing
Decoding (Density Evolution)
 We assume that the all-zeros codeword was
transmitted (requires a symmetric channel).
 We compute the distribution of likelihood
ratios coming from the channel.
 For each iteration, we compute the message
distributions from variable to check and
check to variable.
D.E. Update Rule
 The update rule for Density Evolution is
defined in the additive domain of each type
of node.
 Whereas in B.P, we add (log) messages:
mc,v
(i )

 B' ( m
(i )
v ',c
v '|c  v
)
mv,c
(i )
 log( v ) 
 B' ( m
c '|v  c
( i 1)
c ',v
 In D.E, we convolve message densities:
PM
(i )
c ,v


*P
v '|c  v
B '( M v ',c
(i )
)
PM
(i )
v ,c
 Plog(v )  * PB '( M
c '|v  c
( i 1)
c ',v
)
)
Familiar Example:
 If one die has density function given by:
1 2 3 4 5 6
 The density function for the sum of two dice
is given by the convolution:
2 3 4 5 6 7 8 9 10 11 12
D.E. Threshold
 Fixing the channel message densities, the
message densities will either "converge" to
minus infinity, or they won't.
 For the gaussian channel, the smallest SNR
for which the densities converge is called the
density evolution threshold.
D.E. Simulation of (3,6) codes
 Threshold for regular
(3,6) codes is 1.09 dB
 Set SNR to 1.12 dB
(.03 above threshold)
 Watch fraction of
"erroneous messages"
from check to variable
Improvement vs. current error
fraction for Regular (3,6)
 Improvement per
iteration is plotted
against current error
fraction
 Note there is a single
bottleneck which took
most of the decoding
iterations
Irregular (λ, ρ) LDPC codes
...
...
 a fraction λi of variable Variable nodes
nodes have degree i. ρi
of check nodes have
λ2
degree i.
ρ4
 Edges are connected by
λ3
π
a single random
permutation.
ρm
λn
 Nodes have become
Check nodes
specialized.
D.E. Simulation of Irregular
Codes (Maximum degree 10)
 Set SNR to 0.42 dB
(~.03 above threshold)
 Watch fraction of
erroneous check to
variable messages.
 This Code was
designed by
Richardson et. al.
Comparison of Regular and
Irregular codes
 Notice that the
Irregular graph is much
flatter
 Note: Capacity
achieving LDPC codes
for the erasure channel
were designed by
making this line
exactly flat
Constructing LDPC code
graphs from a proto-graph
 Consider a bipartite graph
G, called a "proto-graph"
 Generate a graph G α called
an "expanded graph"


replace each node by α
nodes.
replace each edge by α
edges, permuted at random
=G
α=2
=G2
Local Structure of
 The structure of the
neighborhood of any
edge in Gα can be
found by examining G
 The neighborhod of
radius r of a random
edge is increasingly
probably loop-free as
α→∞.
α
G
Density Evolution on G
 For each edge (c,v) in
G, compute:
PM
(i )
v ,c
 Pv * * PB '( M
c '|v  c
 and:
PM
(i )
c ,v
 * PB '( M
v '|c  v
( i 1)
v ',c
)
(i )
c ',v
)
Density Evolution without
convolution
 One-dimensional approximation to D.E, which
requires:




A statistic that is approximately additive for check nodes
A statistic that is approximately additive for variable
nodes
A way to go between these two statistics
A way to characterize the message distribution from the
channel
Optimizing a Proto Graph using
Simulated Annealing
 Simulated Annealing is an iterative algorithm
that approximately minimizes an energy
function
 Requirements:




A space S over which to find the optimum point
An energy function E(s):S→R
A random perturbation function p(s):S→S
A "temperature profile" t(i)
Optimization Space
 Graphs with a fixed number of variable and check
nodes (rate is fixed)
 Optionally, we can add untransmitted (state)
variables to the code
 Typical Parameters



32 transmitted variables
5 untransmitted variables
21 parity checks
Energy function
 Ideal: density evolution threshold.
 Practical:


Approximate density evolution threshold
Number of iterations to converge to fixed error
probability at fixed SNR
Perturbations
 Types of operation



Add an edge
Delete an edge
Swap two edges
 Note: Edge swapping
operation not necessary
to span the space
Basic Simulated Annealing
Algorithm
 Take s0 = a random point in S
 For each iteration i, define si' = p(si)
 if E(si') < E(si) set si+1 = si'
E ( s ')  E ( s )
 if E(si ') > E(si) set si+1 = si' w.p. e t (i )
i
i
Degree Profile of Optimized
Code
 The optimized graph
has a large fraction of
degree 1 variables
 Check variables range
from degree 3 to
degree 8
 (recall that the graph is
not defined by the
degree profile)
12
10
8
Variable
nodes
Check
nodes
6
4
2
0
1 2 3 4 5 6 7 8
Threshold vs. Complexity
 Designed codes of rate
.5 with threshold 8 mB
from channel capacity
on AWGN channel
 Low complexity
(maximum degree = 8)
Improvement vs. Error Fraction
Comparison to Regular (3,6)
 The regular (3,6) code has
a dramatic bottleneck.
 The irregular code with
maximum degree 10 is
flatter, but has a bottleneck.
 The optimized proto-graph
based code is nearly flat for
a long stretch.
Simulation Results
 n=8192, k=4096
 Achieves bit error rate
of about 4×10-4 at
SNR=0.8dB.
 Beats the performance
of n=10000 code in [1]
by a small margin.
 There is evidence that
there is an error floor
Review
 We Introduced the idea of LDPC graphs
based on a proto-graph
 We designed proto-graphs using the
Simulated Annealing algorithm, using a fast
approximation to density evolution
 The design handily beats other published
codes of similar maximum degree
Open Questions
 What's the ultimate limit to the performance
vs. maximum degree tradeoff?
 Can we find a way to achieve the same
tradeoff without randomized algorithms?
 Why do optimizing distributions sometimes
force the codes to have low-weight
codewords?
A Big Question
 Can we derive the shannon limit in the
context of MP decoding of LDPC codes, so
that we can meet the inequalities with
equality?
Free Parameters within S.A.
 Rate
 Maximum check, variable degrees
 Proto-graph size
 Fraction of untransmitted variables
 Channel Parameter (SNR)
 Number of iterations in Simulated Annealing
Performance of Designed MET
Codes
 Shows performance
competitive with best
published codes
 Block error probability
<10-5 at 1.2 dB
 a soft error floor is
observed at very high
SNR, but not due to
low-weight codewords
Multi-edge-type construction
 Edges of a particular
"color" are connected
through a permutation.
 Edges become
specialized. Each edge
type has a different
message distribution
each iteration.
MET D.E. vs. decoder
simulation