Slides - NCSU COE People

Download Report

Transcript Slides - NCSU COE People

Recent Results in
Non-Asymptotic Shannon Theory
Dror Baron
Supported by AFOSR, DARPA, NSF, ONR, and Texas Instruments
Joint work with M. A. Khojastepour, R. G. Baraniuk, and S. Sarvotham
“we may someday see the end of wireline”
• S. Cherry, “Edholm’s law of bandwidth,” IEEE
Spectrum, vol. 41, no. 7, July 2004, pp. 58-60
But will there ever be enough data rate?
• R. Lucky, 1989:
“We are not very good at predicting uses until the
actual service becomes available. I am not worried;
we will think of something when it happens.”
 There will always be new applications that gobble
up more data rate!
How much can we improve wireless?
• Spectrum is limited natural resource
• Information theory says we need lots of power for
high data rates - even with infinite bandwidth!
• Solution: transmit more power BUT
– Limited by environmental concerns
– Will batteries support all that power?
• Sooner or later wireless rates will hit a wall!
Where can we improve?
• Algorithms and hardware gains
– Power-efficient computation
– Efficient power amplifiers
– Advances in batteries
– Directional antennas
• Communication gains
– Channel coding
– Source coding
– Better source and channel models
Where will “the last dB” of
communication gains come from?
Network information theory (Shannon theory)
Traditional point to point information theory
•
•
•
•
Single
Single
Single
Single
Encoder
source
transmitter
receiver
communication stream
Channel
• Most aspects are well-understood
Decoder
Network information theory
Encoder
Channel
Encoder
Channel
• Network of:
– Multiple sources
– Multiple transmitters
– Multiple receivers
– Multiple communication streams
Decoder
• Few results
• My goal: understand various costs of network
information theory
What costs has information
theory overlooked?
Channel coding has arrived…
Encoder
Channel
Decoder
• Turbo codes [Berrou et al., 1993]
– ~0.5 dB gap to capacity (rate R below capacity)
– BER=10-5
– Block length n=6.5£104
• Regular LDPC codes [Gallager, 1963]
• Irregular LDPC [Richardson et al., 2001]
– 0.13 dB gap to capacity
– BER=10-6
– n=106
Distributed source coding has also arrived…
x
Encoder
Decoder
y
• Encoder for x based on syndrome of channel code
• Decoder for x has correlated side information y
• Various types of channel codes can be used
• Slepian-Wolf via LDPC codes [Xiong et al., 2004]
– H(X|Y)=0.47
– R=0.5 (rate above Slepian-Wolf limit)
– BER=10-6
– Block length n=105
Hey! Did you notice those block lengths?
• Information theory provides results in the
asymptotic regime
– Channel coding: 8>0, rate R=C- achievable
with !0 as n!1
– Slepian-Wolf coding: 8>0, rate R=H(X|Y)+
achievable with !0 as n!1
• Best practical results achieved for n¸105
• Do those results require large n?
But we live in a finite world…
• Real world data doesn’t always have n¸106
– IP packets
– Emails, text messages
– Sensornet applications
(small battery ! small n)
• How do those methods perform for n=104? 103?
• How quickly can we approach the
performance limits of information theory?
And we don’t know the statistics either!
• Lossless coding (single source):
– Length-n input x~Ber(p)
– Encode with wrong parameter q
– K-L divergence penalty with variable rate codes
Performance loss (minor bitrate penalty)
• Channel coding, distributed source coding:
– Encode with wrong parameter q<p<0.5
– Fixed rate codes based on joint-typicality
– Typical set Tq for q is smaller than Tp for p
• As n!1, Pr(error)!1
Performance collapse!
Main challenges
• How quickly can we approach the
performance limits of information theory?
• Will address for channel coding and Slepian-Wolf
• What can we do when the source statistics
are unknown?
• Will address for Slepian-Wolf
But first . . .
What does the prior art indicate?
Underlying problem
• Shannon [1958]:
“This inverse problem is perhaps the more natural
in applications: given a required level of probability
of error, how long must the code be?”
– Motivation may have been phone and space
communication
– “Small” probability of codeword error 
• Wireless paradigm:
Given k bits, what are the minimal channel
resources to attain probability of error ?
– Can retransmit packet  fix “large” 
– n depends on packet length
– Need to characterize R(n,)
Error exponents
• Fix rate R<C and codeword length n
– Bounds on probability of error
– Random coding Pr[error]·2-nEr(R)
– Sphere packing Pr[error]¸2-nEsp(R)+o(n)
– Er(R)=Esp(R) for R near C
Error exponents
• Fix rate R<C and codeword length n
– Bounds on probability of error
– Random coding Pr[error]·2-nEr(R)
– Sphere packing Pr[error]¸2-nEsp(R)+o(n)
– Er(R)=Esp(R) for R near C
• Shannon’s regime:
“This inverse problem is perhaps the more natural
in applications: given a required level of probability
of error, how long must the code be?”
– Fix R<C
E(R)=O(1)
log()=O(n) good for “small” 
Error exponents
• Fix rate R<C and codeword length n
– Bounds on probability of error
– Random coding Pr[error]·2-nEr(R)
– Sphere packing Pr[error]¸2-nEsp(R)+o(n)
– Er(R)=Esp(R) for R near C
• Wireless paradigm:
Given k bits, what are the minimal channel
resources to attain probability of error ?
– Fix 
nE(R)=O(1)
– o(n) term dominates
Bounds diverge
Error exponents fail for R=C-/n0.5
How quickly can we approach the
channel capacity?
(known statistics)
Binary symmetric channel (BSC) setup
s
Encoder f
x=f(s)
y
Decoder g
s’=g(y)
z~Ber(n,p)
•
•
•
•
s2{1,…,M} input message
x, y, and z binary length-n sequences
z~Bernoulli(n,p) implies crossover probability p
Code (f,g,n,M,) includes:
– Encoder x=f(s)2{1,…,M}
– Rate R=log(M)/n
– Channel y=xz
– Decoder g reconstructs s by s’=g(y)
– Error probability Pr[g(y)s]·
Non-asymptotic capacity
• Definition: CNA(n,)=max9 code (f,g,n,M,) log(M)/n
• Theorem: Capacity of BSC is C=1-H(Z)=1-H2(p)
• Prior art by Wolfowitz [1978]
– Converse result CNA(n,) · C-KC()/n0.5
– Achievable result CNA(n,) ¸ C-KA()/n0.5
– Bounds are loose KA()>KC()
C
CNA(n,)
looseness
of bounds
n
• Can we tighten Wolfowitz’s bounds?
Key to solution – Packing typical sets
• Need to encode typical set TZ for z
– Code needs to “cover” z2Tz
Need Pr(z2TZ)¼ 1-
Probability  of codeword error
output
space
Tz
• What about rate?
– Output space = 2n possible sequences
– Can’t pack more than 2n/|Tz| sets into output
M·2n/|Tz|
– Minimal cardinality Tmin covers 1-
CNA·1-log(|Tmin|)
What’s the cardinality of Tmin?
• Consider empirical statistics nz=i zi, PZ=nz/n
– p<0.5  Pr(z) monotone decreasing in nz
 Minimal Tmin has form Tmin,{z: PZ·()}
• Determine () with central limit theorem (CLT)
– E[PZ]=p, Var(PZ)=p(1-p)/n
Pz~N(p,p(1-p)/n)
• Asymptotic
– =p+
– LLN: 0
• Non-asymptotic
– =p+[p(1-p)/n]0.5
– CLT:  ! ()
Tight non-asymptotic capacity
• Theorem:
– CNA(n,)=C-K()/n0.5+o(n-0.5)
– K()=-1() [p(1-p)]0.5 log((1-p)/p)
– Gap to capacity is K()/n0.5+o(n-0.5)
• Note: o(n-0.5) asymptotically negligible w.r.t. K/n0.5
C
CNA(n,)
tight
bound
n
 Tightened Wolfowitz bounds up to o(n-0.5)
 Gap to capacity of LDPC codes 2-3x greater
 We know how quickly we can approach C
Non-asymptotic capacity of BSC
Gaussian channel results
s
Encoder f
x=f(s)
y
Decoder g
s’=g(y)
z~N(0,2)
• Continuous channel
• Power constraint i(xi)2 · nP
• Shannon [1958] derived CNA(n,) for Gaussian
channel via cone packing (non-i.i.d. codebook)
• Information spectrum bounds on probabilities of
error indicate Gaussian codebooks are sub-optimal
 i.i.d. codebooks aren’t good enough!
Excess power of Gaussian channel
How quickly can we approach the
Slepian-Wolf limit?
(known statistics)
But first . . .
Slepian-Wolf Review
Slepian-Wolf setup
x
y
Encoder fX
Encoder fY
fX(x)
fY(y)
gX(fX(x),fY(y))
Decoder g
gY(fX(x),fY(y))
• x and y are correlated length-n sequences
• Code (fX,fY,gX,gY,n,MX,MY,X,Y) includes:
– Encoders fX(x)2{1,…,MX}, fY(y)2{1,…,MY}
– Rates RX=log(MX)/n, RY=log(MY)/n
– Decoder g reconstructs x and y by gX(fX(x),fY(y))
and gY(fX(x),fY(y))
– Error probabilities Pr[gX(fX(x),fY(y))x]·X and
Pr[gY(fX(x),fY(y))y]·Y
Slepian-Wolf theorem
• Theorem: [Slepian&Wolf,1973]
– RX¸H(X|Y) (conditional entropy)
– RY¸H(Y|X)
– RX+RY¸H(X,Y) (joint entropy)
RY
Slepian-Wolf
rate region
H(Y)
H(Y|X)
H(X|Y)
H(X)
RX
Slepian-Wolf with binary symmetric
correlation structure
(known statistics)
Binary symmetric correlation setup
z~Ber(q)
y~Ber(p)
x~Ber(r)
f(x)2{1,…,M}
Encoder f
Decoder g
g(f(x),y)2{0,1}n
•
•
•
•
y, z, and z are length-n Bernoulli sequences
Correlation channel z is independent of y
Bernoulli parameters p,q2[0,0.5), r=p(1-q)+(1-p)q
Code (f,g,n,M,) includes:
– Encoder f(x)2{1,…,M}
– Rate R=log(M)/n
– Decoder g(f(x),y)2{0,1}n
– Error probability Pr[g(f(x),y)x]·
Relation to general Slepian-Wolf setup
• x, y, and z are Bernoulli
• Correlation z independent of y

• Focus on encoding x at rate approaching H(Z)
• Neglect well-known encoding of y at rate RY=H(Y)
our setup
RY
H(Y)
H(Y|X)
H(Z)
H(X)
RX
Non-asymptotic Slepian-Wolf rate
• Definition: RNA(n,)=min9 code (f,g,n,M,) log(M)/n
• Prior art [Wolfowitz,1978]
– Converse result RNA(n,) ¸ H(X|Y)+KC()/n0.5
– Achievable result RNA(n,) · H(X|Y)+KA()/n0.5
– Bounds are loose KA()>KC()
RNA(n,)
H(X|Y)
looseness of
bounds
n
• Can we tighten Wolfowitz’s bounds?
Tight non-asymptotic rate
• Theorem:
– RNA(n,)=H(Z)+K()/n0.5+o(n-0.5)
– K()=-1() [q(1-q)]0.5 log((1-q)/q)
– Redundancy rate is K()/n0.5+o(n-0.5)
• Note: o(n-0.5) decays faster than K/n0.5
RNA(n,)
H(X|Y)
tight bound
n
 Tightened Wolfowitz bounds up to o(n-0.5)
 We know how quickly we can approach H(Z) with
known statistics
What can we do when the source
statistics are unknown?
(universality)
Universal setup
z~Ber(q)
y~Ber(p)
x~Ber(r)
Encoder f
ny=iyi
•
•
•
•
f(x)2{1,…,Mnx,ny}
Decoder g
g(f(x),y)2{0,1}n
Unknown Bernoulli parameters p, q, r
Encoder observes x and ny=iyi
Communication of ny requires log(n) bits
Variable rate used
– Need distribution for nz
– Distribution depends on nx and ny (not x)
– Codebook size Mnx,ny
Distribution of nz
• CLT was key to solution with known statistics
• How can we apply CLT when q is unknown?
• Consider a numerical example
– p=0.3, q=0.1, r=p(1-q)+(1-p)q
– PX=r, PY=p, PZ=q (empirical = true)
– We plot Pr(nz|nx,ny) as function of nz2{0,…,n}
Pr(nz|nx,ny) for n=102
Pr(nz|nx,ny) for n=103
Pr(nz|nx,ny) for n=104
Pr(nz|nx,ny) for n=104
where
Universal rate
• Theorem:
– RNA(n,)=H(P*Z)+K’()/n0.5+o(n-0.5)
– K’()=f(PY)K()
– f(PY)=2[PY(1-PY)]0.5/|1-2PY|
f(PY)!1
f(PY)!0
Why is f(PY) small when PY is small?
• Known statistics  Var(nz)=nq(1-q) regardless of
empirical statistics
• PY!0  can estimate nZ with small variance
 Universal scheme outperforms known statistics
when PY is small
• Key issue: variable rate coding (universal) beats
fixed rate coding (known statistics)
 Can cut down expected redundancy (known
statistics) by communicating ny to encoder
• log(n) bits for ny will save O(n0.5)
Redundancy for PY¼0.5
• f(PY) blows up as PY approaches 0.5
• Redundancy is O(n-0.5) with enormous constant
• Another scheme has O(n-1/3) redundancy
• Better performance for PY=0.5+O(n-1/6)
 Universal redundancy can be huge!
• Ongoing research: improvement of O(n-1/3)
Numerical example
• n=104
• q=0.1
• Slepian-Wolf requires nH2(q)=4690 bits
• Non-asymptotic approach (known statistics) with
=10-2 requires nRNA(n,)=4907 bits
• Universal approach with PY=0.3 requires 5224 bits
• With PY=0.4 we need 5863 bits
• In practice, penalty for universality is huge!
Summary
• Network information theory (Shannon theory) may
enable to increase wireless data rates
• Practical channel codes and distributed source
codes approaching limits, rely on large n
• How quickly can we approach the
performance limits of information theory?
 CNA=C-K()/n1/2+o(n-1/2)
 RNA=H(Z)+K()/n1/2+o(n-1/2)
 Gap to capacity of LDPC codes 2-3x greater
Universality
• What can we do when the source statistics
are unknown? (Slepian Wolf)
 PY<0.5: H(P*Z)+K’()/n1/2+o(n-1/2)
 PY¼0.5: H(P*Z)+O(n-1/3) – can be huge!
• Universal channel coding with feedback for BSC
– Capacity-achieving code requires PY=0.5
Universality with current scheme is O(n-1/3)
Encoder
Channel
feedback
Decoder
Further directions
• Gaussian channel (briefly discussed)
– Shannon [1958] derived CNA(n,) for Gaussian
channel with cone packing (non-i.i.d. codebook)
– Gaussian codebooks are sub-optimal!
• Other channels:
– CNA(n,) ¸ C-KA()/n0.5 via information spectrum
– Gaussian codebook distribution sub-optimal
Must consider non-i.i.d. codebook constructions
• Penalties for finite n and unknown statistics
exist everywhere in Shannon theory!!
www.dsp.rice.edu