Dalsa talk - McMaster University

Download Report

Transcript Dalsa talk - McMaster University

Context-based
Data Compression
Xiaolin Wu
Polytechnic University
Brooklyn, NY
Part 3. Context modeling
Context model – estimated
symbol probability
Variable length coding schemes need estimates
of probability of each symbol - model
Model can be
Static - Fixed global model for all inputs
English text
Semi-adaptive - Computed for specific data being
coded and transmitted as side information
C programs
Adaptive - Constructed on the fly
Any source!
7/17/2015
2
Adaptive vs. Semi-adaptive
Advantages of semi-adaptive
Simple decoder
Disadvantages of semi-adaptive
Overhead of specifying model can be high
Two-passes of data required
Advantages of adaptive
one-pass  universal  As good if not better
Disadvantages of adaptive
Decoder as complex as encoder
Errors propagate
7/17/2015
3
Adaptation with Arithmetic and
Huffman Coding
Huffman Coding - Manipulate Huffman tree on
the fly - Efficient algorithms known but
nevertheless they remain complex.
Arithmetic Coding - Update cumulative
probability distribution table. Efficient data
structure / algorithm known. Rest essentially
same.
Main advantage of arithmetic over Huffman is
the ease by which the former can be used in
conjunction with adaptive modeling techniques.
7/17/2015
4
Context models
If source is not iid then there is complex dependence
between symbols in the sequence
In most practical situations, pdf of symbol depends on
neighboring symbol values - i.e. context.
Hence we condition encoding of current symbol to its
context.
How to select contexts? - Rigorous answer beyond our
scope.
Practical schemes use a fixed neighborhood.
7/17/2015
5
Context dilution problem
The minimum code length of sequence x1 x2  xn
achievable by arithmetic coding, if P( xi | xi 1xi 2  x1 )
is known.
The difficulty of estimating P( xi | xi 1xi 2  x1 ) due
to insufficient sample statistics prevents the use
of high-order Markov models.
7/17/2015
6
Estimating probabilities different
contexts
Two approaches
Maintain symbol occurrence counts within each
context
number of contexts needs to be modest to avoid
context dilution
Assume pdf shape within each context same (e.g.
Laplacian), only parameters (e.g. mean and variance)
different
Estimation may not be as accurate but much
larger number of contexts can be used
7/17/2015
7
Entropy (Shannon 1948)
 Consider a random variable
X
{
x
,
x
,...,
x
N 1}
source alphabet : 0 1
probability mass functionpX:( xi )  prob( X  xi )
i( xisi )   log P( xi )
Self-information xofi
measured in bits if the log is base 2.
event of lower the probability carries more
information
Self-entropy is the weighted average of self
information
H ( X )   P( x ) log P( x )

i
i
i
7/17/2015
8
Conditional Entropy
 Consider two random variables
X
{
x
,
x
,...,
x
}
X
Alphabet of
: 0 1
N 1
Alphabet of C {:c0 , c1 ,...,cM 1}
C
and
xi of
Conditional Self-information
is
i( xi | cm )   log P( xi | cm )
Conditional Entropy is the average value of
conditional self-information
N 1 M 1
H ( X | C )   P( xi | cm ) P(cm ) log2 P( xi | cm )
i 0 m 0
7/17/2015
9
Entropy and Conditional Entropy
H ( X | C)
The conditional entropy
can be
X
interpreted as the amount of uncertainty
C we know
remaining about the , given that
random variable .
C
The additional knowledge
of
X
the uncertainty about
.
should reduce
H ( X | C)  H ( X )
7/17/2015
10
Context Based Entropy Coders
Consider a sequence of symbols
S
symbol to code
form the context space, C
0 0 1 0 0 1 1 0 1 1 1 0 1 0 0 1 0 0 0 0 1 0 0 1 1 1 0 1 0 1 1 0 1 1 0 1 1 0 ...
C0 0 0 0
C1 0 0 1
C7 1 1 1
7/17/2015
EC
EC
H (C0 )
H (C1 )
R
 P(C )H (C )
i
i
i
EC
H (C7 )
11
Decorrelation techniques to
exploit sample smoothness
Transforms
DCT, FFT
wavelets
Differential Pulse Coding Modulation (DPCM)
predict current symbol with past observations
d
xˆi  t xi t
t 1
 code prediction residual rather than the symbol
ei  xi  xˆi  xi  xˆi  ei
7/17/2015
12
Benefits of prediction and
transform
A priori knowledge exploited to reduce the self-entropy
of the source symbols
Higher coding efficiency due to
 Fewer parameters to be estimated adaptively
 Faster convergence of adaptation
7/17/2015
13
Further Reading
Text Compression - T.Bell, J. Cleary and I.
Witten. Prentice Hall. Good coverage on
statistical context modeling. Focus on text
though.
Articles in IEEE Transactions on Information
Theory by Rissanen and Langdon
Digital Coding of Waveforms: Principles and
Applications to Speech and Video. Jayant and
Noll. Good coverage on Predictive coding.
7/17/2015
14