Data Compression

Download Report

Transcript Data Compression

Increasing Information per Bit
•
Information in a source
–
–
Mathematical Models of Sources
Information Measures
• Compressing information
– Huffman encoding
• Optimal Compression for DMS?
– Lempel-Ziv-Welch Algorithm
• For Stationary Sources?
• Practical Compression
•
Quantization of analog data
–
–
–
–
Scalar Quantization
Vector Quantization
Model Based Coding
Practical Quantization
•
•
•
m-law encoding
Delta Modulation
Linear Predictor Coding (LPC)
Huffman encoding
• Variable length binary code for DMS
– finite alphabet, fixed probabilities
• Code satisfies the Prefix Condition
– Codes are instantaneously and unambiguously
decodable as they arrive
e.g, 0,10,110,111 is OK
0,01,011,111 is not OK 01 = 0,111, or 01
Huffman encoding
• Use Probabilities to order coding priorities of letters
– Low probability get codes first (more bits)
– This smoothes out the information per bit
Huffman encoding
•
•
•
•
•
Use a code tree to make the code
Combine the symbols with lowest probability to make a new block symbol
Assign a 1 to one of the old symbols code word and 0 to the other symbol
Now reorder and combine the two lowest probability symbols of the new set
Each time the synthesized block symbol has lowest probability the code words
get shorter
x1
x2
x3
x4
x5
x6
x7
D0
D1
D2
D3
D4
00
01
10
110
1110
11110
11111
Huffman encoding
• Result:
– Self Information or Entropy
• H(X) = 2.11 (The best possible average number of bits)
– Average number of bits per letter
_
7
R   nk P( xk )
k 1
 2.21
nk = number of bits per symbol
So the efficiency =
H (X )
_
R
 95.5%
Huffman encoding
• Lets compare to simple 3 bit code
x1
x2
x3
x4
x5
x6
x7
00
01
10
110
1110
11110
11111
R=
Efficiency
2 0.35 000
2
0.3 001
2
0.2 010
3
0.1 011
4 0.04 100
5 0.005 101
5 0.005 110
2.21
95%
3
3
3
3
3
3
3
R=
0.35
0.3
0.2
0.1
0.04
0.005
0.005
3
70%
Huffman encoding
• Another example
x1
x2
x3
x4
x5
x6
x7
X8
D0
D1
D2
D3
00
010
011
100
101
110
1110
1111
Huffman encoding
• Multi-symbol Block codes
– Use symbols made of original symbols
x1 x1 , x1 x2  x7 x7  J  L2 Symbols
_
Can show the new codes information per bit ( R J ) satisfies:
_
Rj
1
 H (X ) 
J
J
_
RJ
lim
 H(X )
J  J
So large enough block code gets you as close to H(X) as you want
Huffman encoding
• Lets consider
a J=2 block
code example
Encoding Stationary Sources
• Now there are joint probabilities of blocks
of symbols that depend on previous bits
P( x1 x2  xk )  P( x1 ) P( x2 | x1 ) P( x3 | x1 x2 )  P( xk | x1  xk 1 )
 P( x1 ) P( x2 ) P( x3 )  P( xk ) Unless DMS
Can show the joint entropy is:
H ( X 1 X 2  X k )  H ( X 1 )  H ( X 2 | X 1 )    H ( X k | X 1 X 2  X k 1 )
k
  H (Xm)
m 1
Which means less bits than a symbol by symbol code can be used
Encoding Stationary Sources
• H(X | Y) is the conditional entropy
Joint (total) probability of xi
n
m
H ( X | Y )   P( xi | y j ) P( y j ) log 2 P( xi | y j )
i 1 j 1
Information in xi given yj
Can show:
n
m
H ( X | Y )   P ( y j | xi ) P( xi ) log 2 (
i 1 j 1
P ( y j | xi )
P( y j )
P ( xi ))
Conditional Entropy
• Plotting this for n = m = 2 we see that when
Y depends strongly on X then H(X|Y) is low
P(Y=0|X=0)
P(Y=1|X=1)
Conditional Entropy H(X|Y) vs P(X=0) for various P(Y|X)
1.20E+00
0.05
H(X|Y)
1.00E+00
0.15
8.00E-01
0.35
6.00E-01
0.5
4.00E-01
0.75
2.00E-01
0.9999
0.00E+00
0.00E+00
2.00E-01
4.00E-01
6.00E-01
P(X=0)
8.00E-01
1.00E+00
1.20E+00
Conditional Entropy
• To see how P(X|Y) and P(Y|X) relate consider:
• They are very simlar when P(X=0) ~ 0.5
P(X=0|Y=0)
1.20E+00
P(Y=0|X=0)
1.00E+00
0.05
0.15
0.35
0.5
0.75
0.95
8.00E-01
6.00E-01
4.00E-01
2.00E-01
0.00E+00
0.00E+00
2.00E-01
4.00E-01
6.00E-01
P(X=0)
8.00E-01
1.00E+00 1.20E+00
Optimal Codes for Stationary Sources
• Can show that for large blocks of symbols
Huffman encoding is efficient
Define
Then Huffman code gives:
Now if
1
H J ( X )  H ( X1 X 2  X J )
J
_
1
HJ (X )  R  HJ (X ) 
J
J 
_
Get:
H ( X )  R  H ( X )  
 0
J 
i.e., Huffman is optimal
Lempel-Ziv-Welch Code
• Huffman encoding is efficient but need to
know joint probabilities of large blocks of
symbols
• Finding joint probabilities is hard
• LZW is independent of source statistics
• Is a universal source code algorithm
• Is not optimal
Lempel-Ziv-Welch
• Build a table from strings not already in the table
• Output table location for strings in the table
• Build the table again to decode
Input String = /WED/WE/WEE/WEB/WET
Characters Input Code Output New code value
New String
/
/
256
/W
W
W
257
WE
E
E
258
ED
D
D
259
D/
/W
256
260
/WE
E
E
261
E/
/WE
260
262
/WEE
E/
261
263
E/W
WE
257
264
WEB
B
B
265
B/
/WE
260
266
/WET
T
T
Source: http://dogma.net/markn/articles/lzw/lzw.htm
Lempel-Ziv-Welch
• Decode
Input Codes: / W E D 256 E 260 261 257 B 260 T
Input/
NEW_CODE
OLD_CODE
STRING/
Output
/
/
/
W
/
E
CHARACTER
New table entry
W
W
256 = /W
W
E
E
257 = WE
D
E
D
D
258 = ED
256
D
/W
/
259 = D/
E
256
E
E
260 = /WE
260
E
/WE
/
261 = E/
261
260
E/
E
262 = /WEE
257
261
WE
W
263 = E/W
B
257
B
B
264 = WEB
260
B
/WE
/
265 = B/
T
260
T
T
266 = /WET
Lempel-Ziv-Welch
• Typically takes hundreds of entries to table
before compression occurs
• Some nasty patents make licensing an issue