Transcript here

Canonical Huffman trees:
Goals: a scheme for large alphabets with
• Efficient decoding
• Efficient coding
• Economic use of main memory
DL - 2004
Compression3 – Beeri/Feitelson
1
A non-Huffman same cost tree
symbol
frequency Code1
Code 2
(huffman)
decimal
a
10
000
000
0
b
11
001
001
1
c
12
100
010
2
d
13
101
011
3
e
22
01
10
4
f
23
11
11
5
Code 1: lca(e,b) = 0
code 2: lca(e,b) = 
Code 2: successive integers
(going down from longest codes)
DL - 2004
Compression3 – Beeri/Feitelson
2
tree for code 2:
a
b
c
d
0
1
2
3
e
f
4
5
Lemma: #(nodes) in each level in Huffman is even
Proof: a parent with single child is impossible
DL - 2004
Compression3 – Beeri/Feitelson
3
General approach:
let the maximal length be L
let ni = #(leaves) of length i, i  1,..., L (some possibly zeros)
allocate to L the numbers 0,..., nL  1, in binary
(complete by zeros on the left to length L, as needed)
these occupy nodes 0,...,nL / 2  1 (nL /2 nodes) on level L  1
allocate to L -1 the numbers nL / 2,..., nL / 2  nL 1  1
(complete by zeros on the left, to length L -1, as needed)
now nodes 0,...,(nL / 2  nL 1 )/2 on level L  2 are occupied
and so on, down to 1
DL - 2004
Compression3 – Beeri/Feitelson
4
Canonical Huffman Algorithm :
• compute lengths of codes and numbers of
symbols for each length (for regular Huffman)
L = max length
first(L) = 0
for i = L-1 downto 1 {
–
first(i) = (first(i+1) + n ) / 2
i 1
– assign to symbols of length
i codes of this length,
starting at first(i)
}
Q: What happens when there are no symbols of length i?
Does first(L) = 0< first(L-1)<…<First(1) always hold?
DL - 2004
Compression3 – Beeri/Feitelson
5
Decoding:
(assume we start now on new symbol)
i=1;
v = nextbit(); // we have read the first bit
while v<first(i) { // small codes start at large numbers!
i++;
v=2*v + nextbit();
}
/* now, v is code of length i of a symbol s
s is in position v –first(i) in the block of
symbols with code length i (positions from 0)
*/
Decoding can be implemented by shift/compare
(very efficient)
DL - 2004
Compression3 – Beeri/Feitelson
6
Data structures for decoder:
• The array first(i)
• Arrays S(i) of the symbols with code length i,
ordered by their code
(v-first(i) is the index value to get the symbol for code v)
Thus, decoding uses efficient arithmetic operations
+ array look-up – more efficient then storing a
tree and traversing pointers
What about coding (for large alphabets, where
symbols = words or blocks)?
The problem: millions of symbols  large Huffman
tree, …
DL - 2004
Compression3 – Beeri/Feitelson
7
Construction of canonical Huffman:
(sketch)
assumption: we have the symbol frequencies
Input: a sequence of (symbol, freq)
Output: a sequence of (symbol, length)
Idea: use an array to represent a heap for
creating the tree, and the resulting tree and
lengths
We illustrate by example
DL - 2004
Compression3 – Beeri/Feitelson
8
Example: frequencies: 2, 8, 11, 12
(each cell with a freq. also contains a symbol – not shown)
#5
#8
#7
#6
2
12
11
8
Now reps of 2, 8 (smallest) go out, rest percolate
#7
#6
2
12
11
8
The sum 10 is put into cell4, and its rep into cell 3
Cell4 is the parent (“sum”) of cells 5, 8.
#7
DL - 2004
#6
#3
10
#4
12
Compression3 – Beeri/Feitelson
11
#4
9
after one more step:
#6
#3
21
#3
#4
12
#3
#4
Finally, a representation of the Huffman tree:
#2
33
#2
#3
#4
#2
Next, by i=2 to 8, assign lengths
#2
DL - 2004
0
1
2
#4
#3
#4
(here shown after i=4)
#2
Compression3 – Beeri/Feitelson
#3
#4
10
Summary:
• Insertion of (symbol,freq) into array – O(n)
• Creation of heap – kn log n
• Creating tree from heap: each step is k log n
kn log n
total is
• Computing lengths O(n)
• Storage requirements: 2n (compare to tree!)
DL - 2004
Compression3 – Beeri/Feitelson
11
Entropy H: a lower bound on compression
How can one still improve?
Huffman works for given frequencies, e.g., for the
English language –
static modeling
Plus: No need to store model in coder/decoder
But, can construct frequency table for each file
semi-static modeling
Minus:
– Need to store model in compressed file (negligible for
large files)
– Takes more time to compress
Plus: may provide for better compression
DL - 2004
Compression3 – Beeri/Feitelson
12
3rd option: start compressing with default freqs
As coding proceeds, update frequencies:
After reading a symbol:
– compress it
– update freq table*
Adaptive modeling
Decoding must use precisely same algorithm for
updating freqs  can follow coding
Plus:
• Model need not be stored
• May provide compression that adapts to file,
including local changes of freqs
Minus: less efficient then previous models
* May use a sliding window to better reflect local changes
DL - 2004
Compression3 – Beeri/Feitelson
13
Adaptive Huffman:
• Construction of Huffman after each symbol: O(n)
• Incremental adaptation in O(logn) is possible
Both too expensive for practical use (large alphabets)
We illustrate adaptive by arithmetic coding (soon)
DL - 2004
Compression3 – Beeri/Feitelson
14
Higher-order modeling: use of context
e.g.: for each block of 2 letters, construct freq.
table for the next letter (2-order compression)
(uses conditional probabilities – hence improvement)
This also can be static/semi-static/adaptive
DL - 2004
Compression3 – Beeri/Feitelson
15
Arithmetic coding:
Can be static, semi-static, adaptive
Basic idea:
• Coder: start with the interval [0,1)
• 1st symbol selects a sub-interval, based on its
probability
• i’th symbol selects a sub-interval of (i-1)’th
interval, based on its probability
• When file ends, store a number in the final
interval
• Decoder: reads the number, reconstructs the
sequence of intervals, i.e. symbols
• Important: Length of file stored at beginning
of compressed file
(otherwise, decoder does not know when to stop)
DL - 2004
Compression3 – Beeri/Feitelson
16
Example: (static)
a ~ 3/4, b ~ 1/4
The file to be compressed: aaaba
The sequence of intervals (& symbols creating them):
[0,1), a [0,3/4), a [0,9/16), a [0, 27/64),
b [81/256, 108/256), a [324/1024, 405/1024)
Assuming this is the end, we store:
• 5 –length of file
• Any number in final interval, say 0.011 (3 digits)
(after first 3 a’s, one digit suffices!)
(for a large file, the length will be negligible)
DL - 2004
Compression3 – Beeri/Feitelson
17
Why is it a good approach in general?
For a symbol with large probability, # of binary
digits needed to represent an occurrence is
smaller than 1  poor compression with
Huffman
But, arithmetic represents such a symbol with a
small shrinkage of interval, hence the extra
number of digits is smaller than 1!
Consider the example above, after
DL - 2004
Compression3 – Beeri/Feitelson
aaa
18
Arithmetic coding – adaptive – an example:
The symbols: {a, b, c}
Initial frequencies: 1,1,1 (= initial accumulated freqs)
(0 is illegal, cannot code a symbol with probability 0!)
b: model passes to coder the triple (1, 2, 3):
– 1 : the accumulated freqs up to, not including, b
– 2 : the accumulated freqs up to, including, b
– 3 : the sum of freqs
Coder notes new interval [1/3, 2/3)
Model updates freqs to 1, 2, 1
c: model passes (3,4,4) (upper quarter)
Coder updates interval to [7/12,8/12)
Model updates freqs to (1,2,2)
And so on ….
DL - 2004
Compression3 – Beeri/Feitelson
19
Practical considerations:
• Interval ends are held as binary numbers
• # of bits in number to be stored proportional
to size of file – impractical to compute it all
before storing
solution: as interval gets small, first bit of a
number in it is determined. This bit is written
by code into compressed file, and “removed”
from interval ends (= mult by 2)
Example: in 1st example, when interval becomes
[0,27/64] ~ [000000,011011) (after 3 a’s)
output 0, and update to [00000,11011)
Decoder sees 1st 0: knows the first three are a’s,
Computes interval, “throws” the 0
DL - 2004
Compression3 – Beeri/Feitelson
20
• Practically, (de)coder maintain a word for each
number, computations are approximate
Some (very small) loss of compression
Both sides must perform same approximations at
“same time”
• Initial assignment of freq. 1 to
low freq. symbols?
Solution: assign 1 to all symbols not seen so far
If k were not seen yet, one now occurs, give it
1/k
• Since coder does not know when to stop, file
length must be stored in compressed file
DL - 2004
Compression3 – Beeri/Feitelson
21
• Frequencies data structure: need to allow both
update, and sums of the form  f i
(expensive for large alphabets)
i k
Solution: a tree-like structure
cell
binary
sum
1
1
f1
2
10
f1+f2
3
11
f3
What is the
algorithm to
compute  f i
4
100
f1+f2+f3+f4
5
101
f5
6
110
f5+f6
7
111
f7
O(logn) accesses!
8
1000
f1+…+f8
If k, the binary cell # ends
with i 0’s, the cell contains
fk+f_(k-1)+…+ f_(k-i+1)
i k
DL - 2004
Compression3 – Beeri/Feitelson
22
Dictionary-based methods
Huffman is a dictionary-based method:
Each symbol in dictionary has associated code
But, adaptive Huffman is not practical
Famous adaptive methods:
LZ77, LZ78 (Lempel-Ziv)
We describe LZ77 (basis of gzip in Unix)
DL - 2004
Compression3 – Beeri/Feitelson
23
Basic idea: The dictionary -- the sequences of
symbols in a window before current position
(typical window size: 212  214 )
• When coder at position p, window is the
symbols in positions p-w,…p-1
• Coder searches for longest seq. that matches
the one at position p
• If found, of length l, put (n,l) into file (n -offset, l length), and forward l positions,
else output the current symbol
DL - 2004
Compression3 – Beeri/Feitelson
24
Example:
• input is: a b a a b a b…b (11 b’s)
• Code is : a b (2,1) (1,1) (3,2) (2,1) (1,10)
• Decoding: a a, b b, (2,1)  a, (1,1)  a,
current known string: a b a a
(3,2)  b a, (2,1)  b
current known string: a b a a b a b
(1, 10)  Go back one step to b
do 10 times: output scanned symbol,
advance one
(note: run-length encoding hides here)
Note: decoding is extremely fast!
DL - 2004
Compression3 – Beeri/Feitelson
25
Practical issues:
1) Maintenance of window: use cyclic buffer
2) searching for longest matching word
 expensive coding
3) How to distinguish a pair (n,l) from a symbol?
4) Can we save on the space for (n,l)?
The gzip solution for 2-4:
2: a hash table of 3-sequences, with lists of
positions where a sequence starting with
them exists (what about short matches?)
An option: limit the search in the list (save time)
Does not always find the longest match, but loss is very
small
DL - 2004
Compression3 – Beeri/Feitelson
26
3: one bit suffices (but see below)
4: offsets are integers in range [1,2^k], often
smaller values are more frequent
Semi-static solution: (gzip)
• Divide file into segments of 64k; for each:
• Find the offsets used and their frequencies
• Code using canonical Huffman
• Do same for lengths
• Actually, add symbols (issue 3) to set of
lengths, code together using one code, and
put in file this code before offset code (why?)
DL - 2004
Compression3 – Beeri/Feitelson
27
One last issue (for all methods): synchronization
Assume you want to start decoding in mid-file?
E.g.: a db of files, coded using one code
• Bit-based addresses for the files --- these
addresses occur in many IL’s, which are
loaded to MM.
32/address is ok, 64/address may be costly
• Byte/word-based addresses allow for much
larger db’s. It may pay to even use k-word
blocks based addresses
how does one synchronize?
DL - 2004
Compression3 – Beeri/Feitelson
28
Solution: fill last block with 01…1
if code fills last block, add a block
Since file addresses/lengths are known, filling
can be removed
Does this work for Huffman? Arithmetic? LZ77?
What is the cost?
DL - 2004
Compression3 – Beeri/Feitelson
29
Summary of file compression:
• Large db’s  compression helps reduce storage
• Fast query processing requires synchronization
and fast decoding
• Db is often given, so statistics can be collected –
semi-static is a viable option
(plus regular re-organization)
• Context-based methods give good compression,
but expensive decoding
word-based Huffman is recommended (semi-static)
• Construct two models: one for words, another
for no-words
DL - 2004
Compression3 – Beeri/Feitelson
30
Compression of inverted lists




Introduction
Global, non-parametric methods
Global parametric methods
Local parametric methods
DL - 2004
Compression3 – Beeri/Feitelson
31
Introduction:
Important parameters:
• N - # of documents in db
• n - # of (distinct) words
• F - # of word occurrences
• f - # of inverted list entries
(in TREC, 99)
741,856
535,346
333,338,738
134,994,414
Total size: 2G
The index contains:
lexicon (MM, if possible), IL’s (Disc)
IL compression helps to reduce size of index,
cost of i/o
DL - 2004
Compression3 – Beeri/Feitelson
32
The IL for a term t contains f t entries
An entry:
d (= doc. id), {in-doc freq. f d,t , in-doc-position,…}
For ranked answers, the entry is usually (d, f d,t )
We consider each separately – independent
compressions, can be composed
DL - 2004
Compression3 – Beeri/Feitelson
33
Compression of doc numbers:
A sequence of numbers in [1..N]; how can it be
compressed?
Most methods use gaps :
g1=d1, g2=d2-d1, …
We know that
•
 gi  N
• For long lists, most are small.
These facts can be used for compression
(Each method has an associated probability distribution
on the gaps, defined by code lengths: li log(1/ pi ) )
DL - 2004
Compression3 – Beeri/Feitelson
34
Global, non-parametric methods
Binary coding:
represent each gap by a fixed length binary
number
log N
Code length for g:
Probability: uniform distribution: p(g)=1/N
DL - 2004
Compression3 – Beeri/Feitelson
35
Unary coding:
represent each g>0 by d-1 digits 1, then 0
1 -> 0, 2 -> 10, 3 -> 110, 4-> 1110, …
Code length for g: g
  gi  d f t
Worst case for sum: N (hence for all IL’s: nN)
is this a nice bound?
P(g): 2 g
Exponential decay; if does not hold in practice
 compression penalty
DL - 2004
Compression3 – Beeri/Feitelson
36
Gamma ( ) code :
a number g is represented by
• Prefix ‫רישא‬: unary code for* 1  log g 
• Suffix ‫ סיפא‬:binary code, with log g  digits,
for d  2log g 
Examples:
1: 1  log1  1, prefix is 0, suffix is empty, code is 0
7 : 1  log 7   3, prefix is 110, suffix is 11, code is 11011
18 :
Code length for g: 1  2 log g 
Probability: p( g )  2
(*Why notlog g  ?)
DL - 2004
 (1 2 log g  )
1
1 2
 2  g
2g
2
Compression3 – Beeri/Feitelson
37
Delta (  )
code :
prefix: represent 1  log g  in gamma code
suffix: represent g  2 
log g 
in binary (as in gamma)
Examples:
7: 1  log 7   3, 1  log 3  2, prefix is 101
suffix is 11, code is 10111
0:
Code lenght for g: 1  2 log(1  log g )   log g 
1  2 log log g  log g
Probability: p( g )  2
DL - 2004
 (1 2 1 log g    log g  )
1

2 g (log g ) 2
Compression3 – Beeri/Feitelson
38
Interim summary:
We have codes with probability distributions :
unary
2 g
gamma
1
2g 2
delta
1
2 g (log g ) 2
binary
log N
Q: can you prove that the (exact) formulas for
probabilities for gamma, delta sum to 1?
DL - 2004
Compression3 – Beeri/Feitelson
39
Golomb code:
Semi-static, uses db statistics
 global, parametric code
1) Select a basis b (based on db statistics – later)
2) g>0  we represent g-1
• Prefix: let q  ( g  1) / b 
(integer division)
represent, in unary, q+1
• Suffix: the remainder is (g-1)-qb (in [0..b-1])
represent by a binary tree code
- some leaves at distance log b 
- the others at distance log b 
DL - 2004
Compression3 – Beeri/Feitelson
40
The binary tree code:
let k  log b , j  2k  b
cut 2j leaves from the full binary tree of depth k
assign leaves, in order, to the values in [0..b-1]
Example: b=6
0
1
2
DL - 2004
Compression3 – Beeri/Feitelson
3
4
5
41
Summary of Golomb code:
Prefix: unary code of 1  ( g  1) / b
Suffix: code of length between log b and log b
length  1  ( g  1) / b   log b 
1
1
1
1
probability   ( g 1) / b   1/ b g 1
2b 2
2b (2 )
Exponential decay like unary, slower rate,
affected by b
Q: what is the underlying theory?
Q: how is b chosen?
DL - 2004
Compression3 – Beeri/Feitelson
42
Infinite Huffman Trees :
Example: Consider
The code
p(i )  2i , i  1,....
(*) 0, 10, 110, 1110, …
seems natural, but Huffman algorithm is not
applicable! (why?)
For each m, consider the (finite) m-approximation
1 1
1 1
1
1
, ,..., m , m , where the last entry is m   i
2 4
2 2
2
i m 2
• each has a Huffman tree code: 0, 10, …, 1…10
• the code for m+1 refines that of m
• The sequence of codes converges to (*)
DL - 2004
Compression3 – Beeri/Feitelson
43
0
1/2
1
0 1/2 1
1/4
DL - 2004
approximation 1, code words: 0, 1
approximation 2, code words: 0, 10,11
approximation 3, code words: 0, 10,110,111
0 1/4 1
0
1/8 1 approximation 1, code words: 0, 10, 110,
1/8
1110, 1111
1/16
1/16
Compression3 – Beeri/Feitelson
44
A more general approximation scheme:
Given: the sequence a1 , a2 ,........
An m-approximation, with skip b is the finite
sequence a1 , a2 ,..., am , am 1 ,..., am b where
am  i 
am i  jb

i 1,...,b
j 0
for example: b = 3:
a7 a8 a9
approximated tail
DL - 2004
Compression3 – Beeri/Feitelson
45
Fact: refining the m-approx. by splitting am 1
to am 1 and am  b 1
gives the m+1-approx.
A sequence of m-approximations is good if
(*) am 1 , am b1 are the smallest in the sequence,
a1 , a2 ,..., am , am 1 , am  2 ..., am b , am b1
so they are the 1st pair merged by Huffman
(why is this important?)
(*) Depends on the ai and on b
DL - 2004
Compression3 – Beeri/Feitelson
46
i 1
Let ai  (1  p )  p -- the Bernoulli distribution
A decreasing sequence a1  a2  ........
 am 1 is smallest among a1 ,..., am 1
am b1 is smallest among am  2 ,..., am b1
a1
am  2
am  b 1
am 1
to prove (*), need to show:
(i) am 1  am b
(ii) am b1  am
For which b do they hold?
DL - 2004
Compression3 – Beeri/Feitelson
47
am i   am i  jb   (1  p ) m  i  jb 1  p  (1  p ) m i 1  p   (1  p ) jb
j 0
j 0
 (1  p ) m i 1  p 
j 0
1
1  (1  p )b
Hence, (i)  (1  p ) m 11  p  (1  p ) m  b 1  p 
1
1  (1  p )b
 1  (1  p ) b 1  (1  p) b
And, (ii)  (1  p )
m  b 11
1
m 1
 p

(1

p
)
p
b
1  (1  p )
 (1  p )b  (1  p )b 1  1
together (1  p)b  (1  p)b1  1  (1  p)b1  (1  p)b
DL - 2004
Compression3 – Beeri/Feitelson
48
We select < on the right (useful later):
(**)
(1  p)b  (1  p)b1  1  (1  p)b1  (1  p)b
has a unique solution
To solve, from the left side, we obtain
1
b
b
(1  p ) [1  p  1]  1  (1  p ) 
2 p
log(2  p )
 b  log(1  p )   log(2  p )  b  
log(1  p )
Hence the solution is (b is an integer):
 log(2  p 
b  

log(1

p
)


DL - 2004
Compression3 – Beeri/Feitelson
49
Next: how do these Huffman trees look like?
Start with 0-approx. a1 ,..., ab
Facts:
1. A decreasing sequence (so last two are smallest)
2. ab1  ab  a1
(when b>3)
1
i 1
follows from
ai  p  (1  p ) 
1  (1  p )b
and (**)
3. Previous two properties are preserved when
last two are replaced by their sum
4. The Huffman tree for the sequence assigns to
log b  or log b
codes of lengths
of
same cost as the Golomb code for remainders
Proof: induction on b
DL - 2004
Compression3 – Beeri/Feitelson
50
Now, expand the approximations, to obtain infinite
tree:
To obtain code for aqb j start from a j and split q times
0
aj
aj
0
1
a j b
1
a j b
a j 2b
a j  3b
a j 2b
0
a j  3b
1
a j 4b
The code for aqb j is that of a j followed by q  1 1's then 0
This is the Golomb code, (with places of
prefix/suffix exchanged)!!
DL - 2004
Compression3 – Beeri/Feitelson
51
Last question:
where do we get p, and why Bernoulli?
Assume equal probability p for t to be in d
For a given t, probability of the gap g from one
doc to next is then (1  p) g 1  p
For p: there are f pairs (t, d), estimate p by
f
p
nN
Since N is large, a reasonable estimate
DL - 2004
Compression3 – Beeri/Feitelson
52
For TREC:
p
135 106
500 103  750 106
To estimate
0.00036
 log(2  p ) 
b  
,

 log(1  p ) 
for a small p
log(2-p) ~ log 2, log(1-p) ~ -p
b ~ (log 2)/p ~ 0.69 nN/f = 1917
end of (blobal)
DL - 2004
Golomb
Compression3 – Beeri/Feitelson
53
Global observed frequency:
(a global method)
• Construct all IL’s collect statictics on
frequencies of gaps
• Construct canonical Huffman tree for gaps
The model/tree needs to be stored
(gaps are in [1..N]; for TREC this is 3/4M gap values 
storage overhead may not be so large)
Practically, not far from gamma, delta,
But local methods are better
DL - 2004
Compression3 – Beeri/Feitelson
54
Local (parametric) methods:
Coding of IL(t) based on statistics of IL(t)
Local observed frequency:
Construct canonical Huffman for IL(t) based on
its gap frequencies
Problem: in small IL’s # of distinct gaps is close
to # of gaps
Size of model close to size of compressed data
Example: 25 entries, 15 gap values
Model: 15 gaps, 15 lengths (or freqs)
Way out: construct model for groups of IL’s
(see book for details)
DL - 2004
Compression3 – Beeri/Feitelson
55
Local Bernoulli/Golomb:
Assumption: f t - # of entries of IL(t) is known
(to coder & decoder)
take p ~ f t / N ,
estimate b & construct Golomb
Note:
Large f_t  larger p  smaller b  code gets
close to unary (reasonable, many small gaps)
Small f_t  large b  most coding ~ log b
For example: f_t = 2 (one gap)  b ~ 0.69N
for a gap < 0.69/N, code in log(0.69N)
for a larger gap, one more bit
DL - 2004
Compression3 – Beeri/Feitelson
56
Interpolative coding:
Uses original d’s , not g’s
Let f = f_t, assume d’s are stored in L[0,…,f]
(each entry is at most N)
• Standard binary for middle d, with # of bits
determined by its range
• Continue as in binary search; each d in binary,
with # of bits determined by modified range
DL - 2004
Compression3 – Beeri/Feitelson
57
Example: L[3,8,9,11,12,13,18] (f=7) N=20
• H  7 div 2 = 3
L[3] = 11 (4’th d)
smallest d is 1, and there are 3 to left of L[3]
largest d is 20, there are 3 to right of L[3]
size of interval is (20-3)-(1+3)=17-4=13
 code 11 in 4 bits
• For sub-list left of 11: 3, 8, 9
h  3 div 2 = 1
L[1] = 8
bounds: lower: 1+1 = 2; upper = 10-1=9
code using 3 bits
• For L[2] = 9, range is [9..10], use 2 bits
• For sub-list right of 11 – do on board
(note the element that is coded in 0 bits!)
DL - 2004
Compression3 – Beeri/Feitelson
58
Advantages:
• Relatively easy to code, decode
• Very efficient for clusters (a word that occurs in
many documents close to each other)
Disadvantage: more complex to implement,
requires a stack
And cost of decoding is a bit more than Golomb
------ ---------- -------- ----------- ------- -------Summary of methods : Show table 3.8
DL - 2004
Compression3 – Beeri/Feitelson
59
An entry in IL(t) also contains f d ,t - freq. of t in d
compression of f_{d,t}:
In TREC, F/f ~ 2.7  these are small numbers
Unary: total overhead is  f d ,t  F
Cost per entry is F/f (for TREC: 2.7)
Gamma: shorter than unary, except for 2, 4
(For TREC: ~2.13)
Does not pay the complexity to choose another
Total cost of compression of IL: 8-9 bits/entry
DL - 2004
Compression3 – Beeri/Feitelson
60