Transcript Lecture 15

Functional Programming
Lecture 15 Case Study:
Huffman Codes
The Problem
Design a coding/decoding scheme and implement in
Haskell.
This requires:
- an algorithm to encode a message,
- an algorithm to decode a message,
- an implementation.
Fixed and Variable
Length Codes
A fixed length code assigns the same number of bits to
each code word.
E.g. ASCII letter -> 7 bits (up to 128 code words)
So to encode the string “at” we need 14 bits.
A variable length code assigns a different number of
bits to each code word, depending on the frequency of
the code word. Frequent words are assigned short
codes; infrequent words are assigned long codes.
E.g.
a
b
t
tree to encode and decode
“at” encoded by 011
0 for go left
1 for go right
Coding
0
1
a
0
1
b
t
a is encoded by 1 bit, 0
b is encoded by 2 bits, 10
t is encoded by 2 bits, 11
An important property of a Huffman code is that the
codes are prefix codes: no code of a letter (code word)
is the prefix of the code of another letter (code word).
E.g. 0 is not a prefix of 10 or 11
10 is not a prefix of 0 or 11
11 is not a prefix of 0 or 10
So, “aa” is encoded by 00.
“ba” is encoded by 100.
Decoding
0
1
a
0
1
b
t
The encoded message 1001111011is decoded as:
10 - b
0 -a
11 - t
11 - t
0 -a
11 - t
In view of the frequency of t, this is probably not a
good code. t should be encoded by 1 bit!
ps. Morse code is a type of Huffman code.
A Haskell
Implementation
Types
-- codes -data Bit = L | R deriving (Eq, Show)
type Hcode = [Bit]
-- Huffman coding tree --- characters at leaf nodes, plus frequencies --- frequencies as well at internal nodes -data Tree = Leaf Char Int | Node Int Tree Tree
Assume that codes are kept in table (rather than read
off a tree).
-- table of codes -type Table = [(Char, Hcode)]
Encoding
-- encode a message according to code table --- encode each character and concatenate -codeMessage :: Table -> [Char] -> Hcode
codeMessage tbl = concat . map (lookupTable tbl)
-- lookup the code for a character in code table -lookupTable :: Table -> Char -> Hcode
lookupTable [] c = error “lookupTable”
lookupTable ((ch,code):tbl) c
| ch == c = code
| otherwise = lookupTable tbl c
Decoding
-- decode a message according to code tree
-- if at a leaf node, then character is decoded,
-start again at root
-- if at an internal node, then follow sub-tree
-- according to next code bit
decode :: Tree -> Hcode -> [Char]
decode tr = decodetree tr
where
decodetree (Node f t1 t2) (L:rest)
= decodetree t1 rest
decodetree (Node f t1 t2) (R:rest)
= decodetree t2 rest
decodetree (Leaf ch f) rest
= ch:(decodetree tr rest)
------
Example
codetree = Node 3 (Leaf ‘a’ 0)
(Node 3 (Leaf ‘b’ 1) (Leaf ‘t’ 2))
-- assume ‘a’ is most frequent, denoted by smallest --- number -message = [R,L,L,R,R,R,R,L,R,R]
decode codetree message
=>
decodetree Node 3 t1 (Node 3 ..)
R: [L,L,R,R,R,R,L,R,R]
=> decodetree (Node 3 (Leaf ‘b’ 1) (Leaf ‘t’ 2))
L: [L,R,R,R,R,L,R,R]
=> decodetree ( Leaf ‘b’ 1) L:[R,R,R,R,L,R,R]
=> ‘b’ : decodetree Node 3 (Leaf ‘a’ 0) (Node 3 ..)
L: [R,R,R,R,L,R,R]
=> ‘b’: decodetree (Leaf ‘a’ 0)) [R,R,R,R,L,R,R]
=> ‘b’ : ‘a’: decodetree Node 3 (Leaf ‘a’ 0) (Node 3 ..)
[R,R,R,R,L,R,R]
We still have to make:
the code tree
the code table
(Next lecture!)