Applications of Trees

Download Report

Transcript Applications of Trees

Trees (Ch. 9.2)
Longin Jan Latecki
Temple University
based on slides by
Simon Langley and
Shang-Hua Teng
Basic Data Structures - Trees

Informal: a tree is a
structure that looks
like a real tree (upside-down)

Formal: a tree is a
connected graph
with no cycles.
Trees - Terminology
size=7
subtree
value
x
b
c
root
e
d
m
a
nodes
leaf
Every node must have its value(s)
Non-leaf node has subtree(s)
Non-root node has a single parent node
A parent may have 1 or more children
height=2
Types of Tree
Binary Tree
Each node has at most 2 sub-trees
m-ary Trees
Each node has at most m sub-trees
Binary Search Trees
A binary search tree:
 … is a binary tree.
 if a node has value N, all values in its
left sub-tree are less than N, and all
values in its right sub-tree are greater
than N.
This is a binary search tree
This is NOT a binary search tree
5
4
3
7
2
8
9
Searching a binary search tree
search(t, s) {
If(s == label(t))
return t;
If(t is leaf) return null
If(s < label(t))
search(t’s left tree, s)
else
search(t’s right tree, s)}
Time per level
O(1)
O(1)
h
Total O(h)
Searching a binary search tree
search( t, s )
{ while(t != null)
{ if(s == label(t)) return t;
if(s < label(t)
t = leftSubTree(t);
else
t = rightSubTree(t);
}
return null;
Time per level
O(1)
O(1)
h
Total O(h)

Here’s another function that does the
same (we search for label s):
TreeSearch(t, s)
while (t != NULL and
if (s < label[t])
t = left[t];
else
t = right[t];
return t;
s != label[t])
Insertion in a binary search tree:
we need to search before we insert
Insert 6
6
Insert 11
11
5
6 11
3
2
always insert to a leaf
8
6
4
7
6
9
11
Time complexity ? O(height_of_tree)
O(log n) if it is balanced
n = size of the tree
Insertion
insertInOrder(t, s)
{ if(t is an empty tree) // insert here
return a new tree node with value s
else if( s < label(t))
t.left = insertInOrder(t.left, s )
else
t.right = insertInOrder(t.right, s)
return t }
Try it!!

Build binary search trees for the
following input sequences
• 7, 4, 2, 6, 1, 3, 5, 7
• 7, 1, 2, 3, 4, 5, 6, 7
• 7, 4, 2, 1, 7, 3, 6, 5
• 1, 2, 3, 4, 5, 6, 7, 8
• 8, 7, 6, 5, 4, 3, 2, 1
Comparison –
Insertion in an ordered list
insertInOrder(list, s) {
loop1: search from beginning of list, look for an item >= s
loop2: shift remaining list to its right, start from the end of list
insert s
}
Insert 6
6
2
6
3
6
4
6
5
6
7
6
Time complexity?
8
7
9
8
O(n) n = size of the list
9
Data Compression

Suppose we have 3GB character data file that we wish
to include in an email.
Suppose file only contains 26 letters {a,…,z}.
Suppose each letter a in {a,…,z} occurs with frequency
fa.
Suppose we encode each letter by a binary code
If we use a fixed length code, we need 5 bits for each
character
The resulting message length is 5 f a  f b   f z 

Can we do better?





Data Compression: A Smaller Example

Suppose the file only has 6 letters
{a,b,c,d,e,f} with frequencies
a
b
c
d
e
f
.45 .13 .12 .16 .09 .05
000 001 010 011 100 101
0 101 100 111 1101 1100


Fixed length
Variable length
Fixed length 3G=3000000000 bits
Variable length
.45 1  .13  3  .12  3  .16  3  .09  4  .05  4  2.24G
How to decode?

At first it is not obvious how
decoding will happen, but this is
possible if we use prefix codes
Prefix Codes



No encoding of a
character can be the prefix
of the longer encoding of
another character:
We could not encode t as
01 and x as 01101 since
01 is a prefix of 01101
By using a binary tree
representation we
generate prefix codes with
letters as leaves
0
1
0
1
e
a
1
0
0
t
n
1
s
Prefix codes allow easy decoding
Decode:
0
1
0
11111011100
1
e
s 1011100
a
1
0
0
t
n
1
s
sa 11100
san 0
sane
Prefix codes

A message can be decoded uniquely.

Following the tree until it reaches to a
leaf, and then repeat!

Draw a few more trees and produce
the codes!!!
Some Properties




Prefix codes allow easy decoding
An optimal code must be a full binary tree (a
tree where every internal node has two
children)
For C leaves there are C-1 internal nodes
The number of bits to encode a file is
B(T )   f (c)  length T c 
cC
where f(c) is the freq of c, lengthT(c) is the tree depth of c,
which corresponds to the code length of c
Optimal Prefix Coding Problem

Input: Given a set of n letters (c1,…, cn)
with frequencies (f1,…, fn).

Construct a full binary tree T to define a
prefix code that minimizes the average
code length
Average( T )  i 1 fi  length T ci 
n
Greedy Algorithms

Many optimization problems can be solved
using a greedy approach
• The basic principle is that local optimal decisions
•
•

may be used to build an optimal solution
But the greedy approach may not always lead to an
optimal solution overall for all problems
The key is knowing which problems will work with
this approach and which will not
We study
• The problem of generating Huffman codes
Greedy algorithms

A greedy algorithm always makes the choice
that looks best at the moment
• My everyday examples:
• Driving in Los Angeles, NY, or Boston for that matter
• Playing cards
• Invest on stocks
• Choose a university
• The hope: a locally optimal choice will lead to a
•

globally optimal solution
For some problems, it works
Greedy algorithms tend to be easier to code
David Huffman’s idea

A Term paper at MIT

Build the tree (code) bottom-up in a
greedy fashion
Each tree has a weight in its root and symbols as its leaves.
We start with a forest of one vertex trees representing
the input symbols.
We recursively merge two trees whose sum of weights
is minimal until we have only one tree.
Building the Encoding Tree
Building the Encoding Tree
Building the Encoding Tree
Building the Encoding Tree
Building the Encoding Tree
Building the Encoding Tree
Building the Encoding Tree
Building the Encoding Tree