Transcript Tree

CS121 Data Structures
Trees
Each entry in a List (Stack, Queue) has at most one
predecessor and one successor.
In a Tree each entry has at most one predecessor, but can
have more than one successor.
CS121 © JAS 2005
2-1
CS121 Data Structures
In general trees can have any number of successors, but
usually we restrict ourselves to a maximum of two
successors - BINARY TREES.
A tree is either
• Empty, or
• Consists of a value and two sub-trees (left and right)
CS121 © JAS 2005
2-2
CS121 Data Structures
For each non-empty tree there is a unique node with no
predecessor – the root node
Nodes which have no successors are termed leaf nodes
All other nodes have exactly one predecessor and either one
or two successors – internal nodes
CS121 © JAS 2005
2-3
CS121 Data Structures
Various subsets of trees can be identified –
• Binary Trees (maximum of two successors)
• Strictly Binary Trees (nodes have either 0 or 2 successors)
• Full Binary Trees (all leaves at same depth)
• Complete Binary Trees (all levels except last one are full,
and last level is filled from left)
CS121 © JAS 2005
2-4
CS121 Data Structures
Various characteristics of trees can be measured
• Number of nodes - Nodes
• Number of leaves - Leaves
• Depth (longest path from root to leaf) – Depth
Nodes(EmptyTree) = 0
Nodes(Tree) = 1 + Nodes(Tree.Left) + Nodes(Tree.Right)
CS121 © JAS 2005
2-5
CS121 Data Structures
Leaves(EmptyTree) = 0
Leaves(RootOnly) = 1
Leaves(Tree) = Leaves(Tree.Left) + Leaves(Tree.Right)
Depth(Tree) = 1 + maximum(Depth(Tree.Left), Depth(Tree.Right))
Depth(EmptyTree) = – 1
CS121 © JAS 2005
2-6
CS121 Data Structures
Example Application – Representing an Arithmetic Expression
Nodes represent operators
Leaves represent operands
( b ^ 2 – 4 * a * c ) / ( 2 * a )
/
–
*
^
b
*
2
*
4
2
a
c
a
CS121 © JAS 2005
2-7
CS121 Data Structures
Traversing a tree (visit all nodes in a tree)
• Left, Root, Right
• Left, Right, Root
• Root, Left, Right
- Inorder or Symmetric
- Postorder
- Preorder or Depth-First
• Level by Level
- Breadth-First
Each of these traversals can be reversed
CS121 © JAS 2005
2-8
CS121 Data Structures
Defining a tree by operations/methods
Create a tree
either an empty a tree
or, a tree consisting solely of a root
IsEmpty – determine if a tree is empty
CS121 © JAS 2005
2-9
CS121 Data Structures
Accessing/Moving round a tree
GetData – view data in a node
GetLeft – move to/access left sub-tree
GetRight – move to/access right sub-tree
Adding to a tree
can be application dependent
simplest methods add a sub-tree to an empty left (or right)
subtree
Adding elsewhere involves reshaping the tree
CS121 © JAS 2005
2-10
CS121 Data Structures
Defining a Tree Class
public class BTNode
{ private Object data;
private BTNode left, right;
public BTNode(Object initialData,
BTNode initialLeft,
BTNode initialRight)
{ data = initialData;
left = initialLeft;
right = initialRight;
}
CS121 © JAS 2005
2-11
CS121 Data Structures
public Object getData()
{ return data; }
public BTNode getLeft()
{ return left; }
public BTNode getRight()
{ return right; }
CS121 © JAS 2005
2-12
CS121 Data Structures
public Object getLeftmostData()
{ if (left == null)
return data;
else
return left.getLeftmostData();
}
public Object getRightmostData()
{
………
}
CS121 © JAS 2005
2-13
CS121 Data Structures
public void inorderPrint()
{ if (left != null)
left.inorderPrint();
System.out.println(data);
if (right != null)
right.inorderPrint();
}
CS121 © JAS 2005
2-14
CS121 Data Structures
public BTNode removeLeftmost()
{ if (left == null)
return right; //leftmost node is root
else
{ left = left.removeLeftmost();
return this;
}
}
CS121 © JAS 2005
2-15
CS121 Data Structures
public void setData(Object newData)
{ data = newData; }
public void setleft(BTNode newLeft)
{ left = newLeft; }
CS121 © JAS 2005
2-16
CS121 Data Structures
public static BTNode treeCopy(BTNode src)
{ BTNode leftCopy, RightCopy;
if (src == null)
return null;
else
{ leftCopy = treeCopy(src,left);
rightCopy = treeCopy(src.right);
return new
BTNode(src.data,leftCopy,rightCopy);
}
}
CS121 © JAS 2005
2-17
CS121 Data Structures
Binary Search Trees
In a binary search tree, a node N with a key K is inserted so
that
• the keys in the left subtree of N are less than K, and
• the keys in the right subtree of N are greater than K.
To summarise, for each node N in a binary search tree:
Keys in left subtree of N
<
Key K in node N
<
Keys in right subtree of N.
CS121 © JAS 2005
2-18
CS121 Data Structures
public BTNode addEntry(Object newData)
{ if (this == null)
return (new BTNode(newData,null,null));
else
{ if newData.lessthan(data)
left = left.addEntry(newData);
else
right = right.addEntry(newData);
}
return this;
}
CS121 © JAS 2005
2-19
CS121 Data Structures
public boolean findEntry(Object wanted)
{ if this == null
return false;
else
if wanted.lessthan(data)
return left.findEntry(wanted);
else
if wanted.greaterthan(data)
return right.findEntry(wanted);
else //assume equal to data
return true;
}
CS121 © JAS 2005
2-20
CS121 Data Structures
Deleting From a Binary Search Tree
• If node to be deleted is a leaf – no problem, just delete
• If node has only one subtree – replace node with child
• If node has two subtrees
– Determine larger sub-tree (left/right)
– Select least/greatest node of larger sub-tree
– Move that node up
– This will always be a leaf node or a node with only one
sub-tree so process ends
CS121 © JAS 2005
2-21
CS121 Data Structures
h
d
e
f
b
a
l
c e
j
g i
CS121 © JAS 2005
n
k m
o
2-22
CS121 Data Structures
Binary Tree Application - Huffman Codes
Aim is to encode more frequent letters with shorter bit sequences
Algorithm
• Count frequencies of letters
• Arrange into ascending order (linked list or binary search tree?)
• Using two least frequent letters construct binary tree such that
– Left leaf is least frequent letter; label left branch 1
– Right leaf is next least frequent letter; label right branch 0
– Root is given value = sum of leafs
• Replace first two entries in frequency list with entry representing
sum (ensure list remains sorted)
• Repeat
2-23
CS121 © JAS 2005
CS121 Data Structures
Encoding a string
Replace each letter with bit code derived from path from
root of tree to leaf containing that letter
Decoding a string
Read binary code such that
1
– descend left
0
– descend right
at leaf – output letter
CS121 © JAS 2005
2-24
CS121 Data Structures
This is a simple message
0
1
1
1
m
0
1
g
0 1
h l
0
1
0
s
mgh 4
lpat 5
ei 6
smgh 9
1
1
e
0
0 1
p a
0
t
CS121 © JAS 2005
0
i
a
e
g
h
i
l
m
p
s
t
gh
lp
at
2
3
1
1
3
1
2
1
5
1
2
2
3
2-25
CS121 Data Structures
Binary Tree Application - Priority Queues
A heap is a complete binary tree such that no node has a
value bigger than its parent.
This can provide a representation of a priority queue.
The highest priority entry will always be at the root.
Removal of the root does, however, require restructuring the
tree.
To maintain the complete tree property – move the last item
in the tree (rightmost leaf on bottom level) to the root.
To re-establish the priority property swap new root with
highest priority child and repeat until it is moved to the
correct position
CS121 © JAS 2005
2-26
CS121 Data Structures
Implementation of Heaps
Obviously Heaps can be implemented using references, but as
(almost) all the operations are based on swapping values
and the trees are complete it is also feasible to use arrays to
implement a heap tree.
CS121 © JAS 2005
2-27
CS121 Data Structures
Trees implemented using an array
a
b
c
d
e
f
g h
i
j
k
l
m
n
o
a
b
d
h
c
e
i j
f
k l
CS121 © JAS 2005
g
m n
o
2-28
CS121 Data Structures
If a Tree is represented by array ATree
Tree.root == ATree[0]
If
Tree == ATree[i]
then Tree.left == ATree[2i+1]
and
Tree.right == ATree[2i+2]
CS121 © JAS 2005
2-29
CS121 Data Structures
Analysis of Binary Tree
For a complete binary tree every comparison halves the size
of the tree to be searched.
Searching is thus said to be O(logn)
n is the number of items to be searched
log is the logarithm to base 2
If we inserted sorted data into a binary search tree we would
end up with a linear list
For a linear list the search time is O(n)
CS121 © JAS 2005
2-30
CS121 Data Structures
Balanced binary search trees have the best search times –
O(logn)
In the worst case, unbalanced trees can have O(n) search
times
AVL trees – as each new key is inserted, the tree remains
balanced
Guarantees O(logn) search times
To keep tree balanced insertion worst case is O(n)
CS121 © JAS 2005
2-31
CS121 Data Structures
AVL (Adelson-Velskii and Landis) trees are almost balanced binary
trees
They have both O(logn) insertion (for one key) and search times
Height of binary tree = length of longest path from the root to some
leaf
(The empty binary tree is considered to have a height of -1)
The AVL property for a node, N, in a binary tree is that the heights
of the left and right subtrees of node N are either equal or if they
differ by 1
An AVL tree is a binary tree in which each of its nodes has the AVL
property.
CS121 © JAS 2005
2-32
CS121 Data Structures
If inserting into an AVL tree causes the AVL property to be
lost at a node, we can apply some shape-changing tree
transformations, rotations, to restore the AVL property
There are four different rotations
Single left
Single right
Double left = single right then single left
Double right = single left then single right
CS121 © JAS 2005
2-33
CS121 Data Structures
private AVLNode RotateRight( AVLNode tree)
{
AVLNode newTree = tree.Left;
tree.Left = newTree.Right;
newTree.Right = tree;
return newTree;
}
CS121 © JAS 2005
2-34
CS121 Data Structures
newTree
tree
ORY
210 0 210
JFK
10 0 10
BRU
CS121 © JAS 2005
Right Rotation
2-35
CS121 Data Structures
JFK
BRU
ARN
ORY
DUS
ZRH
MEX
ORD
GLA
Right then Left
Double Left
GCM
NRT
CS121 © JAS 2005
2-36
CS121 Data Structures
AVL Tree Summary
Difference in height between the left and right subtrees of every node
is 0 or 1
If this property is lost by inserting a new key, rotations can be
performed on the subtree to regain the property
An extra field will be required in the node record to retain the height
of the subtrees
It can be shown that in the worst case insertions, searches and
deletions take at most O(logn) time
Remember binary trees have a worst case of O(n) when a chain
occurs. In fact it can be proven using fibonacci trees that AVL trees
are at worst 44% less efficient than complete trees.
Proof not required for course, but can be found in text books.
CS121 © JAS 2005
2-37
CS121 Data Structures
Two-Three Trees (2-3 Trees)
Another option is to arrange for all subtrees to be perfectly
balanced with respect to their heights, but to permit the
number of search keys stored in nodes to vary
Each node in a 2-3 tree is permitted to contain either one or
two search keys, and to have either two or three
descendants
All leaves of the tree are empty trees that lie on exactly one
bottom level.
CS121 © JAS 2005
2-38
CS121 Data Structures
H<L
L
H
J<L<N
J
N
D
A
E
F
I
CS121 © JAS 2005
K
L
O
P
2-39
CS121 Data Structures
B<H
B
H
B<D
D
AA B
E
F
J
I
CS121 © JAS 2005
K
N
L
O
P
2-40
CS121 Data Structures
H<M
M
H H L
J<M<N
D
J
J
N
N
L^
A
E
F
I
CS121 © JAS 2005
KK
LM
O
P
2-41
CS121 Data Structures
The worst case 2-3 tree is one in which every node has 1 key
and two children – a complete binary tree
For a 2-3 tree with k levels and n elements we have
(k+1)
n=2
-1
Hence k+1=log(n+1)
As splits are passed up the tree to parent nodes, there can only
ever be O(logn) splits during the insertion
Thus we can determine that search and deletions take O(logn)
in the worst cases.
CS121 © JAS 2005
2-42
CS121 Data Structures
B-Trees
Increase the number of children in a non-root node of a 2-3 tree to
be, say, from 100 to 200, then we have a B-tree of order 200
Such a tree can store 8 million records in just 3 levels
2003 = 8 million or log2008000000=3
Therefore when storing records on disc we would only need 3 disc
accesses in order to find any record in our tree of 8 million
records
A binary tree would require around 23 disc access
A B-tree of order 200 would require 3 or in the worst case 5
accesses
CS121 © JAS 2005
2-43
CS121 Data Structures
Binary trees may be fast when stored in fast primary memory,
but trees with higher branching factors are far more
efficient when stored on slow external memory devices.
For efficiency, all of an ordered key sequence in a B-tree is
read into internal memory at once, and a fast binary search
is used to find a key in this ordered sequence. This either
gives the key itself, or the node which must be read from
disc next.
In general, in a B-tree of order m , each internal node except
the root and the leaves must have between upper(m/2) and
m children. The root can either be a leaf or it can contain
from 2 to m children. All leaves lie on the same bottommost level and are empty.
CS121 © JAS 2005
2-44
CS121 Data Structures
Summary
Concepts and terminology of trees– branches, nodes, children,
parents, ancestors, descendants, internal nodes, leaves, levels,
paths
Binary trees, definition, and recursive definition, complete binary
tree definition, representing expressions using binary trees,
traversing binary trees
Implementing trees in Java
Binary search trees
Huffman coding tree
Deleting from a binary tree
CS121 © JAS 2005
2-45
CS121 Data Structures
Analysis of the binary tree – big-O notation
Priority queues; represented by heaps; reheapifying
AVL trees – definition, how to calculate the heights and
balance factors, how to identify AVL and non-AVL trees,
how to insert, how to perform rotations, benefits
2-3 trees – how to insert into, benefits
B-trees – benefits, general idea of how they work
CS121 © JAS 2005
2-46
CS121 Data Structures
CS121 © JAS 2005
2-47