Transcript tree

Binary Search Trees
15-211
Fundamental Data Structures and
Algorithms
Margaret Reid-Miller
January 20, 2004
Plan
 Today
Review of trees, and some analysis
 Reading:
For today: Chapters 18, 19
For next time: Chapter 17
 Reminder: HW2 is available!
Trees are Everywhere
CS is upside down
root
leaves
Trees
root
node label
a
nodes
parent
b
e
children
depth=1
height=0
c
d
f
leaves
Trees, more abstractly
 A tree is a graph (usually directed)
with the following characteristics:
There is a distinguished node called the
root node.
Every non-root node has exactly one
parent node (the root has none).
Unique parents
a
b
e
c
f
root
d
Trees are everywhere
 Tree structures are everywhere in
life.
 As a result, in computer programs,
trees turn out to be one of the most
commonly used data structures.
Organization charts
ignore this
Origins of natural languages
Origins of life
===========================================
|
|
==================================
|
|
|
| ===============================
|
| |
|
| |
======
|
| |
|
|
| |
===|=====
|
| |
| |
| ======| |
======| ======
| |
| |
|
|
| |
| |
|
| ======
| |
| |
|
===|
<<===| |
| |
|
| ===
| |
| |
|
===|
| |
| |
===|
===
| |
| |
| |
| |
===|
| |==============
| |
|
| |
| |
|
===| |==============
| |
|
| | |
| |
|
| | ===============
| |
|
==P=| |
| |
|
|
| ==================
| |
|
|
|
| |
|
|
=====================
| |
|
|
===|
| ===|
============
|
| | |
===|
|
| | |
===| ============
|
| | |
| |
|
| | |
===| ===============
|
===| |
| |
|
| ==D=| ==================
|
|
|
Porifera
(sponges)
Cnidaria
(jellyfish, anemones, corals, etc.)
Ctenophora
(comb-jellies)
Arthropoda
(insects, spiders, crabs, etc.)
Onychophora
Tardigrada
Annelida
(velvet worms)
(water bears)
(segmented worms)
Pogonophora
Vestimentifera
Echiura
Mollusca
(snails, clams, squids, etc.)
Sipuncula
Nemertea
(ribbon worms)
Platyhelminthes
Chordata
(flatworms)
(vertebrates and relatives)
Hemichordata
lophophorates
Chaetognatha
Tournament structure
Arithmetic Expressions
+
*
2
5
7
Directory structure
/afs
cs
usr
andrew
acs
course
15
113
18
211
usr
Game trees
Binary Search Trees
Trees, inductively
 A tree is
empty, or
A node containing a set of trees (its
children)
 Alternatively, a tree is
A leaf node (no children), or
A node containing a set of trees (its
children)
Binary trees
Let's focus on binary trees
A binary tree is either
• empty (we'll write nil for clarity), or
• looks like (x,L,R) where
x is an object, and
L, R are binary subtrees
In pictures
x
nil
R
L
Binary tree nodes in Java
From Weiss, pg 579:
class BinaryNode {
private Object element;
private BinaryNode left, right;
public BinaryNode() {…}
public
public
public
public
public
public
…
}
Object getElement() {…}
BinaryNode getLeft() {…}
BinaryNode getRight() {…}
void setElement(Object x) {…}
BinaryNode setLeft() {…}
BinaryNode setRight() {…}
Quick exercise
Write the code for a new
“flat” method (also called
“inorder”):
a
T
public LinkedList flat(T);
(you may assume a “join”
operation on LinkedLists)
e
flat(T) = e,b,f,a,d,d
b
d
f
d
A solution
Remember to think inductively!
public LinkedList flat(BinaryNode t) {
if (t == null)
// empty tree base case
return null;
else {
LinkedList l = flat(t.getLeft());
LinkedList r = flat(t.getRight());
return l.join(List(element,r));
}
}
Binary search trees (BSTs)
A binary tree T is a binary search tree iff
flat(T) is an ordered sequence.
Equivalently, for all nodes (x,L,R) in T all the
nodes in L are less than x, and all the nodes
in R are larger than x.
Example
5
3
2
7
4
6
flat(T) = 2,3,4,5,6,7,9
9
Binary search
How does one search in a BST?
Inductively:
search(x,nil) = false
search(x,(x,L,R)) = true
search(x,(a,L,R)) = search(x,L) x<a
search(x,(a,L,R)) = search(x,R) x>a
Searching
Search for 6:
5
3
2
7
4
6
9
Searching
5 = 6?
3
2
7
4
6
9
Searching
5
3
2
7 = 6?
4
6
9
Searching
5
3
2
7
4
6 = 6? 9
Decisions, decisions
The tree just implements the possible
sequences of decisions we would make
in ordinary binary search.
<
5
>
=
3
7
It's binary since the = part causes
immediate termination.
Binary search
search(x,nil) = false
search(x,(x,L,R)) = true
search(x,(a,L,R)) = search(x,L) x<a
search(x,(a,L,R)) = search(x,R) x>a
should return something
useful…
Real life
In most applications, we do not store simple
elements, but pairs
(key,value)
We search for a key, and want the
corresponding value returned if the key is
found, and "No" otherwise.
The ADT is called a dictionary.
Correctness
Clearly, search() can never return a false
positive answer.
But search() only walks down one branch, so
how do we know we don't get false negative
answers?
Suppose T is a BST that contains x.
Claim: search(x,T) properly returns "true".
Proof (by induction)
Base Case: T has one node (x, nil, nil)
Clearly search(x, T) returns true.
Inductive Case: Suppose search(x, T)
correctly returns true if T has n-1 or fewer
nodes.
We will show it returns true if T has n nodes.
Proof (inductive case)
Suppose T = (a,L,R) has n nodes.
Case 1: x = a:
done.
Case 2: x < a: Since T is a BST, x must be
in L.
But by induction (on tree size),
search(x,L) returns true. Done.
Case 3: x > a: same as case 2.
An Interesting
Application of Search
Trees
Sample problem:
How to do instant
messaging on a
telephone keypad?
Typing “I love you”:
444 * 555 666 888
33 * 999 666 88 #
What is the worstcase number of
keystrokes?
What might be the
average number of
keystrokes?
Better approach:
4* 5 6 8 3* 9 6 8#
Can be done by
using tries.
Tries
 A tree structure that encodes the
possible sequences of symbols in a
dictionary.
Invented by Fredkin in 1960.
 For example, suppose we have the
alphabet {a,b}, and want to store
the sequences
aaa, aab, baa, bbab
Tries
a
aaa, aab, baa, bbab
b
a
a
b
a
b
a
a
b
No information stored in nodes per se.
Shape of trie determines the items.
Tries
Nodes can be quite large: N pointers to
children, where N is the size of the
alphabet.
Operations are very fast:
search, insert, delete are all take k “steps”
where k is the length of the sequence in
question.
Keypad IM Trie
4
5
9
…
I
4
6
6
5
3
8
8
3
like
you
love
5
…
9
lovely
Trie analysis
 Tries are an extremely nice data
structure.
 Searching for an item of length m
requires time O(m), independent of
the size of the dictionary itself.
Usually, the dictionary is much larger
than the any of the words it contains
 A disadvantage: Each node has
many children (e.g., 26 children for
each node, in a standard dictionary).
Homework 2 alert!
 You will likely find tries to be an
extremely useful data structure in
HW2…
Hint: Instead of paths from root to each
node representing a sequence of
letters, consider using paths to
represent sequences of words
Back to Binary Search
Trees
Insertions
Insertions in a BST are very similar to
searching: find the right spot, and then
put down the new element as a new
leaf.
We will not allow multiple insertions of
the same element, so there is always
exactly one place for the new element.
How Many?
How many decisions do we have to make
before we have either found the element,
or know it's not in the tree?
Why do we care?
versus
What is the
height?
A bad tree
2
3
4
5
6
7
How might we get this tree?
9
Logarithms and exponents
 Logarithms and exponents are
everywhere in algorithm analysis
logba = c
if
a = bc
Logarithms and exponents
 Usually will leave off the base b
when b=2, so for example
log 1024 = 10
210 = 1024
Some useful equalities
logbac = logba + logbc
logba/c = logba - logbc
logbac = clogba
logba = (logca) / logcb
(ba)c = bac
babc = ba+c
ba/bc = ba-c
Logarithms and trees
In a “perfect” BST containing n nodes…
•What is the height?
•How many nodes are there at level i, at each height?
•How many steps does it take to search?
Balanced trees
 Intuitively, one way to ensure good
behavior is to make “balanced”
search trees.
If always balanced, then operations
such as search and insert will require at
most log(N) comparisons.
Example
5
3
2
7
4
6
flat(T) = 2,3,4,5,6,7,9
9
Forcing good behavior
It is clear (?) that for any n inputs, there
always is a BST containing these elements
of logarithmic depth.
But if we just insert the standard way, we
may build a very unbalanced, deep tree.
Can we somehow force the tree to remain
shallow?
At low cost?
AVL Trees
AVL trees
 Adelson-Velskii-Landis (AVL) trees
are binary search trees that impose
an additional representation
invariant on BSTs:
Every node most have left and right
subtrees of similar height.
“Similar height” means the difference is
at most 1.
AVL-Trees
An AVL-tree is a BST with the property that
at every node the difference in the depth of
the left and right subtree is at most 1.
5
3
6
7
5
2
4
6
2
9
2
7
1
1
4
5
8
3
4
6
7
9
4
5
8
3
OK
not OK
AVL implies shallow
Claim: Any AVL-tree of depth d has at least
Fd+3-1 nodes where Fn is the nth Fibonacci
number.
Why? Because we may assume it's true for
the left and right subtrees.
And, Fn is approximately
The depth is < 1.44log(n+2)-1.328 <= log(n)
Implementing AVL trees
 The main complications are insertion
and deletion of nodes.
 For deletion:
Don’t actually delete nodes.
Just mark nodes as deleted.
Called lazy deletion.
Lazy deletion
5
3
2
7
4
6
9
On average, deleting even half of the nodes by
marking leads to depth penalty of only 1.
Nodes
 AVL nodes contain the following
information.
Node value.
Left and right children.
Lazy deletion flag.
Height of this tree.
BST nodes in Java
Based on Weiss, pg 607:
class BinaryNode {
Comparable element;
BinaryNode left, right;
int height;
boolean deleted;
public BinaryNode(Comparable theElement) {
element = theElement;
left = right = null;
}
…
}
The Comparable interface
 Objects that are Comparable support
the method compareTo().
 v.compareTo(w) returns
-1 if v < w,
0 if v == w,
1 if v > w,
or throws ClassCastException if
neither v nor w can be cast into a class
appropriate for the comparison.
Naïve insert method
public BinaryNode insert (Comparable x,
BinaryNode t) {
if (t==null)
t = new BinaryNode(x);
else if (x.compareTo(t.element) < 0)
t.left = insert(x, t.left);
else if (x.compareTo(t.element) > 0)
t.right = insert(x, t.right);
else
throw new DuplicateItemException();
return t;
}
The insertion problem
5
2
1
9
4
3
8
7
Insertions can
violate the balance
condition.
Nodes from the
insertion point up
to the root might
be out of balance.
Maintaining balance
 In order to maintain balance, we will
maintain an integer height in each
node.
This allows us to detect when a node
goes out of balance.
 When we detect an out-of-balance
condition, we rebalance the deepest
out-of-balance node.
This will be enough to regain balance.
 But can we do this fast enough?
Quiz Break
Manipulating BSTs
 Write two Java methods. First:
 static BinaryNode rotate1 (BinaryNode t);
t
Z
Y
X
X
Y
Z
Part 2
 Write two Java methods. Second:
 static BinaryNode rotate2 (BinaryNode t);
t
Z
X
X
Y1
Y2
Y1
Y2
Z
Quiz
static BinaryNode rotate1 (
BinaryNode t);
class BinaryNode {
Comparable element;
BinaryNode left, right;
t
Z
X
Y
X
Y
…
}
Z
static BinaryNode rotate2 (
BinaryNode t);
t
Z
X
X
Y1
Y2
Y1 Y2
Z
Solution
static BinaryNode rotate1 (BinaryNode t)
{
BinaryNode newRight = new BinaryNode();
newRight.left = t.left.right;
newRight.right = t.right;
t.left = t.left.left;
t.right = newRight;
t
}
Z
X
Y
X
Can you find a solution that avoids creating
a new BinaryNode?
Y
Z
How trees lose balance
 Suppose that a node n goes out of
balance.
node n
height a
height b
abs(a-b)  1
This is the AVL invariant!!!
How trees lose balance
 Suppose that a node n goes out of
balance.
node n
height a
height b
abs(a-b) > 1
AVL invariant violated
How trees lose balance, cont’d
 In fact, in this case the left subtree
must not have been a leaf.
Not previously
a leaf, unless
right subtree
was empty.
How trees lose balance, cont’d
 In fact, in this case the left subtree
must not have been a leaf.
The deepest node in these subtrees
has depth 2 greater than the deepest
node in this subtree.
How trees lose balance, cont’d
 Two cases to consider:
How trees lose balance, cont’d
 Two cases to consider:
Inserted into left
subtree of left child.
How trees lose balance, cont’d
 Two cases to consider:
Inserted into left
subtree of left child.
Inserted into right
subtree of left child.
How trees lose balance, cont’d
 Note that there are two symmetric
cases for the right child.
Regaining balance
 In an AVL tree, the balance invariant
is regained by performing a rotation
on out-of-balance nodes.
 The rotations require O(d) where d is
the depth of the deepest out-ofbalance node.
 Thus, regaining balance via rotations
requires O(log N) time, since
d=log(N)+1.
The single rotation
 For the case of insertion into left
subtree of left child:
Depth
increased by 1
Depth reduced
by 1
Z
Y
X
X
Y
Deepest node of X has depth 2
greater than deepest node of Z.
Z
Double rotation
 For the case of insertion into the
right subtree of the left child.
Z
X
Single rotation fails!
X
Y
Z
Y
Double rotation
 For the case of insertion into the
right subtree of the left child.
Z
X
Y1
Y2
Right subtree is definitely
non-empty and has depth
2 greater than Z.
Double rotation
 For the case of insertion into the
right subtree of the left child.
Z
X
Z
X
Y1
Y2
Y1
Y2
Double rotation
 For the case of insertion into the
right subtree of the left child.
Z
X
Z
X
Y1
Y2
Y1
Y2
Double rotation
 For the case of insertion into the
right subtree of the left child.
Z
X
X
Y1
Y2
Y1
Y2
Z
Symmetry
 And don’t forget that there are two
symmetric cases for insertions into
the left and right subtrees of the
right child.
Examples
 We don’t need examples!
 The rotations clearly restore the AVL
representation invariant!
 But try to play with the demo at
http://www.seanet.com/users/arsen/avltree.html