lecture27-to

Download Report

Transcript lecture27-to

Tree Data Structures
Heaps for searching

Search in a heap?



Would have to look at root
If search item smaller than root, look at left and right child
If search item smaller than any node , look at both children

Search in a heap is upper bounded by O(n) – may have to
look at every node (if your value is smaller than every node
in the tree).

Total time:



n nodes inserted in heap, O(log2n) insertion
n searches, O(n) each
O(n log2n + n2)
Heaps for Searching



Heap is well suited for problems where have to
remove specific elements (largest, smallest)
Need to better exploit binary tree properties
(max height log2n) better than naive heap search
to allow O(log2n) search for arbitrary elements
But if pull out largest element every time,
Result is reverse sorted list – Heapsort
 Cost = ~ O(nlog2n) to build heap, O(nlog2n) to
extract items back out

A Better Way to Search: Binary
Search Trees

Binary Search Tree:



A binary tree (2 children, left and right)
Either zero nodes (empty) or
If > 0 nodes,




Every element has a key and no two elements have the same key
(unique keys)
All keys (if any) in the left sub-tree are smaller than the key in the
root
All keys (if any) in the right sub-tree are larger than the key in the
root
The left and right subtrees are also binary search trees.
Binary Search Trees
60
30
5
70
40
2
Unique keys
Left nodes < root
Right nodes > root
Left and right are also
binary search trees
65
80
Unique keys
Left nodes < root
Right nodes > root
Left and right are also
binary search trees
Binary Search Trees
20
15
12
25
18
22
Not a binary search
tree
Right child of 25 is not
larger than 25
Unique keys
Left nodes < root
Right nodes > root
Left and right are not also
binary search trees
Binary Search Trees

Note that there is no constraint to be a complete
binary tree, just an arbitrary binary tree
Suggests that linked node implementation may be
more useful
 May affect properties of searching


Recursive definition of binary search tree =
recursive algorithms
Binary Search Trees: Search

Search:

Take advantage of binary tree properties

Begin at root
If root == 0, return as tree is empty
Otherwise,






Compare x to root key
If x == root key, return root node
Else if x < root key, then can’t be in right subtree due to
binary tree properties -> Recursively search on left child
Else recursively search on right child
Binary Search Trees: BSTNode
Definition
template <class Type>
class BSTNode
{
private:
BSTNode* leftChild;
BSTNode* rightChild;
Element<Type> data;
};
template <class Type>
class Element
{
private:
Type key;
??? OTHER DATA
}
Binary Search Tree:
Search Implementation
template <class Type>
BSTNode<Type>* BST<Type>::Search(const Element<Type>& x)
{ return Search(root,x); }
template<class Type>
BSTNode<Type>* BST<Type>::Search(BSTNode*<Type> *b, const
Element<Type>& x)
{
if (b == 0) return 0;
if (x.key == b->data.key) return b;
if (x.key < b->data.key) return Search(b->LeftChild, x);
return Search(b->rightChild, x);
}
Binary Search Trees: Search
Example
Find 15
30
5
40
Is root == 0? No
Compare 15 to root (30)
2
15
15 < 30, so recurse on left child
Compare 15 to 5
15 > 5, so recurse on right child
Compare 15 to 15
15 == 15, so return node with 15
Binary Search Trees: Big Oh
Analysis

At the root, we do one comparison


Depending on result




> Root or < Root
Move to one child of root [moving down a level]
Do one comparison
Max number of times could do this is the height of the tree
(maximum number of levels) – O(h).
Thus ease of search is dependent on the shape of the tree:


Skewed – expensive: O(n)
Balanced – cheap: O (log 2 n)
Binary Search Trees: Insertion

Rules: Insertion must preserve
Unique keys
 Right child > parent
 Left child < parent
 Self-similar (internal nodes are also binary trees)


How do we check for uniqueness?

Look at all the nodes?
Binary Search Trees: Insertion

Don’t need to look at all the nodes
Take advantage of the fact that before adding it was
already a binary search tree
 To see if value is in tree, search for it.

Add
30
15
5
Search for 15
15 ? 30, 15 < 30 => Left
15 ? 5, 15 > 5 => Right
15 ? 15, 15 == 15 => Not Unique
2
40
15
Binary Search Trees: Insertion

Search not also performs test for uniqueness,
but also puts us in the right place to insert at

Where input value should be in tree
Add
30
15
Search for 15
15 ? 30, 15 < 30 => Left
15 ? 5, 15 > 5 => Right
No right child, so not present
5
2
40
15
Add 15 as right child of 5
Binary Search Trees: Insertion
Implementation
template <class Type>
bool BST<Type>::Insert(const Element<Type> & x)
{
// search for x
BSTNode<Type> *current = root; BSTNode<Type>* parent = 0;
}
while (current) {
parent = current;
if (x.key == current-> data.key) return false;
if (x.key < current->data.key) current = current->leftChild;
else current = current->rightChild; }
current = new BSTNode<Type>;
current->leftChild = 0; current->rightChild = 0; current->data = x;
if (!root) root = current;
else if (x.key < parent->data.key) parent->leftChild = current;
else parent->rightChild = current;
return true;
Binary Search Trees: Insertion
Big Oh Analysis

Core of insertion function is in the search
implementation



Dependent on shape and size of tree
Actual insertion is constant time
Cost is bounded by search cost, which we have said:


O(n) worst case
~O(log2n) average case with a well balanced tree.
Binary Search Trees: Deletion

Rules: Deletion must preserve

Unique keys




No work to do here. If unique before delete, unique afterwards as
deletes can’t change values in tree
Do need to ensure:
Right child > parent, left child < parent
Self-similar (internal nodes are also binary trees)
Binary Search Trees: Deletion
Three cases:
30
1) Leaf Node (15)
5
2
40
Remove leaf node
Set parents pointer
where leaf node was
to zero
15
30
5
2
40
Binary Search Trees: Deletion
30
5
2) Non-leaf, one child (5)
From current,
Set parents link to currents link
Remove current node
40
2
30
5
2
40
Binary Search Trees: Deletion
Non-leaf, multiple children (30)
30
5
Replace value with largest
element of left subtree or
smallest element of right
subtree
40
2
Delete node from which you
swapped
This then becomes
case 1 or case 2
5
5
5
40
5
2
toDelete
2
40
Binary Search Trees: Deletion

The rule was:
“Replace value with largest element of left subtree or smallest
element of right subtree”

Is this guaranteed to work?

Yes, because of binary tree properties, largest element of left side is:



Smallest of right side is:



Bigger than anything in left side
Smaller than anything in right side
Bigger than anything in left side
Smaller than anything in right side
These are exactly the roles that must be fulfilled when moving to
become the root of that subtree
Binary Search Trees: Height

The worst case height for a binary tree is the
number of elements in the tree

Skewed tree
40
30
5
2
Binary Tree operation costs
are bounded by the height of the
tree, so in these cases become O(n).
How easy is it to get a skewed tree?
Sorted or nearly sorted data
Binary Search Trees: Height
bool BST<Type>::Insert(const Element<Type> & x)
{
// search for x
BSTNode<Type> *current = root; BSTNode<Type>*
parent = 0;
while (current) {
parent = current;
if (x.key == current-> data.key) return false;
if (x.key < current->data.key) current = current>leftChild;
else current = current->rightChild; }
current = new BSTNode<Type>;
current->leftChild = 0; current->rightChild = 0; current>data = x;
if (!root) root = current;
else if (x.key < parent->data.key) parent->leftChild =
current;
else parent->rightChild = current;
return true;
}
Insert: 3, 4, 6, 5, 8
root
3
4
6
5
86
Binary Search Trees: Height



If insertions are made at random, height is O(log n) on
average
Random insertions are the general case, so most of the
time will achieve O(log n) height
There are ways to guarantee O(log n) height – requires
modifications to insert and delete functions to maintain
balance.
TreeSort:

Insertion into a binary tree places a specific
ordering on the elements.
For the root,
Everything in the left
subtree is < root
30
5
Everything in the right
subtree is > root
40
For each subtree,
2
15
35
50
Everything on the left <
subtree root,
Everything on the right is
> subtree root
TreeSort:

Theoretically, should be able to construct an
ordering of all elements from the tree:




Generate an array of size equal to number of elements
in tree
Root goes in middle of array
Left subtree fills in left half of array
Right subtree fills in right half of array
< 30

And Recurse
30
<5 5 >5
> 30
30
<40 40 >40
TreeSort:

Extracting ordered array from binary tree:

Perform in-order traversal (LVR) – Ensures will visit
all smaller items first and larger items last
30
LVR Ordering:
2,5,15,30,40,35,50
5
2
40
15
35
50
TreeSort:

Analysis of TreeSort:

Given an array of size n, have to build binary a tree with n-elements


Requires N insertions
Given a binary tree with n-elements, have to traverse tree in LVR order to
extract sorted order
Construction: O(n * log 2 n) if balanced
O(n * n) if not balanced
Traversal: O(n) anytime
Average Case: O(n log 2 n), Worst Case: O(n2)
TreeSort:






Very similar to quicksort!
Same average case [O(n log n)] and worst case [O(n2)]
times
Roots of binary search tree subnodes are the pivots
Place data smaller than pivot on left of pivot (leftChild),
place larger data on right of pivot (rightChild)
The better the pivot is, the more balanced the tree is (same
for quicksort recursion)
Nearly sorted/already sorted data leads both to trouble:
Bad partitioning for quicksort, Bad construction for
treesort
Rank Information

Often times when working with lists of data, interested in rank
information:






What is the largest item?
What is the smallest?
What is the median?
What is the fifth smallest item?
Largest and smallest are trivial [O(n)]
What if want to ask a lot of questions about rank or want to
know about something other than largest smallest?
Rank Information

Sorting approach to rank information:
Sort the list
 Return list[rankOfInterest]
 O(n log n) [sort] + O(1) [value retrieval]


If using dynamic data, may not have the array to
work with – instead a linked list would be more likely
Rank Information

Linked List Approach

Sort list

Assuming mergesort for linked lists

Traverse list to find rankOfInterest element

O(n log n) [sort] + O(rankOfInterest) [traversal]

Can handle dynamic data, but slower!
Rank Information

Binary Tree Approach:

Insert into binary tree
Inorder traversal up until rankOfInterest node (goes through
in sorted order)

O(n log n) [building tree] + O(rankOfInterest) [traversal]

Same cost as linked list approach (probably easier since don’t
have to write quicksort for linked lists).
Rank Information:

Binary Tree Approach II:

Add a new variable to each node in the tree



Insert elements into binary tree




leftSize = indicates number of elements in nodes left subtree
+ self
Initially set all left sizes to 1 (for self)
As pass by parent nodes in searching for appropriate place,
store references to each parent node
If do insertion, update each parent’s leftSize value
If don’t insert (non-unique), no updates for leftSize
Search by rank using traditional binary tree search on
leftSize value

Function on next slide
Rank Information:
template <class Type>
BinaryTreeNode<Type>* BinarySearchTree<Type>:: search(int
rank)
{
BinaryTreeNode<Type>* current = root;
while (current)
{
if (k == current->leftSize) return current;
else if (rank < current->leftSize) current =
current-leftChild;
else { rank = rank – leftSize; current =
current->rightChild;}
}
}
Rank Information: Example
4
leftSize values:
Mike
2
Rank 2 < leftSize(Mike) [4]
Move to root->leftChild
Rank 2 == leftSize(John) [2]
2
John
Thomas
Return John Node
Georgia
1
What is 2nd element?
Kylie
1
Shelley
Tyler
1
Real Ranks for Data
[First is rank 1, Last is 7]:
Georgia, John, Kylie,
Mike, Shelley, Thomas, Tyler
1
What is 5th element?
Rank 5 > leftSize(Mike) [4]
Move to root->rightChild
Rank = 5-4 = 1 < leftSize(Thomas) [2]
Move to leftChild of Thomas
Rank == leftSize(Shelley) [1]
Return Shelley Node
Rank Information: Analysis

Searching (traversal) is now bounded by the height of the tree


Building tree was O(n log n), but we added more work





On average O(log n)
Original n log n comes from n insertions, log n cost each
Now have to update parents leftSize values
However, maximum number of parents = height of tree = on average log
n
So the cost for a single insertion is now just 2 log n, and all insertions
costs are still bounded by O(n log n)
So for dynamic data, can do rank information in:
O(n log n) [building] + O(log n) [searching]
Better than approaches that sort and traverse to rank
position
Threaded Trees: General Trees
Mike
John
Georgia
Fred
Kylie
Thomas
Shelley
Tyler
Hall
Wasting a lot of links in this tree -> All terminals
waste 2 links! Can we make use of those for our
good? Yes.
Threaded Trees
NULL
ff
Mike
ff
ff
tt
Fred
John
Georgia
Hall
Thomas ff
Kylie
tt
tt
Shelley
tt
Tyler
tt
Threaded Trees: Insertion
Mike
Mike
John
John
Kylie
Kylie
Hall
Bill
Hall
Bill
Threaded Trees: Insertion
Mike
Mike
John
John
Hall
Hall
Bill
Fred
Kylie
Kate
Kylie
Kate
Jane
Bill
Fred
Jane