Lists_Hash_BST

Download Report

Transcript Lists_Hash_BST

Lists, Hash Tables, Trees
CS 1302
Fall 1999
Contents of Lecture
• Linked lists
– Linked lists in general
– What classes are needed to implement?
– Code example
– Linked lists versus arrays: when to use which?
• Hash table basics
• Binary search trees
Linked Lists in Java
We will now show the basics for a Java linked list holding
data elements of some specific class.
Our example:
• A linked list of class RingList. It’s a class, so we may create
any number of instances of RingList.
• Any and all objects of class Ring contain nodes that are
of class RingNode which manages data about a single ring.
• Note that the list itself is one class, the items in that list are
of another class, and the program using them is a 3rd class.
head
1 instance
of RingList
width
width
2 instances
of RingNode
The RingNode Class
class RingNode {
width
private int width;
private RingNode next;
public RingNode (int width) {
this.width = width;
next = null; // reserved word for
// nonexistent obj
}
public void setNext (RingNode
next) {
this.next = next;
}
public RingNode getNext () {
return (next);
}
public void print () {
System.out.println (title);
}
}//class BookNode
class RingList {
private RingNode head;
The List Class
head
RingList () {
head = null;
}
public void add (RingNode newRing) {
RingNode currRing;
if (head == null)
head = newRing;
else {
currRing = head;
while (currRing.getNext() != null)
currRing = currRing.getNext();
currRing.setNext (newRing);
}
}//add
public void print () {
printHelper (head);
}//print
private void printHelper
(RingNode probe) {
if (probe != null) {
probe.print ();
printHelper
(probe.getNext());
}
} //printHelper
}//class RingList
The class that uses the list
public class ListUser {
public static void main (String args[ ]) {
RingList rings = new RingList ();
/* the RingList.add method can do the "new RingNode”
for you. Which is better, do you think? */
rings.add
(new RingNode (6));
rings.add
(new RingNode (5));
rings.add
(new RingNode (4));
rings.add
(new RingNode (3));
rings.print ();
}//main
head
}//class ListUser
rings
6
5
Using Linked Lists in Java
Three crucial things to remember:
1. No pointers in Java. Manipulate references
• main difference in code from 1301/1501: no need for
de-referencing operator ( ^ )
2. No direct access to data:
• assignment of references only
• data access via accessor/modifier
methods
3. Multiple classes required:
• A type of node is defined in one class.
• A list that uses nodes is defined in another class.
• The program that uses the list is in yet another class.
Q: Why bother with different Collection types?
Designing is largely choosing among alternatives based on design goals
Arrays vs. Linked Lists
Same Cost/Benefit Trade-offs as in Pseudocode:
Arrays are statically sized, so you have to commit to length in advance
Linked lists are dynamically sized, so you can hedge your bets
Arrays require O(N) work to insert at front (need to shift rest)
Linked lists require O(1) (constant) work (need to “splice” in place)
Arrays provide random access to each data element
Linked lists require that you traverse list to get to search node
Arrays require space for data objects
Linked lists also require space for “next” references
Linked Structures & Dynamic
Memory Allocation: Summary Points
Problems with static memory allocation
• you run out of memory if you underestimate
• you waste memory if you overestimate
• you have to manage the memory manually (e.g., closing gaps in
the data structure)
Dynamic memory allocation
• all data (objects) is allocated in the heap; you work with
reference to those objects
• you request new memory when you need it (using "new")
• java collects memory that you don't need any more and returns it
to the heap (garbage collection)
Examples
• linked lists, ordered lists, queues, stacks, doubly-linked lists, etc.
• trees, binary trees, binary search trees, height-balanced binary
search trees, graphs (to come)
Lists: Summary
• Linked lists
– Class that uses the list (e.g. driver)
– Class representing the list itself (NodeList)
– Class representing nodes (Node)
• One or more data fields
• One field is “pointer” to next node: type is Node
– null is “zero”/non-existent object
• Never reference a null object!!!
• Lists -v- arrays: Design as choosing among options
– Early commitment to length in arrays
– Insertion cost constant for lists
– Space overhead for next field in list
•
•
Hashtable basics
Binary search trees
Contents of Lecture
•
Linked lists
• Hash table basics
– Purpose
– Hash functions and collisions
– Collision resolution strategies
•
Binary search trees
Hashing
Problem:
How to quickly access items in a list?
Possible Solution: Hashing
Idea:
Shrink the address space to fit the population size.
For example:
Use some function to reduce the address space of a billion
possible SocSecNums to the population size of 100 students.
Hash Function:
The function by which you shrink the address space, e.g,
index = SocSecNum % 100
Hash Functions
The Perfect Hash Function:
• would be very fast (used for all data access)
• would return a unique result for each key, i.e., would result in
zero collisions
• in general case, perfect hash doesn’t exist (we can create one
for a specific population, but as soon as that population changes... )
• provides an ideal point of reference
Common Hash Functions:
• Digit selection: e.g., last 4 of phone num
• Division: modulo
•Character keys: use ASCII num values for chars (e.g., ‘R’ is 82)
Cost of Hash
Two costs of hashing:
1. loss of natural order
• side effect of desired random shrinking
• lose any ordering of original indices
2. collision will occur
• no perfect hash function
• when (not “if”) collision, how to handle it?
Collision Resolution strategies:
• Multiple record buckets: small for each index, but . . .
• Open address methods: look for next open address, but . . .
• Coalesced chaining: use cellar for overflow (~34..40% of size)
• External chaining: linked list at each location
Collision Resolution
Technique: Multiple element buckets
• Idea: have extra spaces there for overflow
• if population of 8, and if hash fuction of mod 8, then:
1st
hash
1st
2nd
collision collision
0
1
2
3
4
5
6
7
Problems: using 3N space; “what if 3rd collision at any one locale?”
Collision Resolution
Technique: Open address methods
• Idea: upon collision, look for an empty spot
• if population of 8, and if hash fuction of mod 8
• Assume data items arrived in the order: W, X, Y, Z, A, B, C, D
0
1
2
3
D hashes to 2
W hashes to 1
C hashes to 1
X hashes to 3
4
5
6
7
Y hashes to 4
Z hashes to 3
A hashes to 6
B hashes to 5
D belongs at 2, but C
already there
W already at 1, so C
to next available slot
X already at 3, so Z
to next available slot
B belongs at 5, but Z
already there
Problem: Deteriorates to an unsorted list (e.g., O(N) )
Collision Resolution
Cellar
Technique: Coalesced chaining:
• Idea: have small extra “cellar” to handle collision
• if population of 8, and if hash fuction of mod 8
• Assume data items arrived in the order: W, X, Y, Z, A, B, C, D
0
1
W hashes to 1
2
D hashes to 2
3
4
5
X hashes to 3
Y hashes to 4
B hashes to 5
6
7
8
9
10
A hashes to 6
C hashes to 1
Z hashes to 3
9
10
Works
well with
cellar of
35..40% of N
if good hash
function;
cellar can
overflow if
need be
Cellar bottom
is now 8
Collision Resolution
Technique: External chaining:
• Idea: have pointers to all items at given hash,
handle collision as normal event.
• if population of 8, and if hash fuction of mod 8
• Assume data items arrived in the order: W, X, Y, Z, A, B, C, D
0
1
2
3
4
5
6
7
W hashes to 1
D hashes to 2
X hashes to 3
Y hashes to 4
B hashes to 5
A hashes to 6
C hashes to 1
Z hashes to 3
Hashing with Chaining: Example
public class HashChain {
private Node[] bucket;
private int TableSize;
public HashChain(int TableSize) {
this.TableSize = TableSize;
bucket = new Node[TableSize];
for (int i=0; i< TableSize; i++)
bucket[i] = new Node();
} // HashChain
private int getHashKey(int
newElement) {
return newElement % TableSize;
} // getHashKey
public void addElement
(int newElement) {
int index = getHashKey(newElement);
bucket[index]
.insertNode(newElement);
} //addElement
public Node getElement(int iData) {
int index = getHashKey(iData);
Node item =
bucket[index]
.locateNode(iData);
return item;
} // getElement
} // HashChain
public Node locateNode
(int iData, Node current) {
if (iData == current.getData())
return current;
else if (current.getNextNode()== null)
return null;
else
return locateNode
(iData, current.getNextNode());
}
public class Node {
int iData;
Node nextNode;
public Node() { ; }
public Node(int iData) {
this.iData = iData;
}
public void insertNode(int iData) {
insertNode (iData, this);
}
public int getData() {
return iData;
}
public void insertNode
(int iData, Node current) {
if (current.getNextNode() == null)
current.setNextNode
(new Node(iData));
else
insertNode
(iData, current.getNextNode());
}
public Node locateNode(int iData) {
return locateNode(iData, this);
}
public Node getNextNode() {
return nextNode;
}
public void setNextNode(Node nextNode) {
this.nextNode = nextNode;
}
public String toString() {
return "Node: " + iData;
}
}
class Driver{
public static void main(String arg[]) {
HashChain hash = new HashChain(10);
for (int i=0; i< 100; i++)
{
hash.addElement(i);
} // for
for (int i=0; i<100; i++)
{
System.out.println (hash.getElement(i));
} // for
} // main
} // Driver
Summary of Hash Tables
• Purpose: Fast searching of lists by reducing address
space to approx. population size.
• Hash function: the reduction function
• Collision: hash(a) == hash(b), but a!=b
• Collision resolution strategies
– Multiple element buckets still risk collisions
– Open addressing quickly deteriorates to
unordered list
– Chaining is most general solution
Contents of Lecture
•
•
Linked lists
Hash table basics
• Binary search trees
– BSTs in Java
– Balancing a BST
– Search strategies
•
and there’s a bit more coming up, so don’t pack up yet
Trees--Definitions
Recall our previous discussion of Trees.
Defined: A tree is a multiply-linked data structure
path
Internal
node (non
terminal)
fox
branch
hen
height
cat
dog
elf
hat
hog
Leaf
(terminal)
Binary Trees
Binary Trees:
same basic idea
as from
Pseudocode.
A few neat properties of binary trees:
* There exists at most one path between
any two nodes.
* A tree with N nodes has N-1 edges.
* A full binary tree with N internal nodes
has N+1 external nodes.
A
B
D
C
E
F
* The height of a full binary tree with N
nodes is about log2N.
Binary Trees:
Part of a Class
for the Tree’s
Nodes
data
Object
a TreeNode
class TreeNode {
private SomeClass dataObject; // reference to data
private TreeNode left; // reference to left subtree
private TreeNode right; // reference to right subtree
// constructor for leaf node with null references
public TreeNode(SomeClass newObject) {
this( newObject, null, null );
}
// accessor: returns reference to current data object
public SomeClass getObject( ) {
return dataObject ;
}
// accessor: returnes reference to left subtree
public TreeNode getLeft( ) {
return left;
}
// accessor: return reference to right subtree
public TreeNode getRight( ) {
return right;
}
} // class TreeNode
Insertion into a Binary Search Tree
In Class TreeNode:
public void insert (SomeClass newObject) {
if ( newObject.lessThan( dataObject ) ) {
if ( left == null )
left = new TreeNode(newObject);
else
left.insert( newObject );
}
else // treating duplicates as if they’re greaterThan
{
if ( right == null )
right = new TreeNode (newObject);
else
right.insert( newObject );
}
} // insert
More Insertion, plusTraversal
a TreeNode
In Class Tree:
root
public class Tree {
private TreeNode root;
public Tree( ) {
root = null;
}
2
possible
Trees
public void insertInBST
(SomeClass newObject) {
if (root == null)
root = new TreeNode
(newObject);
else
root.insert(newObject );
} // insertInBST
public void inorderTraversal( ) {
inorderHelper (root);
}
root
data
Object
public void inorderHelper
(TreeNode node) {
if (node != null) {
inorderHelper (node.left);
System.out.println
(node.dataObject );
inorderHelper (node.right);
} // if
} // inorderHelper
} // class Tree
Binary Tree Balancing
If unbalanced, binary tree can become linked list:
In best-case, tree search
time is O(log N).
A
B
Problem:
As tree grows out of balance search time deteriorates:
worst-case, the search time
of O(N).
C
Tree as
linked list
creates
worst
case
search time
D
E
Solved by keeping tree
F
in balance!
Binary Tree Balancing
Two general methods of tree balancing
1) Rebalance from sorted list.
E.g., perform inorder traversal followed by tree
reconstruction of sorted array.
Cost: O(N) time to reconstruct global tree
2) Create ‘almost balanced’ binary tree and use local tree balancing
(AVL trees, covered in algorithms courses)
Cost : O(log N) insertion time.
Binary Tree Balancing
Reconstruction Technique:
place nodes in array (in order traversal).
Sort the array
Combine
into a
recursive
call
Take midpoint as new tree head.
Take midpoint of each remaining half as the
left and right child; repeat
New Head
9 12 14 22 25 31 44 47 64 74 84
Left & Right Children
DFS: Simple Example
A DFS of a binary tree is similar to a recursive pre-order traversal,
except that:
1. Pre-order visits entire tree, while DFS stops at at a goal node.
2. The right tree gets visited only if the left fails to hold the goal
node.
A
B
D
C
E
F
To find C, traverse: A-B-D-E-C (but not F)
To find E, traverse: A-B-D-E, etc.
Tree Searching: Recursive
public boolean searchTree(Node item, Node current) {
boolean found = false;
Terminates
if (current == null)
recursive
found = false;
else if (current.equals(item))
call
found = true;
else if (searchTree(current.getLeft())
found = true;
else if (searchTree(current.getRight())
found = true;
else
Note: If binary search tree,
one achieves improved search
found = false;
performance by comparing the
return found;
current node to item, limiting
} // searchTree
search to left or right subtree.
Contents of Lecture
•
•
Linked lists
Hash table basics
• Binary search trees
– Terms, revisited: branch, leaf, etc.
– Java example for Node class (cf. List
Node)
– Search cost degenerates to that for list
unless tree is balanced
– Depth-first search is like preorder traversal,
but terminates when target node is found
•
and there’s really is a bit more coming up, so don’t pack up just yet
Designing with (== choosing among)
Data Structures
Example: Maintaining Very Large
Distributed Databases
Issues
How fast can we search?
How fast can we insert and delete entries?
How large a data structure do we need?
How large a data structure do we need in main memory to
work with?
Very Large Databases
• How large is an English dictionary?
/usr/dict/words: 25,000 words
ARTFL Project, Webster’s Revised Unabridged Dictionary, 1913
Edition: 110,000 words
(http://humanities.uchicago.edu/forms_unrest/webster.form.html)
• How large is the web?
• How fast can we search databases this size?
Searching and Maintaining
Very Large Databases
How fast are these techniques?
Data representation
Search Algorithm
Sorted Array
Binary Search
Ordered Linked List
Linear Search
Binary Search Tree
Tree Search
Balanced Binary Search Tree
Tree Search
O(log n), but how long do they really take on real machines?