Power Point 2000
Download
Report
Transcript Power Point 2000
Assignment 2:
A Simple Address Book
Andy Wang
Data Structures, Algorithms, and
Generic Programming
Learning Objectives
Gain further experience in C++
Manage project develop using make
Apply the concept of hash functions
Mission
Write a simple address book
Use hash functions to access a table
Insertion
Lookup
Deletion
Deliverables (Due 9/26/2003)
makefile
hashtable.h
hashtable.cpp
main.cpp
Development logs
More on Development Log
Due in each class
Cumulative
9/17, turn in a log that covers 9/15 – 9/17
9/22, turn in a log that covers 9/15 – 9/22
And so on…
Requirements
Create a proj2 directory
makefile
hashtable.h
hashtable.cpp
main.cpp
hashtable.h
Interface of the class HashTable
Required public interface:
bool Insert(const string &name, const string &addr);
bool Lookup(const string &name, string *addr) const;
string Remove(const string &name);
Constructor
Hashtable(unsigned int size);
Default size: 5
Insert()
bool Insert(const string &name, const string &addr);
• Return value
•
•
true on success
false on failure
•
•
Duplicate name
Table is full
Insert()
Name: Michael Jackson
Address:
0
1
2
3
Hash Table
Insert()
Name: Michael Jackson
Address: Never-Never Land
Hash(name)
0
1
2
3
Hash Table
Insert()
Name: Michael Jackson
Address: Never-Never Land
Hash(name)
0
1 Michael Jackson Never-Never Land
2
3
Hash Table
Insert()
Name: Mickey Mouse
Address: Disneyland
0
1 Michael Jackson Never-Never Land
2
3
Hash Table
Insert()
Name: Mickey Mouse
Address: Disneyland
Oh, Boy!
Hash(name)
0
1 Michael Jackson Never-Never Land
2
3
Hash Table
One problem
with hashing—collision
Collision handling techniques:
Increase the hash table size
Chaining
Linear probing
Quadratic probing
Double hashing
Increase the Hash Table Size
If collision, double the size, rehash all entries
+ Conceptually simple
Unbounded growth of table size
Birthday Paradox
In a group of 60 people, you are very likely to find
two people with the same birthday.
The probability of collisions is higher than you think!
Chaining
Requires a data structure from a later lecture
Basic idea: each table entry is associated
with multiple (key, value) pairs
Sequentially look through those pairs
+ incremental growth of the table
- poor performance when an entry is associated
with too many (key, value) pairs
Linear Probing
Sequentially find the next empty table entry
(Hash(key) + 1) % size
(Hash(key) + 2) % size …
+ Simple
+ Use all table entries before size increase
- Clustering (nonuniform distribution of table
entries)
- Can degenerate into sequential searches
Quadratic Probing
Try to avoid clustering
(Hash(key) + 12) % size
2
(Hash(key) + 2 ) % size…
+ Simple
- Secondary clustering (not as severe)
Double Hashing
Use another hash function to determine the
skipping distance
If Hash1(key) points to an occupied entry
Use (Hash1(key) + Hash2(Hash1(key))) % size
(Hash1(key) + 2*Hash2(Hash1(key)) % size…
+ Avoids secondary clustering
+ Widely used in compilers
Lookup()
bool Lookup(const string &name, string *address) const;
• Return value
•
true if found
•
•
address contains the address
false if not found
Lookup()
Name: Michael Jackson
0
1 Michael Jackson Never-Never Land
2
3
Hash Table
Lookup()
Name: Michael Jackson
Hash(name)
0
1 Michael Jackson Never-Never Land
2
3
Hash Table
Remove()
string Remove(const string &name) const;
• Return value
•
The address associated with the name
Remove()
Name: Michael Jackson
0
1 Michael Jackson Never-Never Land
2
3
Hash Table
Remove()
Name: Michael Jackson
Hash(name)
0
1 Michael Jackson Never-Never Land
2
3
Hash Table
Remove()
Name: Michael Jackson
Hash(name)
0
1
2
3
Hash Table
Your Hash Function
Hash(name) // may contain spaces
Develop/choose your own hash function(s)
Use all characters in the name
Use the ordering information of each character
Short names should not cluster in the table
Milestones
1. Create all files
Add #include into .cpp files
Add #ifndef into .h files
Add files names into the makefile
make clean
make
Milestones
2. Add the class definitions to the header files
Try to compile
3. Write the skeleton code in your .cpp files
Nothing but empty functions
Try to get the interface to compile
4. Write pseudo code (comments) in each
skeleton function
Milestones
5. Implement the constructor/destructor
routines
Add private functions as needed
Compile and test those routines
6. Implement a dump routine to visualize the
hash table
Milestones
7. Implement the accessor routines that only
read your private data members
Compile and test those routines
8. Implement the accessor routines that write to
your private data members
Compile and test those routines
Milestones
9. Implement the Insert routine
Implement the associated hash function(s)
Test the hash function in isolation
Check for clustering
Check for duplicate names
10. Implement collision handling
Milestones
11. Implement the Lookup routine
Compile and test the routine
12. Implement the Remove routine. Compile
and test the routine
Pace Yourself
Try to accomplish 1 to 2 milestones per day
Start your project early!