Maps, Sets, Hash tables

Download Report

Transcript Maps, Sets, Hash tables

Hash Tables, Maps and Sets
Chapter 15
Wherein we throw all the data into
random array slots and somehow obtain
O(1) retrieval time
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
1
Hash Tables
• Recall order of magnitude of searches
– Linear search O(n)
– Binary search O(log2n)
– Balanced binary tree search O(log2n)
– Unbalanced binary tree can degrade to O(n)
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
2
Hash Tables
• In some situations faster search is needed
– Solution is to use a hash function
– Value of key field given to hash function
– Location in a hash table is calculated
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
3
Hash Functions
• Simple function could be to mod the value of
the key by some arbitrary integer
int h(int i)
{ return
i % someInt;
}
• Note the max number of locations in the
table will be same as someInt
• Note that we have traded speed for wasted
space
– Table must be considerably larger than number
of items anticipated
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
4
Hash Functions
• Observe the problem with same value
returned by h(i) for different values of i
– Called collisions
• A simple solution is linear probing
– Linear search begins at
collision location
– Continues until empty
slot found for insertion
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
5
Hash Functions
• When retrieving a value
linear probe until found
– If empty slot encountered
then value is not in table
• If deletions permitted
– Slot can be marked so
it will not be empty and cause an invalid linear
probe
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
6
Hash Functions
• Strategies for improved performance
– Increase table capacity (less collisions)
– Use different collision resolution technique
– Devise different hash function
• Hash table capacity
– Size of table must be 1.5 to 2 times the size of
the number of items to be stored
– Otherwise probability of collisions is too high
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
7
Collision Strategies
• Linear probing can result in primary
clustering
• Consider quadratic probing
– Probe sequence from location i is
i + 1, i – 1, i + 4, i – 4, i + 9, i – 9, …
– Secondary clusters can still form
• Double hashing
– Use a second hash function to determine probe
sequence
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
8
Collision Strategies
• Chaining
– Table is a list or vector of head nodes to linked
lists
– When item hashes to location, it is added to that
linked list
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
9
Improving the Hash Function
• Ideal hash function
– Simple to evaluate
– Scatters items uniformly throughout table
• Modulo arithmetic not so good for strings
– Possible to manipulate numeric (ASCII) value of
first and last characters of a name
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
10
Implementations
Resizable
Hash Table
Array
Set
Interfaces
HashSet
HashMap
Linked
List
TreeSet
ArrayList
List
Map
Balanced
Tree
Hash Table
+ Linked
List
LinkedHashSet
LinkedList
TreeMap
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
LinkedHashMap
11
• HashMap -- an associative data structure
• HashSet -- a container for holding values
• TreeMap – just like HashMap only items are
ordered by key (internal red-black tree)
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
12
the Map Interface
• Map ADT methods:
– get(k): if the map M has an entry with key k, return
its associated value; else, return null
– put(k, v): insert entry (k, v) into the map M; if key k
is not already in M, then return null; else, return
old value associated with k
– remove(k): if the map M has an entry with key k,
remove it from M and return its associated value;
else, return null
– size(), isEmpty()
– keys(): return an iterator of the keys in M
– values(): return an iterator of the values in M
Hash Tables
13
the Set Interface
• Set ADT methods:
– add(o): Adds the specified element to this set if it
is not already present. Returns true if the set does
not already contain element (new item added)
– contains(o): returnts true if the set contains the
specified element
– remove(o): Removes the specified element from
this set if it is present. Returns true if the set
contained the specified element
– size(), isEmpty()
Hash Tables
14
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
15
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
16
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
17
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
18
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
19
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
20
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
21
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
22
Using “Chaining” to resolve
collisions
• A linked list in every cell of array
• No rehashing required
• Unlimited collisions can be handled
• Lousy hash codes will still reduce performance
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
23
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
24