Lecture 6 Notes

Download Report

Transcript Lecture 6 Notes

CS203 LECTURE 6
John Hurley
Cal State LA
2
Hashing
An object may contain an arbitrary amount of data, and
searching a data structure that contains many large objects
is expensive
• suppose your collection of Strings stores the text of various
books, you are adding a book, and you need to make sure
you are preserving the Set definition – ie that no book
already in the Set has the same text as the one you are
adding.
A hash function maps each datum to a value to a fixed and
manageable size. This reduces the search space and makes
searching less expensive
Hash functions must be deterministic, since when we search
for an item we will search for its hashed value. If an identical
item is in the list, it must have received the same hash value
3
Hashing
Any function that maps larger data to smaller ones must map more than
one possible original datum to the same mapped value
Diagram from Wikipedia
When more than one item in a collection receives the same hash value, a
collision is said to occur. There are various ways to deal with this. The
simplest is to create a list of all items with the same hash code, and do a
sequential or binary search of the list if necessary
4
Hashing
Hash functions often use the data to calculate some numeric
value, then perform a modulo operation.
• This results in a bounded-length hash value. If the
calculation ends in % n, the maximum hash value is n-1, no
matter how large the original data is.
• It also tends to produce hash values that are relatively
uniformly distributed, minimizing collisions.
• Modulo may be used again within a hash-based data
structure in order to scale the hash values to the number of
keys found in the structure
5
Hashing
The more sparse the data, the more useful hashing is
Sparse:
A
AA
AB
AC
AD
AE
AF
AG
AH
AI
AJ
AK
AL
AM
AN
AO
AP
AQ
AR
AS
AT
AU
AV
AW
AX
AY
AZ
Not Sparse:
Juror 1
Juror 2
Juror 3
Juror 4
6
Hashing
Hashing is used for many purposes in programming. The one
we are interested in right now is that it makes it easier to look
up data or memory addresses in a table
7
Sets
A Set differs from a List in these ways:
• a Set does not have any inherent order
• a Set may not contain any duplicate elements
• This means that a set may not contain any two elements e1 and e2
such that e1.equals(e2)
There are several types of Set in the Java Collections
Framework. The most important ones are HashSet and
TreeSet
8
The Collection interface is the root interface
for manipulating a collection of objects.
The Set Interface
Hierarchy
9
10
The AbstractSet Class
The AbstractSet class is a convenience class that extends
AbstractCollection and implements Set. The AbstractSet class
provides concrete implementations for the equals method and
the hashCode method.
The hash code of a set is the sum of the hash codes of all the
elements in the set. Since the size method and iterator
method are not implemented in the AbstractSet class,
AbstractSet is an abstract class.
11
The HashSet Class
The HashSet class is a concrete class
that implements Set.
Objects added to a hash set need to
implement the hashCode() method
12
Duplicates
Attempts to add duplicate records to a set are ignored:
package demos;
import java.util.HashSet;
import java.util.Set;
public class Demo {
public static void main(String[] args) {
Set<String> nameSet = new HashSet<String>();
nameSet.add("Brutus");
nameSet.add("Cicero");
nameSet.add("Spartacus");
printAll(nameSet);
nameSet.add("Spartacus");
printAll(nameSet);
nameSet.add("Spartacus");
printAll(nameSet);
}
public static <T> void printAll(Set<T> set){
System.out.println(" set contains these records: ");
for(T t: set){
System.out.println(t);
}
}
}
13
Hash Set Demo
package demos;
import java.util.HashSet;
import java.util.Iterator;
public class HashSetDemo {
// heavily adapted from http://www-inst.eecs.berkeley.edu/~cs61c/sp13/labs/06/ //
public static void main(String args[]) {
String input = "The right of the people to be secure in their persons, houses, papers, and "
+" effects, against unreasonable searches and seizures, shall not be violated, and no "
+ "Warrants shall issue, but upon probable cause, supported by Oath or affirmation, and "
+ "particularly describing the place to be searched, and the persons or things to be seized.";
Set<String> stringSet = new HashSet<String>();
String[] words = input.split("\\W+"); // \\W+ means "one or more characters that are not alphanumeric or
underscores"
System.out.println("Number of words in the input is " + words.length);
for (String s: words) {
stringSet.add(s.toLowerCase());
}
System.out.println("The number of unique words in the input is "
+ stringSet.size());
Iterator<String> myIterator = stringSet.iterator();
while (myIterator.hasNext()) {
System.out.println(myIterator.next());
}
}
14
LinkedHashSet
Note that the words in the output from the
last example are not in the same order as
they appeared in the original input.
LinkedHashSet preserves the input order
by using a linked list to implement a
HashSet
Change the HashSet in the example to
LinkedHashSet and compare the output.
15
SortedSet Interface and TreeSet Class
• SortedSet is a subinterface of Set, which guarantees that
the elements in the set are sorted.
• TreeSet is a concrete class that implements the
SortedSet interface.
• You can use an iterator to traverse the elements in the
sorted order.
• The elements can be sorted in two ways.
• One way is to use the Comparable interface.
• The other is to specify a comparator for the elements in the set
This approach is referred to as order by comparator
• To see a TreeSet in action, return to the last demo and
replace the HashSet with a TreeSet.
• We will discuss the implementation of trees next week.
16
SortedSet Interface and TreeSet Class
• To order by Comparator, write a class
that implements Comparator and pass
it to the TreeSet constructor
• Be careful: ordering by comparator
results in only one entry per group of
inputs that are equal according to the
compare() method in the Comparator
17
SortedSet Interface and TreeSet Class
package demos;
import java.util.Comparator;
public class StringSortByLength implements Comparator
<String>{
@Override
public int compare(String s1, String s2) {
return s1.length() - s2.length();
}
}
18
SortedSet Interface and TreeSet Class
package demos;
import java.util.Iterator;
import java.util.Set;
import java.util.TreeSet;
public class TreeSetDemo {
// heavily adapted from http://www-inst.eecs.berkeley.edu/~cs61c/sp13/labs/06/ //
public static void main(String args[]) {
String input = "The right of the people to be secure in their persons, houses, papers, "
+"and effects, against unreasonable searches and seizures, shall not be "
+"violated, and no Warrants shall issue, but upon probable cause, supported "
+"by Oath or affirmation, and particularly describing the place to be "
+"searched, and the persons or things to be seized.";
Set<String> stringSet = new TreeSet<String>(new StringSortByLength());
String[] words = input.split("\\W+");
// \\W means "any character that is not alphanumeric or an
underscore
System.out.println("Number of words in the input is " + words.length);
for (String s: words) {
stringSet.add(s.toLowerCase());
}
System.out.println("The number of distinct word lengths in the input is "
+ stringSet.size());
Iterator<String> myIterator = stringSet.iterator();
while (myIterator.hasNext()) {
String nextString = myIterator.next();
System.out.println(nextString.length() + ": First example added: " + nextString);
}
}
}
19
Comparator
package booksdemo;
public class Book implements Comparable<Book>{
String author;
String title;
String isbn;
public Book(String author, String title, String isbn) {
super();
this.author = author;
this.title = title;
this.isbn = isbn;
}
public String getAuthor() {
return author;
}
public String getTitle() {
return title;
}
public String getIsbn() {
return isbn;
}
@Override
public int compareTo(Book otherBook) {
int authDiff = author.compareTo(otherBook.getAuthor());
if(authDiff != 0) return authDiff;
else return title.compareTo(otherBook.getTitle());
}
public String toString(){
return "Author: " + author + " Title: " + title + " ISBN: " + isbn;
}
}
Comparator
package booksdemo;
import java.util.Collection;
import java.util.Comparator;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import java.util.TreeSet;
public class BookTreeSetDemo {
TreeSet<Book> theSet;
public BookTreeSetDemo(){
theSet = new TreeSet<Book>();
}
public BookTreeSetDemo(Comparator<Book> comp){
theSet = new TreeSet<Book>(comp);
}
public void addAll(Collection <Book> c){
theSet.addAll(c);
}
public void printAll(){
System.out.println("The number of items in the set is " + theSet.size());
Iterator<Book> myIterator = theSet.iterator();
while (myIterator.hasNext()) {
System.out.println(myIterator.next().toString());
}
}
public static void main(String[] args){
Book b1 = new Book("Smith", "Basketweaving 101", "1234-5678-9012");
Book b2 = new Book("Smith", "Basketweaving 101", "2345-6789-0123");
Book b3 = new Book("Smith", "Basketweaving 101", "2345-6789-0123");
Book b4 = new Book("Jones", "Basketweaving 102", "3456-7890-1234");
List<Book> l = new LinkedList<Book>();
l.add(b1);
l.add(b2);
l.add(b3);
l.add(b4);
BookTreeSetDemo c1 = new BookTreeSetDemo();
c1.addAll(l);
c1.printAll();
Comparator<Book> comp = new BookISBNComparator();
BookTreeSetDemo c2 = new BookTreeSetDemo(comp);
c2.addAll(l);
c2.printAll();
}
}
20
21
Comparator
package booksdemo;
import java.util.Comparator;
public class BookISBNComparator implements Comparator<Book>{
@Override
public int compare(Book b1, Book b2) {
return b1.getIsbn().compareTo(b2.getIsbn());
}
}
22
Maps
A List or array can be thought of as a set of key-value pairs in
which the keys are integers (the indexes) and the values are
the data being stored.
Suppose we want to be able to look up values using a key
other than in integer index. For example, we need to look up
friends' addresses based on their names. We could write a
class with instance variables for name and address and then
construct a List or Set. When we need to look up an address,
we iterate through the list looking for a match for the name of
the person whose address we want to look up.
Maps provide a simpler alternative by mapping keys of any
type to values of any other type.
23
The Map Interface
The Map interface maps keys to the elements. The keys are
like indexes. In List, the indexes are integer. In Map, the keys
can be any objects.
24
Map Interface and Class Hierarchy
An instance of Map represents a group of objects,
each of which is associated with a key. You can get
the object from a map using a key, and you have to
use a key to put the object into the map.
25
The Map Interface UML Diagram
Concrete Map Classes
26
Entry
27
28
HashMap and TreeMap
The HashMap and TreeMap classes are two
concrete implementations of the Map
interface.
• HashMap is efficient for locating a value,
inserting a mapping, and deleting a
mapping.
• TreeMap, which implements SortedMap, is
efficient for traversing the keys in a sorted
order.
29
HashMap
Map<String, String> myDict = new HashMap<String, String>();
myDict.put("evacuate", "remove to a safe place");
myDict.put("descend", "move or fall downwards");
myDict.put("hypochondriac", "a person who is abnormally anxious about their health");
myDict.put("injunction", "an authoritative warning or order");
myDict.put("creek", "a stream, brook, or a minor tributary of a river");
myDict.put("googol", "10e100");
String defString = "The definition of ";
System.out.println(defString + "descend : " + myDict.get("descend"));
System.out.println(defString + "injunction : " + myDict.get("injunction"));
System.out.println(defString + "googol : " + myDict.get("googol"));
// http://www-inst.eecs.berkeley.edu/~cs61c/sp13/labs/06/
30
LinkedHashMap
• The entries in a HashMap are not ordered.
• LinkedHashMap extends HashMap with a linked list
implementation that supports an ordering of the
entries in the map. Entries in a LinkedHashMap can
be retrieved in the order in which they were inserted
into the map (known as the insertion order), or the
order in which they were last accessed, from least
recently accessed to most recently (access order).
The no-arg constructor constructs a LinkedHashMap
with the insertion order.
• LinkedHashMap(initialCapacity, loadFactor, true).
31
Example: LinkedHashMap
// adapted from http://www.tutorialspoint.com/java/java_linkedhashmap_class.htm
public static void main(String args[]) {
// Create a hash map
LinkedHashMap<String, Double> lhm = new LinkedHashMap<String, Double>();
// Put elements to the map
lhm.put("Zara", new Double(3434.34));
lhm.put("Mahnaz", new Double(123.22));
lhm.put("Ayan", new Double(1378.00));
lhm.put("Daisy", new Double(99.22));
lhm.put("Qadir", new Double(-19.08));
// Get a set of the entries
Set<Entry<String, Double>> set = lhm.entrySet();
// Get an iterator
Iterator<Entry<String, Double>> i = set.iterator();
// Display elements
while (i.hasNext()) {
Entry<String, Double> me = i.next();
System.out.print(me.getKey() + ": ");
System.out.println(me.getValue());
}
System.out.println();
// Deposit 1000 into Zara's account
double balance = lhm.get("Zara").doubleValue();
lhm.put("Zara", new Double(balance + 1000));
System.out.println("Zara's new balance: " + lhm.get("Zara"));
}
32
TreeMap
package demos;
import java.util.TreeMap;
public class Demo {
//adapted from http://www.roseindia.net/java/jdk6/TreeMapExample.shtml
public static void main(String[] args) {
TreeMap<Integer, String> tMap = new TreeMap<Integer, String>();
// inserting data in alphabetical order by entry value
tMap.put(6, "Friday");
tMap.put(2, "Monday");
tMap.put(7, "Saturday");
tMap.put(1, "Sunday");
tMap.put(5, "Thursday");
tMap.put(3, "Tuesday");
tMap.put(4, "Wednesday");
// data ends up sorted by key value
System.out.println("Keys of tree map: " + tMap.keySet());
System.out.println("Values of tree map: " + tMap.values());
System.out.println("Key: 5 value: " + tMap.get(5) + "\n");
System.out.println("First key: " + tMap.firstKey() + " Value: "
+ tMap.get(tMap.firstKey()) + "\n");
System.out.println("Last key: " + tMap.lastKey() + " Value: "
+ tMap.get(tMap.lastKey()) + "\n");
33
TreeMap
System.out.println("All values with an enhanced for loop: ");
for(String s: tMap.values())
System.out.println(s);
System.out.println("First three values: ");
for(String s: tMap.headMap(4).values())
System.out.println(s);
System.out.println("Values starting with fourth value: ");
for(String s: tMap.tailMap(4).values())
System.out.println(s);
System.out.println("Removing first datum: " +
tMap.remove(tMap.firstKey()));
System.out.println("Now the tree map Keys: " + tMap.keySet());
System.out.println("Now the tree map contain: " + tMap.values() +
"\n");
System.out.println("Removing last entry: " +
tMap.remove(tMap.lastKey()));
System.out.println("Now the tree map Keys: " + tMap.keySet());
System.out.println("Now the tree map contain: " + tMap.values());
}
}
34
Case Study: Counting the Occurrences of
Words in a Text
This program counts the occurrences of words in a text and displays the words
and their occurrences in ascending alphabetical order.
The program uses a hash map to store a pair consisting of a word and its
count.
Algorithm for handling input:
For each word in the input file
Remove non-letters (such as punctuation marks) from the word.
If the word is already present in the frequencies map
Increment the frequency.
Else
Set the frequency to 1
To sort the map, convert it to a tree map.
35
package frequency;
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Map;
import java.util.Scanner;
import java.util.TreeMap;
public class WordFrequency {
public static void main(String[] args) throws FileNotFoundException {
Map<String, Integer> frequencies = new TreeMap<String, Integer>();
Scanner in = new Scanner(new File("romeojuliet.txt"));
while (in.hasNext()) {
String word = clean(in.next());
// Get the old frequency count
if (word != "") {
Integer count = frequencies.get(word);
// If there was none, put 1; otherwise, increment the count
if (count == null) {
count = 1;
} else {
count = count + 1;
}
frequencies.put(word, count);
}
}
// Print all words and counts
for (String key : frequencies.keySet()) {
System.out.println(key + ": " + frequencies.get(key));
}
}
36
/**
* Removes characters from a string that are not letters.
*
* @param s
*
a string
* @return a string with all the letters from s
*/
public static String clean(String s) {
String r = "";
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (Character.isLetter(c)) {
r = r + c;
}
}
return r.toLowerCase();
}
}
37
Pattern Matching
Programming often requires testing strings for particular
patterns
• Search text for instances of a particular word, like using
<ctrl> f in a word processor
• Search for instances of a word that might have variant
spelling, like color and colour
• Search for instances of a range of different substrings, like
any integer
• Test a string to see whether it matches a complex pattern,
like a url or a Social Security Number (xxx-xx-xxxx, where
each x stands for a single digit)
38
Regular Expressions
• In many cases we can specify exactly the characters we
•
•
•
•
want to match, like 'c'
In others, a pattern includes a single character which can
match any other character
Sometimes a substring can match any string, as when
searching a Windows directory for all files with a particular
filename extension
In yet other cases, a substring must match any of a range of
possible substrings, eg validating input for an email address
Regular Expressions provide a way to specify
characteristics of strings that is flexible enough to use in all
these cases.
39
Regular Expressions
• You will use RegExs in many contexts, but a very common
one is the matches() method of the String class.
• Consider two variants of the same Irish name, Hurley and
O'Hurley. "John Hurley".equals("John O'Hurley") is false.
However, we can use matches() with a RegEx if we are
searching for either one.
String h1 = "John Hurley";
String h2 = "John O'Hurley";
System.out.println(h1.equals(h1));
System.out.println(h1.equals(h2));
System.out.println(h1.matches(h2));
40
RegExs
• x a specified character x Java matches Java
• . any single character Java matches J..a
• (ab|cd) ab or cd ten matches t(en|im)
• [abc] a, b, or c Java matches Ja[uvwx]a
• [^abc] any character except Java matches Ja[^ars]a
• a, b, or c
• [a-z] a through z Java matches [A-M]av[a-d]
• [^a-z] any character except a through z
• Java matches Jav[^b-d]
• [a-e[m-p]] a through e or m through p
• Java matches [A-G[I-M]]av[a-d]
• [a-e&&[c-p]] intersection of a-e with c-p
• Java matches [A-P&&[I-M]]av[a-d]
41
RegExs
• \d a digit, same as [0-9] Java2 matches "Java[\\d]"
• \D a non-digit $Java matches "[\\D][\\D]ava"
• \w a word character Java1 matches "[\\w]ava[\\w]"
• \W a non-word character $Java matches "[\\W][\\w]ava"
• \s a whitespace character "Java 2" matches "Java\\s2"
• \S a non-whitespace char Java matches "[\\S]ava"
•
• p* zero or more Java and av match "[A-z]*"
•
•
•
•
•
•
•
•
•
•
occurrences of pattern p bbb matches "a*"
p+ one or more occurrences b, aa, and ZZZ match "[A-z]+"
p? zero or one Java and ava match "J?ava"
p{n} exactly n occurrences of pattern p Java matches "Ja{1}va"
Java does not match "Ja{2}va"
p{n,} at least n occurrences of pattern p Java and Jaaava match "Ja{1,}va"
Java does not match "Ja{2,}va"
p{n,m} between n and m a matches "a{1,9}"
aaaaaaaaaa does not match "a{1,9}"
Java does not match "Ja{2,9}va"
42
RegExs
• Backslash is a special character that starts an escape sequence in a string. So
•
•
•
•
•
you need to use "\\d" in Java to represent \d.
A whitespace (or a whitespace character) is any character which does not
display itself but does take up space. The characters ' ', '\t', '\n', '\r', '\f' are
whitespace characters. So \s is the same as [ \t\n\r\f], and \S is the same as [^
\t\n\r\f\v].
Backslash is a special character that starts an escape sequence in a string. So
you need to use "\\d" in Java to represent \d.
*, +, ?, {n}, {n,}, and {n, m} in Table 1 are quantifiers that specify how many times
the pattern before a quantifier may repeat. For example, A* matches zero or
more A’s, A+ matches one or more A’s, A? matches zero or one A’s, A{3}
matches exactly AAA, A{3,} matches at least three A’s, and A{3,6} matches
between 3 and 6 A’s. * is the same as {0,}, + is the same as {1,}, and ? is the
same as {0,1}.
Do not use spaces in the repeat quantifiers. For example, A{3,6} cannot be
written as A{3, 6} with a space after the comma.
You may use parentheses to group patterns. For example, (ab){3} matches
ababab, but ab{3} matches abbb
43
RegExs
• String ssNum = "123-45-6789";
• String notSsNum = "123456789";
• String ssPat = "[\\d]{3}-[\\d]{2}-[\\d]{4}";
// Social Security Number
• System.out.println(ssNum.matches(ssPat));
• System.out.println(ssNum.matches(notSsNum));
• Always provide users with guidance when they must match a regex!
44
RegExs
public static void main(String[] args) {
String ssNum = null;
String ssPatt = "[\\d]{3}-[\\d]{2}-[\\d]{4}";
String input;
String prompt = "Please enter your Social Security Number in the format XXX-XXXXXX";
do{
input = JOptionPane.showInputDialog(null, prompt);
if(input.matches(ssPatt)) ssNum = input;
else prompt = "Invalid input. " + prompt;
} while(ssNum == null);
JOptionPane.showMessageDialog(null, "Thanks, taxpayer # " + ssNum );
}
45
RegExs
String tmi = "My Social Security number is 123-45-6789 and "
+ "my telephone number is (123)456-7890";
String sanitized = tmi.replaceAll("[\\d]{3}-[\\d]{2}-[\\d]{4}",
"XXX-XX-XXXX").replaceAll("\\([\\d]{3}\\)[\\d]{3}-[\\d]{4}", "(XXX)XXX-XXXX");
System.out.println(sanitized);