out.println(ssNum.matches(ssPat))

Download Report

Transcript out.println(ssNum.matches(ssPat))

LECTURE 8
CS203
2
Maps
A List or array can be thought of as a set of key-value pairs in
which the keys are integers (the indexes) and the values are
the data being stored.
Suppose we want to be able to look up values using a key
other than an integer index. For example, we need to look up
friends' addresses based on their names. We could write a
class with instance variables for name and address and then
construct a List or Set. When we need to look up an address,
we iterate through the list looking for a match for the name of
the person whose address we want to look up.
Maps provide a simpler alternative by mapping keys of any
type to values of any other type.
3
The Map Interface
The Map interface contains methods that map keys to the
elements. The keys serve the same function as array or lists
indices. In List, the indexes are integer, but in Map, the keys can be
any objects.
An instance of Map represents a group of objects (values) and a
group of keys. You get the object from a Map by searching for the
key. Correspondingly, you supply the key when you add a value to
the map.
A Map is parameterized with two types, the type of the key and the
type of the value. For example, a Map <Long, Student> would be
convenient for looking up students by CIN
4
The Map Interface
5
The Map Interface UML Diagram
6
Concrete Map Classes
7
Entry
8
HashMap and TreeMap
The HashMap and TreeMap classes are two
concrete implementations of the Map
interface.
• HashMap is efficient for locating a value,
inserting a mapping, and deleting a
mapping.
• TreeMap, which implements SortedMap, is
efficient for traversing the keys in a sorted
order.
9
HashMap
Map<String, String> myDict = new HashMap<String, String>();
myDict.put("evacuate", "remove to a safe place");
myDict.put("descend", "move or fall downwards");
myDict.put("hypochondriac", "a person who is abnormally anxious about their health");
myDict.put("injunction", "an authoritative warning or order");
myDict.put("creek", "a stream, brook, or a minor tributary of a river");
myDict.put("googol", "10e100");
String defString = "The definition of ";
System.out.println(defString + "descend : " + myDict.get("descend"));
System.out.println(defString + "injunction : " + myDict.get("injunction"));
System.out.println(defString + "googol : " + myDict.get("googol"));
// http://www-inst.eecs.berkeley.edu/~cs61c/sp13/labs/06/
10
LinkedHashMap
• The entries in a HashMap are not ordered in a way
that is typically useful to the programmer.
• LinkedHashMap extends HashMap with a linked list
implementation that supports an ordering of the
entries in the map. Entries in a LinkedHashMap can
be retrieved in the order in which they were inserted
into the map (known as the insertion order), or the
order in which they were last accessed, from least
recently accessed to most recently (access order).
The no-arg constructor constructs a LinkedHashMap
with the insertion order.
• LinkedHashMap<key type, value type>(initialCapacity,
loadFactor, true).
11
Example: LinkedHashMap
// adapted from http://www.tutorialspoint.com/java/java_linkedhashmap_class.htm
public static void main(String args[]) {
// Create a hash map
LinkedHashMap<String, Double> lhm = new LinkedHashMap<String, Double>();
// Put elements to the map
lhm.put("Zara", new Double(3434.34));
lhm.put("Mahnaz", new Double(123.22));
lhm.put("Ayan", new Double(1378.00));
lhm.put("Daisy", new Double(99.22));
lhm.put("Qadir", new Double(-19.08));
// Get a set of the entries
Set<Entry<String, Double>> set = lhm.entrySet();
// Get an iterator
Iterator<Entry<String, Double>> i = set.iterator();
// Display elements
while (i.hasNext()) {
Entry<String, Double> me = i.next();
System.out.print(me.getKey() + ": ");
System.out.println(me.getValue());
}
System.out.println();
// Deposit 1000 into Zara's account
double balance = lhm.get("Zara").doubleValue();
lhm.put("Zara", new Double(balance + 1000));
System.out.println("Zara's new balance: " + lhm.get("Zara"));
}
12
TreeMap
package demos;
import java.util.TreeMap;
public class Demo {
//adapted from http://www.roseindia.net/java/jdk6/TreeMapExample.shtml
public static void main(String[] args) {
TreeMap<Integer, String> tMap = new TreeMap<Integer, String>();
// inserting data in alphabetical order by entry value
tMap.put(6, "Friday");
tMap.put(2, "Monday");
tMap.put(7, "Saturday");
tMap.put(1, "Sunday");
tMap.put(5, "Thursday");
tMap.put(3, "Tuesday");
tMap.put(4, "Wednesday");
// data ends up sorted by key value
System.out.println("Keys of tree map: " + tMap.keySet());
System.out.println("Values of tree map: " + tMap.values());
System.out.println("Key: 5 value: " + tMap.get(5) + "\n");
System.out.println("First key: " + tMap.firstKey() + " Value: "
+ tMap.get(tMap.firstKey()) + "\n");
System.out.println("Last key: " + tMap.lastKey() + " Value: "
+ tMap.get(tMap.lastKey()) + "\n");
13
TreeMap
System.out.println("All values with an enhanced for loop: ");
for(String s: tMap.values())
System.out.println(s);
System.out.println("First three values: ");
for(String s: tMap.headMap(4).values())
System.out.println(s);
System.out.println("Values starting with fourth value: ");
for(String s: tMap.tailMap(4).values())
System.out.println(s);
System.out.println("Removing first datum: " +
tMap.remove(tMap.firstKey()));
System.out.println("Now the tree map Keys: " + tMap.keySet());
System.out.println("Now the tree map contain: " + tMap.values() +
"\n");
System.out.println("Removing last entry: " +
tMap.remove(tMap.lastKey()));
System.out.println("Now the tree map Keys: " + tMap.keySet());
System.out.println("Now the tree map contain: " + tMap.values());
}
}
14
Counting the Occurrences of Words in a Text
This program counts the occurrences of words in a text and displays the words and their
occurrences in ascending alphabetical order. The program uses a hash map to store a
pair consisting of a word and its count.
Algorithm for handling input:
For each word in the input file
Remove non-letters (such as punctuation marks) from the word.
If the word is already present in the frequencies map
Increment the frequency.
Else
Set the frequency to 1
To sort the map, convert it to a tree map.
Source: Liang
15
package frequency;
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Map;
import java.util.Scanner;
import java.util.TreeMap;
public class WordFrequency {
public static void main(String[] args) throws FileNotFoundException {
Map<String, Integer> frequencies = new TreeMap<String, Integer>();
// in your labs, ask the user for the path to the input file; don’t hardcode it like this.
Scanner in = new Scanner(new File("romeojuliet.txt"));
while (in.hasNext()) {
String word = clean(in.next());
// Get the old frequency count
if (word != "") {
Integer count = frequencies.get(word);
// If there was none, put 1; otherwise, increment the count
if (count == null) {
count = 1;
} else {
count = count + 1;
}
frequencies.put(word, count);
}
}
// Print all words and counts
for (String key : frequencies.keySet()) {
System.out.println(key + ": " + frequencies.get(key));
}
}
16
/**
* Removes non-letter characters from a String
*
* @param s
*
a string
* @return a string with all the letters from s
*/
public static String clean(String s) {
String r = "";
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (Character.isLetter(c)) {
r = r + c;
}
}
return r.toLowerCase();
}
}
17
Pattern Matching
Programming often requires testing strings for particular
patterns
• Search text for instances of a particular word, like using
<ctrl> f in a word processor
• Search for instances of a word that might have variant
spelling, like color and colour
• Search for instances of a range of different substrings, like
any integer
• Test a string to see whether it matches a complex pattern,
like a url or a Social Security Number (xxx-xx-xxxx, where
each x stands for a single digit)
18
Regular Expressions
• In many cases we can specify exactly the characters we
•
•
•
•
want to match, like 'c'
In others, a pattern includes a single character which can
match any other character
Sometimes a substring can match any string, as when
searching a Windows directory for all files with a particular
filename extension
In yet other cases, a substring must match any of a range of
possible substrings, eg validating input for an email address
Regular Expressions provide a way to specify
characteristics of strings that is flexible enough to use in all
these cases.
19
Regular Expressions
• You will use RegExs in many contexts, but a very common
one is the matches() method of the String class.
• Consider two variants of the same Irish name, Hurley and
O'Hurley. "John Hurley".equals("John O'Hurley") is false.
However, we can use matches() with a RegEx if we are
searching for either one.
String h1 = "John Hurley";
String h2 = "John O'Hurley";
System.out.println(h1.equals(h1));
System.out.println(h1.equals(h2));
System.out.println(h1.matches(h2));
System.out.println(h1.matches("John (O')?Hurley"));
System.out.println(h2.matches("John (O')?Hurley"));
20
RegExs
• x a specified character x: Java matches Java
• . any single character: Java matches J..a
• (ab|cd) ab or cd: ten matches t(en|im)
• [abc] a, b, or c: Java matches Ja[uvwx]a
• [^abc] any character except a, b, or c: Java matches
Ja[^ars]a
• [a-z] a through z in Unicode order:
• Java matches [A-M]av[a-d]
• [^a-z] any character except a through z:
• Java matches Jav[^b-d]
• [a-e[m-p]] a through e or m through p:
• Java matches [A-G[I-M]]av[a-d]
• [a-e&&[c-p]] intersection of a-e with c-p:
• Java matches [A-P&&[I-M]]av[a-d]
21
RegExs
• \d a digit, same as [0-9]: Java2 matches "Java[\\d]"
• \D a non-digit: $Java matches "[\\D][\\D]ava"
• \w a word character (alphanumeric or underscore): Java1 matches "[\\w]ava[\\w]"
• \W a non-word character: $Java matches "[\\W][\\w]ava"
• \s a whitespace character: "Java 2" matches "Java\\s2"
• \S a non-whitespace char: Java matches "[\\S]ava"
• p* zero or more occurrences of pattern p ; Java and av match "[A-z]*"; bbb
matches "b*"
• p+ one or more occurrences of pattern p: b, aa, and ZZZ match "[A-z]+"
• p? zero or one occurrence: Java and ava match "J?ava"
• p{n} exactly n occurrences: \
• Java matches "Ja{1}va" but does not match "Ja{2}va"
• p{n,} at least n occurrences of pattern p:
• Java and Jaaava match "Ja{1,}va" but Java does not match "Ja{2,}va"
• p{n,m} between n and m:
• a matches "a{1,9}" but aaaaaaaaaa does not match "a{1,9}"
• Java does not match "Ja{2,9}va"
22
RegExs
• With the String match methods, backslash is a special character that starts an
•
•
•
•
•
•
escape sequence in a string. So you need to use "\\d" in Java to represent \d.
A whitespace (or a whitespace character) is any character which does not
display itself but does take up space. The characters ' ', '\t', '\n', '\r', '\f' are
whitespace characters. So \s is the same as [ \t\n\r\f], and \S is the same as [^
\t\n\r\f\v].
*, +, ?, {n}, {n,}, and {n, m} in Table 1 are quantifiers that specify how many times
the pattern before a quantifier may repeat. For example, A* matches zero or
more A’s, A+ matches one or more A’s, A? matches zero or one A’s, A{3}
matches exactly AAA, A{3,} matches at least three A’s, and A{3,6} matches
between 3 and 6 A’s. * is the same as {0,}, + is the same as {1,}, and ? is the
same as {0,1}.
Do not use spaces in the repeat quantifiers. For example, A{3,6} cannot be
written as A{3, 6} with a space after the comma.
You may use parentheses to group patterns. For example, (ab){3} matches
ababab, but ab{3} matches abbb
^ can also mean the beginning of a line
$ end of a line
23
RegExs
• String ssNum = "123-45-6789";
• String notSsNum = "123456789";
• String ssPat = "[\\d]{3}-[\\d]{2}-[\\d]{4}";
// Social Security Number
• System.out.println(ssNum.matches(ssPat));
• System.out.println(ssNum.matches(notSsNum));
• Always provide users with guidance when they must match a regex with input!
24
RegExs
public static void main(String[] args) {
String ssNum = null;
String ssPatt = "[\\d]{3}-[\\d]{2}-[\\d]{4}";
String input;
String prompt = "Please enter your Social Security Number in the format XXX-XXXXXX";
do{
input = JOptionPane.showInputDialog(null, prompt);
if(input.matches(ssPatt)) ssNum = input;
else prompt = "Invalid input. " + prompt;
} while(ssNum == null);
JOptionPane.showMessageDialog(null, "Thanks, taxpayer # " + ssNum );
}
25
RegExs
String tmi = "My Social Security number is 123-45-6789 and "
+ "my telephone number is (123)456-7890";
String sanitized = tmi.replaceAll("[\\d]{3}-[\\d]{2}-[\\d]{4}",
"XXX-XX-XXXX").replaceAll("\\([\\d]{3}\\)[\\d]{3}-[\\d]{4}", "(XXX)XXX-XXXX");
System.out.println(sanitized);
26
RegExs
• A Pattern object is a compiled representation of a regular expression.
The Pattern class provides no public constructors. To create a pattern,
you must first invoke one of its public static compile methods, which will
then return a Pattern object. These methods accept a regular
expression as the first argument; the first few lessons of this trail will
teach you the required syntax.
• A Matcher object is the engine that interprets the pattern and performs
match operations against an input string. Like the Pattern class,
Matcher defines no public constructors. You obtain a Matcher object by
invoking the matcher method on a Pattern object.
• Regexs that require two backslashes with the String match methods
only require one backslash with Matcher. For example, a digit is
represented by \d, not \\d
27
RegExs
private static String inputString = "The thrill is gone\nThe thrill is gone away\nThe thrill is gone baby\nThe thrill is gone
away\n "You know you done me wrong baby\nAnd you'll be sorry someday";
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
while (true) {
System.out.println("Enter regex: ");
Pattern pattern = Pattern.compile(sc.nextLine());
Matcher matcher = pattern.matcher(inputString);
boolean found = false;
while (matcher.find()) {
System.out.println("I found the text" + " \"" + matcher.group() + "\"" + "starting at "
+ "index " + matcher.start() + " and ending at index " + matcher.end());
found = true;
}
if (!found) {
System.out.println("No match found.");
}
}
}
// adapted from http://docs.oracle.com/javase/tutorial/essential/regex/test_harness.html
28
RegExs
Regex: T
// any upper case T
Results:
I found the text "T" starting at index 0 and ending at index 1
I found the text "T" starting at index 19 and ending at index 20
I found the text "T" starting at index 43 and ending at index 44
I found the text "T" starting at index 67 and ending at index 68
Regex: [Tt]
// any t, upper case or lower case
Results:
I found the text "T" starting at index 0 and ending at index 1
I found the text "t" starting at index 4 and ending at index 5
I found the text "T" starting at index 19 and ending at index 20
I found the text "t" starting at index 23 and ending at index 24
I found the text "T" starting at index 43 and ending at index 44
I found the text "t" starting at index 47 and ending at index 48
I found the text "T" starting at index 67 and ending at index 68
I found the text "t" starting at index 71 and ending at index 72
Regex: (^[Tt][A-z]+|\W+[Tt][A-z]+) // words that begin with T or t
Results:
I found the text "The" starting at index 0 and ending at index 3
I found the text " thrill" starting at index 3 and ending at index 10
I found the text "
The" starting at index 18 and ending at index 22
I found the text " thrill" starting at index 22 and ending at index 29
I found the text "
The" starting at index 42 and ending at index 46
I found the text " thrill" starting at index 46 and ending at index 53
I found the text "
The" starting at index 66 and ending at index 70
I found the text " thrill" starting at index 70 and ending at index 77
29
RegExs
Regex: [A-z]+y([^A-z]+|$|\n) // words that end with y
Results:
I found the text "away
" starting at index 38 and ending at index 43
I found the text "baby
" starting at index 62 and ending at index 67
I found the text "away
" starting at index 86 and ending at index 91
I found the text "baby
" starting at index 118 and ending at index 123
I found the text "sorry " starting at index 137 and ending at index 143
I found the text "someday" starting at index 143 and ending at index 150
Regex: [A-z]+w+[A-z]+
// words with w in the middle
Results:
• I found the text "away" starting at index 38 and ending at index 42
• I found the text "away" starting at index 86 and ending at index 90