Transcript session05

Session 05
Java Strings and Files
Exercise
Complete the “quick-and-dirty” class CharacterCounter
containing only a main() method that displays the number
of non-space characters on the command line after the
command. For example:
$ java CharacterCounter
0
$ java CharacterCounter a
1
$ java CharacterCounter a bc def ghij
10
CharacterCount template
public class CharacterCounter {
public static void main( String[] args ) {
int characterCount = 0 ;
} // end main
} // end class CharacterCounter
StringTokenizer
• Useful tool for processing a String object
• Allows you to sequentially walk down a
String and extract “words”/tokens that are
delimited by specified characters
• What delimiter normally aids us in parsing a
long string into words?
StringTokenizer
General usage of a StringTokenizer:
– create one using a constructor that takes a
string argument to process
– send one of two messages: hasMoreTokens()
and nextToken
– use a stereotypical loop to process a sequence
of strings
A default StringTokenizer uses spaces as
delimiters.
StringTokenizer Example
import java.util.StringTokenizer;
public class EchoWordsInArgumentV1 {
public static void main( String[] args ) {
StringTokenizer words = new StringTokenizer(args[0]);
while( words.hasMoreElements() ) {
String word = words.nextToken();
System.out.println( word );
} // end while
} // end main
} // end class EchoWordsInArgumentV1
StringTokenizer Example
$ java EchoWordsInArgumentV1 "StringTokenizer, please process me."
StringTokenizer,
please
process
me.
• Notice the quotes (“”) in the command line so the
whole string is read as args[0].
• The comma (“,”) and period (“.”)are part of the
words and not delimiters by default.
StringTokenizer Example 2
• Fortunately, we can construct a StringTokenizer that
uses specified characters for delimiters.
• The designer of the StringTokenizer was planning
ahead for future usage!!!
$ java EchoWordsInArgumentV2 "StringTokenizer, please process me."
StringTokenizer
please
process
me
StringTokenizer Example 2
import java.util.StringTokenizer;
public class EchoWordsInArgumentV2 {
public static void main( String[] args ) {
String delimiters = " .?!()[]{}|?/&\\,;:-\'\"\t\n\r";
StringTokenizer words = new StringTokenizer( args[0],
delimiters );
while( words.hasMoreElements() ) {
String word = words.nextToken();
System.out.println( word );
} // end while
} // end main
} // end class EchoWordsInArgumentV2
UNIX/Linux pipe
• “|” character on the command line
• Allows the output of one program to be sent as input
to another program, like the UNIX “sort” utility.
$ java EchoWordsInArgumentV2 "StringTokenizer, please process me.” | sort
StringTokenizer
me
please
process
• Is this sorted? How can we fix this?
StringTokenizer Example 3
import java.util.StringTokenizer;
public class EchoWordsInArgumentV3 {
public static void main( String[] args ) {
String delimiters = " .?!()[]{}|?/&\\,;:-\'\"\t\n\r";
StringTokenizer words = new StringTokenizer( args[0],
delimiters );
while( words.hasMoreElements() ) {
String word = words.nextToken();
word = word.toLowerCase();
System.out.println( word );
} // end while
} // end main
} // end class EchoWordsInArgumentV3
StringTokenizer Example 3
$ java EchoWordsInArgumentV3 "StringTokenizer, please process me." | sort
me
please
process
stringtokenizer
Java File I/O
• Allows us to write and read “permanent”
information to and from disk
• How would file I/O help improve the
capabilities of the MemoPadApp?
Java File I/O Example: Echo.java
• echoes all the words in one file to an output
file, one per line.
$ java Echo hamlet.txt hamlet.out
$ less hamlet.out
1604
the
tragedy
of
hamlet
prince
of
denmark
by
william
shakespeare ...
Study Echo.java’s File I/O
• have constructors that allow convenient and
flexible processing
• send input message: readLine()
• send output messages: print() and println()
• use a stereotypical loop to process a file of
lines
• use of the stereotypical StringTokenizer
loop as inner loop
import java.io.*;
import java.util.StringTokenizer;
public class Echo {
public static void main( String[] args ) throws IOException {
String delimiters = " .?!()[]{}|?/&\\,;:-\'\"\t\n\r";
BufferedReader inputFile = new BufferedReader(new FileReader(args[0]) );
PrintWriter
outputFile = new PrintWriter( new FileWriter( args[1] ) );
String buffer = null;
while( true ) {
buffer = inputFile.readLine();
if ( buffer == null ) break;
buffer = buffer.toLowerCase();
StringTokenizer tokens = new StringTokenizer( buffer, delimiters );
while( tokens.hasMoreElements() ) {
String word = tokens.nextToken();
outputFile.println( word );
} // end while
} // end while(true)...
} // end main
} // end class Echo
wc - UNIX/Linux utility
• wc prints the number of lines, words, and
characters in a file to standard output.
• For example:
$ wc hamlet.txt
4792
31957 196505 hamlet.txt
Exercise
• Using Echo.java as your starting point,
create a WordCount.java program that does
the same thing as wc, i.e., prints the number
of lines, words, and characters in a file to
standard output. For example:
$ java WordCount hamlet.txt
4792 32889 130156
import java.io.*;
import java.util.StringTokenizer;
public class WordCount {
public static void main( String[] args ) throws IOException {
String delimiters = " .?!()[]{}|?/&\\,;:-\'\"\t\n\r";
BufferedReader inputFile = new BufferedReader( new FileReader( args[0] ) );
String
int
int
int
buffer
chars
words
lines
=
=
=
=
null;
0;
0;
0;
while( true ) {
buffer = inputFile.readLine();
if ( buffer == null ) break;
lines++;
buffer = buffer.toLowerCase();
StringTokenizer tokens = new StringTokenizer( buffer, delimiters );
while( tokens.hasMoreElements() ) {
String word = tokens.nextToken();
words++;
chars += word.length();
} // end while
} // end while( true )...
System.out.println( "" + lines + " " + words + " " + chars );
} // end main
} // end class WordCount
Why the difference in the number
of words and number of characters?
$ wc hamlet.txt
4792
31957
196505 hamlet.txt
$ java WordCount hamlet.txt
4792 32889 130156