Line-based file processing

Download Report

Transcript Line-based file processing

Line-based file processing
reading: 6.3
Use this if you seem to need line based processing for some aspects
of your program and token based for others. Line based can do both.
Token-based can’t.
1
Line-based Processing (Section 6.3)
Line-based Scanners
Method
nextLine()
Description
returns next entire line of input (from cursor to \n) This is
the only non-token-based Scanner method. Also
consumes the terminating \n.
hasNextLine() returns true if there are any more lines of input to
read (always true for console input) False when EOF (end of
file) is only thing left.
Scanner input = new Scanner(new File("file name"));
while (input.hasNextLine()) {
String line = input.nextLine();
process this line; // scan this line using a String
// scanner (see Scanners on
// Strings, below).
}
File processing question
• Write a program that reads a text file and "quotes" it by
putting a > in front of each line. Example input: file msg.txt
Chris,
Can you please modify the a5/turnin settings
to make CSE 142 Homework 5 due Wednesday,
July 27 at 11:59pm instead of due tomorrow
at 6pm?
Thanks, Pat
• Example output:
>Chris,
>
> Can you
> to make
> July 27
> at 6pm?
>
> Thanks,
3
please modify the a5/turnin settings
CSE 142 Homework 5 due Wednesday,
at 11:59pm instead of due tomorrow
Pat
File processing answer
import java.io.*;
// for File
import java.util.*; // for Scanner
public class QuoteMessage {
public static void main(String[] args)
throws FileNotFoundException {
Scanner input = new Scanner(new File("msg.txt"));
while (input.hasNextLine()) {
String line = input.nextLine();
System.out.println(">" + line);
}
}
}
4
Consuming lines of input
23
3.14 John Smith
45.2
"Hello" world
19
47
• The Scanner consumes a line through the \n but returns everything except the \n.
• This is why it is called line processing.
• Examples: (The portion of the file returned is bold.)
23\t3.14 John Smith\t"Hello" world\n\t\t45.2
^
19\n
47
– String line = input.nextLine();
23\t3.14 John Smith\t"Hello" world\n\t\t45.2 19\n 47
^
line now contains 23\t3.14 John Smith\t"Hello" world
– String line2 = input.nextLine();
23\t3.14 John Smith\t"Hello" world\n\t\t45.2
line2 now contains \t\t45.2
19\n 47
^
19
– Each \n character is consumed but not returned as part of the string.
Scanners on Strings
• A Scanner can tokenize each line (String)in a file:
Scanner name = new Scanner(String);
– Example:
String text = "15 3.2 hello
9 27.5";
Scanner scan = new Scanner(text);
int num = scan.nextInt();
System.out.println(num);
// 15
double num2 = scan.nextDouble();
System.out.println(num2);
// 3.2
String word = scan.next();
System.out.println(word);
// hello
As in the scanner from the console
• the scanner ignores all white space – spaces, new lines and tabs.
• if reading the line of text from the console, there will never be a \n in one of
these strings because the input.nextLine() method never returns a \n.
• input.nextLine() reads the next line and returns the string up to the \n, then it
consumes the \n.
Advantage of using line processing with a string scanner.
The line processing brings in a line, stopping at the \n.
Token processing reads in tokens, ignoring \n’s.
The end of the line is often an important marker that token processing
ignores but line processing retains.
7
• Methods of Scanner for a string are the same as we have seen for
file and console input:
Method
Description
nextInt()
reads and returns an int value
nextDouble()
reads and returns a double value
next()
reads and returns the next token* as a String
Method Name
Description
hasNext()
whether any more tokens remain
hasNextDouble()
whether the next token can be interpreted
as type double
hasNextInt()
whether the next token can be interpreted
as type int
* A token on input is any contiguous data separated by white space. We will see
examples later. hasNext and hasNextLine can each return false. Console could not.
8
Breaking input lines into Strings, then tokens.
Input file input.txt:
Output to console:
The quick brown fox jumps over
the lazy dog.
Line 1 contains 6 words
Line 2 contains 3 words
// Counts the words on each line of a file
int lineCount = 0;
Scanner input = new Scanner(new File("input.txt"));
while (input.hasNextLine()) {
String line = input.nextLine(); // works with a line
Scanner lineScan = new Scanner(line);
lineCount++;
// process the contents of this line
int count = 0;
Note: The last word is “dog.”.
while (lineScan.hasNext()) {
String word = lineScan.next(); // works with token
count++;
}
System.out.println("Line " + lineCount
+ " contains " + count + " words");
}
}
Hours question
• A file (hours.txt) contains the employees ID number, their name (first name only)
and the number of hours they worked on each day they worked. Print the total
number of hours worked and the average hours worked per day. Note, different
employees work a different number of days.
123 Kim 12.5 8.1 7.6 3.2
456 Eric 4.0 11.6 6.5 2.7 12
789 Stef 8.0 8.0 8.0 8.0 7.5
– It should produce the following output:
Kim
(ID# 123) worked 31.4 hours (7.9 hours/day)
Eric (ID# 456) worked 36.8 hours (7.4 hours/day)
Stef (ID# 789) worked 39.5 hours (7.9 hours/day)
With token-based processing we would not know where the end of line is.
For example, we would read Kim’s ID, name and the 4 time amounts she
worked. But, the 456 (Eric’s ID) would probably be read as hours
because the file.nextDouble() would not see the \n. Kim would be given
an extra 456 hours! But the program would then stop because it would
want another double value but the next thing is Eric’s name. Input
mismatch exception.
Hours answer
// Processes an employee input file and outputs each employee's
// hours.
The point is that we now only process
import java.io.*;
// for File
the data on one line at a time. We do
not cross line boundaries.
import java.util.*; // for Scanner
public class Hours {
public static void main(String[] args)
throws FileNotFoundException {
Scanner input = new Scanner(new File("hours.txt"));
while (input.hasNextLine()) {
String line = input.nextLine();
Scanner lineScan = new Scanner(line);
int id = lineScan.nextInt();
// e.g. 456
String name = lineScan.next(); // e.g. "Eric"
double sum = 0.0;
int count = 0;
while (lineScan.hasNextDouble()) {
sum = sum + lineScan.nextDouble(); // e.g. 11.6
count++;
}
double average = sum / count;
System.out.printf("%-5s (ID# %d) worked %5.1f hours (%.1f hours/day)\n",
name, id, sum, average);
}
}
}
Kim
Eric
Stef
(ID# 123) worked
(ID# 456) worked
(ID# 789) worked
31.4 hours (7.9 hours/day)
36.8 hours (7.4 hours/day)
39.5 hours (7.9 hours/day)
Line processing example
• Example: Read in a file containing HTML text, and surround all
uppercase tokens with < and > . Assume any token that is all upper
case is an HTML tag, so BODY is translated to <BODY>.
– Retain the original order of the tokens on each line.
Input file:
HTML HEAD
TITLE My web page /TITLE
/HEAD BODY
P There are pics of my cat here,
as well as my B cool /B blog,
with pics from my Vegas trip.
/BODY /HTML
Output to console:
<HTML> <HEAD>
<TITLE> My web page </TITLE>
</HEAD> <BODY>
<P> There are pics of my cat here,
as well as my <B> cool </B> blog,
with pics from my Vegas trip.
</BODY> </HTML>
Scanner input = new Scanner(new File("page.html"));
while (input.hasNextLine()) {
String line = input.nextLine();
Scanner lineScan = new Scanner(line);
while (lineScan.hasNext()) {
String token = lineScan.next();
if (token.equals(token.toUpperCase())) {
// an HTML tag
System.out.print("<" + token + "> ");
} else {
System.out.print(token + " " );
}
}
System.out.println();
12
}
import java.io.*; // for File
This just shows the entire
import java.util.*; // for Scanner
HTMLTokenizer program.
public class HTMLTokenizer {
public static void main(String[] args)throws FileNotFoundException {
Scanner input = new Scanner(new File("page.html"));
while (input.hasNextLine()) {
String line = input.nextLine();
Scanner lineScan = new Scanner(line);
while (lineScan.hasNext()) {
String token = lineScan.next();
if (token.equals(token.toUpperCase())) { // an HTML tag
System.out.print("<" + token + "> ");
} else {
System.out.print(token + " " );
}
}
System.out.println();
}
Contents of page.html:
Output:
}
HTML HEAD
<HTML> <HEAD>
}
TITLE My web page /TITLE
<TITLE> My web page </TITLE>
/HEAD BODY
P There are pics of my cat here,
as well as my B cool /B blog,
with pics from my Vegas trip.
/BODY /HTML
</HEAD> <BODY>
<P> There are pics of my cat here,
as well as my <B> cool </B> blog,
with pics from my Vegas trip.
</BODY> </HTML>
Complex input question
• Write a program that searches for and finds the hours worked and average hours
per day for a particular person represented in the following file. If the name is not
found, it displays a statement to that effect.
• It can only search for one person on each run of the program.
– Input file contents:
123 Susan 12.5 8.1 7.6 3.2
456 Brad 4.0 11.6 6.5 2.7 12
789 Jennifer 8.0 8.0 8.0 8.0 7.5 7.0
– Example log of execution:
Enter a name: Brad
Brad (ID#456) worked 36.8 hours (7.4 hours/day)
– Example log of execution:
Enter a name: Harvey
Harvey was not found
This program only does one search in one run.
14
Complex input answer 1
// This program searches an input file of employees' hours worked
// for a particular employee and outputs that employee's hours data.
// It reads the entire file to find the name.
import java.io.*;
import java.util.*;
// for File
// for Scanner
public class HoursWorked {
public static void main(String[] args)
throws FileNotFoundException {
Scanner console = new Scanner(System.in);
System.out.print("Enter a name: ");
String searchName = console.nextLine();
// e.g. "BRAD"
boolean found = false;
// a boolean flag
Scanner input = new Scanner(new File("hours.txt"));
while (input.hasNextLine()) {
String line = input.nextLine();
Scanner lineScan = new Scanner(line);
int id = lineScan.nextInt();
// e.g. 456
String name = lineScan.next();
// e.g. "Brad"
if (name.equalsIgnoreCase(searchName)) {
processLine(lineScan, name, id);
found = true;
// we found that employee!
}
}
if (!found) {
// found will be true if we ever found the person
System.out.println(searchName + " was not found");
}
}
more ...
15
This program keeps getting lines from the file even after finding a match.
Complex input answer 1 (continued)
// totals the hours worked by one person and outputs their info
public static void processLine(Scanner lineScan, String name, int id) {
double sum = 0.0;
int count = 0;
while (lineScan.hasNextDouble()) {
sum += lineScan.nextDouble();
count++;
}
double average = sum / count;
System.out.printf("%-5s (ID# %d) worked %5.1f hours (%.1f hours/day)\n",
name, id, sum, average);
}
}
Example log of execution:
Enter a name: Brad
Brad (ID#456) worked 36.8 hours (7.4 hours/day)
Example log of execution:
Enter a name: Harvey
Harvey was not found
16
Complex input answer 2: more complex while loop
// This program searches an input file of employees' hours worked
// for a particular employee and outputs that employee's hours data.
import java.io.*;
import java.util.*;
// for File
// for Scanner
public class HoursWorked {
public static void main(String[] args)
throws FileNotFoundException {
Scanner console = new Scanner(System.in);
System.out.print("Enter a name: ");
String searchName = console.nextLine();
// e.g. "BRAD"
boolean found = false;
// a boolean flag
Scanner input = new Scanner(new File("hours.txt"));
while (input.hasNextLine() && !found) { // now stops when name is found.
String line = input.nextLine();
Scanner lineScan = new Scanner(line);
int id = lineScan.nextInt();
// e.g. 456
String name = lineScan.next();
// e.g. "Brad"
if (name.equalsIgnoreCase(searchName)) {
processLine(lineScan, name, id);
// Same method as in previous
found = true;
// we found that employee!
}
}
if (!found) {
// found will be true if we ever found the person
System.out.println(searchName + " was not found");
}
}
17
This is much more efficient than the
previous version if the file is large.
// This program searches an input file of employees' hours worked
// for a particular employee and outputs that employee's hours data.
import java.io.*; // for File
import java.util.*; // for Scanner
Complex input answer 3:
use break statement
public class HoursWorkedBreak {
public static void main(String[] args)
throws FileNotFoundException {
Scanner console = new Scanner(System.in);
System.out.print("Enter a name: ");
String searchName = console.nextLine(); // e.g. "BRAD"
boolean found = false;
// a boolean flag
Scanner input = new Scanner(new File("hours2.txt"));
while (input.hasNextLine()) {
String line = input.nextLine();
Scanner lineScan = new Scanner(line);
int id = lineScan.nextInt();
// e.g. 456
String name = lineScan.next();
// e.g. "Brad"
if (name.equalsIgnoreCase(searchName)) {
processLine(lineScan, name, id);
found = true;
break; // we found them!
}
}
if (!found) { // found will be true if we ever found the person
System.out.println(searchName + " was not found");
} } }
18