Reading and Writing Text Files in Java

Download Report

Transcript Reading and Writing Text Files in Java

Reading and Writing
Text Files in Java
John Lamertina
(Dietel Java 5.0 Chp 14, 19, 29)
April 2007
Content



Reading and Writing Data Files (chp 14)
String Tokenizer to Parse Data (chp 29)
Comma Separated Value (CSV) Files – an
exercise which applies:
 Multi-dimensional
arrays (chp 7)
 Exception Handling (chp 13)
 Files (chp 14)
 ArrayList Collection (chp 19)
 Tokenizer (chp 29)
Data Hierarchy





Field – a group of characters or bytes that
conveys meaning
Record – a group of related fields
File – a group of related records
Record key – identifies a record as belonging to
a particular person or entity – used for easy
retrieval of specific records
Sequential file – file in which records are stored
in order by the record-key field
Reading & Writing Files
3
Java Streams and Files
Each file is a sequential stream of bytes
 Operating system provides mechanism to
determine end of file

 End-of-file
marker
 Count of total bytes in file

Java program processing a stream of
bytes receives an indication from the
operating system when program reaches
end of stream
Reading & Writing Files
4
File - Object - Stream


Java opens file by creating an object and associating a
stream with it
Standard streams – each stream can be redirected
System.in – standard input stream object, can be
redirected with method setIn
 System.out – standard output stream object, can be
redirected with method setOut
 System.err – standard error stream object, can be
redirected with method setErr

Reading & Writing Files
5
Classes related to Files

java.io classes
 FileInputStream and FileOutputStream – byte-based I/O
 FileReader and FileWriter – character-based I/O
 ObjectInputStream and ObjectOutputStream – used for
input and output of objects or variables of primitive data
types
 File – useful for obtaining information about files and
directories

Classes Scanner and Formatter
– can be used to easily read data from a file
 Formatter – can be used to easily write data to a file
 Scanner
Reading & Writing Files
6
File Class

Common File methods
– return true if file exists where it is
specified
 isFile – returns true if File is a file, not a
directory
 isDirectory – returns true if File is a
directory
 getPath – return file path as a string
 list – retrieve contents of a directory
 exists
Reading & Writing Files
7
Write with Formatter Class

Formatter class can be used to open a text file
for writing
 Pass
name of file to constructor
 If file does not exist, will be created
 If file already exists, contents are truncated
(discarded)
 Use method format to write formatted text to file
 Use method close to close the Formatter
object (if method not called, OS normally closes
file when program exits)
 Example: see figure 14.7 (p 686-7)
Reading & Writing Files
8
Possible Exceptions
– occurs when opening file
using Formatter object, if user does not have
permission to write data to file
 FileNotFoundException – occurs when opening
file using Formatter object, if file cannot be found
and new file cannot be created
 NoSuchElementException – occurs when invalid
input is read in by a Scanner object
 FormatterClosedException – occurs when an
attempt is made to write to a file using an already
closed Formatter object
 SecurityException
Reading & Writing Files
9
Read with Scanner Class

Scanner object can be used to read data
sequentially from a text file
File object representing file to be read to
Scanner constructor
 FileNotFoundException occurs if file cannot be
found
 Data read from file using same methods as for
keyboard input – nextInt, nextDouble, next, etc.
 IllegalStateException occurs if attempt is
made to read from closed Scanner object
 Example: see Figure 14.11 (p 690-1)
 Pass
Reading & Writing Files
10
Tokens: Fields of a Record


Tokenization breaks a statement, sentence, or
line of data into individual pieces
Tokens are the individual pieces
 Words
from a sentence
 Keywords, identifiers, operators from a Java
statement
 Individual data items or fields of a record (that were
separated by white space, tab, new line, comma, or
other delimiter)
String Tokenizer
11
String Classes
Class java.lang.String
 Class java.lang.StringBuffer
 Class java.util.StringTokenizer

String Tokenizer
12
StringTokenizer


Breaks a string into component tokens
Default delimiters: “ \t \n \r \f”


space, tab, new line, return, or form feed
Specify other delimiter(s) at construction or in
method nextToken:
String delimiter = “ , \n”;
StringTokenizer tokens = new StringTokenizer(sentence, delimiter); -or String newDelimiterString = “|,”;
tokens.nextToken(newDelimiterString);

String Tokenizer
13
Example 29.18
import java.util.Scanner;
import java.util.StringTokenizer;
public class TokenTest {
public static void main (String[] args) {
Scanner scan = new Scanner(System.in);
System.out.println("Enter a sentence to tokenize and press Enter:");
String sentence = scan.nextLine();
// default delimiter is " \t\n\r\f"
String delimiter = " ,\n";
StringTokenizer tokens = new StringTokenizer(sentence, delimiter);
System.out.printf("Number of elements: %d\n", tokens.countTokens());
System.out.println("The tokens are:");
while (tokens.hasMoreTokens())
System.out.println(tokens.nextToken());
}
}
(Refer to p 1378)
String Tokenizer
14
Comma Separated Value (CSV)
Data Files
Fields are separated by commas
 For data exchange between disparate
systems
 Pseudo standard used by Microsoft Excel
and other systems

Comma Separated Values
15
CSV File Format Rules
1.
2.
3.
4.
5.
6.
Each record is one line
Fields are separated by comma delimiters
Leading and trailing white space in a field is ignored
unless the field is enclosed in double quotes
First record in a CSV may be a header of field names.
A CSV application needs some boolean indication of
whether first record is a header.
Empty fields are indicated by consecutive comma
delimiters. Thus every record should have the same
number of delimiters
Fields with embedded commas must be enclosed in
double quotes
For more information:
http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm
Comma Separated Values
16
CSV Format vs StringTokenizer

StringTokenizer with a comma delimiter will read
most CSV files, but does not account for empty
fields or a quoted field with embedded commas:
 Empty
fields in a CSV file are indicated by
consecutive commas. Example:
 123, John ,, Doe (Middle Name field is blank)
 Fields with embedded commas are enclosed in
quotes. Example:
 456 , “King , the Gorilla” , Kong
Comma Separated Values
17
Exercise Part 1

Develop and test classes to read and write CSV
data files, satisfying the first four “CSV File
Format Rules” (listed on a previous slide). Your
completed classes must:
 Handle the usual possible file exceptions
 Read CSV-formatted data from one or more
files into
a single array
 Print the data array
 Write data from the array to a single file in CSV format

Test your CSV reader to read and print sample
files:
 TestFile1.csv
 TestFile2.csv
Comma Separated Values
18
Multi-dimensional Arrays
Java implements multi-dimensional arrays
as arrays of 1-dimensional arrays.
 Rows can actually have different numbers of
columns. Example:

int b[][];
b = new int[ 2 ][ ];
// create 2 rows
b[ 0 ] = new int[ 5 ]; // create 5 columns for row 0
b[ 1 ] = new int[ 3 ]; // create 3 columns for row 1
(Refer to p 311-315)
Comma Separated Values
19
Array Dimension: Length

Recall that for a one-dimensional array:
int a[ ] = new int[ 10 ];
int size = a.length;

For a two-dimensional array:
int b[][] = new int[ 10 ][ 20 ];
int size1 = b.length;
// number of rows
int size2 = b[ i ].length; // number of cols for i-th row
Comma Separated Values
20
TestFile1.cvs
987,
413,
123,
990,
Thomas ,Jefferson,7 Estate Ave.,Loretto, PA, 15940
Martha,Washington,1600 Penna Ave,Washington, DC,20002
Martin , Martina ,777 Williams Ct.,Smallville, PA,15990
Shelby, Roosevelt,15 Jackson Pl,NYC,NY, 12345
TestFile2.cvs
ID, FName, LName, StreetAddress, City, State, Zip
123, John ,Dozer,120 Main st.,Loretto, PA, 15940
107, Jane,Washington,220 Hobokin Ave.,Philadelphia, PA,0911
123, William , Adams ,120 Jefferson St.,Johnstown, PA,15904
451, Brenda, Bronson,127 Terrace Road,Barrows,AK, 99789
729, Brainfield,Blanktowm, PA, 16600
Comma Separated Values
21
Exercise Part 2


Develop an application that uses your CSV reader and
writer classes
Read the test files (or create your own test files) and
perform data validity checks by displaying an appropriate
error message and the offending record(s):





If any fields are missing
If extra fields are found
If any records have duplicate IDs
If any record has an invalid zip code (i.e. not exactly 5 digits)
Write all records to a single CSV file (i.e. concatenate
the multiple test files in a single file)
Comma Separated Values
22
Exercise Part 3 (extra credit)
Extend your classes to be fully compliant
with the “CSV File Format Rules”.
 Hint: Review some existing CSV Java
libraries online.

Comma Separated Values
23
Hints 1.a
CSVFile
-
boolean hasHeaderRow;
String fileName;
Scanner input;
List<String> records;
String data[][];
int numRecords;
int maxNumFields;
+
+
+
+
+
+
+
+
+
+
+
CSVFile(String fileName)
CSVFile(boolean headerRow, String fileName)
boolean getHasHeaderRow()
String getFileName()
int getNumRecords()
int getMaxNumFields()
void getData(String a[][])
void openFile()
void readRecords()
void parseFields()
void printData()
Comma Separated Values
24
Hints 1.b
import
import
import
import
import
import
import
import
java.io.File;
java.util.Scanner;
java.io.FileNotFoundException;
java.lang.IllegalStateException;
java.util.NoSuchElementException;
java.util.List;
java.util.ArrayList;
java.util.StringTokenizer;
Comma Separated Values
25
Hints 1.c
public void openFile() {
try {
input = new Scanner(new File(fileName));
}
catch (FileNotFoundException fileNotFound) {
...
public void readRecords() {
// Read all lines (records) from the file into an ArrayList
records = new ArrayList<String>();
try {
while (input.hasNext())
records.add( input.nextLine() );
...
Comma Separated Values
26
Hints 1.d
public void parseFields() {
String delimiter = ",\n";
// Create two-dimensional array to hold data (see Deitel, p 313-315)
int rows = records.size();
// #rows for array = #lines in file
data = new String[rows][];
// create the rows for the array
int row = 0;
for (String record : records) {
StringTokenizer tokens = new StringTokenizer(record,delimiter);
int cols = tokens.countTokens();
data[row] = new String[cols]; // create columns for current row
int col = 0;
while (tokens.hasMoreTokens()) {
data[row][col] = tokens.nextToken();
col++;
}
…
Comma Separated Values
27
Hints 1.e
public static void main (String[] args) {
CSVFile file1 = new CSVFile(true,"TestFile1.csv");
file1.openFile();
file1.readRecords();
file1.parseFields();
file1.printData();
String fileData[][] =
new String[file1.getNumRecords()][file1.getMaxNumFields()];
file1.getData(fileData);
…
Comma Separated Values
28
CSV Libraries


http://ostermiller.org/utils/CSV.html
http://opencsv.sourceforge.net/