Transcript Slide Set 7

operating
systems
Kernel File Interface
(or programming I/O in Unix)
Input and Output
When programming in C on Unix, there are two
very different I/O libraries you can use:
The C language libraries:
o Buffered
o Part of the C language
o The basic unit is a FILE*
The Kernel I/O calls
o Unbuffered
o System calls – not part of C
o The basic unit is a File Descriptor
operating
systems
Application Program
Low level I/O
- file descriptors
High level I/O - Streams
Standard I/O Library
Low level I/O - file descriptors
Kernel
operating
systems
Standard C I/O
operating
systems
As in C++, the fundamental notion used
in doing I/O is the stream (but it is not an
object as it is in C++ ... it is a data structure)
When a file is created or opened in C, the
system associates a stream with the file.
When a stream is opened, the fopen( ) call
returns a pointer to a FILE data structure.
The FILE data structure contains all of the
information necessary for the I/O library to
manage the stream:
* a file descriptor
* a pointer to the I/O buffer
* error flags
* etc
The original C I/O library was written around 1975 by
Dennis Ritchie. Little has changed since then.
operating
systems
Standard Streams
Three streams are predefined and available to a
Process. These standard streams are referenced
Through the pre-defined FILE pointers stdin, stdout,
and stderr. These pointers are defined in <stdio.h>
operating
systems
Buffering I/O
One of the keys of the C I/O library is that I/O is
normally buffered to minimize context switches.
Fully Buffered: I/O takes place when a buffer is full.
Disk files are normally fully buffered. The buffer is
allocated by the I/O library itself by doing a malloc.
Line Buffered: I/O takes place when a new line
character is encountered. Line buffering is used for
terminal I/O. Note that I/O may take place before a
new line character is encountered because of the
size of the buffer.
operating
systems
Unbuffered:
No buffering is done. Data is output immediately.
operating
systems
Most Unix systems default to the following:
Standard Error is always un-buffered.
All streams referring to a terminal device are
line buffered (stdin and stdout).
All other streams are fully buffered.
operating
systems
Flushing a Stream
You can force a stream to be flushed,
(all unwritten bytes are passed to the kernel)
#include <stdio.h>
int fflush (FILE *fp);
I’ve not seen an issue in Windows, but in Unix, you
may not see output when you expect to if you don’t
flush the buffers.
operating
systems
fopen
#include <stdio.h>
FILE *fopen (const char *filename, const char *mode);
pointer to the FILE structure
holding the internal state
information about the
connection to the associated
file. Returns a NULL pointer if
open fails.
full path to the file to
be opened
when opened for reading and writing
* input cannot immediately follow output
without an intervening fflush, fseek, fsetpos, or rewind.
* output cannot immediately follow input
without an intervening fseek, fsetpos, or rewind.
Mode bits
“r”
“rb”
“w”
“wb”
“a”
“ab”
“r+”
“rb+”
“w+”
“wb+”
“a+”
“ab+”
open
open
open
open
open
open
open
open
open
open
open
open
text file for reading
binary file for reading
text file for writing - truncate
binary file for writing - truncate
text file for writing-append
binary file for writing-append
text file to read & write (file must exist)
binary file to read & write - ditto
text file to read & write – truncate
binary file to read & write – truncate
text file to read & write – append
binary file to read & write - append
Opening a Stream
operating
systems
Restriction
r
file must already exist
previous contents are discarded
*
stream can be read
stream can be written
stream can only be written at end
*
w
a
*
*
*
r+ w+ a+
*
*
*
*
You cannot set permission when a file is opened with w or a
*
*
*
*
*
*
Example of using fopen
FILE *in;
if ((in = fopen(“file1.txt”, “r”)) == NULL)
perror(“could not open file1.txt”);
operating
systems
Related Calls
FILE *freopen (const char *pathname,
const char *mode, FILE *fp);
Opens a specified file on a specified stream. Closes the file first, if it is
already open. Most typically used with stdin, stdout, and stderr to open
a file as one of these streams.
FILE *fdopen (int filedes, const char *mode);
takes a file descriptor as a parameter. Used with
pipes and network connections, because these use
file descriptors. Associates an I/O stream with the
descriptor.
fclose
operating
systems
#include <stdio.h>
int fclose (FILE *stream);
returns a zero if the close is successful
Otherwise it returns -1
All files are closed when the program terminates
normally, but this allows no opportunity to do error
recovery if termination is not normal. Therefore, it
is recommended that all files be closed explicitly.
operating
systems
Binary I/O
Binary I/O is commonly used to read or write arrays
or to read and write structures, because both deal with
fixed size blocks of information.
Note: Binary files are not necessarily
interchangeable across systems!
* compilers change how data is packed
* binary formats are different on different cpu
architectures.
operating
systems
Unformatted I/O
There are three types of unformatted I/O:
* Character at a time
* Line at a time
* Direct I/O (fread and fwrite for binary data)
operating
systems
Stream Positioning
for binary files and text files on GNU systems
returns the current byte offset or -1L
#include <stdio.h>
long ftell (FILE *fp);
int fseek (FILE *fp, long offset, int whence);
void rewind (FILE *fp);
returns 0 if successful
nonzero on error
SEEK_SET – from beginning of file
SEEK_CUR – from the current position
SEEK_END – from the end of the file
operating
systems
For portability across POSIX systems use:
int fgetpos (FILE *fp, fpos_t *pos);
int fsetpos (FILE *fp, const fpos_t *pos);
returns 0 if successful
the position is passed in this parameter,
a new data type defined by the POSIX standard.
The position value in an fsetpos must have been
obtained in a previous fgetpos call.
fread
fread is used to read binary data and
text in fixed sized blocks
operating
systems
address of where first byte is to be stored
#include <stdio.h>
size_t fread (void *ptr, size_t size, size_t nblocks,
FILE *stream);
The number of blocks to read
The size of each
block or record
The number of items read.
It could be less than
nblocks if there is an error
or eof is reached.
The stream to read from
operating
systems
Interpreting Binary Data
If the data that you are reading has
some record structure …
struct record_fmt
...
data_buf;
fread(&data_buf, sizeof(char), sizeof(data_buf), file_handle);
from the file
operating
systems
struct record_fmt
{
int a;
float b;
char id[8];
char pw[8];
};
databuf
01000101000110100011101000111001
011110100111001010001101000111101
001101110100001101110101000011110
011101000101011010100011101010011
cout << data_buf.id;
fwrite
operating
systems
#include <stdio.h>
address of the first
byte to write
size_t fwrite (void *ptr, size_t size, size_t nblocks,
FILE *stream);
The number of blocks
to write
The number of blocks The stream to write to
written. If not the
The size of each block or record
same as nblocks, some
error has occurred.
operating
systems
Character at a time Input
#include <stdio.h>
The return value is an unsigned char
that has been converted to an int.
int fgetc (FILE *stream);
The constant EOF (usually -1) is
returned if there is an error or if the
end of the file is encountered.
fgetc gets the next character in the stream as an unsigned char and
returns it as an int. If an eof or an error is encountered, EOF is returned
instead. This call is guaranteed to be written as a function.
operating
systems
Character at a time Input
int getc (FILE *stream);
highly optimized –best function for reading a
single character. Usually implemented as a
macro.
int getchar ( void );
Equivalent to getc(stdin)
In most implementations, each stream maintains
operating
systems
* an error flag
* an end-of-file flag
To distinguish between EOF and an error call one
of the following functions:
#include <stdio.h>
int ferror (FILE *fp);
returns nonzero (true) if error flag
is set, otherwise returns 0
int feof (FILE *fp);
returns nonzero (true) if eof flag
is set, otherwise returns 0
Clear the flags by calling
void clearerr (FILE *fp);
operating
systems
After reading a character from a stream, it can
be pushed back into the stream.
#include <stdio.h>
int ungetc (int c, FILE *fp);
the character to push back. Note
that it is not required that you push
back the same character that you read.
You cannot pushback EOF.
Implementations are not required to support more than
a single character of pushback, so don’t count on it.
operating
systems
Character Output
int fputc (int c, FILE *stream);
fputc converts c to an unsigned char
and writes it to the stream. EOF is
returned if there is an error.
int putc (int c, FILE *stream);
optimized for single character input
int putchar( int c );
assumes stdout is the output stream
operating
systems
Line at a Time Input
#include <stdio.h>
returns buf if successful
and NULL on end of file or
error.
char *fgets (char *buf, int n, FILE *fp);
reads up through and including the next newline character,
but no more than n-1 characters. The buffer is terminated with
a null byte. If the line is longer than n-1, a partial line is returned.
The buffer is still null terminated. If the input contains a null, you can’t tell.
char *gets (char *fp);
Warning
gets has been deprecated because it does
not allow the size of the buffer to be specified.
This allows buffer overflow!
operating
systems
String Output
#include <stdio.h>
int fputs (const char *str, FILE *fp);
writes a null-terminated string to the stream.
It does not write the null terminating character.
It does not write a newline character. Returns EOF
if the function fails.
int puts (const char *str);
writes the null terminated string to standard-out, replacing the
zero terminating character with a new-line character.
If successful, the function returns a non-negative value.
If the function fails, it returns EOF.
I/O Efficiency
operating
systems
Char at a time
#include <stdio.h>
int main (void)
{
int c;
while ( (c =getc(stdin)) != EOF)
if (putc(c, stdout) == EOF)
perror("Error writing output");
}
if(ferror(stdin))
perror("Error reading input");
exit(0);
EOF is ctrl-D
Line at a time
operating
systems
#include <stdio.h>
#define MAXLINE 4096
int main (void)
{
char buf[MAXLINE];
while (fgets(buf, MAXLINE, stdin) != NULL)
if (fputs(buf, stdout) == EOF)
perror("Output Error");
if (ferror(stdin))
perror("Input Error");
}
exit(0);
operating
systems
for copying a file of 1.5M bytes in 30,000 lines
loop is executed 30,000 times
loop is executed 1.5M times
Function
user CPU
fgets, fputs
getc, putc
fgetc, fputc
2.2 seconds
4.3 seconds
4.6 seconds
Formatted Output
operating
systems
int printf (const char *format-spec, print-data … );writes to stdout
int fprintf (FILE *fp, const char *format-spec, print data);
int sprintf(char *s, const char *format-spec, print-data…);
a format-specification has the following format: writes to buffer and
%[flags] [width] [.precision] type
% -this is
format-spec
appends a null byte
at the end.
digits after decimal point. This
can truncate data
Minimum field width.
If width is prefixed with 0,
add zeros until minimum width
is reached.
- left align, default is to right align
+ prefix value with a sign
0 pad output with zeros
prefix positive values with a blank
d signed decimal integer
i signed decimal integer
u unsigned decimal integer
o unsigned octal integer
x unsigned hex integer
f double in fixed point notation
e double in exponent notation
c single character, an int
s a string
operating
systems
Example Format Specification
“%-10.8f”
left justify the
output
% - introduces the format
specification
print 8 digits after the
decimal point
output field is 10 chars wide
as a minimum. Padded if fewer
characters in the output. Data is never
truncated.
Example
operating
systems
int n = 3;
double cost-per-item = 3.25;
printf(“Cost of %3d items at $%4.2f each = $%6.2f\n”,
n, cost-per-item, n*cost-per-item);
Cost of
3
first field is 3 characters wide
data is right justified
items at $
3
.
2
5
second field is 4 characters wide
with two characters after decimal point
=$
9
.
7 5
third field is 6 characters wide
with 2 characters after decimal point
right justified
operating
systems
Formatted Input
#include <stdio.h>
int scanf (const char* format-spec, data fields);
int fscanf (FILE *fp, const char *format-spec, data fields);
int sscanf (const char *buf, const char *format-spec,
data fields);
scanf reads formatted data from stdin into the data fields
given in the argument string. Each argument must be a
pointer to a variable that corresponds to a type specifier
in the format specification.
The format specification can contain:
* white space characters. A white space character causes scanf
to read in, but not store all consecutive white space characters
in the input stream, up to the next non-white space character.
* non-white space characters, except % sign. Causes scanf to
read but not store a matching non-white space character. If the
character does not match, scanf terminates.
* format specification, introduced by %. Causes scanf to read in
and convert characters in the input into values of the specified type.
The resulting value is assigned to the next data field in the arg list.
operating
systems
Temporary Files
#include <stdio.h>
FILE *tmpfile (void);
creates a temporary file (type wb+) that is
automatically deleted when the file is closed
or the program terminates.
operating
systems
Sample Program
Write a simple version of the cat command.
It takes an optional parameter, a file name.
It copies the file to stdout.
- if no file name is given, it copies stdin to stdout
operating
systems
Preliminaries
#include <stdio.h>
#include <stdlib.h>
C programmers use #define to
define constants. It works like a
macro … the value 256 gets
inserted wherever the name
LINELEN appears in the code.
There is no type checking!
header files for I/O
required to define NULL
#define LINELEN 256
void send_to_stdout( FILE*);
function prototype
Main declaration
The number of arguments
on the command line
Array contains the
command line
arguments
int main (int argc, char* argv[ ])
{
...
}
Body of main
int main (int argc, char* argv[ ])
{
Declare a FILE* to hold the file handle
FILE *fp;
if (argc == 1)
send_to_stdout ( stdin);
If there is just one command line
argument it is the command. Copy
from stdin.
int main (int argc, char* argv[ ])
{
FILE *fp;
if (argc == 1)
send_to_stdout ( stdin); If there are two command
line arguments, the second
else if (argc == 2)
one is the file name.
{
if ( (fp = fopen(*++argv, “r”) ) != NULL)
{
send_to_stdout ( fp );
fclose ( fp );
}
int main (int argc, char* argv[ ])
{
FILE *fp;
if (argc == 1)
send_to_stdout ( stdin);
else if (argc == 2)
{
if (fp = fopen(*++argv, “r”) ) != NULL)
{
send_to_stdout ( fp );
fclose ( fp );
}
else
handle file {
perror(“could not open the file.”);
won’t open
error
exit(1);
}
else
{
perror(“could not open the file.”);
exit(1);
Handle the case where
}
}
there are too many
}
arguments on the
else
command line.
{
perror(“Invalid command – too many arguments”);
exit(1);
}
return 0;
send_to_stdout function
void send_to_stdout(FILE *fp)
{
char line[LINELEN];
}
while ( fgets (line, LINELEN, fp) )
{
if (fputs ( line, stdout ) == EOF )
{
perror(“Write to stdout failed”);
exit(1);
}
}