Transcript FileSystem

Other filesystem system calls
• pipe
• dup
• mount
• umount
• link
• unlink
• system
• popen
Operating System Design
File system 3.1
Laface 2007
Unnamed and named pipes
•
•
•
Pipes and FIFOs (also known as named pipes) provide a
unidirectional interprocess communication channel.
The difference between them is the manner in which they are
created and opened.
I/O on pipes and FIFOs has exactly the same well known
bounded buffer producer-consumer semantics.
Write pointer
Read pointer
0
Operating System Design
1
2
3
4
5
File system 3.2
6
7
8
9
Laface 2007
Unnamed pipe
process A
Cannot share
pipe
calls pipe()
process B
process D
process C
process E
Share pipe
Operating System Design
File system 3.3
Laface 2007
Unnamed pipe
pipe (fdptr);
• fdptr
is a pointer to an int array that will be filled with two file
descriptors that can be used to read e write the unnamed pipe.
•
•
•
•
•
•
An inode is assigned to a new pipe
Two entries in the user file descriptor table and in the file table, are
allocated
The inode reference count indicates how many times the pipe has been
opened (both for reading and writing) (2).
The kernel stores in each entry of the file table the read and write
reference counts
The inode also includes the offset values for the next read and write
(cannot be modified by means of lseek)
Storing the offsets in the inode rather than in the file table allows more
than one process to share the pipe both in reading and writing (each
process modifies the same offset)
Operating System Design
File system 3.4
Laface 2007
read and write
char string[]="hello";
main(){
char buf[1024];
char *cp1, *cp2;
int fds[2];
cp1=string;
cp2=buf;
while (*cp1)
*cp2++=*cp1++;
pipe(fds);
for(;;) {
write(fds[1], buf, 6);
read (fds[0], buf, 6);
}
}
Operating System Design
File system 3.5
Laface 2007
Pipe open
• A process opening a pipe for reading will be suspended until
another process open the pipe for writing (and viceversa)
• It is possible to open a FIFO using the flags
• O-NONBLOCK or O-NDELAY
• O_ASYNC
− Setting the O_ASYNC flag for the read end of a pipe
causes a signal (SIGIO by default) to be generated when
new input becomes available on the pipe
• Non-blocking I/O is also possible by using the fcntl F_SETFL
operation to enable the O_NONBLOCK open file status flag.
Operating System Design
File system 3.6
Laface 2007
Named pipe
Example of open that blocks the issuing process until the
other process open the other end of a named pipe
npipe-r.c
npipe-w.c
Operating System Design
File system 3.7
Laface 2007
Pipe write
• Writes of less than PIPE_BUF bytes (4KB on Linux) are
atomic
n <= PIPE_BUF
• O_NONBLOCK disabled
– All n bytes are written atomically;
– write may block if there is not room for n bytes to be
written immediately
• O_NONBLOCK enabled
– If there is room to write n bytes to the pipe, then
write succeeds immediately, writing all n bytes;
– otherwise it fails, with errno set to EAGAIN.
Operating System Design
File system 3.8
Laface 2007
Pipe write
•
Writes of more than PIPE_BUF bytes may be non-atomic
n > PIPE_BUF
• O_NONBLOCK disabled
– the write is non-atomic: the data given to write may be
interleaved with writes by other process;
– the write blocks until n bytes have been written.
• O_NONBLOCK enabled
– If the pipe is full, then write fails, with errno set to EAGAIN.
– Otherwise, a "partial write" of up to n bytes may occur, and
these bytes may be interleaved with writes by other processes.
Operating System Design
File system 3.9
Laface 2007
Pipe close
•
•
•
•
If all file descriptors referring to the write end of a pipe have
been closed, then an attempt to read from the pipe return 0.
If all file descriptors referring to the read end of a pipe have been
closed, then a write will cause a SIGPIPE signal to be generated
for the calling process.
If the calling process is ignoring this signal, then write fails with
the error EPIPE.
An application that uses pipe and fork should close
unnecessary duplicate file descriptors to ensures that end-of-file
and SIGPIPE/EPIPE are delivered when appropriate.
Operating System Design
File system 3.10
Laface 2007
Other examples of pipe use
•
•
Client-server with named pipe
– pipe1.c
– client_fifo.c
– server-fifo.c
Use of pipe, dup, exec, getenv
– pipe2.c
Operating System Design
File system 3.11
Laface 2007
Other examples of pipe use
Client- server using an unnamed pipe (pipe1.c server1.c client1.c)
PARENT
1
1
PIPE1
PIPE2
0
0
CHILD
Operating System Design
File system 3.12
Laface 2007
dup – dup2
newfd = dup (fd);
•
Duplicates the fd pointer in the first free entry of the user file
description table
newfd = dup2 (fd1, fd2);
•
Duplicates the fd1 pointer in the fd2 entry of the user file
description table
Operating System Design
File system 3.13
Laface 2007
Comparison between open and dup
User file
descriptor table
0
1
2
3
4
5
6
File table
C=2
Inode table
C=3 (/etc/passwd)
C=1
C=1
Operating System Design
File system 3.14
C=1 (local)
Laface 2007
Other examples of pipe use
•
Use of pipe, dup, exec, getenv
– pipe2.c
Operating System Design
File system 3.15
Laface 2007
dup example
#include <fcntl.h>
main ()
{
int i,j;
char buf1[512], buf2[512];
i = open("/etc/passwd", O_RDONLY);
j = dup(i);
read(i,buf1, sizeof(buf1));
read(j,buf2, sizeof(buf2));
close(i);
read(j, buf2, sizeof(buf2));
}
Operating System Design
File system 3.16
Laface 2007
Output redirection
fd = open(”file_output”, O_CREAT|O_WRONLY);
close(1);
dup(fd);
close(fd);
write(1,buf, sizeof(buf));
0
0
1
1
2
2
3
3
4
4
Operating System Design
File system 3.17
Null
Laface 2007
mount
mount (pathname, dir pathname, options);
• dev
pathname is
– the name of the device special file corresponding to the disk
partition formatted with a file system
– a directory name
• dir
pathname is the directory (mount point), in the current
directory tree, where the filesystem will be mounted.
•
options indicates the mode of mounting (ex. Read-Only)
Operating System Design
File system 3.18
Laface 2007
mount
/
file system
root
bin
etc
usr
file system
/dev/dsk1
/
cc
date
sh
getty
passwd
bin
awk
Operating System Design
banner
File system 3.19
yacc
include
src
stdio.h
uts
Laface 2007
Mount table
Inode Table
Mount Table
Buffer
Mounted on Inode
marked as mount point
Reference count = 1
Superblock
Mounted on Inode
Root Inode
Root inode of
mounted File System
Reference count = 1
Operating System Design
File system 3.20
Laface 2007
mount
procedura
mount
input: nome file di un file speciale a blocchi
nome directory punto di mount
opzioni (sola lettura)
output: nessuno
{
if (non superuser)
return(errore);
prende inode file speciale a blocchi (namei);
effettua controlli legalità;
prende inode per nome directory "mounted on" (namei);
if (non directory o contatore di riferimenti > 1){
rilascia inode (procedura iput);
return(errore);
}
Operating System Design
File system 3.21
Laface 2007
mount
find a free entry in mount table;
open the block device;
getblk;
read superblock;
initialize superblock;
iget root inode of the new filesystem
store it in mount table;
mark the inode of the directory as a mount point;
relese the inode of the special file (iput);
free the inode in memory of the mount point;
}
Operating System Design
File system 3.22
Laface 2007
umount
umount ( special file name);
•
Before unmounting a filesystem, the kernel controls that no file is
still in use (open) searching in the inode table the files having a
device field equal to the device of the filesystem we try to
umount.
Operating System Design
File system 3.23
Laface 2007
Virtual File System
System
File
System
operation
Generic
Unix
Inode
Inode
open
close
Unix
read
write
…….
Remote
Inode
PCFS
ropen
Remote
rclose
rread
rwrite
Operating System Design
File system 3.24
Laface 2007
link (source name, target name);
link("/usr/src/uts/sys", "/usr/include/sys");
link("/usr/include/realfile.h", /usr/src/uts/sys/testfile.h");
/
usr
src
include
uts
sys
realfile.h
sys
inode.h
Operating System Design
testfile.h
File system 3.25
Laface 2007
unlink (pathname)
•
•
•
•
Deletes a name from the filesystem.
If that name was the last link to a file and no processes have the file open
the file is deleted (reference cont and link count = 0)
If the name was the last link to a file, but a process still has the file open
(reference count > 0) the file will remain in existence until the last file
descriptor referring to it is closed.
If the name referred to a symbolic link the link is removed.
If the name referred to a socket, FIFO or device the name for it is
removed but processes which have the object open may continue to use
it.
Operating System Design
File system 3.26
Laface 2007
unlink
unlink (pathname);
•
•
•
•
The kernel releases in this order the file blocks:
– Direct blocks
– Direct blocks pointed by indirect blocks
– Indirect blocks
Set to 0 the entries in the inode
Set to 0 the file size
Update the disk copy of the inode
Operating System Design
File system 3.27
Laface 2007
unlink
iget inode of the file that must be removed;
update the parent directory;
set to 0 the status field of the inode of the erased file;
release the inode of the parent directory (iput);
decrements the file link count;
release the file inode (iput);
// iput tests link count if it is zero free and ifree
Operating System Design
File system 3.28
Laface 2007
unlink - close
• A process can perform unlink of a file while itself or
another process still has the file open
• Any process will be able to access the file, but since open
increments the file’s inode reference count, the kernel will
not remove the data blocks and the inode, it just decrements
the link count.
•
when a system call close is executed, the reference count
becomes 0 and close call free and ifree.
Operating System Design
File system 3.29
Laface 2007
Example with unlink – stat – fstat
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
main (int argc, char **argv) {
int fd; char buf[1024]; struct stat statbuf;
if (argc != 2) exit (-1);
if ((fd = open(argv[1], O_RDONLY)== -1) exit (-1);
unlink(argv[1])
/*unlink of the file just opened*/
if (stat(argv[1], &statbuf) == -1)
/* stat name */
printf("stat %s fails \n", argv[1]);
else
printf("stat %s succeed !!!\n",argv[1]);
if (fstat(fd, &statbuf) == -1) /* stat through fd */
printf ("fstat %s fails \n",argv[1]);
else
printf("fstat %s succseed \n", argv[1]);
while (read(fd,buf,sizeof(buf))>0)
printf ("%1024s",buf);
/*print 1KB */
}
Operating System Design
File system 3.30
Laface 2007
Advisory / Mandatory locking
• Advisory locking
(flock BDS ) (fcntl POSIX)
– read and write are protected by an access protocol
• Mandatory locking
(non POSIX)
– kernel managed locking
• Behaviour similar to the Readers & Writers problem
– shared or read lock
 excludes other writes
– exclusive or write lock
 excludes other reads and writes
Operating System Design
File system 3.31
Laface 2007
File locking
• A set of processes open a file that stores a sequence number
• The processes
– read the sequence number
– print their process identifier followed by the sequence
number
– Increment and store the sequence number
Operating System Design
File system 3.32
Laface 2007
File locking example
/* Concurrent processes updating the same file */
#include <stdio.h>
#include <sys/file.h>
#define SEQFILE “seqno”
#define MAXBUFF 100
main() {
int fd, i, n, pid, seqno;
char buff[MAXBUFF+1];
pid =getpid();
if ((fd = open(SEQFILE, O_RDWR))<0)
err_sys(“can’t open %s”, SEQFILE);
Operating System Design
File system 3.33
Laface 2007
File locking example
for (i=0; i<10; i++) {
my_lock(fd);
lseek(fd, 0L, 0);
if ((n = read(fd, buff, MAXBUFF))<=0)err_sys(“read error”);
buff[n]=‘\0’;
if ((n = sscanf(buff, “%d\n”, &seqno)) !=1)
err_sys(“sscanf error”);
printf(“pid = %d, seq# = %d\n”, pid, seqno);
seqno++;
sprintf(buff, “%03d\n”, seqno);
n = strlen(buff);
lseek(fd, 0L, 0);
if (write(fd, buff, n) !=n) err_sys(“write error”);
my_unlock(fd);
}
}
Operating System Design
File system 3.34
Laface 2007
No locking errors
my_lock(fd)
int fd;
{
return;
}
my_unlock(fd)
int fd;
{
return;
}
Operating System Design
pid
pid
pid
pid
pid
pid
pid
pid
pid
pid
pid
pid
pid
pid
pid
pid
pid
pid
pid
pid
File system 3.35
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
692,
692,
693,
692,
692,
693,
692,
692,
693,
692,
692,
693,
692,
692,
693,
693,
693,
693,
693,
693,
seq#
seq#
seq#
seq#
seq#
seq#
seq#
seq#
seq#
seq#
seq#
seq#
seq#
seq#
seq#
seq#
seq#
seq#
seq#
seq#
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
0
1
0
2
3
1
4
5
2
6
7
2
6
7
3
4
5
6
7
8
Laface 2007
BSD file locking operations
LOCK_SH
LOCK_EX
LOCK_UN
LOCK_NB
Operating System Design
read
write
unlock
no_blocking
File system 3.36
Laface 2007
BSD 4.3 solution (flock)
/* BDS 4.3 */
#include <sys/file.h>
my_lock(fd)
int fd;
{
lseek(fd, 0L, 0);
if (flock(fd, LOCK_EX) == -1)
err_sys(“can’t LOCK_EX”);
}
my_unlock(fd)
int fd;
{
if (flock(fd, LOCK_UN, 0L) == -1)
err_sys(“can’t LOCK_UN”);
}
Operating System Design
File system 3.37
pid
pid
pid
pid
pid
pid
pid
pid
pid
pid
pid
pid
pid
pid
pid
pid
pid
pid
pid
pid
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
1165,
1165,
1165,
1165,
1165,
1164,
1164,
1165,
1164,
1165,
1164,
1165,
1164,
1165,
1164,
1165,
1164,
1164,
1164,
1164,
seq#
seq#
seq#
seq#
seq#
seq#
seq#
seq#
seq#
seq#
seq#
seq#
seq#
seq#
seq#
seq#
seq#
seq#
seq#
seq#
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Laface 2007
Advisory locking
int fcntl(int fd, int cmd, struct flock *lock)
• F_GETLK,
F_SETLK and F_SETLKW are used to acquire, release, and
test for the existence of record locks
•
struct flock {
short l_type; //Type of lock: F_RDLCK,F_WRLCK, F_UNLCK
short l_whence; // SEEK_SET, SEEK_CUR, SEEK_END
off_t l_start; // Starting offset for lock
off_t l_len;
// Number of bytes to lock
pid_t l_pid; // PID of process blocking the lock (F_GETLK only)
};
•
•
Bytes past the end of the file may be locked, but not bytes before the
start of the file.
Specifying 0 for l_len has the special meaning: lock all bytes starting at
the location specified by l_whence and l_start through to the end
of file, no matter how large the file grows.
Operating System Design
File system 3.38
Laface 2007
Record locking: F_SETLK, F_SETLKW
•
F_SETLK, F_SETLKW
– Acquire a lock (when l_type is F_RDLCK or F_WRLCK) or release
a lock (when l_type is F_UNLCK) .
 If a conflicting lock is held by another process, this call
returns -1 and sets errno to EACCES or EAGAIN
 If a conflicting lock is held by another process, waits
Operating System Design
File system 3.39
Laface 2007
Record locking: F_GETLK
• F_GETLK
– On input to this call, lock describes a lock we would like to
place on the file.
 If the lock could be placed, fcntl does not actually place
it, but returns F_UNLCK in the l_type field of lock and
leaves the other fields of the structure unchanged.
 If one or more incompatible locks would prevent this lock
being placed, then fcntl returns details about one of these
locks in the l_type, l_whence, l_start, and l_len fields
of lock and sets l_pid to be the PID of the process holding
that lock.
Operating System Design
File system 3.40
Laface 2007
Record locking
•
Record locks are automatically released when the process terminates
or if it closes any file descriptor referring to a file on which locks are
held.
– A process can lose the locks when some other process decides
to open, read and close it.
•
Record locks are not inherited by a child created via fork, but are
preserved across an execve.
•
Because of the buffering performed by the stdio library, avoid the
use of record locking with stdio functions ; use read and write.
Operating System Design
File system 3.41
Laface 2007
POSIX file & record locking
#include <sys/file.h>
int lockf(int fd, int cmd, off_t len)
• lockf
Operating System Design
in Linux is just an interface to fcntl
File system 3.42
Laface 2007
Mandatory locking (Non-POSIX)
•
•
•
•
•
Mandatory locks are enforced for all processes.
If a process tries to perform an incompatible access on a file region that has an
incompatible mandatory lock, then the result depends upon whether the
O_NONBLOCK flag is enabled for its open file description.
– If the O_NONBLOCK flag is not enabled, then system call is blocked until
the lock is removed or converted to a mode that is compatible with the
access.
– If the O_NONBLOCK flag is enabled, then the system call fails with the error
EAGAIN or EWOULDBLOCK.
To make use of mandatory locks, mandatory locking must be enabled both on the
file system that contains the file to be locked, and on the file itself.
Mandatory locking is enabled on a file system using the "-o mand" option to
mount, or the MS_MANDLOCK flag for system call mount.
Mandatory locking is enabled on a file by disabling group execute permission on
the file and enabling the set-group-ID permission bit (octal 02000)
Operating System Design
File system 3.43
Laface 2007
Record locking examples
flock [-h] [-s start] [-l len] [-w|-r] filename
-h
print this help
-s start
region starting byte
-l len
region length (0 means all file)
-w
write lock
-r
read lock
-b
block when locking impossible
-f
enable BSD semantic
Operating System Design
File system 3.44
Laface 2007
Record locking examples
• flock
• flock
-r flock.c
• flock
• flock
• flock
-w flock.c
• flock
• flock
• flock
• flock
-r flock.c
-w flock.c
-w -s0 -l10 flock.c
-r -s0 -l10 flock.c
-w -s5 -l15 flock.c
-w -s11 -l15 flock.c
-r -s10 -l20 flock.c
Operating System Design
File system 3.45
Laface 2007
Blocking record locking
• flock
• flock
-r -b -s0 -l10 flock.c
-w -s0 -l10 flock.c
• Warning!! BDS and POSIX file locking structures are independent
• flock -r -b -s0 -l10 flock.c
• flock -f -w flock.c
(BDS)
Operating System Design
File system 3.46
Laface 2007
Use of link for locking
#define LOCKFILE “seqno.lock”
#include <sys/errno.h>
extern int errno;
my_lock(int fd) {
int tempfd; char tempfile[30];
sprintf(tempfile, “LCK%d”, getpid());
if ((tempfd = creat(tempfile, 0444))<0)
err_sys(“can’t creat temp file”);
close(tempfd);
while (link(tempfile, LOCKFILE)<0){
if (errno != EEXIST) err_sys(“Link error”);
sleep(1);
}
if (unlink(tempfile)<0)err_sys(“Unlink error for temp file”);
}
my_unlock( int fd) {
if (unlink(LOCKFILE)<0) err_sys(“Unlink error for LOCKFILE”);
}
Operating System Design
File system 3.47
Laface 2007
tmpfile and mktemp
FILE *tmpfile(void);
•
•
•
•
•
•
Opens a unique temporary file in binary read/write (w+b) mode.
The file will be automatically deleted when it is closed or the
program terminates.
#include <stdlib.h>
int mkstemp(char *template);
Generates a unique temporary filename from template. The last six
characters of template must be XXXXXX and these are replaced
with a string that makes the filename unique.
The file is then created with mode read/write and permissions 0600.
Template must be declared as a character array.
The file is opened with the open O_EXCL flag, this guarantees that
the process is the only user
Operating System Design
File system 3.48
Laface 2007
mktemp
#include <stdlib.h>
char template[] = "/tmp/fileXXXXXX";
int fd;
fd = mkstemp(template);
Operating System Design
File system 3.49
Laface 2007
system
/* creates a directory */
#include <stdio.h>
#define MAXLINE 1024
main ()
{
char line[MAXLINE], command[MAXLINE+10];
int n;
FILE *fp;
if (fgets(line, MAXLINE, stdin) == NULL)
err_sys(“filename read error”);
sprintf(command, “mkdir %s”, line);
if (system(command) != 0)
err_sys(“system error”);
exit(0);
}
Operating System Design
File system 3.50
Laface 2007
popen
#include <stdio.h>
#define MAXLINE 1024
main () {
char line[MAXLINE], command[MAXLINE+10];
int n; FILE *fp;
if (fgets(line, MAXLINE, stdin) == NULL)
err_sys(”filename read error”);
sprintf(command, ”cat %s”, line);
if ((fp = popen(command, ”r”)) == NULL) err_sys(”popen error”);
while ((fgets(line, MAXLINE, fp)) != NULL)
{
n = strlen(line);
if (write(1, line, n) != n) err_sys(”data write error”);
}
if (ferror(fp)) err_sys(”fgets error”);
pclose(fp);
exit(0);
}
Operating System Design
File system 3.51
Laface 2007