System Calls
Download
Report
Transcript System Calls
Operating Systems, 142
Tas:
Vadim Levit , Dan Brownstein, Ehud Barnea,
Matan Drory and Yerry Sofer
Practical Session 1, System Calls
A few administrative notes…
• Course homepage: http://www.cs.bgu.ac.il/~os142/
• Contact staff through the dedicated email:
[email protected]
(format the subject of your email according to the
instructions listed in the course homepage)
• Assignments:
Extending xv6 (a pedagogical OS)
Submission in pairs.
Frontal checking:
1.
2.
Assume the grader may ask anything.
Must register to exactly one checking session.
System Calls
• A System Call is an interface between a user
application and a service provided by the
operating system (or kernel).
• These can be roughly grouped into five major
categories:
1.
2.
3.
4.
5.
Process control (e.g. create/terminate process)
File Management (e.g. read, write)
Device Management (e.g. logically attach a device)
Information Maintenance (e.g. set time or date)
Communications (e.g. send messages)
System Calls - motivation
• A process is not supposed to access the kernel. It
can’t access the kernel memory or functions.
• This is strictly enforced (‘protected mode’) for
good reasons:
• Can jeopardize other processes running.
• Cause physical damage to devices.
• Alter system behavior.
• The system call mechanism provides a safe
mechanism to request specific kernel operations.
System Calls - interface
• Calls are usually made with C/C++ library functions:
User Application
C - Library
getpid()
Kernel
Load arguments,
eax _NR_getpid,
kernel mode (int 80)
System Call
syscall_exit
return
sys_getpid()
Call
Sys_Call_table[eax]
resume_userspace
return
User-Space
Kernel-Space
Remark: Invoking int 0x80 is common although newer techniques for “faster” control transfer are provided
by both AMD’s and Intel’s architecture.
System Calls – tips
• Kernel behavior can be enhanced by altering the
system calls themselves: imagine we wish to
write a message (or add a log entry) whenever a
specific user is opening a file. We can re-write the
system call open with our new open function and
load it to the kernel (need administrative rights).
Now all “open” requests are passed through our
function.
• We can examine which system calls are made by
a program by invoking strace<arguments>.
Process control
• Fork
• pid_t fork(void);
• Fork is used to create a new process. It creates a duplicate of
the original process (including all file descriptors, registers,
instruction pointer, etc’).
• Once the call is finished, the process and its copy go their
separate ways. Subsequent changes to one should not effect
the other.
• The fork call returns a different value to the original process
(parent) and its copy (child): in the child process this value is
zero, and in the parent process it is the PID of the child
process.
• When fork is invoked the parent’s information should be
copied to its child – however, this can be wasteful if the child
will not need this information (see exec()…). To avoid such
situations, Copy On Write (COW) is used for the data section.
Copy On Write (COW)
• How does Linux manage COW?
fork()
Parent
Process
Child
Process
DATA STRUCTURE
(task_struct)
DATA STRUCTURE
(task_struct)
RW
RW
RO
write
information
protection fault!
Copying is expensive.
The child process will
point to the parent’s
pages
Well, no other choice
but to allocate a new
RW copy of each
required page
Process control
An example:
Program flow:
int i = 3472;
printf("my process pid is %d\n",getpid());
fork_id=fork();
if (fork_id==0){
i= 6794;
printf(“child pid %d, i=%d\n",getpid(),i);
}
else
printf(“parent pid %d, i=%d\n",getpid(),i);
return 0;
Output:
my process pid is 8864
child pid 8865, i=6794
parent pid 8864, i=3472
PID = 8864
i = 3472
fork ()
PID = 8865
fork_id = 8865
i=3472
Is this the only possible output?
Running the above code on some systems will almost
always return this value. Why?
fork_id=0
i = 6794
Process control - zombies
• When a process ends, the memory and resources
associated with it are deallocated.
• However, the entry for that process is not
removed from the process table.
• This allows the parent to collect the child’s exit
status.
• When this data is not collected by the parent the
child is called a “zombie”. Such a leak is usually
not worrisome in itself, however, it is a good
indicator for problems to come.
Process control - zombies
• In some (rare) occasions, a zombie is actually
desired – it may, for example, prevent the
creation of another child process with the same
pid.
• Zombies are not the same as orphan processes (a
process whose parent ended and is then adopted
by init (process id 1)).
• Zombies can be detected with ps –el (marked
with ‘Z’).
• Zombies can be collected with the wait system
call.
Process control
• Wait
• pid_t wait(int *status);
• pid_t waitpid(pid_t pid, int *status, int options);
• The wait command is used for waiting on child processes
whose state changed (the process terminated, for example).
• The process calling wait will suspend execution until one of
its children (or a specific one) terminates.
• Waiting can be done for a specific process, a group of
processes or on any arbitrary child with waitpid.
• Once the status of a process is collected that process is
removed from the process table by the collecting process.
• Kernel 2.6.9 and later also introduced waitid(…) which gives
finer control.
Process control
• exec*
• int execv(const char *path, char *const argv[]);
• int execvp(const char *file, char *const argv[]);
• exec….
• The exec() family of function replaces current process
image with a new process image (text, data, bss, stack,
etc).
• Since no new process is created, PID remains the same.
• Exec functions do not return to the calling process
unless an error occurred (in which case -1 is returned
and errno is set with a special value).
• The system call is execve(…)
errno
• The <errno.h> header file includes the integer errno variable.
• This variable is set by many functions (including sys calls) in the
event of an error to indicate what went wrong.
• errnos value is only relevant when the call returned an error
(usually -1).
• A successful call to a function may also change the errno value.
• errno may be a macro.
• errno is thread local meaning that setting it in one thread does not
affect its value in any other thread.
• Be wary of mistakes such as:
• If (call()==-1){
printf(“failed…”);
if (errno==…..)
}
• Code defensively! Use errno often!
Process control – simple shell
#define…
…
int main(int argc, char **argv){
…
while(true){
type_prompt();
read_command(command, params);
pid=fork();
if (pid<0){
if (errno==EAGAIN)
printf(“ERROR cannot allocate sufficient memory\n”);
continue;
}
if (pid>0)
wait(&status);
else
execvp(command,params);
}
File management
• In POSIX operating systems files are accessed via a file descriptor
(Microsoft Windows uses a slightly different object: file handle).
• A file descriptor is an integer specifying the index of an entry in the
file descriptor table held by each process.
• A file descriptor table is held by each process, and contains details
of all open files. The following is an example of such a table:
FD
Name
Other information
0
Standard Input (stdin)
…
1
Standard Output (stdout) …
2
Standard Error (stderr)
…
• File descriptors can refer to files, directories, sockets and a few
more data objects.
File management
• Open
• int open(const char *pathname, int flags);
• int open(const char *pathname, int flags, mode_t mode);
• Open returns a file descriptor for a given pathname.
• This file descriptor will be used in subsequent system calls
(according to the flags and mode)
• Flags define the access mode: O_RDONLY (read only),
O_WRONLY (write only), O_RDRW (read write). These can be
bit-wised or’ed with more creation and status flags such as
O_APPEND, O_TRUNC, O_CREAT.
• Close
• Int close(int fd);
• Closes a file descriptor so it no longer refers to a file.
• Returns 0 on success or -1 in case of failure (errno is set).
File management
• Read
• ssize_t read(int fd, void *buf, size_t count);
• Attempts to read up to count bytes from the file descriptor fd, into the
buffer buf.
• Returns the number of bytes actually read (can be less than requested
if read was interrupted by a signal, close to EOF, reading from pipe or
terminal).
• On error -1 is returned (and errno is set).
• Note: The file position advances according to the number of bytes
read.
• Write
• ssize_t write(int fd, const void *buf, size_t count);
• Writes up to count bytes to the file referenced to by fd, from the
buffer positioned at buf.
• Returns the number of bytes actually wrote, or -1 (and errno) on error.
File management
• lseek
• off_t lseek(int fd, off_t offset, int whence);
• This function repositions the offset of the file position of the
file associated with fd to the argument offset according to
the directive whence.
• Whence can be set to SEEK_SET (directly to offset),
SEEK_CUR (current+offset), SEEK_END (end+offset).
• Positioning the offset beyond file end is allowed. This does
not change the size of the file.
• Writing to a file beyond its end results in a “hole” filled with
‘\0’ characters (null bytes).
• Returns the location as measured in bytes from the
beginning of the file, or -1 in case of error (and set errno).
File management
• Dup
• int dup(int oldfd);
• int dup2(int oldfd, int newfd);
• The dup commands create a copy of the file descriptor oldfd.
• After a successful dup command is executed the old and
new file descriptors may be used interchangeably.
• They refer to the same open file descriptions and thus share
information such as offset and status. That means that using
lseek on one will also affect the other!
• They do not share descriptor flags (FD_CLOEXEC).
• Dup uses the lowest numbered unused file descriptor, and
dup2 uses newfd (closing current newfd if necessary).
• Returns the new file descriptor, or -1 in case of an error (and
set errno).
File management
Consider the following
example:
fileFD= open(“file.txt”…);
close(1);
As a result (abstract):
0
stdin
…
1
stdout
…
2
stderr
…
3
file.txt
…
0
stdin
…
1
file.txt
…
2
stderr
…
/* closes file handle 1, which is stdout.*/
fd =dup(fileFD); /* will create another file handle.
File handle 1 is free, so it will be allocated. */
close(fileFD); /* don’t need this descriptor
anymore.*/
printf(“this did not go to stdout”);
File management - example
#define…
…
#define RW_BLOCK 10
int main(int argc, char **argv){
int fdsrc, fddst;
ssize_t readBytes, wroteBytes;
char *buf[RW_BLOCK];
char *source = argv[1];
char *dest = argv[2];
fdsrc=open(source,O_RDONLY);
if (fdsrc<0){
perror("ERROR while trying to open source file:");
exit(-1);
}
fddst=open(dest,O_RDWR|O_CREAT|O_TRUNC, 0666);
if (fddst<0){
perror("ERROR while trying to open destination file:");
exit(-2);
}
perror() produces a message on the
standard error output describing the
last error encountered during a call to a
system call. Use with care: the
message is not cleared when non
erroneous calls are made.
exit() system call.
Bitwise OR: open for both reading and
writing, if the file does not exist create
it and always start at 0.
File management - example
lseek(fddst,20,SEEK_SET);
do{
readBytes=read(fdsrc, buf, RW_BLOCK);
if (readBytes<0){
if (errno == EIO){
printf("I/O errors detected, aborting.\n");
exit(-10);
}
exit (-11);
}
wroteBytes=write(fddst, buf, readBytes);
if (wroteBytes<RW_BLOCK)
if (errno == EDQUOT)
printf("ERROR: out of quota.\n");
else if (errno == ENOSPC)
printf("ERROR: not enough disk space.\n");
} while (readBytes>0);
lseek(fddst,0,SEEK_SET);
write(fddst,"\\*WRITE START*\\\n",16);
close(fddst);
close(fdsrc);
return 0;
}
Change the offset to 20.
Using errno directly.
Start writing at offset 20.
If the file is opened with hexedit, the
first 20 bytes will be 00.
Adding an extra comment at the
beginning of the file.
Fork – example (1)
How many lines of “Hello” will be printed
in the following example:
int main(int argc, char **argv){
int i;
for (i=0; i<10; i++){
fork();
printf(“Hello \n”);
}
return 0;
}
Fork – example (1)
How many lines of “Hello” will be printed
in the following example:
int main(int argc, char **argv){
int i;
for (i=0; i<10; i++){
fork();
printf(“Hello \n”);
}
return 0;
}
Program flow:
i=0
i=1
i=2
Total number of printf calls:
Fork – example (2)
How many lines of “Hello” will be printed
in the following example:
int main(int argc, char **argv){
int i;
for (i=0; i<10; i++){
printf(“Hello \n”);
fork();
}
return 0;
}
Fork – example (2)
How many lines of “Hello” will be printed
in the following example:
Program flow:
i=0
int main(int argc, char **argv){
int i;
for (i=0; i<10; i++){
printf(“Hello \n”);
fork();
}
return 0;
}
i=1
i=2
Total number of printf calls:
Fork – example (3)
How many lines of “Hello” will be printed
in the following example:
int main(int argc, char **argv){
int i;
for (i=0; i<10; i++)
fork();
printf(“Hello \n”);
return 0;
}
Fork – example (3)
How many lines of “Hello” will be printed
in the following example:
int main(int argc, char **argv){
int i;
for (i=0; i<10; i++)
fork();
printf(“Hello \n”);
return 0;
}
Program flow:
i=0
i=1
i=2
Total number of printf calls:
Tips
• Information sources are abundant:
• The internet.
• Man pages (apropos).
• In Linux it is often useful (and easy) to examine the
included header files. You can easily find their location
by using the whereis command (you may also find
which useful).
• MSDN – this is less relevant to our course, but it also
includes code examples.
• A list of system calls on CS dept. computers:
/usr/include/asm/unistd_32.h
Overview
ASSIGNMENT 1
Assignment 1
• Divided into four parts:
1.
2.
3.
4.
•
Get to know xv6, and brush up your c skills
Create and modify system calls
Implement different scheduling algorithms
Write user space programs to test previous
sections
You can start working on sections 1, 2 and 4 immediately.
General scheduling algorithms will be discussed in
practical session 3
Assignment 1
Hello xv6
• xv6 is a simplistic educational OS, it is used in
universities such as MIT and Yale.
• xv6 is a re-implementation of Unix Version 6,
but offers only a partial implementation.
• We will use QEMU, which is a generic and
open source machine emulator and virtualizer
to run xv6.
• Everything you need is installed on lab
computers.
Assignment 1
Details
• We will use svn to retrieve the initial xv6 version, and
modify that version.
• There are two main files that are built by the
makefile: xv6.img and fs.img, one is the OS and the
other is the file system.
• In the first task you will add to xv6 the ‘PATH’
environment variable, and the option for the right
and left arrows (‘←’, ‘→’, ‘↑’, ‘↓’).
• In the second task you will add scheduling algorithms
and helpful system calls.
• In the last task you will add user space programs.
Notice that they are different from regular c
programs because they use xv6 libraries.
XV6 CODE
35
THE SHELL
int
main(void)
{
static char buf[100];
int fd;
// Assumes three file descriptors open.
while((fd = open("console", O_RDWR)) >= 0){
if(fd >= 3){
close(fd);
break;
}
}
// Read and run input commands.
while(getcmd(buf, sizeof(buf)) >= 0){
if(buf[0] == 'c' && buf[1] == 'd' && buf[2] == ' '){
// Clumsy but will have to do for now.
// Chdir has no effect on the parent if run in the child.
buf[strlen(buf)-1] = 0; // chop \n
if(chdir(buf+3) < 0)
printf(2, "cannot cd %s\n", buf+3);
continue;
}
if(fork1() == 0)
runcmd(parsecmd(buf));
wait();
}
exit();
}
36
THE SCHEDULER
// Per-CPU process scheduler.
// Each CPU calls scheduler() after setting itself up.
// Scheduler never returns. It loops, doing:
// - choose a process to run
// - swtch to start running that process
// - eventually that process transfers control
//
via swtch back to the scheduler.
void
scheduler(void)
{
struct proc *p;
for(;;){
// Enable interrupts on this processor.
sti();
// Loop over process table looking for process to run.
acquire(&ptable.lock);
for(p = ptable.proc; p < &ptable.proc[NPROC]; p++){
if(p->state != RUNNABLE)
continue;
// Switch to chosen process. It is the process's job
// to release ptable.lock and then reacquire it
// before jumping back to us.
proc = p;
switchuvm(p);
p->state = RUNNING;
swtch(&cpu->scheduler, proc->context);
switchkvm();
// Process is done running for now.
// It should have changed its p->state before coming back.
proc = 0;
}
release(&ptable.lock);
}
}
37
THE KILL SYSTEM CALL
/*** sysproc.c ***/
int
sys_kill(void)
{
int pid;
/*** syscall.c ***/
if(argint(0, &pid) < 0)
return -1;
return kill(pid);
}
/*** proc.c ***/
// Kill the process with the given pid.
// Process won't exit until it returns
// to user space (see trap in trap.c).
int
kill(int pid) {
struct proc *p;
acquire(&ptable.lock);
for(p = ptable.proc; p < &ptable.proc[NPROC]; p++){
if(p->pid == pid){
p->killed = 1;
// Wake process from sleep if necessary.
if(p->state == SLEEPING)
p->state = RUNNABLE;
release(&ptable.lock);
return 0;
}
release(&ptable.lock);
return -1;
}
static int (*syscalls[])(void) = {
[SYS_chdir]
sys_chdir,
[SYS_close]
sys_close,
[SYS_dup]
sys_dup,
[SYS_exec]
sys_exec,
[SYS_exit]
sys_exit,
[SYS_fork]
sys_fork,
[SYS_fstat]
sys_fstat,
[SYS_getpid] sys_getpid,
[SYS_kill]
sys_kill,
[SYS_link]
sys_link,
[SYS_mkdir]
sys_mkdir,
[SYS_mknod]
sys_mknod,
[SYS_open]
sys_open,
[SYS_pipe]
sys_pipe,
[SYS_read]
sys_read,
[SYS_sbrk]
sys_sbrk,
[SYS_sleep]
sys_sleep,
[SYS_unlink] sys_unlink,
[SYS_wait]
sys_wait,
[SYS_write]
sys_write,
[SYS_uptime] sys_uptime,
};
38