Transcript ppt

CS252: Systems Programming
Ninghui Li
Based on Slides by Prof. Gustavo Rodriguez-Rivera
Topic 1: Introduction to the Course
and Program Structure
General Information
Web Page:
http://www.cs.purdue.edu/homes/cs252
Office: LWSN2142K
Office hours:
Tentative: Tuesdays and Fridays between
1:30pm and 2:30pm.
Talk with me after class, drop by for short
questions, or email to make appointment
E-mail: [email protected]
Textbooks

No required textbook. Primary source is lecture notes
Recommended (not required):

Introduction to Systems Programming: a Hands-on
Approach by by Gustavo A. Junipero Rodriguez-Rivera and
Justin Ennen
https://www.cs.purdue.edu/homes/grr/SystemsProgrammingBook/




Operating Systems: Three Easy Pieces
http://pages.cs.wisc.edu/~remzi/OSTEP/
The Linux Programming Interface: A Linux and UNIX
System Programming Handbook
 Useful for some projects, and good as a reference book
How Linux Works (2nd Edition) by Brian Ward
The Linux Command Line by William E. Shotts, Jr.
Course Communications
Use Piazza for questions/answers, announcements,
etc.
Students have been added to Piazza
Use Blackboard Learn for grade distribution
Labs
There is no lab the first week.
The projects will be explained in the lab
sessions.
E-mail administrative questions to
[email protected]
Post project questions on Piazza
TAs office hours will be posted in the web page.
Grading
Grade allocation
Midterm: 18% (Mon Mar 6, 8pm to 9:30pm,
PHYS 114, PHYS 112)
 Final:
24%
 Projects/labs: 48%
 Six labs, weighted approximately by #of
weeks
 Attendance as measured by clicker quizs: 10%

Exams also include questions about the labs.
Policy Regarding Cheating
• Cheating defined as
•
Copy your answers or code for the lab or exam from
another source other than provided by the course
• Consequence: receives −𝒙 in the lab, where x is the
original grade not considering impact of cheating
• Give your code to others to copy
•
𝒙
𝟐
Consequence: Receives in the lab
• Every cheating case will be referred to Dean of
Students and recorded
• What happened in the past?
What is Systems Programming?
• How do computers differ from other
machines?
• Computers are general-purpose machines that
can be programmed to do many, many things
• System programming is about
implementing software systems to support
programming, i.e., how to provide
abstraction and virtualization.
Roles of This Course
• Provide introduction to the following core
CS topics
•
•
•
•
•
•
•
Compilers (CS 352)
Programming language (CS 456)
Operating systems (CS 354)
Networking (CS 422)
Security (CS 426)
Information systems (CS 348)
Theory of computing (CS 483)
Roles of This Course
• Learn what to expect in upper-division CS
courses
• Ability to complete (which includes debug)
large programming projects independently
Course Organization: Part 1: C
Programming Review
•
•
•
•
Process of Compiling, Assembling,
Linking, Loading, Runtime Linker, Static
and Shared Libraries; Address space.
Structure of a Program.
Review of pointers, arrays, pointers to
functions, memory management
Debugging using GDB.
Stack frames and buffer-overflow
vulnerabilities
Lab 1 (2 weeks): Implementing memory allocation
Course Organization: Part 2:
Language Interpreter
•
•
•
•
Learn a simple function programming language
that uses only non-negative integers, and has 5
keywords:
•
inc, dec, ifz, define, halt
Learn a simple fragment of LISP/Scheme
Learn concept of regular expressions, finite state
automata, grammars
Learn to use Lex and YACC to create parsers
Lab 2 (2 weeks): Extend a language interpreter for a
simple functional language.
Course Organization: Part 3:
Unix Systems Programming I
•
•
•
•
•
Unix history & file system fundamentals
Common Unix utilities
Unix Shell scripting
Structure of a Unix Shell
Useful system calls: file creation, read, write,
close, file mode, IO redirection, pipes, fork,
wait, waitpid, signals, etc.
Lab 3 (3 weeks): Writing a Unix shell.
Course Organization: Part 4:
Unix Systems Programming II
•
•
•
•
•
Introduction to operating systems
OS kernel fundamentals: user mode,
kernel mode, interrupts, system calls
Processes, scheduling,
Programming with threads, thread
creation.
Race Conditions, Mutex locks. Other
synchronization methods
Lab 4 (1 week): Introduction to Threads.
Course Organization: Part 5:
Unix Network Programming
•
•
•
Internet overview: ARP, IP, DNS, TCP,
UDP, NAT
Socket Programming.
Server architecture
Lab 5 (3 weeks): Implementing a Web Server
Course Organization: Part 6:
Miscellaneous topics
• More UNIX Systems Programming
• Introduction to SQL
• Concept of Turing Machines,
Computability, Deterministic and Nondeterministic Turing Machines
• Concept of P=NP
Lab 6 (Team Project, 3 weeks): Implement a nondeterministic Turing Machine
Problem of the Week
• Questions taken from the books
• Cracking the Coding Interview: 150
Programming Questions and Solutions
• Elements of Programming Interviews
• Not officially part of the grade equation
From Source Files to Programs
and Processes
What Happens From a C Source
Program to Program Execution
Building a program, i.e, generating an executable
file from source code
What are the steps?
What does an executable file look like?
Loading a program
Each time when you execute a program, a
process is created.
In Unix/Linux, use “ps” command to show
processes in the system
Building a Program
The programmer writes a program hello.c
The preprocessor expands #define, #include,
#ifdef etc preprocessor statements and generates a
hello.i file.
The compiler compiles hello.i, optimizes it and
generates an assembly instruction listing hello.s
The assembler (as) assembles hello.s and
generates an object file hello.o
The compiler (cc or gcc) by default hides all these
intermediate steps. You can use compiler options
to run each step independently.
Original file hello.c
#include <stdio.h>
main()
{
printf("Hello\n");
}
After preprocessor
gcc -E hello.c > hello.i
(-E stops compiler after running
preprocessor)
hello.i:
/* Expanded /usr/include/stdio.h */
typedef void *__va_list;
typedef struct __FILE __FILE;
typedef int
ssize_t;
struct FILE {…};
extern int fprintf(FILE *, const char *, ...);
extern int fscanf(FILE *, const char *, ...);
extern int printf(const char *, ...);
/* and more */
main()
{
printf("Hello\n");
}
After compiling
gcc -S hello.c
(-S stops compiler after
generating assembly code)
Resulting file is hello.s
Actual code depends on the system
LC0:
.ascii "Hello\0"
.text
.globl _main
.def
_main; .scl
_main:
pushl
movl
andl
subl
call
movl
call
leave
ret
%ebp
%esp, %ebp
$-16, %esp
$16, %esp
___main
$LC0, (%esp)
_puts
2; .type
32;
.endef
After compiling & assembling
“gcc -c hello.c” generates hello.o
The main function already has a value in the object
file hello.o
hello.o has undefined symbols, like the _puts
function call that we don’t know where it is placed.
The command “nm” can lists the symbols from
object files
Output of “nm hello.o”
0000000000000000 b .bss
0000000000000000 d .data
0000000000000000 t .text
U __main
0000000000000000 T main
U puts
// uninitilized data
// Global and static vars
// entry point of program
// main function defined in code
__main and puts are undefined in “hello.o”
They are provided by the libraries
Building a program (continued)
The linker puts together all object files as well as the
object files in static libraries.
The linker also takes the definitions in shared libraries
and verifies that the symbols (functions and variables)
needed by the program are completely satisfied.
If there is symbol that is not defined in either the
executable or shared libraries, the linker will give an
error.
Static libraries (.a files) are added to the executable.
shared libraries (.so files) are not added to the
executable file.
Static and Shared Libraries
Shared libraries are shared across different
processes.
There is only one instance of each shared
library for the entire system.
Static libraries are not shared.
There is an instance of an static library for
each process.
After linking
“gcc –o hello hello.c” generates the hello
executable
The hello.o object code is statically linked with libraries
to include code of library functions
In linking, “static = compilation/building time”
Sometimes, not all functions’ code are included, some
code are stored in shared libraries and dynamically
linked.
In linking, “dynamic = loading/execution time”
Building a Program
hello.c
hello.i
C
Preprocessor
Editor
Programmer
Optimizer
hello.s
hello.o
Assembler
(as)
Compiler
(cc)
(static)
Linker (ld)
Other .o files
Static libraries (.a files)
They add to the size of
the executable.
Executable
File (hello)
What is a program?
A program is a file in a special format that
contains all the necessary information to load an
application into memory and make it run.
A program file includes:





machine instructions
initialized data
List of library dependencies
List of memory sections that the program will use
List of undefined values in the executable that will be
known until the program is loaded into memory.
Executable File Formats
There are different executable file formats

ELF – Executable Link File
It is used in most UNIX systems (Solaris, Linux)
Can use elfdump to see information in binary file

COFF – Common Object File Format
It is used in Windows systems

a.out – Used in BSD (Berkeley Standard Distribution)
and early UNIX
It was very restrictive. It is not used anymore.
Note: BSD UNIX and AT&T UNIX are the
predecessors of the modern UNIX flavors like
Solaris and Linux.
Loading a Program
After one types ./hello in a shell, the shell
creates a new process and load the file hello.
The loader is a program that is used to run an
executable file in a process.
Before the program starts running, the loader
allocates space for all the sections of the
executable file (text, data, bss etc)
It loads into memory the executable and
shared libraries (if not loaded yet)
Loading a Program
It also writes (resolves) any values in the
executable to point to the functions/variables in
the shared libraries.(E.g. calls to printf in hello.c)
Once memory image is ready, the loader jumps to
the _start entry point that calls init() of all libraries
and initializes static constructors. Then it calls
main() and the program begins.
_start also calls exit() when main() returns.
The loader is also called “runtime linker”.
Loading a Program
Executable
File
Loader
(runtime linker)
(/usr/lib/ld.so.1)
Shared libraries (.so, .dll)
Executable
in memory
Static and Shared Libraries
Shared libraries are shared across different
processes.
There is only one instance of each shared
library for the entire system.
Static libraries are not shared.
There is an instance of an static library for
each process.
Static and Dynamic
Static – Events that happen during program
building.
Example: Static linker
Dynamic – Events that happen while
program is running.
Also called “Runtime”.
Example: Dynamic linker
Memory Structure of a Process
Memory of a Program
A program sees memory as an array of
bytes that goes from address 0 to 264-1
(0x0000000000000000-0xFFFFFFFFFFFFFFFF)
That is assuming a 64-bit architecture.
264-1
0xFFFFFFFFFFFFFFFF
0
0x0000000000000000
Memory Sections
The memory is organized into sections
called “memory mappings”.
264-1
Stack
Shared Libs
Heap
Bss
Data
0
Text
Memory Sections
Each section has different permissions:
read/write/execute or a combination of them.
Text- Instructions that the program runs
Data – Initialized global variables.
Bss – Uninitialized global variables. They are
initialized to zeroes.
Heap – Memory returned when calling
malloc/new. It grows upwards.
Stack – It stores local variables and return
addresses. It grows downwards.
Memory Sections
Dynamic libraries – They are libraries shared with
other processes.
Each dynamic library has its own text, data, and
bss.
Each program has its own view of the memory
that is independent of each other.
Virtual memory, mapped by OS to physical memory
This view is called the “Address Space” of the
program.
If a process modifies a byte in its own address space,
it will not modify the address space of another
process.
Where things are located
Program hello.c
int a = 5;
// Stored in
int b[20];
// Stored in
int main() { // Stored in
int x;
// Stored in
int *p =(int*)
malloc(sizeof(int));
}
data section
bss
text
stack
//In heap
Memory Gaps
Between each memory section there may be gaps
that do not have any memory mapping.
If the program tries to access a memory gap, the
OS will send a SEGV signal that by default kills
the program and dumps a core file.
The core file contains the value of the variables
global and local at the time of the SEGV.
The core file can be used for “post mortem”
debugging.
gdb program-name core
gdb> where
Review
Steps of building a program
Static vs. shared library
Static vs. dynamic linking
Memory structure of a process:
text, data, stack, heap
where different variables occur
Upcoming
• Topic 2: Memory Management