OSPP: The Kernel Abstraction

Download Report

Transcript OSPP: The Kernel Abstraction

The Kernel Abstraction
Main Points
• Process concept
– A process is an OS abstraction for executing a
program with limited privileges
• Dual-mode operation: user vs. kernel
– Kernel-mode: execute with complete privileges
– User-mode: execute with fewer privileges
• Safe control transfer
– How do we switch from one mode to the other?
Processes
• Fundamental abstraction of program execution
– memory
– processor(s)
• each processor abstraction is a thread
– “execution context”
Process Concept
• Process: an instance of a program, running
with limited rights
– Process control block: the data structure the OS
uses to keep track of a process
– Two parts to a process:
• Thread: a sequence of instructions within a process
– Potentially many threads per process (for now 1:1)
– Thread aka lightweight process
• Address space: set of rights of a process
– Memory that the process can access
– Other permissions the process has (e.g., which procedure calls
it can make, what files it can access)
A Program
const int nprimes = 100;
int prime[nprimes];
int main() {
int i;
int current = 2;
prime[0] = current;
for (i=1; i<nprimes; i++) {
int j;
NewCandidate:
current++;
for (j=0; prime[j]*prime[j] <= current; j++) {
if (current % prime[j] == 0)
goto NewCandidate;
}
prime[i] = current;
}
return(0);
}
The Unix Address Space
stack
dynamic
bss
data
text
Typical Process Layout
• Libraries provide the
glue between user
processes and the OS
– libc linked in with all C
programs
– Provides printf, malloc,
and a whole slew of
other routines necessary
for programs
Activation Records
Stack
OBJECT1
OBJECT2
Heap
HELLO WORLD
GO BIG RED CS!
Data
printf(char * fmt, …) {
create the string to be printed
SYSCALL 80
}
malloc() { … }
strcmp() { … }
Library
Text
main() {
printf (“HELLO WORLD”);
printf(“GO BIG RED CS”);
!
Program
Full System Layout
• The OS is omnipresent and
steps in where necessary to
aid application execution
Kernel Activation Records
USER OBJECT1
OBJECT2
LINUX
syscall_entry_point() { … }
– Typically resides in high
memory
• When an application needs
to perform a privileged
operation, it needs to
invoke the OS
OS Stack
OS Heap
OS Data
OS Text
Activation Records
OBJECT1
OBJECT2
HELLO WORLD
GO BIG RED CS!
printf(char * fmt, …) {
Stack
Heap
Data
Library
main() { … }
Program
Process Concept
Multiple Processes
other stuff
kernel stack
other stuff
kernel stack
other stuff
kernel stack
other stuff
kernel stack
kernel text
Memory Protection
Memory Protection
Virtual Addresses
• Translation done in hardware, using a table
• Table set up by operating system kernel
Privileged Instructions
Privileged instructions
• Examples?
• What should happen if a user program
attempts to execute a privileged instruction?
Thought Experiment
• How can we implement execution with limited
privilege?
– Execute each program instruction in a simulator
– If the instruction is permitted, do the instruction
– Otherwise, stop the process
– Basic model in Javascript, …
• How do we go faster?
– Run the unprivileged code directly on the CPU?
Privilege Levels
• Some processor functionality cannot be made
accessible to untrusted user applications
– e.g. HALT, Read from disk, set clock, reset devices,
manipulate device settings, …
• Need to have a designated mediator between
untrusted/untrusting applications
– The operating system (OS)
• Need to delineate between untrusted applications
and OS code
– Use a “privilege mode” bit in the processor
– 0 = Untrusted = user, 1 = Trusted = OS
Privilege Mode
• Privilege mode bit indicates if the current program
can perform privileged operations
– On system startup, privilege mode is set to 1, and the
processor jumps to a well-known address
– The operating system (OS) boot code resides at this
address
– The OS sets up the devices, initializes the MMU, loads
applications, and resets the privilege bit before invoking
the application
• Applications must transfer control back to OS for
privileged operations
Hardware Support:
Dual-Mode Operation
• Kernel mode
– Execution with the full privileges of the hardware
– Read/write to any memory, access any I/O device,
read/write any disk sector, send/read any packet
• User mode
– Limited privileges
– Only those granted by the operating system kernel
• On the x86, mode stored in EFLAGS register
Hardware Support:
Dual-Mode Operation
• Privileged instructions
– Available to kernel
– Not available to user code
• Limits on memory accesses
– To prevent user code from overwriting the kernel
• Timer
– To regain control from a user program in a loop
• Safe way to switch from user mode to kernel
mode, and vice versa
Atomic Instructions
• Hardware needs to provide special
instructions to enable concurrent programs to
operate correctly
Hardware Timer
Switch between hardware and kernel
Hardware Timer
• Hardware device that periodically interrupts
the processor
– Returns control to the kernel timer interrupt
handler
– Interrupt frequency set by the kernel
• Not by user code!
– Interrupts can be temporarily deferred
• Not by user code!
• Crucial for implementing mutual exclusion
Context switch between usermode and kernel
Mode Switch
• From user-mode to kernel
– Interrupts
• Triggered by timer and I/O devices
– Exceptions
• Triggered by unexpected program behavior
• Or malicious behavior!
– System calls (aka protected procedure call)
• Request by program for kernel to do some operation on
its behalf
• Only limited # of very carefully coded entry points
Mode Switch
• From kernel-mode to user
– New process/new thread start
• Jump to first instruction in program/thread
– Return from interrupt, exception, system call
• Resume suspended execution
– Process/thread context switch
• Resume some other process
– User-level upcall
• Asynchronous notification to user program
Context switch
Interrupts
Basic Computer Organization
Memory
CPU
?
Keyboard
• Let’s build a keyboard
– Lots of mechanical switches
– Need to convert to a
compact form (binary)
• We’ll use a special
mechanical switch that,
when pressed, connects
two wires simultaneously
Keyboard
+
4-bit
encoder
(16 to 4)
not all 16 wires are shown
• When a key is
pressed, a 7bit key
identifier is
computed
3-bit
encoder
(4 to 3)
Keyboard
4-bit
encoder
(16 to 4)
3-bit
encoder
(4 to 3)
Latch
+
not all 16 wires are shown
• A latch can store the keystroke indefinitely
Keyboard
+
4-bit
encoder
(16 to 4)
3-bit
encoder
(4 to 3)
Latch
CPU
not all 16 wires are shown
• The keyboard can then appear to the CPU as
if it is a special memory address
Device Interfacing Techniques
• Memory-mapped I/O
– Device communication goes over the memory bus
– Reads/Writes to special addresses are converted into I/O operations
by dedicated device hardware
– Each device appears as if it is part of the memory address space
• Programmed I/O
– CPU has dedicated, special instructions
– CPU has additional input/output wires (I/O bus)
– Instruction specifies device and operation
• Memory-mapped I/O is the predominant device interfacing
technique in use
Polling vs. Interrupts
• One design is the CPU constantly needs to read the keyboard
latch memory location to see if a key is pressed
– Called polling
– Inefficient
• An alternative is to add extra circuitry so the keyboard can
alert the CPU when there is a keypress
– Called interrupt driven I/O
• Interrupt driven I/O enables the CPU and devices to perform
tasks concurrently, increasing throughput
– Only needs a tiny bit of circuitry and a few extra wires to implement
the “alert” operation
Interrupt Driven I/O
Memory
CPU
intr
dev id
Interrupt
Controller
An interrupt controller mediates between competing
devices
Raises an interrupt flag to get the CPU’s attention
Identifies the interrupting device
Can disable (aka mask) interrupts if the CPU so desires
Interrupt Management
• Interrupt controllers manage interrupts
– Maskable interrupts: can be turned off by the CPU for
critical processing
– Nonmaskable interrupts: signifies serious errors (e.g.
unrecoverable memory error, power out warning, etc)
• Interrupts contain a descriptor of the interrupting
device
– A priority selector circuit examines all interrupting devices,
reports highest level to the CPU
• Interrupt controller implements interrupt priorities
– Can optionally remap priority levels
How do we take interrupts safely?
• Interrupt vector
– Limited number of entry points into kernel
• Kernel interrupt stack
– Handler works regardless of state of user code
• Interrupt masking
– Handler is non-blocking
• Atomic transfer of control
– Single instruction to change:
•
•
•
•
Program counter
Stack pointer
Memory protection
Kernel/user mode
• Transparent restartable execution
– User program does not know interrupt occurred
Interrupt Vector
• Table set up by OS kernel; pointers to code to
run on different events
Interrupt Masking
• Interrupt handler runs with interrupts off
– Reenabled when interrupt completes
• OS kernel can also turn interrupts off
– Eg., when determining the next process/thread to
run
– If defer interrupts too long, can drop I/O events
Interrupt Handlers
• Non-blocking, run to completion
– Minimum necessary to allow device to take next
interrupt
– Any waiting must be limited duration
– Wake up other threads to do any real work
• Pintos: semaphore_up
• Rest of device driver runs as a kernel thread
– Queues work for interrupt handler
– (Sometimes) wait for interrupt to occur
At end of handler
• Handler restores saved registers
• Atomically return to interrupted
process/thread
– Restore program counter
– Restore program stack
– Restore processor status word/condition codes
– Switch to user mode
Exceptional Situations
•
System calls are control transfers to the OS, performed under the control of the
user application
•
Sometimes, need to transfer control to the OS at a time when the user program
least expects it
–
–
–
–
Division by zero,
Alert from the power supply that electricity is about to go out,
Alert from the network device that a packet just arrived,
Clock notifying the processor that the clock just ticked,
•
Some of these causes for interruption of execution have nothing to do with the
user application
•
Need a (slightly) different mechanism, that allows resuming the user application
Interrupts & Exceptions
•
On an interrupt or exception
–
–
–
–
–
–
–
•
Switches the stack pointer to the kernel stack
Saves the old (user) SP value
Saves the old (user) Program Counter value
Saves the old privilege mode
Saves cause of the interrupt/exception
Sets the new privilege mode to 1
Sets the new PC to the kernel interrupt/exception handler
Kernel interrupt/exception handler handles the event
–
–
–
–
–
Saves all registers
Examines the cause
Performs operation required
Restores all registers
Performs a “return from interrupt” instruction, which restores the privilege mode, SP
and PC
Before
After
Interrupt Stack
• Per-processor, located in kernel (not user)
memory
– Usually a thread has both: kernel and user stack
• Why can’t interrupt handler run on the stack
of the interrupted user process?
Interrupt Stack
Context switch
System Calls
System Calls
• A system call is a controlled transfer of execution
from unprivileged code to the OS
– A potential alternative is to make OS code read-only, and
allow applications to just jump to the desired system call
routine. Why is this a bad idea?
• A SYSCALL instruction transfers control to a system
call handler at a fixed address
System Calls
other stuff
Kernel portion
of address space
kernel stack
kernel text
trap into kernel
User portion of
address space
write(fd, buf, len)
System Calls
• Sole interface between user and kernel
• Implemented as library routines that
execute trap instructions to enter kernel
• Errors indicated by returns of –1; error
code is in errno
if (write(fd, buffer, bufsize) == –1) {
// error!
printf("error %d\n", errno);
// see perror
}
Sample System Calls
• Print character to screen
– Needs to multiplex the shared screen resource
between multiple applications
• Send a packet on the network
– Needs to manipulate the internals of a device
whose hardware interface is unsafe
• Allocate a page
– Needs to update page tables & MMU
Syscall vs. Interrupt
• The differences lie in how they are initiated, and how
much state needs to be saved and restored
• Syscall requires much less state saving
– Caller-save registers are already saved by the application
• Interrupts typically require saving and restoring the
full state of the processor
– Because the application got struck by a lightning bolt
without anticipating the control transfer
System Calls
Kernel System Call Handler
• Locate arguments
– In registers or on user(!) stack
• Copy arguments
– From user memory into kernel memory
– Protect kernel from malicious code evading checks
• Validate arguments
– Protect kernel from errors in user code
• Copy results back
– into user memory
SYSCALL instruction
•
SYSCALL instruction does an atomic jump to a controlled location
–
–
–
–
–
–
•
Switches the sp to the kernel stack
Saves the old (user) SP value
Saves the old (user) PC value (= return address)
Saves the old privilege mode
Sets the new privilege mode to 1
Sets the new PC to the kernel syscall handler
Kernel system call handler carries out the desired system call
–
–
–
–
–
–
–
Saves callee-save registers
Examines the syscall number
Checks arguments for sanity
Performs operation
Stores result in v0
Restores callee-save registers
Performs a “return from syscall” instruction, which restores the privilege mode, SP and PC
Web Server Example
System Boot
System Boot
• Operating system must be made available to
hardware so hardware can start it
– Small piece of code – bootstrap loader, locates
the kernel, loads it into memory, and starts it
– Sometimes two-step process where boot block at
fixed location loads bootstrap loader
– When power initialized on system, execution
starts at a fixed memory location
• Firmware used to hold initial boot code
Booting
Virtual Machines
Virtual Machines
• A virtual machine takes the layered approach
to its logical conclusion. It treats hardware
and the operating system kernel as though
they were all hardware
• A virtual machine provides an interface
identical to the underlying bare hardware
• The operating system host creates the
illusion that a process has its own processor
and (virtual memory)
• Each guest provided with a (virtual) copy of
underlying computer
Virtual Machine
Virtual Machines (Cont)
Non-virtual Machine
Virtual Machine
(a) Nonvirtual machine (b) virtual machine
User-Level Virtual Machine
• How does VM Player work?
– Runs as a user-level application
– How does it catch privileged instructions, interrupts,
device I/O, …
• Installs kernel driver, transparent to host kernel
–
–
–
–
Requires administrator privileges!
Modifies interrupt table to redirect to kernel VM code
If interrupt is for VM, upcall
If interrupt is for another process, reinstalls interrupt
table and resumes kernel
Context switch
System Upcalls
Upcall: User-level interrupt
• AKA UNIX signal
– Notify user process of event that needs to be handled
right away
• Time-slice for user-level thread manager
• Interrupt delivery for VM player
• Direct analogue of kernel interrupts
–
–
–
–
Signal handlers – fixed entry points
Separate signal stack
Automatic save/restore registers – transparent resume
Signal masking: signals disabled while in signal handler
Upcall: Before
Upcall: After
Terminology
• Trap
– Any kind of a control transfer to the OS
• Syscall
– Synchronous, program-initiated control transfer from user to the OS to
obtain service from the OS
– e.g. SYSCALL
• Exception
– Asynchronous, program-initiated control transfer from user to the OS
in response to an exceptional event
– e.g. Divide by zero, segmentation fault
• Interrupt
– Asynchronous, device-initiated control transfer from user to the OS
– e.g. Clock tick, network packet