Self stabilizing Linux Kernel Mechanism

Download Report

Transcript Self stabilizing Linux Kernel Mechanism

Self stabilizing Linux Kernel
Mechanism
Doron Mishali, Alex Plits
Supervisors:
Prof. Shlomi Dolev
Dr. Reuven Yagel
The Linux Kernel
• The kernel is the central component of most
computer operating systems. Its
responsibilities include managing the system's
resources (the communication between
hardware and software components).
• The main task of the kernel is to allow the
execution of applications and support them
with features such as hardware abstractions.
The Kernel (Cont.)
• The kernel also responsible of the high level
scheduler, the one who decides which
processes will be in the memory and which on
the disk.
• The process is the main kernel abstraction, it
defines which memory portions the application
can access.
The Scheduler
The low level scheduler decides which of the
ready, in-memory processes are to be executed
(allocated a CPU) next following a clock
interrupt, an IO interrupt, an operating system
call or another form of signal.
In the project we will deal with the Completely
Fair Scheduler implementation of the Linux
Scheduler.
RB-Tree as Scheduler
data structure
The run queue is kept sorted by the Runnable
threads' virtual runtimes by storing it in a RedBlack Tree, which is a variant of a binary
search tree. When the scheduler decides to
switch threads, it switches to the leftmost
thread in the red-black tree, that is, the one
with the earliest virtual runtime.
The RB-Tree
The RB-Tree (Cont.)
The RB-Tree data structure must keep on the
following properties:
• Every node is either black or red.
• All leafs are black.
• Both children of a red node are black.
• Every path from a node to a leaf should have
the same count of black nodes.
The RB-Tree (Cont.)
Each violation of one of those properties can
cause a crash in the next tree operation which
means, possible corruption of data!
In order to keep track on these properties we are
executing our mechanism which keeps a close
eye on this data structure by running frequent
tests on it upon a corruption detection, The
program can self stabilize the system which
means – Auto recovery.
Self Stabilizing
• Self-stabilization is a concept of fault-tolerance in
distributed computing.
• A self-stabilizing system will end up in a correct
state no matter what state it is initialized with, and
no matter what execution steps it will take.
• The ability to recover without external
intervention is very desirable in modern computer
since it would enable them to repair errors and
return to normal operations on their own.
Computers can thus be made fault-tolerant.
The project
Goal
Detect & Recover from a corruption of the Scheduler’s RB-Tree
data structure.
Method
• Perform series of tests on the Scheduler at the following scenarios:


Periodic (i.e. every couple of time units).
On a Page Fault occurrence(which may indicate memory corruption).
The tests include legitimacy tests and memory tests.
• In case of a failure we will stabilize the system – by Auto recovering
which is done by rebuilding the data structure from currently
running processes which in turn will let the processes run normally
on the next scheduling.
The project – Some examples
In this example one of the nodes in the tree points to some garbage in memory, A
thing that definitely can happen in an operating system.
The project – Some examples
The next figure demonstrates a case when the root changed it’s color from black
(mandatory on RB-tree) to red. This will probably cause a corrupt on the next insertion of
the process to the tree which can end up in a wrong structure of the tree in the best case or
a system crash in the worst case.
Running the recovery
• After detecting the inconsistency, we run the
recovery procedure which yields the following
results for the above examples:
For example 1
For example 2
Project Process
The project was divided into 3 stages:
• In depth understanding on how the CFS
kernel scheduler works and the data structures
it uses.
• Understanding the self stabilizing mechanism
• “Hacking” the Linux Kernel, changing/adding
data structures and interaction with existing
kernel code.
Difficulties:
• Coding inside the Kernel
– Data structure's encapsulation (Modules vs. Embedded
Code).
– Synchronization (Spin Locks, Scheduler preemption).
– Enormous amount of interconnectivity.
• Debugging the kernel
–
–
–
–
(our advice - AVOID IT IF YOU CAN)
Embedded Debugger KDB ( new versions support KGDB ).
UML (User Mode Linux) vs. VMware.
Asynchronous system ( Interrupts & Exceptions ).
Demonstration – corruption
without recovery
Demonstration – corruption
with recovery
References and Utilities:






S. Dolev - Self stabilization, the MIT Press,
2000.
Understanding the Linux Kernel - Daniel P.
Bovet & Marco Cesati.
Kernel Hacking - Kong, Joseph.
KDB – embedded kernel debugger.
VMware – virtual machine for the
simulations.
UML – User Mode Linux (embedded VM).
Q&A