70% of Linux kernel code!

Download Report

Transcript 70% of Linux kernel code!

Improving the Reliability
of Commodity
Operating Systems
Michael M. Swift, Brian N. Bershad, Henry M. Levy
Presented by Ya-Yun Lo
EECS 582 – W16
1
Outline
• Introduction
• Nooks
• Implementation
• Evaluating Reliability
• Performance
EECS 582 – W16
2
Device Driver
Application
• A module that translates
high-level OS requests to
device-specific requests
Application
Kernel
Virtual Memory
File Systems
• Programmers writing
device drivers are often
less experienced
Networking
Scheduling
Device Drivers
70% of Linux kernel code!
…
EECS 582 – W16
3
Motivation
Application
• Kernel extensions are a
major source of system
failures
Application
Kernel
Virtual Memory
File Systems
Networking
Scheduling
Device Drivers
70% of Linux kernel code!
…
EECS 582 – W16
4
Motivation
Application
• Kernel extensions are a
major source of system
failures
Application
Kernel
Virtual Memory
File Systems
Networking
Scheduling
Device Drivers
70% of Linux kernel code!
…
EECS 582 – W16
5
Goal
Application
• Eliminate downtime caused
by drivers
• Isolation - Prevent system
crashes
• Recovery - Keep applications
running
Application
Driver
Kernel
EECS 582 – W16
6
Goal
Application
• Eliminate downtime caused
by drivers
• Isolation - Prevent system
crashes
• Recovery - Keep applications
running
Application
Driver
Kernel
EECS 582 – W16
7
Nooks
• A reliability subsystem that Isolates extensions from the kernel
• For fault resistance, not fault tolerance
• System must prevent and recover from most extension mistakes
• For mistakes, not abuse
• Exclude malicious behavior
EECS 582 – W16
8
Nooks
• Isolation
• Isolate kernel from extension failures
• Detect extension failures before they corrupt kernel
• Backward-compatible
• with existing systems and extensions
• Practical
• Efficient
EECS 582 – W16
9
Nooks Isolation Manager (NIM)
• Transparent OS layer
inserted between the
kernel and kernel
extensions
EECS 582 – W16
10
Nooks Isolation Manager (NIM)
• Isolation
• Lightweight kernel protection domain
• Extension Procedure Call (XPC): Communication between kernel and
extensions must go this new kernel service
• Interposition
• Control flow: XPC
• Data transfer: Object tracking
• All interfaces are done through Wrappers (similar to stubs in RPC)
EECS 582 – W16
11
Nooks Isolation Manager (NIM)
• Object Tracking
• Control all modifications of data structures by each extensions
• Extensions cannot directly modify kernel data structures
• Recovery
• Detect and recover from various extension faults
• Recovery helped by Nooks isolation mechanisms
EECS 582 – W16
12
Implementation of Nooks
• Inside Linux 2.4.18 kernel on Intel x86 architecture
• Linux kernel
• over 700 functions callable by extensions
• over 650 extension-entry functions callable by the kernel
• Most interactions between kernel and extensions go through
function calls
EECS 582 – W16
13
Isolation
• Memory management
• Lightweight protection domains with virtual memory protection
• Read-only access to kernel
• Read-write access to its own domain
• Extension Procedure Call (XPC)
• Transfer control safely between extensions and the kernel
• Similar to Remote Procedure Call (RPC)
EECS 582 – W16
14
Interposition
• Bind extensions to wrappers when the
extensions are loaded
• Enable the extension to execute within
its lightweight protection domain
• Wrapper
• Check parameters for validity
• Implement call by value and result
• Perform an XPC to execute the desired
function
EECS 582 – W16
15
Implementation Limitations
• Does not provide complete isolation or fault tolerance for all
possible extension errors
• Current implementation of Recovery assumes that extensions
can be killed and restarted safely
EECS 582 – W16
16
Evaluating Reliability
• Tested eight extensions
•
•
•
•
Two sound card drivers
Four Ethernet drivers
A Win95 compatible file system (VFAT)
An in-kernel Web server
• Injected 400 faults
• 317 resulted in extension failures
EECS 582 – W16
17
Reliability Results
• Nooks eliminated 99% of
the crashes observed with
native Linux
EECS 582 – W16
18
Reliability Results
• Overall, Nooks eliminated
55% of non-fatal
extension failures caused
by fault injection trials
EECS 582 – W16
19
Performance
• Dell 1.7 GHz Pentium 4 PC running Linux 2.4.18
•
•
•
•
890 MB RAM
SoundBlaster 16 sound card
Intel Pro/1000 Gigabit Ethernet adapter
single 7200 RPM, 41 GB IDE hard disk drive
EECS 582 – W16
20
Performance
• Relative performance is determined by
• Comparing latency: Play-mp3, Compile-local
• Throughput: Send/Receive-stream, Serve-simple/complex-web-page
EECS 582 – W16
21
Conclusion
• Nooks focuses on achieving backward compatibility
• Cannot provide complete isolation and fault tolerance
• With modest engineering effort, isolation and recovery can
dramatically improve the system’s reliability
• Performance loss rating from 0 to 60%
EECS 582 – W16
22