Live Updating Operating Systems Using Virtualization
Download
Report
Transcript Live Updating Operating Systems Using Virtualization
Live Updating Operating
Systems Using Virtualization
Haibo Chen, Rong Chen, Fengzhe Zhang, Binyu Zang
Fudan University
Pen-Chung Yew
University of Minnesota at Twin-Cities
Motivation
Operating Systems are far from perfect:
Difficulties in applying patches and upgrades
Security holes, design flaws, bugs, new
features ……
Results: continuous patches and upgrades
required
Disruptive: loss of availability
Irreversible: risk of system crash
Live Update feature is highly desirable, and
very often, critical.
2
What COS misses?
Requirements to Live Update an OS:
Define an updatable unit
Difficult, COS is monolithic
Apply patch in a safe point
Some hot spots do not have a safe point
root file system, network modules
Consistency
Difficult for OS to update itself
3
What is LUCOS?
”Any problem in computer science can be solved
with another level of indirection.”
David Wheeler in Butler Lampson’s 1992 ACM Turing
Award speech.
Live Updating Contemporary Operating Systems
using virtualization
Use Virtual Machine Monitors (VMMs) to patch
operating systems (e.g. Linux)
Avoid need for safe point, allow co-existence of the old
version and the new version of data structures.
VMM maintains the coherence and tracks when to finish
a live update.
4
What is LUCOS?
A practical live updating system
Apply a broaden range of real-life Linux patches
on-the-fly
require no safe points, retain OS-transparency.
Support patches for recovering tainted state (e.g.
deadlock situation)
Allow rolling back committed patches
Require minimal update time(< 1ms) and incur
negligible performance overhead (less than 1%)
5
Some Existing Efforts
Dynamic Software Update
Focus on live update to application software
LUCOS: live update to operating systems
K42 (Baumann et al., Usenix ‘05)
A new operating system to support live update
Tightly bound to object-oriented design techniques
A safe point is desirable
LUCOS: transparently supports existing OS
(including non-object-oriented), requires no safe
point
6
LUCOS Architecture
7
Two Types of Live Updates
Updates to only code:
Only code is modified.
Updates to code with data changes:
Including global, single-instance data, or
multiple-instance data.
8
Live Update to Code Only
Live update to code only:
(1) Update Server replaces the head of the original function with a jump instruction to the patch function
address
(2) OS executes the jump instruction in the original function, and jump to the patch function.
jmp patch_func_vaddr
(1) write
(2) jump
orig_func_vaddr
patch_func_vaddr
original
function
code
patch
function
code
Linux
9
Live Update to Code with Data Changes
Live update to code with data changes:
(1)Update Server replaces the beginning of the original function with a jump instruction to the
patch function address and write-protects the related instances of the data structures.
(2) OS executes the jump instruction in the original function, and jump to the patch function.
(3) OS triggers an interrupt when instances of the data structures are updated, and the Update
Server executes the state transfer function.
jmp patch_func_vaddr
(1) write
orig_func_vaddr
(2) jump
original
function
code
patch_func_vaddr
patch
function
code
Updating instances of
the data structures
(3) interrupt
state_transfer_func_vaddr
state
transfer
function
Xen
Linux
10
Termination of a Live Update
When all threads leave original functions
Stack inspection (Altekar, Usenix Security’05):
Maintain a list of threads executing in original
functions
Remove threads that leave original functions
Terminate live update when the list is empty
11
Patches for Recovering Tainted State
Vision:
Some bugs could cause a tainted state:
Deadlock situation
Simple patching could not solve the problem
spinlock_t demo_lock =
spinlock_t demo_lock =
SPIN_LOCK_UNLOCKED;
SPIN_LOCK_UNLOCKED;
void foo(void){...;
void foo_patch(void){...;
spin_lock(&demo_lock);
spin_lock(&demo_lock);
... ;
...;
if(condition){return;}
if(condition){
...;
spin_unlock(&demo_lock);
spin_unlock(&demo_lock);
return;
}
}...;
spin_unlock(&demo_lock);
}
Code 1. a buggy function with
code 2: a patch function to fix
a potential for deadlocks.
the deadlock problem.
void state_transfer(void){
if(spin_is_locked(
&demo_lock))
spin_unlock(&demo_lock);
}
code 3: a callback function to
recover from a deadlocked
situation.
12
Patches for Recovering Tainted State
Solutions:
Allow callbacks in live update
Three types of callbacks in LUCOS:
function callbacks
thread callbacks
data callbacks
Example: use thread callbacks to resolve the
deadlock situation
13
Patch Rollback
A special type of patches:
Use the original code and data to patch the
committed ones
Change state with new data back to original
data
Resource overhead:
Has to keep original code and data in memory
14
Experiments Setup
Implemented on Linux 2.6.10 running Xen-2.0.5.
Systems:
Fedora Core 2 distribution
3.0GHz Pentium IV with 1GB RAM
Intel Pro 100/1000 Ethernet NIC in 100Mbs LAN
A single 250GB 7200 RPM SATA disk.
15
Workloads
SPEC INT 2000:
Linux build time:
Measure the performance of CPU-intensive
workloads
Measure the overall time to built a Linux
Kernel 2.6.10 with gcc-3.3.3.
Open Source Database Benchmark suite
(OSDB):
Information Retrieval (IR)
Online Transaction Processing (OLTP)
16
Experience with Real-Life Patches
Five typical patches selected from Linux upgrades:
upgrade of Linux kernel from 2.6.10 to 2.6.11
upgrade of backend block device drivers in Xen-Linux
No.
Patch type
Description
1
Type 1
Fixing the page reading bug
2
Type 1
Removal of livelock avoidance
3
Type 2
Upgrading the process scheduler
4
Type 2
Reconstruction of the IRQ descriptors
5
Type 2
Upgrading backend block device drivers in Xen-Linux
17
Time to Apply and Rollback Live Updates
Note: OSDB-IR/OLTP are running in background when the patches are applied
and rollbacked.
18
Relative Performance (Normal Execution)
19
Conclusions
Existing operating systems can be live
updated
No safe point is required
Patches should recover tainted state
Rollback of a live update is supported
Time overhead to apply a live update is
minimal
Performance overhead is negligible
20
Future Work
Avoid the performance overhead of
virtualization
Integrate it with our self-virtualization system
Virtualize operating systems on demand
21
Questions?
Our contact information:
Parallel processing institute, Fudan University,
China
Phone: +86-21-51355363
Fax: +86-21-65646571
22
23
Patch File Format in LUCOS
Follows the format of Linux kernel modules,
and adds
New declarations of data structures
*Callback functions
*Patch startup and patch cleanup functions
*State transfer
24
Fine-grained memory protection
Facilitating ECC memory (Qin et al.,
HPCA’05)
cache line granularity
Mondrian memory protection (Witchel et al.,
ASPLOS-X)
word level memory protection
25
Self-virtualization: architecture
• OS can switch between the three modes on-the-fly quickly
• Applications are completely unaware of the mode switch
• Hosting mode is used to host other OS .
• Migrating mode prepares the OS to self-migrate to other machine.
26