Live Updating Operating Systems Using Virtualization

Download Report

Transcript Live Updating Operating Systems Using Virtualization

Live Updating Operating
Systems Using Virtualization
Haibo Chen, Rong Chen, Fengzhe Zhang, Binyu Zang
Fudan University
Pen-Chung Yew
University of Minnesota at Twin-Cities
Motivation

Operating Systems are far from perfect:



Difficulties in applying patches and upgrades



Security holes, design flaws, bugs, new
features ……
Results: continuous patches and upgrades
required
Disruptive: loss of availability
Irreversible: risk of system crash
Live Update feature is highly desirable, and
very often, critical.
2
What COS misses?

Requirements to Live Update an OS:

Define an updatable unit


Difficult, COS is monolithic
Apply patch in a safe point

Some hot spots do not have a safe point


root file system, network modules
Consistency

Difficult for OS to update itself
3
What is LUCOS?

”Any problem in computer science can be solved
with another level of indirection.”


David Wheeler in Butler Lampson’s 1992 ACM Turing
Award speech.
Live Updating Contemporary Operating Systems
using virtualization



Use Virtual Machine Monitors (VMMs) to patch
operating systems (e.g. Linux)
Avoid need for safe point, allow co-existence of the old
version and the new version of data structures.
VMM maintains the coherence and tracks when to finish
a live update.
4
What is LUCOS?

A practical live updating system





Apply a broaden range of real-life Linux patches
on-the-fly
require no safe points, retain OS-transparency.
Support patches for recovering tainted state (e.g.
deadlock situation)
Allow rolling back committed patches
Require minimal update time(< 1ms) and incur
negligible performance overhead (less than 1%)
5
Some Existing Efforts

Dynamic Software Update



Focus on live update to application software
LUCOS: live update to operating systems
K42 (Baumann et al., Usenix ‘05)




A new operating system to support live update
Tightly bound to object-oriented design techniques
A safe point is desirable
LUCOS: transparently supports existing OS
(including non-object-oriented), requires no safe
point
6
LUCOS Architecture
7
Two Types of Live Updates

Updates to only code:


Only code is modified.
Updates to code with data changes:

Including global, single-instance data, or
multiple-instance data.
8
Live Update to Code Only
Live update to code only:
(1) Update Server replaces the head of the original function with a jump instruction to the patch function
address
(2) OS executes the jump instruction in the original function, and jump to the patch function.
jmp patch_func_vaddr
(1) write
(2) jump
orig_func_vaddr
patch_func_vaddr
original
function
code
patch
function
code
Linux
9
Live Update to Code with Data Changes
Live update to code with data changes:
(1)Update Server replaces the beginning of the original function with a jump instruction to the
patch function address and write-protects the related instances of the data structures.
(2) OS executes the jump instruction in the original function, and jump to the patch function.
(3) OS triggers an interrupt when instances of the data structures are updated, and the Update
Server executes the state transfer function.
jmp patch_func_vaddr
(1) write
orig_func_vaddr
(2) jump
original
function
code
patch_func_vaddr
patch
function
code
Updating instances of
the data structures
(3) interrupt
state_transfer_func_vaddr
state
transfer
function
Xen
Linux
10
Termination of a Live Update


When all threads leave original functions
Stack inspection (Altekar, Usenix Security’05):



Maintain a list of threads executing in original
functions
Remove threads that leave original functions
Terminate live update when the list is empty
11
Patches for Recovering Tainted State

Vision:

Some bugs could cause a tainted state:


Deadlock situation
Simple patching could not solve the problem
spinlock_t demo_lock =
spinlock_t demo_lock =
SPIN_LOCK_UNLOCKED;
SPIN_LOCK_UNLOCKED;
void foo(void){...;
void foo_patch(void){...;
spin_lock(&demo_lock);
spin_lock(&demo_lock);
... ;
...;
if(condition){return;}
if(condition){
...;
spin_unlock(&demo_lock);
spin_unlock(&demo_lock);
return;
}
}...;
spin_unlock(&demo_lock);
}
Code 1. a buggy function with
code 2: a patch function to fix
a potential for deadlocks.
the deadlock problem.
void state_transfer(void){
if(spin_is_locked(
&demo_lock))
spin_unlock(&demo_lock);
}
code 3: a callback function to
recover from a deadlocked
situation.
12
Patches for Recovering Tainted State

Solutions:



Allow callbacks in live update
Three types of callbacks in LUCOS:
 function callbacks
 thread callbacks
 data callbacks
Example: use thread callbacks to resolve the
deadlock situation
13
Patch Rollback

A special type of patches:



Use the original code and data to patch the
committed ones
Change state with new data back to original
data
Resource overhead:

Has to keep original code and data in memory
14
Experiments Setup


Implemented on Linux 2.6.10 running Xen-2.0.5.
Systems:




Fedora Core 2 distribution
3.0GHz Pentium IV with 1GB RAM
Intel Pro 100/1000 Ethernet NIC in 100Mbs LAN
A single 250GB 7200 RPM SATA disk.
15
Workloads

SPEC INT 2000:


Linux build time:


Measure the performance of CPU-intensive
workloads
Measure the overall time to built a Linux
Kernel 2.6.10 with gcc-3.3.3.
Open Source Database Benchmark suite
(OSDB):


Information Retrieval (IR)
Online Transaction Processing (OLTP)
16
Experience with Real-Life Patches

Five typical patches selected from Linux upgrades:


upgrade of Linux kernel from 2.6.10 to 2.6.11
upgrade of backend block device drivers in Xen-Linux
No.
Patch type
Description
1
Type 1
Fixing the page reading bug
2
Type 1
Removal of livelock avoidance
3
Type 2
Upgrading the process scheduler
4
Type 2
Reconstruction of the IRQ descriptors
5
Type 2
Upgrading backend block device drivers in Xen-Linux
17
Time to Apply and Rollback Live Updates
Note: OSDB-IR/OLTP are running in background when the patches are applied
and rollbacked.
18
Relative Performance (Normal Execution)
19
Conclusions

Existing operating systems can be live
updated





No safe point is required
Patches should recover tainted state
Rollback of a live update is supported
Time overhead to apply a live update is
minimal
Performance overhead is negligible
20
Future Work

Avoid the performance overhead of
virtualization


Integrate it with our self-virtualization system
Virtualize operating systems on demand
21
Questions?

Our contact information:



Parallel processing institute, Fudan University,
China
Phone: +86-21-51355363
Fax: +86-21-65646571
22
23
Patch File Format in LUCOS

Follows the format of Linux kernel modules,
and adds




New declarations of data structures
*Callback functions
*Patch startup and patch cleanup functions
*State transfer
24
Fine-grained memory protection

Facilitating ECC memory (Qin et al.,
HPCA’05)


cache line granularity
Mondrian memory protection (Witchel et al.,
ASPLOS-X)

word level memory protection
25
Self-virtualization: architecture
• OS can switch between the three modes on-the-fly quickly
• Applications are completely unaware of the mode switch
• Hosting mode is used to host other OS .
• Migrating mode prepares the OS to self-migrate to other machine.
26