Dynamic and Adaptive Updates of Non-Quiescent

Download Report

Transcript Dynamic and Adaptive Updates of Non-Quiescent

Runtime Mutation of Commodity
Operating System Kernels
or
Please, No More Rebooting!
Kyung Dong Ryu <[email protected]>
IBM T.J. Watson Research Center
Kristis Makris <[email protected]>
Arizona State University
1
Oct 26, 2008
DynAMOS -- KOCSEA '08
IBM Research Worldwide
2700 Researchers in eight labs around the world
2
Oct 26, 2008
DynAMOS -- KOCSEA '08
Culture of Innovation
External Recognition
Scanning Tunneling Microscope
High
Temperature
Superconductivity
Copper Chip
Technology
Silicon-onInsulator
SiGe
DRAM
Nuclear Magnetic
Resonance
Techniques
Electron
Tunneling
Effect
5 Nobel Laureates
8 National Medals of
Technology
Basis for
MRI today
First woman
recipient in the
history of this
prestigious
ACM award
High
Performance
Computing
5 National Medals of Science
6 Turing Awards
59 Members in National
Academy of Engineering
10 Inductees in National
Inventors Hall of Fame
• AAAS • ECS
• ACM
• IEEE
• ACS
• IOP
• APS
• OSA
• AVS
21 Members in National
Academy of Sciences
3
More than 300 Professional
Society Fellows
Oct 26, 2008
DynAMOS -- KOCSEA '08
IBM Patent Leadership – 2006
4,000
3,621
3,500
3,000
2,451
2,500
2,366
2,228
2,110
1,959
2,000
1,771
1,731
1,671
Hitachi
Toshiba
1,610
1,500
1,000
500
0
IBM
4
Samsung
Canon Matsushita
HPQ
Oct 26, 2008
Intel
Sony
Micron
DynAMOS -- KOCSEA '08
Overview







5
Motivation
Dynamic Kernel Updates Categorization
System Architecture
Adaptive Function Cloning
Synchronized Updates
Applications
Conclusion
Oct 26, 2008
DynAMOS -- KOCSEA '08
Motivation



Dynamic kernel updates are essential
Existing updating methods are inadequate
Two approaches
–
Build adaptable OS


–
Dynamic code instrumentation



6
Specially crafted (K42, VINO, Synthetix)
Require OS and application restructuring
No kernel source modification (KernInst, GILK)
Basic block code interposition
Currently limited
– No procedure replacement
– No autonomous kernel adaptability
– No safe, complete subsystem update guarantees
Oct 26, 2008
DynAMOS -- KOCSEA '08
Dynamic Updates Categorization (1)

Updating variable values
–
–
Update an entry in system call table
Update owner (uid) of an inode

–
Count number of system calls of a process


Needs synchronized update
Needs state tracking
Updating datatypes
–
Add new fields in Linux PCB for process checkpointing


Update all functions that use the old datatype, or
Maintain new fields in separate data structure
–
7
Does not need state transfer
Oct 26, 2008
DynAMOS -- KOCSEA '08
Dynamic Updates Categorization (2)

Updating single function
–

Correct a defect
Updating kernel threads
–
Update memory paging subsystem


Updating function groups
–
Update pipefs subsystem

8
Needs update during infinite loop
Needs synchronized update
Oct 26, 2008
DynAMOS -- KOCSEA '08
Our Approach

DynAMOS
–

Dynamic code instrumentation
–
–

–
–
Concurrent execution of multiple versions
State tracking
Autonomous kernel adaptability
Safe updates of complete subsystems
–
–
–
–
9
No kernel source modification or reboot
Procedure replacement
Adaptive updates
–

Prototype for i386 Linux 2.2-2.6
Quiescence detection
Update synchronization (non-quiescent subsystems)
Datatype updates
State transfer
Oct 26, 2008
DynAMOS -- KOCSEA '08
DynAMOS System Architecture
object
file
make
insert
module
ld
gcc
vmlinux
update
source
10
Unmodified kernel
in memory
new
function
images
original
function
images
kernel
source
Oct 26, 2008
DynAMOS -- KOCSEA '08
DynAMOS System Architecture
Unmodified kernel
in memory
new
function
images
original
function
images
load DynAMOS
11
Oct 26, 2008
DynAMOS
kernel module
DynAMOS -- KOCSEA '08
DynAMOS System Architecture
Unmodified kernel
in memory
new
function
images
original
function
images
initiate update
Update tool
/dev/dynamos
DynAMOS
kernel module
version manager
12
Oct 26, 2008
DynAMOS -- KOCSEA '08
DynAMOS System Architecture
Unmodified kernel
in memory
new
function
images
prepare update
cloned new
function
images
original
function
images
image relocation
Update tool
/dev/dynamos
copy
DynAMOS
kernel module
disassembler
version manager
13
Oct 26, 2008
DynAMOS -- KOCSEA '08
DynAMOS System Architecture
Unmodified kernel
in memory
new
function
images
cloned new
function
images
Update tool
cloned new
function
images
/dev/dynamos
original
function
images
DynAMOS
kernel module
version manager
14
Oct 26, 2008
DynAMOS -- KOCSEA '08
DynAMOS System Architecture
Unmodified kernel
in memory
new
function
images
cloned new
function
images
original
function
images
redirection
activate update
Update tool
/dev/dynamos
DynAMOS
kernel module
version manager
15
Oct 26, 2008
DynAMOS -- KOCSEA '08
Execution Flow Redirection
caller
...
call schedule
...

Apply Linger-Longer scheduler
–
–
Unobtrusive fine-grain cycle stealing
Implemented in schedule_LL as a
scheduling policy
schedule
16
step 1
Oct 26, 2008
DynAMOS -- KOCSEA '08
Execution Flow Redirection
caller
redirection handler
...
call schedule
...
schedule
trampoline
jmp *


Trampoline installation
– Disable processor interrupts
– Flush I-cache
Indirect jump
–
17
Don’t modify page permissions
step 2
Oct 26, 2008
DynAMOS -- KOCSEA '08
Execution Flow Redirection
caller
redirection handler
...
call schedule
...
preserve state
perform bookkeeping
execute adaptation handler
restore state
call
schedule
trampoline

adaptation handler

ret
18
Bookkeeping
– Maintain use counters
User-defined adaptation handler
– Execute if available
– Select active version of function
step 2
Oct 26, 2008
DynAMOS -- KOCSEA '08
Execution Flow Redirection
caller
redirection handler
...
call schedule
...
jump to active function
schedule
trampoline
jmp *
adaptation handler
19
schedule_clone
schedule_LL_clone
step 3
Oct 26, 2008
DynAMOS -- KOCSEA '08
Execution Flow Redirection
caller
redirection handler
...
call schedule
...
jump to active function
schedule
trampoline
jmp *
adaptation handler
20
schedule_clone
schedule_LL_clone
jump back
jump back
step 4
Oct 26, 2008
DynAMOS -- KOCSEA '08
Execution Flow Redirection
caller
redirection handler
...
call schedule
...
schedule
trampoline
ret
adaptation handler
21
jump to active function
preserve state
perform bookkeeping
restore state
return to caller
schedule_clone
schedule_LL_clone
jump back
jump back
step 5
Oct 26, 2008
DynAMOS -- KOCSEA '08
Adaptive Function Cloning Benefits

No processor state saved on stack
–

Autonomous kernel determination of update timeliness
–

Using adaptation handler
Function-level updates
–
–
22
Function arguments accessed directly
Basic blocks can be bypassed (no control-flow graph
needed)
Function modifications developed in original source
language
Oct 26, 2008
DynAMOS -- KOCSEA '08
Function Relocation Issues

Replace ret (1-byte) with jmp * (6-byte) back to handler
– Adjust inbound (jmp) and outbound (call) relative offsets

Safely detect
– Backward branches: jmp to code overwritten by trampoline
– Outbound branches: jmp to code outside function image
– Indirect outbound branches: jmp * from indirection table
– Data-in-code
 Need user verification
– Multiple entry-points: e.g. produced by Intel C Compiler
23
Oct 26, 2008
DynAMOS -- KOCSEA '08
Overhead


Small memory footprint (42k)
Indirect addressing (jmp *) hurts branch prediction
–
–
–
24
Can use direct addressing (jmp)
Overhead not correlated to path length
Mostly 1-8%
Oct 26, 2008
DynAMOS -- KOCSEA '08
Quiescence Detection

Needed to
–
Atomically update function groups

–

Safely reverse updates
Implemented by
–
Usage counters

–
On entry and exit
Stack walk-through

For non-returning calls (do_exit in Linux; no ret instruction)

Examine stack and program counter of all processes
Default kernel compilation (works without frame pointers)

25
e.g. Count number of processes using a filesystem
Oct 26, 2008
DynAMOS -- KOCSEA '08
Non-quiescent Subsystems
reader and writer are
synchronized with each other
wait for
new data
in buffer
pipe_read()
{
...
acquire Sem
while (buffer_empty) {
...
release Sem
L1:
sleep
acquire Sem
}
read from data buffer
release Sem
return
}
pipe_write()
{
...
acquire Sem
while (buffer_full) { wait for
...
more room
release Sem
in buffer
L2:
sleep
acquire Sem
}
write in data buffer
release Sem
return
}
Adaptively enlarge pipefs 4k copy buffer
during large data transfers
26
Oct 26, 2008
DynAMOS -- KOCSEA '08
Non-quiescent Subsystems
non-quiescent; sleeping
pipe_read()
{
...
acquire Sem
while (buffer_empty) {
...
release Sem
L1:
sleep
acquire Sem
}
read from data buffer
release Sem
return
}
quiescent
pipe_write()
{
...
acquire Sem
while (buffer_full) {
...
release Sem
L2:
sleep
acquire Sem
}
write in data buffer
release Sem
return
}
subsystem may never quiesce
cannot update atomically
27
Oct 26, 2008
DynAMOS -- KOCSEA '08
Synchronized update of pipefs
pipe_read() {
acquire Sem
while (4k_buffer_empty) {
release Sem
L1:
sleep
acquire Sem
}
read data from 4k_buffer
release Sem
return
}
pipe_read_v3() {
acquire Sem
while (1mb_buffer_empty) {
release Sem
L1:
sleep
acquire Sem
}
read data from 1mb_buffer
release Sem
return
}
Phase 1
28
Oct 26, 2008
DynAMOS -- KOCSEA '08
Synchronized update of pipefs
pipe_read() {
acquire Sem
while (4k_buffer_empty) {
release Sem
L1:
sleep
acquire Sem
}
read data from 4k_buffer
release Sem
return
}
pipe_read_v2() {
acquire Sem
while (4k_buffer_empty) {
release Sem
L1:
sleep
acquire Sem
if (must_update) {
phase = 3
STATE TRANSFER
goto new
}
}
read data from 4k_buffer
release Sem
return
pipe_read_v3() {
acquire Sem
while (1mb_buffer_empty) {
release Sem
L1:
sleep
acquire Sem
}
read data from 1mb_buffer
release Sem
return
}
Semantically
equivalent version at
source code level
Wait for pipe_read to
become inactive
new:
Phase 2
29
}
Oct 26, 2008
DynAMOS -- KOCSEA '08
Synchronized update of pipefs
pipe_read() {
acquire Sem
while (4k_buffer_empty) {
release Sem
L1:
sleep
acquire Sem
}
read data from 4k_buffer
release Sem
return
}
pipe_read_v2() {
acquire Sem
while (4k_buffer_empty) {
release Sem
L1:
sleep
acquire Sem
if (must_update) {
phase = 3
STATE TRANSFER
goto new
}
}
read data from 4k_buffer
release Sem
return
while (1mb_buffer_empty) {
release Sem
sleep
acquire Sem
new:
}
read data from 1mb_buffer
release Sem
return
Phase 2
30
}
Oct 26, 2008
pipe_read_v3() {
acquire Sem
while (1mb_buffer_empty) {
release Sem
L1:
sleep
acquire Sem
}
read data from 1mb_buffer
release Sem
return
}
Inline updated version
DynAMOS -- KOCSEA '08
Synchronized update of pipefs
pipe_read() {
acquire Sem
while (4k_buffer_empty) {
release Sem
L1:
sleep
acquire Sem
}
read data from 4k_buffer
release Sem
return
}
pipe_read_v2() {
acquire Sem
while (4k_buffer_empty) {
release Sem
L1:
sleep
acquire Sem
if (must_update) {
phase = 3
STATE TRANSFER
goto new
}
}
read data from 4k_buffer
release Sem
return
while (1mb_buffer_empty) {
release Sem
sleep
acquire Sem
new:
}
read data from 1mb_buffer
release Sem
return
Phase 3
31
pipe_read_v3() {
acquire Sem
while (1mb_buffer_empty) {
release Sem
L1:
sleep
acquire Sem
}
read data from 1mb_buffer
release Sem
return
}
}
Oct 26, 2008
DynAMOS -- KOCSEA '08
Synchronized update of pipefs
Adaptive update
pipe_read_v2() {
acquire Sem
while (4k_buffer_empty) {
release Sem
L1:
sleep
acquire Sem
if (must_update) {
phase = 3
STATE TRANSFER
goto new
}
pipe_read_adaptation_handler() {
}
if (phase == 3)
read data from 4k_buffer
activate pipe_read_v3
release Sem
else
return
activate pipe_read_v2
while (1mb_buffer_empty) {
if (this process read
release Sem
more than 64k)
sleep
must_update = 1
acquire Sem
}
new:
}
read data from 1mb_buffer
release Sem
return
}
30-90% improvement
in Linux 2.6
3.2% overhead when
not adapting
pipe_read_v3() {
acquire Sem
while (1mb_buffer_empty) {
release Sem
L1:
sleep
acquire Sem
}
read data from 1mb_buffer
release Sem
return
}
Sleep in original version
Awake in new version
Multi-phase approach
Phase 3
32
Oct 26, 2008
DynAMOS -- KOCSEA '08
Adaptive Memory Paging For Efficient
Gang Scheduling

Kernel thread update (kswapd), Linux 2.2
–
–
–
Infinite loop
Awaken by other subsystems
Goes back to sleep
 e.g.

To update
–
Activate interruptible_sleep_on_v2


33
calls interruptible_sleep_on in Linux
Save state, exit
Start new version of kernel thread, restore state
Oct 26, 2008
DynAMOS -- KOCSEA '08
Kernel-Assisted Process Checkpointing

Datatype update for EPCKPT in Linux 2.4
– Compact datatypes in commodity kernel. No extra room



Shadow data structures
– Instantiation (do_fork, sys_open): map memory address of
original variable to shadow using hash table
– Removal (do_exit, fput): free shadow too
– Already instantiated variables

–
Shadow missing: idempotent use of new fields
Update only functions that use new fields

34
struct task_struct: semaphores, pipes, memory mapped
files
struct file: checkpoint filename
No state transfer needed
Oct 26, 2008
DynAMOS -- KOCSEA '08
Related Work

K42
–
–

Ginseng
–

User-level software updates; requires recompilation
KernInst, GILK, Detours, ATOM, EEL
–
–
35
Specially designed with hot-swappable capabilities
Guarantees quiescence
Do not facilitate adaptive execution
Do not safely replace complete subsystems
Oct 26, 2008
DynAMOS -- KOCSEA '08
On-going and Future Work

Automatically produce updates given a patch
–
–
–

Multiprocessor support
–

Safely install trampoline: freeze other processors using
single-byte trap instruction (ud2)
Kernel module port
–
36
Apply MOSIX, Superpages: parallel applications
Apply Nooks: OS reliability
Upgrade Linux kernel
FreeBSD, OpenSolaris
Oct 26, 2008
DynAMOS -- KOCSEA '08
Conclusion

Dynamic Kernel Updates
–
–

Adaptive function cloning
–


37
Scheduler, kernel threads, synchronized updates
Datatype updates
Demonstrated updates
–

Concurrent execution of multiple function versions
Safe updates of non-quiescent subsystems
–

Dynamic code instrumentation
Commodity operating system (prototype for i386 Linux 2.2-2.6)
Synchronized pipefs adaptation, process checkpointing, adaptive
memory paging for efficient gang-scheduling, unobtrusive finegrain cycle stealing, public security fixes
Small memory footprint (42k), 1-8% overhead
Oct 26, 2008
DynAMOS -- KOCSEA '08
Questions ?
38
Oct 26, 2008
DynAMOS -- KOCSEA '08