ECI, July 2005

Download Report

Transcript ECI, July 2005

Process Migration
Checkpoint/Restart
ECI, July 2005
Process Migration

Process migration benefits:




Tool for load balancing
Data access locality
Improved system administration
Mobile computing
ECI – July 2005
2
Process Migration Issues




Execution model: home, remote
Migrating virtual memory
Minimizing downtime
Cost of migration



Run time cost (home, remote)
Migration operation
Limitations of migration
ECI – July 2005
3
Checkpoint / Restart

Checkpoint/restart benefits:







Like migration plus …
Fault resilience
Fault recovery
High availability
Gang scheduling
Debugging, testing, developing
Security (honey-pot)
ECI – July 2005
4
Checkpoint/restart goals


Transparency
Support parallel programs





Multi-process
Multi-node
Security
Minimize required state
Minimize required storage
ECI – July 2005
5
CKPT: Application Level

Application level






Efficient
Non-preemptive
Lack of common API
Source code changes
Possible compiler support
Examples ?
ECI – July 2005
6
CKPT: Library Level

Library level





Typically use a signal handler (callback)
Common API
Restricts functionality (e.g., no IPC)
Relatively portable
Examples…
ECI – July 2005
7
CKPT: Library (contd)

Libckpt



Condor



Memory exclusion, incremental, forked
Modify source code, link statically
Support memory mapping, shared libraries
Relink to special library (needs object file)
Score, co-check


Parallel applications
Modify communication layer
ECI – July 2005
8
Implementation (contd)

Kernel level






Loadable kernel module vs. change kernel
Preemptive / cooperative
Access to entire process state
Complex, less portable
Examples: Sprite, Zap
Virtual machines

(soon)
ECI – July 2005
9
Multi-process Checkpoint

Global state


A set of states from all processes
Consistent global state


If the state of A reflects a message
received from B, then the state of B
reflects sending
If the state of A reflect a message sent
to B but not yet received, it must be part
of the channel state
ECI – July 2005
10
Consistent Global State
ECI – July 2005
11
Multi-process Checkpoint

Uncoordinated checkpoint



Inspect data to find recovery line
Processes are independent, efficient
Domino effect, much storage
ECI – July 2005
12
Multi-process Checkpoint

Coordinated checkpoint


Centrally managed
Blocking



All processes suspended
Flush communication channels
Non blocking

Delay in triggers may yield inconsistency
ECI – July 2005
13
Multi-process Checkpoint

Communication-induced



Piggyback process checkpoint status and
requests on messages
May require enforcing global checkpoint
Unpredictable checkpoint times
ECI – July 2005
14
Multi-process Checkpoint

Summary:
Uncoordinated
Coordinated
Communication
induced
Domino
effect
Possible
No
No
Management
overhead
None
More
Less
Decision
making
Local
Central
Local/central
Checkpoint
data stored
All
Latest only
Several
ECI – July 2005
15
Virtual Machines
“Any problem in computer science can be
solved by another layer of indirection”
ECI, July 2005
What is a Virtual Machine ?




An indirection layer below the execution
environment seen by applications and OS
Decouple architecture and user perceived
behavior of SW and HW resources from
their physical implementation
Provide a uniform view of the underlying
resources
Multiplex multiple virtual systems on a
single (physical) resource
ECI – July 2005
17
VM History

1960’s – Hypervisors (mainframes)



1980-90’s – Obsolete



Proliferation of cheap hardware
Hardware support neglected
Later 1990’s – Reincarnation


Time-share expensive hardware
No change to legacy software
For complex MPP lacking OS infrastructure
2000 - Today: Renaissance

Consolidation, isolation, reliability
ECI – July 2005
18
VM Benefits

Performance





Server consolidation
Efficient HW utilization
Adaptive resource balancing
Checkpoint/restart and migration
Security



Simple (reduced complexity)
Encapsulation and isolation
Mediation
ECI – July 2005
19
VM benefits (contd)

Reliability




Redundancy through replication
Disaster recovery
Deployment testing
And…




Quality of service
Transparent (for legacy SW)
Enhanced interoperability
Development & testing
ECI – July 2005
20
Server utilization
Cumulative usage of 28 servers:
Memory


45% of RAM not used 99.9% of time
25% of RAM never used concurrently
CPU


85% of CPU not used 99.9% of time
81% of CPU never used concurrently
Disk

68% of storage space never used
ECI – July 2005
21
Virtualization levels


HOST entity: encapsulates the guest
GUEST entity: managed by the host
Application programs
Libraries
Operating system
Hardware
ECI – July 2005
API
ABI
ISA
22
Process & System VM
Application
VMM
OS
Hardware
Application
OS
VMM
Hardware
ECI – July 2005
Application
Process
virtual
machine
Application
OS
Virtual
machine
23
VM at different levels

HW level


OS level


Virtual Servers, BSD Jail, Zap
Programming language level


VMware, Xen, Denali, Virtual PC, UML
Java, .NET
Network

VLAN, VPN
ECI – July 2005
24
VM Taxonomy

Process VM - virtual platform that
exists solely to support the process



Unix
Emulators (interpreters)
Dynamic binary translators


Optimize by block translation and caching
Java – “compile once run everywhere”


Intermediate machine code
Optimize by native compilation on-the-fly
ECI – July 2005
25
VM Taxonomy (contd)

System VM - complete persistent
system environment providing access to
virtual hardware


Classic - bare HW
Hosted VM



Easy install and maintenance
Leverage native services of underlying OS
Multiprocessor virtualization
ECI – July 2005
26
Hardware Virtualization

Challenges to build virtual machines

Performance isolation






Scheduling priority
Memory demand
Network traffic
Disk Access
Support for various OS platforms
Small performance overhead
ECI – July 2005
27
Lack of Hardware Support


Ring aliasing
Non-faulting access to privileged state


Address space compression



Where does the VMM reside ?
Impact on transitions


Does the guest see the right state ?
Traps, SYSENTER, SYSEXIT
Interrupts masking
Hidden state
ECI – July 2005
28
Now What ?

Hardware extensions



Software virtualization



Change semantics to support VM
Intel, AMD
Translate code to emulate desired behavior
VMware
Paravirtualization

Xen, Denali
ECI – July 2005
29
Hardware Extensions for VM

Root mode



Non-Root mode



Runs VMM
Like ring-0 before
Runs guest OS
Less privileged
Mask of events to trap
ECI – July 2005
30
VMware

Hardware virtualization



CPU, memory, I/O
Suspend/resume
Live migration
Design goals:



Compatibility
Performance
Simplicity
ECI – July 2005
31
VMware: CPU Virtualization

CPU Virtualization



Challenge: lack of HW support


Execute guest on bare hardware while
retaining control by the VMM
Traps privileged ops & emulates their action
POPF and read access to privileged state
Solution: fast binary translation


Only kernel mode code
Eliminate unnecessary traps
ECI – July 2005
32
VMware: Memory Virtualization

Memory virtualization


Challenges:



Shadow page tables
Inefficient page replacement
Oversized due to replication
Solutions:


Ballooning
Content based sharing
ECI – July 2005
33
VMware: I/O Virtualization


Challenge: wide variety of devices and
interfaces
Solution:



Hosted architecture
Trap through the VMM
Export special devices
ECI – July 2005
34
Xen: Paravirtualization

Provide some exposure to the
underlying hardware



Better performance
Must modify OS to adapt
No modifications to applications
ECI – July 2005
35
Xen (contd)





Downgrade privilege of guest OS
Guest registers syscall and page-fault
handlers with Xen
Partial access to page tables
Fast handlers for most exceptions
Expose set of simple device abstractions
ECI – July 2005
36
Xen (contd)

The cost of porting an OS to Xen:





Privileged instructions
Page table access
Network driver
Block device driver
<2% of code-base
ECI – July 2005
37
Denali

Lightweight protection domains


Minimalistic method geared for performance
Changes:






Idle loops - avoid busy wait
Interrupt queueing - save context switch
Interrupt semantics – “just”/”recent”
No virtual memory (!)
No BIOS – no legacy “crap”
Generic I/O devices
ECI – July 2005
38
Virtual Machine Migration

Optimizations:

Reduce memory state before snapshot


Reduce total cost by incremental updates



ballooning
COW hierarchy
Reduce start-up time by paging on-demand
Reduce transfer time relying on common data

Use hash functions to identify common blocks
ECI – July 2005
39
Virtual Machine Migration

Minimizing down time




Reduce size of VM state
Pre-copy static parts (or..)
Demand-copy static parts
Hot-copy dynamic parts
ECI – July 2005
40
OS Virtualization


Confine applications in containers
Advantages:




Fine granularity
Low overhead
Easier maintenance
Challenges




Transparency
Correctness
Extend OS:
Modify kernel, loadable module, library
ECI – July 2005
41
Isolation – BSD Jail




Create an isolated existing environment
via software means.
Uses chroot (private root per jail)
Processes in a jail are isolated from
files, processes, or network services in
other jails.
A jail can be restricted to a single IP
address.
ECI – July 2005
42
Specialized Virtualization –
Linux VServer






Hosting (consolidation)
Experimentation
Education (do you trust students … ?)
Personal security box
Manage several "versions“
Applications




Virtual servers
Per user firewall
Fail over servers
Honey-pots
ECI – July 2005
43
Specialized Virtualization –
Linux VServer

Isolation


Kernel patch



Processes, file system, IPC, network,
super user capabilities
Add a “context” tag per process/resource
syscalls to handle contexts (irreversible)
Challenges


Capture all holes (indirect access !)
Efficient storage
ECI – July 2005
44
General Virtualization – Zap

Virtualization for isolation



POD – PrOcess Domain
Private namespace
Virtualization for migration


Decouple process from OS
Capture state and reconstruct state
ECI – July 2005
45
Zap – virtualization

Process environment


File system


Rely on “chroot” environment
Network


Interpose on system calls
Per protocol methods
Challenges



Race conditions (smp)
Life-span of objects
Fast translation
ECI – July 2005
46
Zap – Migration

Checkpoint – outside process context




Restart – inside process context



Capture process tree
Capture pod state
Capture per-process state
Restore process tree
Restore processes
Example issues


Sharing
Deleted files
ECI – July 2005
47