Transcript PPT
Virtualization
Part 1 – Concepts & XEN
Virtualization
Concepts
References and Sources
James Smith, Ravi Nair, “The Architectures of Virtual Machines,” IEEE Computer, May 2005, pp. 32-38.
Mendel Rosenblum, Tal Garfinkel, “Virtual Machine Monitors: Current Technology and Future Trends,” IEEE Computer, May 2005, pp. 39-47.
L.H. Seawright, R.A. MacKinnon, “VM/370 – a study of multiplicity and usefulness,” IBM Systems Journal, vol. 18, no. 1, 1979, pp. 4-17.
S.T. King, G.W. Dunlap, P.M. Chen, “Operating System Support for Virtual Machines,” Proceedings of the 2003 USENIX Technical Conference,
June 9-14, 2003, San Antonio TX, pp. 71-84.
A. Whitaker, R.S. Cox, M. Shaw, S.D. Gribble, “Rethinking the Design of Virtual Machine Monitors,” IEEE Computer, May 2005, pp. 57-62.
G.J. Popek, and R.P. Goldberg, “Formal requirements for virtualizable third generation architectures,” CACM, vol. 17 no. 7, 1974, pp. 412-421.
CS 5204 – Fall, 2008
2
Virtualization
Definitions
Virtualization
A layer mapping its visible interface and resources onto the
interface and resources of the underlying layer or system on
which it is implemented
Purposes
Abstraction – to simplify the use of the underlying resource (e.g., by
removing details of the resource’s structure)
Replication – to create multiple instances of the resource (e.g., to simplify
management or allocation)
Isolation – to separate the uses which clients make of the underlying
resources (e.g., to improve security)
Virtual Machine Monitor (VMM)
A virtualization system that partitions a single physical
“machine” into multiple virtual machines.
Terminology
Host – the machine and/or software on which the VMM is implemented
Guest – the OS which executes under the control of the VMM
CS 5204 – Fall, 2008
3
Virtualization
Origins - Principles
“an efficient, isolated duplicate of the real machine”
Efficiency
Resource control
Innocuous instructions should
execute directly on the hardware
Executed programs may not affect
the system resources
Equivalence
The behavior of a program executing
under the VMM should be the same as
if the program were executed directly
on the hardware (except possibly for
timing and resource availability)
Communications of the ACM, vol 17, no 7, 1974, pp.412-421
CS 5204 – Fall, 2008
4
Virtualization
Origins - Principles
Instruction types
Privileged
an instruction traps in unprivileged (user) mode but not in privileged (supervisor) mode.
Sensitive
Control
sensitive –
attempts to change the memory allocation or privilege mode
Behavior
sensitive
Location sensitive – execution behavior depends on location in memory
Mode sensitive – execution behavior depends on the privilege mode
Innocuous – an instruction that is not sensitive
Theorem
For any conventional third generation computer, a virtual machine monitor may be constructed if
the set of sensitive instructions for that computer is a subset of the set of privileged instructions.
Signficance
The IA-32/x86 architecture is not virtualizable.
CS 5204 – Fall, 2008
5
Virtualization
Origins - Technology
IBM Systems Journal, vol. 18, no. 1, 1979, pp. 4-17.
Concurrent execution of multiple production operating systems
Testing and development of experimental systems
Adoption of new systems with continued use of legacy systems
Ability to accommodate applications requiring special-purpose OS
Introduced notions of “handshake” and “virtual-equals-real mode” to allow
sharing of resource control information with CP
Leveraged ability to co-design hardware, VMM, and guestOS
CS 5204 – Fall, 2008
6
Virtualization
VMMs Rediscovered
Application
Application
Application
Guest OS
Guest OS
Guest OS
Virtual
Machine
Virtual
Machine
Virtual
Machine
VMM
Real
Machine
Server/workload consolidation (reduces “server sprawl”)
Compatible with evolving multi-core architectures
Simplifies software distributions for complex environments
“Whole system” (workload) migration
Improved data-center management and efficiency
Additional services (workload isolation) added “underneath” the OS
security (intrusion detection, sandboxing,…)
fault-tolerance (checkpointing, roll-back/recovery)
CS 5204 – Fall, 2008
7
Virtualization
Architecture & Interfaces
Architecture: formal specification of a system’s interface and the logical
behavior of its visible resources.
Applications
API
Libraries
ABI
System Calls
Operating
System
ISA
System ISA
User ISA
Hardware
API – application binary interface
ABI – application binary interface
ISA – instruction set architecture
CS 5204 – Fall, 2008
8
Virtualization
VMM Types
System
Process
Provides ABI interface
Efficient execution
Can add OS-independent
services (e.g., migration,
intrustion detection)
Provdes API interface
Easier installation
Leverage OS services (e.g.,
device drivers)
Execution overhead
(possibly mitigated by justin-time compilation)
CS 5204 – Fall, 2008
9
Virtualization
System-level Design Approaches
Full virtualization (direct execution)
Exact hardware exposed to OS
Efficient execution
OS runs unchanged
Requires a “virtualizable” architecture
Example: VMWare
Paravirtualization
OS modified to execute under VMM
Requires porting OS code
Execution overhead
Necessary for some (popular)
architectures (e.g., x86)
Examples: Xen, Denali
CS 5204 – Fall, 2008
10
Virtualization
Design Space (level vs. ISA)
API interface
ABI interface
Variety of techniques and approaches available
Critical technology space highlighted
CS 5204 – Fall, 2008
11
Virtualization
System VMMs
Type 1
Structure
Primary goals
Type 1: runs directly on host hardware
Type 2: runs on HostOS
Type 1: High performance
Type 2: Ease of
construction/installation/acceptability
Examples
Type 1: VMWare ESX Server, Xen, OS/370
Type 2: User-mode Linux
CS 5204 – Fall, 2008
Type 2
12
Virtualization
Hosted VMMs
Structure
Goals
Improve performance overall
leverages I/O device support on the HostOS
Disadvantages
Hybrid between Type1 and Type2
Core VMM executes directly on hardware
I/O services provided by code running on HostOS
Incurs overhead on I/O operations
Lacks performance isolation and performance
guarantees
Example: VMWare (Workstation)
CS 5204 – Fall, 2008
13
Virtualization
Whole-system VMMs
Challenge: GuestOS ISA differs
from HostOS ISA
Requires full emulation of
GuestOS and its applications
Example: VirtualPC
CS 5204 – Fall, 2008
14
Virtualization
Strategies
GuestOS
De-privileging
privileged
instruction
trap
resource
emulate change
change
Primary/shadow structures
vmm
resource
VMM emulates the effect on system/hardware
resources of privileged instructions whose
execution traps into the VMM
aka trap-and-emulate
Typically achieved by running GuestOS at a lower
hardware priority level than the VMM
Problematic on some architectures where
privileged instructions do not trap when
executed at deprivileged priority
VMM maintains “shadow” copies of critical
structures whose “primary” versions are
manipulated by the GuestOS
e.g., page tables
Primary copies needed to insure correct
environment visible to GuestOS
Memory traces
Controlling access to memory so that the shadow
and primary structure remain coherent
Common strategy: write-protect primary copies
so that update operations cause page faults
which can be caught, interpreted, and emulated.
CS 5204 – Fall, 2008
15
Virtualization
Virtualizing the IA-32 (x86) architecture
Architecture has protection rings 0..3 with OS normally in ring 0 and
applications in ring 3…
…and VMM must run in ring 0 to maintain its integrity and control
…but GuestOS not running in ring 0 is problematic:
Some privileged instructions execute only in ring 0 but do not fault when
executed outside ring 0 (remember privileged vs. sensitive?)
instructions for low latency system calls (SYSENTER/SYSEXIT) always
transition to ring 0 forcing the VMM into unwanted emulation or overhead
For the Itanium architecture, interrupt registers only accessible in ring 0;
forcing VMM to intercept each device driver access to these registers has
severe performance consequences
Masking interrupts can only be done in ring 0
Ring compression: paging does not distinguish privilege levels 0-2,
GuestOS must run in ring 3 but is then not protected from its applications
also running in ring 3
Cannot be used for 64-bit guests on IA-32
The fact that it is not running in ring 0 can be detected (is this important?)
CS 5204 – Fall, 2008
16
Virtualization
VMM
machine
Memory Management
OS
physical
process
virtual
entity
address space
GuestOS
VMM
“shadow” page tables
Isolation/protection of
Guest OS address spaces
Efficient MM address
translation
page tables
CS 5204 – Fall, 2008
17
Virtualization
XEN: paravirtualization
Computer Laboratory
References and Sources
Paul Barham, et.al., “Xen and the Art of Virtualization,” Symposium on Operating Systems Principles 2003 (SOSP’03),
October 19-22, 2003, Bolton Landing, New York.
Presentation by Ian Pratt available at http://www.cl.cam.ac.uk/netos/papers/2005-xen-may.ppt
CS 5204 – Fall, 2008
18
Virtualization
Xen - Structure
Employs paravirtualization
strategy
“Domain 0”
Deals with machine
architectures that cannot be
virtualized
Requires modifications to
guest OS
Allows optimizations
has special access to control
interface for platform
management
Has back-end device drivers
Xen VMM
entirely event driven
no internal threads
Xen 3.0 Architecture
CS 5204 – Fall, 2008
19
Virtualization
MMU Virtualizion : Shadow-Mode
guest reads
Virtual → physical
guest writes
Accessed &
dirty bits
Guest OS
Updates
Virtual → Machine
VMM
MMU
CS 5204 – Fall, 2008
Hardware
20
Virtualization
MMU Virtualization : Direct-Mode
guest reads
Virtual → Machine
guest writes
Guest OS
Xen VMM
MMU
CS 5204 – Fall, 2008
Hardware
21
Virtualization
Writeable Page Tables : 1 – write fault
guest reads
Virtual → Machine
first guest
write
Guest OS
page fault
Xen VMM
MMU
CS 5204 – Fall, 2008
Hardware
23
Virtualization
Writeable Page Tables : 2 - Unhook
guest reads
guest writes
X
Virtual → Machine
Guest OS
Xen VMM
MMU
CS 5204 – Fall, 2008
Hardware
24
Virtualization
Writeable Page Tables : 3 - First Use
guest reads
guest writes
Virtual → Machine
X
Guest OS
page fault
Xen VMM
MMU
CS 5204 – Fall, 2008
Hardware
25
Virtualization
Writeable Page Tables : 4 – Re-hook
guest reads
Virtual → Machine
guest writes
Guest OS
validate
Xen VMM
MMU
CS 5204 – Fall, 2008
Hardware
26
Virtualization
I/O
Safe hardware interfaces
I/O Spaces
Restricts access to I/O registers
Driver isolated from VMM in its own “domain” (i.e., VM)
Communication between domains via device channels
Isolated Device Drive
Unified interfaces
Common interface for group of similar devices
Exposes raw device interface (e.g., for specialized devices like sound/video)
Separate request/response from event notification
I/O descriptor rings
Used to communicate I/O requests and
responses
For bulk data transfer devices (DMA,
network), buffer space allocated out of
band by GuestOS
Descriptor contains unique identifier
to allow out of order processing
Multiple requests can be added before
hypercall made to begin processing
Event notification can be masked by
GuestOS for its convenience
CS 5204 – Fall, 2008
27
Virtualization
Device Channels
Connects “front end” device drivers in
GuestOS with “native” device driver
Is an I/O descriptor ring
Buffer page(s) allocated by GuestOS and
“granted” to Xen
Buffer page(s) is/are pinned to prevent
page-out during I/O operation
Pinning allows zero-copy data transfer
CS 5204 – Fall, 2008
28
Virtualization
System Performance
1.1
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
L
X
V
U
L
X
V
U
SPEC INT2000 (score) Linux build time (s)
L
X
V
U
OSDB-OLTP (tup/s)
L
X
V
U
SPEC WEB99 (score)
Benchmark suite running on Linux (L), Xen (X), VMware Workstation (V), and UML (U)
Benchmark suites
Spec INT200: compute intensive workload
Linux build time: extensive file I/O, scheduling, memory management
OSBD-OLTP: transaction processing workload, extensive synchronous disk I/O
Spec WEB99: web-like workload (file and network traffic)
Fair comparison?
CS 5204 – Fall, 2008
29
Virtualization
I/O Peformance
Systems
L: Linux
IO-S: Xen using IO-Space access
IDD: Xen using isolated device driver
Benchmarks
Linux build time: file I/O, scheduling, memory management
PM: file system benchmark
OSDB-OLTP: transaction processing workload, extensive synchronous disk I/O
httperf: static document retrievel
SpecWeb99: web-like workload (file and network traffic)
CS 5204 – Fall, 2008
30