Transcript ppt
Xen and the Art of
Virtualization
By Paul Barham, Boris Dragovic, Stevan Hand, Tim
Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and
Andrew Warfield.
Presented by Diana Carroll
1
Virtual Machines
One hardware system with memory, processors, I/O
devices.
Multiple execution environments that each map to an
identical representation of the physical system.
An OS running on a virtual machine is not aware that it is
sharing the machine.
Virtual machines must be isolated from each other even though
they share the same hardware.
The execution of one can’t stall or corrupt the others.
The performance overhead needs to be acceptably small.
The virtual machines must share the hardware as equally as
possible.
A Virtual Machine Monitor is needed to accomplish
this.
3/27/2016
Xen and the Art of Virtualization
Diana Carroll
2
Virtual Machine Monitors
Also known as a hypervisor.
Provides an interface for multiple virtual machines to
coexist together.
Can run multiple operating systems on a single
computer.
Provides stability, since even if one OS crashes, the rest of the
machine remains functional.
Can eliminate the need for multiple machines dedicated to
different operating systems.
Provides isolation between operating system instances
and multiplexes physical resources across the running
virtual machines.
3/27/2016
Much like an OS does with processes.
Xen and the Art of Virtualization
Diana Carroll
3
Xen
Xen is a Virtual Machine Monitor (VMM).
Allows users to dynamically instantiate an operating system.
Hosts operating systems such as Linux and Windows.
Some source code modifications are necessary.
In the paper, XenoLinux was complete, Windows XP and NetBSD still in
progress.
Now, NetBSD, Linux, FreeBSD, Plan9, and NetWare are complete.
WindowsXP port was successful, but licensing prohibitions prevent it from
being released. (1)
Multiple operating systems can run simultaneously and perform
different tasks.
Is completely software based and requires no special hardware
support.
3/27/2016
Full virtualization, in which the virtual hardware is identical to the
underlying physical hardware, is virtually impossible on the x86
architecture.
Xen provides a similar, but not quite identical view of the hardware.
Xen and the Art of Virtualization
Diana Carroll
4
Xen Design Principles
Support unmodified application binaries.
Support fully functional, multi-application operating
systems a guests.
Use paravirtualization to provide high performance
and good resource isolation.
Necessary to ensure that it is useful for users.
The guest operating system has to be modified to run on the
Virtual Machine Monitor.
Specifically, the guest OS can no longer execute in ring 0,
because that ring is now occupied by the VMM.
The guest OS has to be modified to run outside of ring 0.
Sometimes more correct behavior and better
performance are achieved when the resource
virtualization is not completely hidden.
3/27/2016
Xen and the Art of Virtualization
Diana Carroll
5
Xen versus Disco
Disco uses true virtualization (almost)
True virtualization does not require any modification of the
guest OS.
The virtual machine is indistinguishable from the real hardware.
Xen uses paravirtualization
The guest OS has to be modified, or ported, onto the Xen
hypervisor.
Xen virtual machines resemble the real hardware but do not
attempt to be an exact match.
When appropriate, the guest OS makes calls to the hypervisor
rather than to the hardware.
Solves the problem of architectures like the x86 that do not
support true virtualization.
3/27/2016
e.g. For memory management and I/O.
The TLB is hardware managed rather than software managed.
Xen and the Art of Virtualization
Diana Carroll
6
The Virtual Machine Interface
A paravirtualized version of the X86 interface.
In this case, the x86 architecture is a worst-case environment.
Divided into memory management, CPU, and I/O.
Guest operating systems execute within domains.
3/27/2016
A domain is a running virtual machine.
Xen and the Art of Virtualization
Diana Carroll
7
Memory Management
Guest OS’s are responsible for allocating and
managing the hardware page tables.
Xen exists in a 64MB section at the top of each address
space.
This avoids the TLB being flushed each time the execution path
enters or leaves the hypervisor.
Guest OS allocates and initializes a new page table
from its own memory and then registers it with Xen.
Minimal involvement from Xen is required to ensure safety and
isolation.
Necessary since x86 does not have a software-managed TLB,
which could be efficiently virtualized.
All subsequent updates must be validated by Xen.
Updates can be batched to improve efficiency.
Segment descriptors are also validated. They must
have lower privelege than Xen, and cannot allow access
to the Xen-reserved portion of the address space.
3/27/2016
Xen and the Art of Virtualization
Diana Carroll
8
Virtualizing the CPU
Applications run at different privilege levels.
With a virtualized CPU, the OS no longer runs at ring
0.
Typically, in x86, an OS runs at ring 0, as the most privileged
entity in the system.
Applications usually run at ring 3.
This privilege level is now reserved for the VMM.
The guest OS must be modified to run at a lower privilege level.
Since most OS implementations do not use rings 1 and 2, the
guest OS can be ported to ring 1.
This prevents the guest OS from executing privileged
hypervisor code, but keeps it safely isolated from
applications that are still running in ring 3.
3/27/2016
Xen and the Art of Virtualization
Diana Carroll
9
CPU virtualization continued
Privileged instructions are required to be validated and
executed within Xen.
Exceptions are managed using a table of exception
handlers.
e.g. Installing a new page table or yielding the processor.
Attempts to execute a privileged instruction fails since only Xen
operates at the highest privilege level.
Page fault handler is the only one that has to be modified to read
from an extended stack frame instead of a register.
System calls allow each guest OS to register a ‘fast’ exception
handler, since it is not necessary for it to run in ring 0.
All exception handlers are validated by Xen.
3/27/2016
Checked to ensure that the handler code does not specify
execution in ring 0.
Xen and the Art of Virtualization
Diana Carroll
10
I/O
Xen
uses a set of device abstractions instead of
emulating existing hardware devices.
I/O
data is transferred to and from each domain via
Xen.
“Shared memory, asynchronous buffer-descriptor
rings” are used to pass I/O buffer information
vertically through the system.
Asynchronous
notifications of I/O events are
made to a domain.
Made
by updating a bitmap of pending event types,
and possible calling an event handler as specified by
its OS.
3/27/2016
Xen and the Art of Virtualization
Diana Carroll
11
Porting an OS to Xen
Requires less than 2% of the total lines of code to be
modified.
The User Software runs
on the Guest OS without
requiring modification.
3/27/2016
Xen and the Art of Virtualization
Diana Carroll
12
Separating Policy from
Mechanism
The hypervisor only provides basic control operations.
Authorized domains can export these operations through a
control interface.
An initial domain, Domain0, is created at boot time and can
access the control interface.
It can then use the control interface to create and manage additional
domains.
Responsible for building the domain and initial structures to
support each guest OS.
Can be specialized to handle the varying requirements of different
OSes.
The control interface also supports virtual I/O devices.
Virtual Network Interfaces (VIF) and Virtual Block Devices (VBD).
Additional administrative tools may be added to Domain0 in the
future.
3/27/2016
Xen and the Art of Virtualization
Diana Carroll
13
Control Transfer
Hypercalls
A
are made from a domain to Xen.
synchronous software trap into the hypervisor.
e.g.
to request a set of page-table updates or other privileged
operation.
Control
is returned to the calling domain when the
call is completed.
Notifications
from Xen to a domain are made
using an asynchronous event mechanism.
Replaces
the delivery mechanism for device
interrupts.
Allows lightweight notification of events.
Similar to Unix signals.
3/27/2016
Xen and the Art of Virtualization
Diana Carroll
14
Device Transfer
I/O descriptor rings are a
circular queue of descriptors
that hold producer/consumer
pointer pairs.
Descriptors are allocated by a
domain, but accessible from
within Xen.
Access to the ring is controlled
by two pairs of pointers.
3/27/2016
The virtual memory manager
is an extra protection domain
between guest OS and I/O
device.
Data needs to be transferred from
I/O device to OS with as little
overhead as possible.
Domains produce requests and
advance the request producer
pointer.
Xen removes requests and
advances the request consumer
pointer.
Xen produces responses and
advances the response producer
pointer.
Domains remove responses and
advance the response consumer
pointer.
Xen and the Art of Virtualization
Diana Carroll
15
Virtualization of System
Components
CPU scheduling is done using the Borrowed Virtual Time
algorithm.
Thread execution is monitored in terms of virtual time.
The scheduler selects the thread with the earliest effective virtual time.
A thread can borrow virtual time by warping back to appear earlier and
gain dispatch priority.
But it then goes to the end of the line after execution.
Protects against low-latency threads using excessive processing cycles.
CPU resources are allocated dynamically, no need to predict processing
requirements in advance.
Guest OSes are given three ways of interpreting time.
3/27/2016
Virtual time only advances while the domain is executing.
Real time is the time in nanoseconds since the machine boot (can be
locked to an external time source).
Wall-clock time is real-time + offset.
Xen and the Art of Virtualization
Diana Carroll
16
Components Continued: Virtual
Address Translation
The x86 architecture uses hardware page tables, which
makes memory virtualization more difficult.
Xen only deals with page table updates.
Guest OS page tables are registered directly with the MMU.
Guest OSes have read-only access.
No need to use shadow page tables.
A guest OS passes Xen its page table updates using a
hypercall.
Requests are validated, and then applied.
A type and reference count are kept for each machine page
frame, and are used to validate updates.
Frames that have already been validated are marked so they do not
have to be revalidated.
Hypercall requests can be batched to improve
performance.
3/27/2016
Xen and the Art of Virtualization
Diana Carroll
17
Physical Memory and Disk
Each domain receives an initial reservation of memory.
Memory is statically divided between domains.
A balloon driver passes memory pages from Xen to the guest OS’s page
allocator.
A domain may claim additional memory pages up to its reservation limit.
A domain may also release pages back to Xen.
Mapping from physical to hardware addresses is left to the OS.
Xen provides a shared translation array that is readable by all domains.
Updates are validated by Xen first.
Only Domain0 has direct access to all physical drives. All other
domains access a virtual block device (VBD)
Domain0 manages the virtual block devices, using the I/O ring
queuing mechanism to control access.
A VBD is composed of a list of extents with associated ownership and
access control information.
To a guest OS, the VBD behavior is very similar to that of a SCSI disk.
3/27/2016
Xen keeps the translation table, and my reorder requests or process them in
batches.
Xen and the Art of Virtualization
Diana Carroll
18
Performance
Five implementations compared in total.
Compared 3 VMM’s
Vmware workstation 3.2
User-Mode Linux (runs the Linux OS in user-mode on a Linux host)
Xen with XenoLinux port
Also Native Linux
All used Redhat 2.0 with the Linux 2.4.21 kernel, i686
architecture, ext3 file system.
All used Dell 2650 dual processor 2.4GHz systems, 2GB
RAM, gigabit Ethernet, and 146GB SCSI drive.
Hyperthreading disabled.
Also tested the ESX server, which replaces the guest
OS with a dedicated kernel on VMware, but unable to
report the results (EULA restrictions).
3/27/2016
Xen and the Art of Virtualization
Diana Carroll
19
Performance Results
Cluster 1: SPEC CPU suite.
Cluster 2: Time taken to build a default configuration of the Linux 2.4.21 kernel with gcc v2.96.
Cluster 3: Open Source Database Benchmark suite in default configuration.
Information retrieval shown in tuples per second.
Cluster 4: Open Source Database Benchmark suite in default configuration.
Computationally intensive application, very little I/O and OS interaction.
Online Transaction Processing workloads shown in tuples per second.
Cluster 5: dbench program emulating load placed on a file server.
Cluster 6: SPEC Web99 is a web server benchmark.
3/27/2016
Xen and the Art of Virtualization
Diana Carroll
20
Operating System Benchmarks
Measured using the lmbench program,
version 3.0-a3
L-UP is native Linux uniprocessor.
L-SMP is native Linux multiprocessor.
Xen is running XenoLinux, their port of the
Linux OS.
VMW is VMware.
UML is user-mode Linux.
3/27/2016
Xen and the Art of Virtualization
Diana Carroll
21
Further Performance Measures
Multiple instances of PostreSQL in separate domains
OSDB-IR = Open Source Database Benchmark Information Retrieval.
OSDB-OLTP = Open Source Database Benchmark On-line Transaction Processing.
Performance Isolation
They couldn’t find another OS-based implementation of performance isolation to compare it
with.
They tested Xen using 4 domains running with equal resource allocations.
3/27/2016
2 domains running previously-measured workloads.
2 domains running disruptive processes (e.g. disk bandwidth hog, fork bomb, memory grabber).
The impact of the disruptive processes was only a 2-4% decrease in performance .
Same processes effectively shut down a native Linux system.
Xen and the Art of Virtualization
Diana Carroll
22
Scalability
Xen’s target was
to scale to 100
domains.
They were able
to configure a
guest OS for
server
functionality,
running a memory
of only 4MB with swap.
When an incoming request was received, it could request more memory
from Xen.
Compared to native Linux, they found a tradeoff situation.
Long time slices gives the highest throughput, but less responsiveness.
Xen running with 50ms time slices had similar throughput to Linux.
Short time slices lowered throughput but improved responsiveness.
3/27/2016
With 128 domains running, Xen still provided a response time of 5.4ms.
5ms time slices resulted in 7.5% lower throughput.
Xen and the Art of Virtualization
Diana Carroll
23
Conclusion
Xen is a software based Virtual Machine Monitor (hypervisor).
Allows multiple OSes to be hosted simultaneously on the same
machine.
Requires the OS to be modified (ported) in order to run on the VMM.
Provides the protection of performance isolation between domains.
Xen today…
Open-source project published under the GPL.
Currently on version 3.0.
NetBSD, Linux (several distros, including SuSE, Fedora, RHEL, Mandrake),
FreeBSD, Plan9, and NetWare are complete. WindowsXP port was
successful, but licensing prohibitions prevent it from being released.
Hardware support for virtualization
Intel is releasing a new line of processors that support virtualization.
2 forms of CPU operation.
3/27/2016
In addition to levels 0-3, there is also a root level where the VMM can run.
Guest OSes still can run at level 0, so porting is no longer required.
Virtual Machine Control Structure (VMCS) manages VM entries and
exits.
Xen and the Art of Virtualization
Diana Carroll
24
References
University of Cambridge Xen page
Wikipedia entry for Xen
http://en.wikipedia.org/wiki/Xen_%28virtual_machine_monitor%29
Intel Virtualization Technology, by Rich Uhlig, Gil Neiger, Dio
Rodgers, Amy Santoni, Fernando Martins, Andrew Anderson,
Steven Bennett, Alain Kagi, Felix Leung, and Larry Smith.
http://www.cl.cam.ac.uk/Research/SRG/netos/xen/
Published in Computer magazine, May 2005 (Vol. 38, No. 5) ISSN: 00189162
Borrowed-Virtual-Time (BVT) scheduling: Supporting Latencysensitive Threads in a General-purpose Scheduler
3/27/2016
Kenneth J. Duda and David R. Cheriton
Xen and the Art of Virtualization
Diana Carroll
25