Xen and the Art of Virtualization

Download Report

Transcript Xen and the Art of Virtualization

Xen and the Art of Virtualization
Paul Barham*, Boris Dragovic, Keir Fraser, Steven Hand, Tim
Harris, Alex Ho, Rolf Neugebauery, Ian Pratt, Andrew Wareld
*Microsoft Research Cambridge, UK
University of Cambridge Computer Laboratory
19th ACM Symposium on Operating System principles(SOSP’03)
1
Introduction
• Resurgence of interest in VM technology(2003)
– Modern computers are sufficiently powerful to
use virtualization.
• In this paper we present Xen:
– a high performance resource-managed virtual
machine monitor(VMM)
2
Problem need to solve
• VM isolation:
– It is not acceptable for the execution of one to
adversely affect the performance of another.
• Different operating systems enabled:
– To accommodate the heterogeneity of popular
applications.
• Performance overhead:
– the performance overhead introduced by
virtualization should be small.
3
XEN: APPROACH & OVERVIEW
• Traditional approach— full virtualization:
– The virtual hardware exposed is functionally
identical to the underlying machine.
– Benefit:
• allowing unmodified operating systems to be hosted.
– Drawback:
• Support for full virtualization was never part of the x86
architectural design.
4
XEN: APPROACH & OVERVIEW(cont.d)
• Improvement:
– Paravirtualization:
• Presenting a virtual machine abstraction that is similar
but not identical to the underlying hardware.
• Improved performance.
• Require modifications to the guest operating system.
– But no modifications are required to guest applications.
» ABI: an application binary interface (ABI) describes the
low-level interface between an application (or any type of)
program and the operating system or another application.
5
The Virtual Machine Interface
• Overview of the paravirtualized x86 interface:
– Memory management
– CPU
– Device I/O
• Why x86?
– x86 represents a worst case
6
Memory management
• Software-managed TLB V.S. Physical-managed
TLB
• Two decision:
– To ensure safety and isolation:
• Guest OSes are responsible for allocating and managing the
hardware page tables
• Minimal involvement from Xen;
– Avoiding a TLB flush when entering and leaving the
hypervisor.
• Xen exists in a 64MB section at the top of every address
space.
7
Memory management(cont.d)
• Method:
1. A guest OS requires a new page table
• EX: a new process is being created.
2. It allocates and initializes a page from its own
memory reservation and registers it with Xen.
3. Guest OS relinquish direct write privileges to the
page-table memory.
• All subsequent updates must be validated by Xen
– Note:
• Guest OSes may batch update requests to amortize the
overhead of entering the hypervisor.
8
CPU
• In order to paravirtualize CPU, the hypervisor
must have higher privilege level than guest OS.
– Prevents the guest OS from directly executing
privileged instructions
• Isolation.
• EX: memory management design we discuss before.
– In x86, processor has 4 privilege levels in hardware.
• The x86 privilege levels are generally described as rings.
– From ring 0 to ring 3(0 is the most privileged)
• Therefore, hypervisor is set to ring 0, guest OS is set to ring 1,
ring 3 is set to applications.
– Any OS which follows this common arrangement can be ported to
Xen by modifying it to execute in ring 1.
9
CPU(cont.d)
• Exception handle:
– EX: page faults and software exception.
– A table describing the handler for each type of
exception is registered with Xen for validation.
• Overhead.
• Safety is ensured by validating exception handlers.
– Validate the handler's code segment does not specify
execution in ring 0.
10
CPU(cont.d)
• Exception handle(cont.d):
– To deal with overhead:
• only two types of exception occur frequently enough to
affect system performance
– system calls (usually implemented via a software exception)
– page faults(no solution)
• System calls can be registered to a `fast' exception
handler.
– Accessed directly by the processor without indirecting via ring
0.
11
Device I/O
• Full-virtualized environments
– Emulating existing hardware devices
• Paravirtualized:
– Xen exposes a set of clean and simple device
abstractions.
• Objective: Protection and isolation.
– I/O data is transferred to and from each VM via
Xen.(describe later)
• In order to perform validation checks
– EX: checking that buffers are contained within a domain's
memory reservation.
12
Detail Design
• Control Transfer:
– Hypercalls and Events
• Data Transfer:
– I/O Rings
• Subsystem Virtualization
– CPU scheduling
– Virtual address translation
– Network
– Disk
13
Control Transfer:
Hypercalls and Events
• Hypercalls:
– Synchronous calls from a VM to Xen.
• In order to perform a privilege operation
• EX: VM request a set of page table updates
• Events:
– Notifications are delivered to VM from Xen using an
asynchronous event mechanism.
• Replaces the usual delivery mechanisms for device
interrupts.
• EX: Indicate that new data has been received over the
network.
• Guest OS may specify an event-callback handler to respond
to the notification.
14
Control Transfer:
Hypercalls and Events
• Events(cont.d):
– Pending events:
• Stored in a per-domain bitmask which is updated by
Xen.
– How to pend events?
• Set a Xen-readable software flag.
– This is analogous to disabling interrupts on a real processor.
15
Data Transfer: I/O Rings
• Data transfer mechanism main idea
– Allows data to move vertically through the system
with as little overhead as possible.
– Minimize the work required to demultiplex data to
a specific VM when an interrupt is received from a
device
16
Data Transfer: I/O Rings
2
3
1
4
• I/O data buffers are allocated out-of-band by the guest OS
– Zero copy: by transfer the pointer and edit permission.
17
Data Transfer: I/O Rings
• Order:
– There is no requirement that requests be
processed by Xen.
• The guest OS associates a unique identifier with each
request which is reproduced in the associated response.
• Reason: reorder I/O operations due to scheduling or
priority considerations.
18
CPU scheduling
• Scheduling alg:
– Borrowed Virtual Time (BVT) scheduling
algorithm[11]
• work-conserving
• has a special mechanism for low-latency wake-up when
VM receives an event
[11]K. J. Duda and D. R. Cheriton. Borrowed-Virtual-Time (BVT) scheduling: supporting latency-sensitive threads in a general-purpose scheduler. In
Proceedings of the 17th ACM SIGOPS Symposium on Operating Systems Principles, volume 33(5) of ACM Operating Systems Review, pages 261.276, Kiawah19
Island Resort, SC, USA, Dec. 1999.
BVT scheduling
• Main idea:
•
Virtual time.
– Dispatching the runnable thread with the earliest effective virtual time
(EVT).
• The EVT for the thread is computed as:
– 𝐸𝑖 ← 𝐴𝑖 − 𝑤𝑎𝑟𝑝? 𝑊𝑖 : 0
– the scheduler runs thread i if it has the minimum 𝐸𝑖 of all the
runnable threads.
• Parameter:
–
–
–
–
𝐸𝑖 = EVT of thread 𝑖
𝐴𝑖 = Actual virtual time (AVT)of thread 𝑖
𝑊𝑖 = Virtual time warp of thread 𝑖
𝑊𝑎𝑟𝑝𝐵𝑎𝑐𝑘𝑖 = Set if warp is enabled
20
BVT scheduling
• Context switch decision:
– 𝐴𝑗 ≤ 𝐴𝑖 − 𝐶/𝑤𝑖
• Weight-fair sharing
– 𝐴𝑖 ← 𝐴𝑖 +
ms)
𝑘∗𝑚𝑐𝑢
𝑤𝑖
(note: k*mcu=thread i running for t
• Parameter:
– mcu = minimum charging unit
– C = context switch allowance
– 𝑤𝑖 = 𝑤𝑒𝑖𝑔ℎ𝑡 𝑡𝑜 𝑠ℎ𝑎𝑟𝑒 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑜𝑟
21
BVT scheduling
Virtual Time(Ei)
w=2/3
w=1/3
Real Time
22
BVT scheduling
• Sleeping adjustment:
– 𝐴𝑖 ← max(𝐴𝑖 , 𝑆𝑉𝑇)
– Scheduler virtual time(SVT)
• Scheduler variable indicating the minimum 𝐴𝑖 of any runnable thread.
Virtual Time(Ei)
gcc sleep for 15
real time unit,
when gcc wakes
up, 𝐴𝑔𝑐𝑐 is
brought to SVT
Real Time
23
Low latency dispatch
• Recall
– 𝐸𝑖 ← 𝐴𝑖 − 𝑤𝑎𝑟𝑝? 𝑊𝑖 : 0
– 𝑊𝑎𝑟𝑝𝐵𝑎𝑐𝑘𝑖 can be set directly by a system call
– Larger warp values provide lower latency dispatch
than smaller values.
• Another parameter:
– 𝐿𝑖 = 𝑤𝑎𝑟𝑝 𝑡𝑖𝑚𝑒 𝑙𝑖𝑚𝑖𝑡
– 𝑈𝑖 = unwarp time requirement
24
Low latency dispatch
Virtual Time(Ei)
Mpeg wake on t=5
and 15, and execute
for 2.5 time.
Real Time
• mpeg run first because it is warped back 50 virtual units
25
Low latency dispatch
Virtual Time(Ei)
Li exceeded
Real Time
26
Virtual address translation
• Indeed, Xen need only be involved in page
table updates.
– Prevent guest OSes from making unacceptable
changes.
• Approach:
– Xen register guest OS page tables directly with the
Memory Management Unit(MMU) and restrict
guest OSes to read-only access.
– Page table updates are passed to Xen via a
hypercall
27
Network
• Each VM has one or more Virtual network interfaces (VIFs).
– VIFs are attached to a virtual firewall-router(VFR)
– Domain0 is responsible for inserting and removing rules on VFR.
• A VIF contains:
– two I/O rings of buffer descriptors, one for transmit and one for
receive.
– Zero copy:
• The guest OS exchanges an unused page frame for each packet it
receives.
• Fairness:
– Xen implements a simple round-robin packet scheduler.
28
Disk
• Only Domain0 has direct unchecked access to
physical (IDE and SCSI) disks.
– VM access persistent storage through the abstraction
of virtual block devices (VBDs).
• I/O ring mechanism.
– A translation table is maintained within the
hypervisor for each VBD.
• Mapping VBD identifier and offset to the corresponding
sector address and physical device.
– Xen services batches of requests from competing
domains in a simple round-robin fashion
29
Evaluation Environment
• Hardware:
– Dell 2650 dual processor2.4GHz Xeon server
– 2GB RAM,
– a Broadcom Tigon 3 Gigabit Ethernet NIC,
– a single Hitachi DK32EJ 146GB 10k RPM SCSI disk
• OS:
– Linux version 2.4.21
30
Evaluation
Relative Performance
• Compare a VM performance with “bare metal”
• Bare metal: a pure Linux OS directly install on physical machine.
31
Evaluation
Concurrent Virtual machine
32
Conclusion
• This paper presents Xen, an x86 virtual
machine monitor
– allows multiple commodity operating systems to
share conventional hardware.
– without sacrificing either performance or
functionality.
• As our experimental results shows.
• Ongoing work:
– Porting BSD and Windows XP kernels to operate
over Xen.
33
Comment
• Paravirtualization indeed has good
performance.
• However, Domain-0 may be the bottleneck
– a lot of work need domain-0 to validate or execute
• OS need modification in order to install on
Xen’s VM.
34
Introduction of Domain-0
• Domain-0
– A special privileged domain(VM)
– Serves as an administrative interface to Xen
– The first domain launched when the system is
booted
• Note:
– Domain-0(Dom0) = Privileged domain
– Domain-U(DomU) = Unprivileged domain
38
3
Configuration and
monitoring
Interface
A simple Xen architecture
2
Dom0 exports the
simplified generic
class devices to
each DomU
1
Direct physical
access to all
hardware
39
TCP working flow Example
1
3
2
Zero Copy
4
40