Live WAN VM Migration

Download Report

Transcript Live WAN VM Migration

Robert Bradford, Evangelos Kotsovinos, Anja Feldmann, Harald
Schiöberg
Presented by Kit Cischke
LIVE WIDE-AREA MIGRATION OF
VIRTUAL MACHINES INCLUDING
LOCAL PERSISTENT STATE
Differences in Contributions
 The first paper discussed transferring run-
time memory of a VM in a LAN.
 Cool.
 This paper expands on that to transfer the
VM’s image, its persistent state and on-going
network connections over a WAN as well.
 By combining pre-copying, write-throttling
and a block-driver, we can achieve this.
Introduction
 In this project, the authors want to extend live VM
migration to include:
 Persistent state (file systems used by the VM)
 Open network connections
 Why?
 Many apps running on a VM need that storage, and NAS
systems may not be available in the new location.
 Moving across a WAN will almost certainly involve an IP
change, and we don’t want to (overly) disrupt TCP
connections.
 Contribution
 A system enables live migration of VMs that use local
storage and open network connections without severely
disrupting their live services.
Highlights
 Some highlights of this work:
 Built upon the Xen Live Migration facility as part
of XenoServer.
 Enables:




Live migration
Consistency
Minimal Service Disruption
Transparency
 Utilizes:
 Pre-copying, write-throttling and IP tunneling.
System Design - Environment
 Both the source and destination run Xen, with
the VM running XenLinux.
 Uses blocktap to export block devices into the
migrated VM.
 Block devices are file-backed meaning the
contents of the block device are stored in an
ordinary file on the file system of the VM.
System Design - Architecture


The initialization stage starts things off by prepping the migration.
The bulk transfer stage pre-copies the disk image of the VM to the
destination while the VM continues to run.
 Xen transfer is then initiated, which performs incremental migration,
again without stopping the VM.
 While the transfers are occurring, all disk writes are intercepted as deltas
that will be forwarded to the destination.



Deltas include the data written, the location written and the size of the data.
The deltas are recorded into a queue that will be transferred later.
If write activity is too high and too many deltas are being generated,
write-throttling is engaged to slow down the VM.
 In parallel with Xen transfer, the deltas are applied to the destination
VM.
 At some point, the source VM is paused, the destination is started and a
temporary network redirect is created to handle the potential IP
changes.
Implementation - Initialization
 Authentication, authorization and access control are
handled by XenoServer.
 The migration client forks, creating a listener
process that signals the block-driver to enter record
mode.
 In record mode, the driver copies the writes to the listener
process, which transfers them to the destination.
 The other half of the migration client begins the bulk
transfer.
 At the destination, there is also a fork in the
daemon. One receives the bulk transfer, the other
receives the deltas.
Implementation – Bulk Xfer
 The VM’s disk image is transferred from the
source to the destination.
 XenoServers platform supports copy-onwrite along with immutable template disk
images, so we can just transfer the changes,
rather than the whole image.
Implentation – Xen Migration
 The system here relies on the built-in
migration mechanism of Xen.
 Xen logs dirty memory pages and copies
them to the destination without stopping the
source VM.
 Eventually, we will have to pause the source
and copy the remaining memory pages.
 Then we start up the migrated VM.
Implementation - Intercepting
 The blkfront device driver
communicates with the
dedicated storage VM via
ring buffer.
 The blktap framework
intercepts requests, but
does it in user space.
 Once a disk request makes
it to the backend, it is both
committed to the disk and
sent to the migration
client.
 The client then packages
the write up as a delta.
Implementation - Application
 After the bulk transfer, and in
parallel with the Xen transfer,
the deltas are transferred and
applied to the migrated VM by
the migration daemon in the
storage VM at the destination.
 If the delta queue becomes
empty and the Xen migration is
finished, I/O requests are put on
hold until the application of the
current crop of deltas is finished.
 The authors found that delta
application was normally
finished before Xen migration,
adding zero time to the overall
migration.
Implementation – Write Throttling
 If the VM attempts to complete more writes
than a given threshold value, future write
attempts are delayed by the block driver.
 This process repeats, with the delay and
threshold doubling each time.
 Experimentally, 16384 is a suitable threshold
with a delay of 2048 μs also being good.
 Enforcement is separated from policy for
extensibility.
Implementation – WAN Redirection
 If the IP of the VM changes, we use IP tunneling and Dynamic





DNS to prevent dropped network connections.
Just before the source VM is paused, an IP tunnel is created from
the source to destination using iproute2.
Once the destination VM is capable of responding to requests at
its new IP, Dynamic DNS forwards the requests to the new IP.
Packets that arrive during the final stage of migration are simply
dropped.
Once no connections exist that use the old IP, the tunnel is torn
down.
Practically, this works because:

The source server only needs to cooperate for a short time, most
network connections are short-lived and if nothing else, it’s no worse
than what you get if the VM doesn’t even try.
Evaluation - Metrics

Want to evaluate the disruption of the system, as perceived by users.







Spoiler: Results look good.
Want to show the system handles diabolical workloads, defined in this
paper as being heavy disk accessors, rather than heavy memory
accessors.
Downtime: Time between pausing the VM on the source and resuming
on the destination.
Disruption Time: Time during which clients observe a reduction in
service responsiveness.
Additional Disruption: Difference between disruption time and
downtime.
Migration time: Time from migration request to running VM at
destination.
Number of Deltas and Delta Rate: How many file system changes and
how often they occur.
Eval – Workload Overview
 Web server serving static content, serving
dynamic web application and video streaming.
 Chosen for realistic usage scenarios and because
they neatly trifurcate the spectrum of disk I/O
patterns:
 Dynamic app generates lots of bursty writes
 Static workload generates a medium amount of
constant writes
 Streaming video causes few writes, but is very
sensitive to disruption.
Eval – Experimental Setup

Three hosts






Dual Xeon 3.2 GHz, 4 GB DDR RAM,
mirrored RAID array of U320 SCSI disks.
The migrated VM was provided with
512 MB RAM and a single CPU.
All hosts were connected by a 100
Mbps switched Ethernet networks.
The migrated VM was running
Debian on a 1GB ext3 disk.
Host C is the client.
To emulate WAN transfers, traffic
shaping was used to limit the
bandwidth to 5Mbps with 100 ms of
latency.

Representative of host in London and
U.S. east coast.
Results – LAN Migration
 Measured disruption is 1.04 seconds and “practically unnoticeable by a
human user.”
 Few deltas by the log files for web server.
 If you’re shooting for a “5 9’s” uptime, you still get 289 migrations a year.
Results – LAN Migration




phpBB with a randomly posting script.
Disruption is 3.09 seconds due to more deltas.
HTTP throughput is almost not affected and total migration time is shorter.
5 9’s uptime still lets us migrate 98 migrations.
Results – LAN Migration
 Streamed a large video file, viewed by a human on host C.
 Disruption is 3.04 seconds and alleviated by the buffer of the video
player.
 No packets are lost, but there is lots of retransmission.
Comparison to Freeze-and-Copy
 Clearly, freeze and
copy provides a much
worse disruption than
the live migration.
Results – WAN Migration
Longer migration time leads to more deltas.
The tunneling let the connections persist.
68 total seconds of disruption, which is a lot, but much less than freeze-and-copy.
Results – Diabolical Workload
Ran the Bonnie benchmark as a diabolical process, generating lots of disk writes. Needed
to throttle twice. Initially, the bulk transfer is severely impeded and fixed. The overall
migration takes 811 s, and without throttling, the transfer would have taken 3 days.
Results – I/O Overhead
 The overhead of intercepting deltas is pretty low
and only noticeable during the migration.
Conclusions
 Demonstrated a VM migration scheme that
includes persistent state, maintains open
network connections and therefore works
over a WAN without major disruption.
 It can handle high I/O workloads too.
 Works much better than freeze-and-copy.
 Future work includes batching of deltas, data
compression and “better support for sparse
files.”