Live Migration-prex - Communication and Multimedia Lab
Download
Report
Transcript Live Migration-prex - Communication and Multimedia Lab
Live Migration
of Virtual Machines
Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm
Hansen†,Eric Jul†, Christian Limpach, Ian Pratt, Andrew Warfield
University of Cambridge Computer Laboratory
† Department of Computer Science ,University of Copenhagen,
Denmark
USENIX NSDI ‘05
Introduction
Operating system virtualization has attracted
considerable interest in recent years
-In data Centers, cluster computing communities
allows many OS instances to run concurrently on a
single physical machine
Migrating an entire OS and all of its applications
as one unit
◦ Compared to the process migration (residual dependencies)
Introduction
Live Migration
Without interfering the network connection
Allows a separation of concerns between the
users and operator of a datacenter or cluster.
Allowing separation of hardware and software
considerations
Introduction
Downtime
◦ services are entirely unavailable
Total migration time
during which state on both machines is synchronized
and which hence may affect reliability
This paper use the “pre-copy” approach to achieve
live migration and target on decreasing the
downtime (implemented on Xen)
Design
Network
Generate an ARP reply from the migrated host,
advertising that the IP has moved to a new location.
Storage
Use a network-attached storage (NAS) device
Do not need to migrate disk storage
Design
Memory Transfer
◦ Push phase
◦ Stop-and-copy phase
◦ Pull phase
most practical solutions select one or two of the
three phases
◦ pure stop-and-copy, pure demand
This paper uses iterative push phase with a typically
very short stop-and-copy phase.
Related Work
Shutdown the VM
Pre-Copy
VMware
Related Work
Post-Copy Live Migration of Virtual
Machines
Michael R. Hines, Umesh Deshpande, and Kartik Gopalan
Computer Science, Binghamton University (SUNY)
ACM SIGPLAN/SIGOPS VEE’09
Design Overview
WritableWorking Sets
Some pages will seldom or never be modified and
hence are good candidates for pre-copy
Some will be written often and so should best be
transferred via stop-and-copy
=> WritableWorking Sets
WritableWorking Sets
WritableWorking Sets
Dynamic Rate-Limiting
Dynamically adapt the bandwidth limit during each
pre-copying round
The administrator selects a minimum(m) and a
maximum(M) bandwidth limit
The first pre-copy round transfers pages at the
minimum bandwidth m
Dynamic Rate-Limiting
Dirtying rate =
(the number of pages dirtied in the previous round)
/ (duration of the previous round)
Bandwidth rate for next round =
Dirtying rate + 50 Mbits/sec
Stop pre-copy when
◦ Calculated rate > M
◦ Less than 256KB remains to be tranferred
Some implementation issues
Rapid Page Dirtying
◦ Do not need to always transfer hot pages
Freeing Page Cache Pages
◦ In the first round
Stunning Rogue Processes
◦ Limit each process to 40 write faults each time
Stunning Rogue Processes
Evaluation
Dell PE-2650 server-class machines
dual Xeon 2GHz CPUs
2GB memory
connected via Gigabit Ethernet
Storage: iSCSI protocol NAS
XenLinux 2.4.27
a. SimpleWeb Server
Apache 1.3 web server
Continuously serving a single 512KB file
memory allocation: 800MB
Initially rate limited to 100Mbit/sec
776MB memory to be transferred in the
first round
165ms outage
a. SimpleWeb Server
b.ComplexWebWorkload:SPECweb99
memory allocation: 800MB
30% require dynamic content generation
16% are HTTP POST operations
0.5% execute a CGI script
The server generates access and POST logs
210ms outage
b.ComplexWebWorkload:SPECweb99
c. Low-Latency Server: Quake 3
a multiplayer on-line game server
a virtual machine with 64MB of memory
Six players joined the game and started to
play within a shared arena
transfers so little data (148KB) in the last
round
Downtime: 60ms
c. Low-Latency Server: Quake 3
d. A DiabolicalWorkload: MMuncher
a virtual machine is writing to memory
faster than can be transferred
Memory: 512MB
a simple C program that writes constantly
to a 256MB
Downtime: 3.5 seconds
d. A DiabolicalWorkload: MMuncher
Conclusion
A pre-copy live migration method on Xen
Concern about WWS
Dynamic network-bandwidth adaption
realistic server workloads such as
SPECweb99 can be migrated with just
210ms downtime
a Quake3 game server is migrated with an
imperceptible 60ms outage