Live Migration-prex - Communication and Multimedia Lab

Download Report

Transcript Live Migration-prex - Communication and Multimedia Lab

Live Migration
of Virtual Machines
Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm
Hansen†,Eric Jul†, Christian Limpach, Ian Pratt, Andrew Warfield
University of Cambridge Computer Laboratory
† Department of Computer Science ,University of Copenhagen,
Denmark
USENIX NSDI ‘05
Introduction

Operating system virtualization has attracted
considerable interest in recent years
-In data Centers, cluster computing communities

allows many OS instances to run concurrently on a
single physical machine

Migrating an entire OS and all of its applications
as one unit
◦ Compared to the process migration (residual dependencies)
Introduction

Live Migration

Without interfering the network connection

Allows a separation of concerns between the
users and operator of a datacenter or cluster.

Allowing separation of hardware and software
considerations
Introduction

Downtime
◦ services are entirely unavailable

Total migration time


during which state on both machines is synchronized
and which hence may affect reliability
This paper use the “pre-copy” approach to achieve
live migration and target on decreasing the
downtime (implemented on Xen)
Design

Network
Generate an ARP reply from the migrated host,
advertising that the IP has moved to a new location.

Storage
Use a network-attached storage (NAS) device
Do not need to migrate disk storage
Design

Memory Transfer
◦ Push phase
◦ Stop-and-copy phase
◦ Pull phase

most practical solutions select one or two of the
three phases
◦ pure stop-and-copy, pure demand

This paper uses iterative push phase with a typically
very short stop-and-copy phase.
Related Work
Shutdown the VM
 Pre-Copy
 VMware

Related Work

Post-Copy Live Migration of Virtual
Machines

Michael R. Hines, Umesh Deshpande, and Kartik Gopalan
Computer Science, Binghamton University (SUNY)
ACM SIGPLAN/SIGOPS VEE’09
Design Overview
WritableWorking Sets

Some pages will seldom or never be modified and
hence are good candidates for pre-copy

Some will be written often and so should best be
transferred via stop-and-copy
=> WritableWorking Sets
WritableWorking Sets
WritableWorking Sets
Dynamic Rate-Limiting

Dynamically adapt the bandwidth limit during each
pre-copying round

The administrator selects a minimum(m) and a
maximum(M) bandwidth limit

The first pre-copy round transfers pages at the
minimum bandwidth m
Dynamic Rate-Limiting

Dirtying rate =
(the number of pages dirtied in the previous round)
/ (duration of the previous round)

Bandwidth rate for next round =
Dirtying rate + 50 Mbits/sec

Stop pre-copy when
◦ Calculated rate > M
◦ Less than 256KB remains to be tranferred
Some implementation issues

Rapid Page Dirtying
◦ Do not need to always transfer hot pages

Freeing Page Cache Pages
◦ In the first round

Stunning Rogue Processes
◦ Limit each process to 40 write faults each time
Stunning Rogue Processes
Evaluation
Dell PE-2650 server-class machines
 dual Xeon 2GHz CPUs
 2GB memory
 connected via Gigabit Ethernet
 Storage: iSCSI protocol NAS
 XenLinux 2.4.27

a. SimpleWeb Server
Apache 1.3 web server
 Continuously serving a single 512KB file
 memory allocation: 800MB
 Initially rate limited to 100Mbit/sec
 776MB memory to be transferred in the
first round
 165ms outage

a. SimpleWeb Server
b.ComplexWebWorkload:SPECweb99
memory allocation: 800MB
 30% require dynamic content generation
 16% are HTTP POST operations
 0.5% execute a CGI script


The server generates access and POST logs

210ms outage
b.ComplexWebWorkload:SPECweb99
c. Low-Latency Server: Quake 3
a multiplayer on-line game server
 a virtual machine with 64MB of memory
 Six players joined the game and started to
play within a shared arena
 transfers so little data (148KB) in the last
round
 Downtime: 60ms

c. Low-Latency Server: Quake 3
d. A DiabolicalWorkload: MMuncher
a virtual machine is writing to memory
faster than can be transferred
 Memory: 512MB
 a simple C program that writes constantly
to a 256MB
 Downtime: 3.5 seconds

d. A DiabolicalWorkload: MMuncher
Conclusion
A pre-copy live migration method on Xen
 Concern about WWS
 Dynamic network-bandwidth adaption
 realistic server workloads such as
SPECweb99 can be migrated with just
210ms downtime
 a Quake3 game server is migrated with an
imperceptible 60ms outage
