Real Time Issues in Live migration of Virtual Machines
Download
Report
Transcript Real Time Issues in Live migration of Virtual Machines
Presented by : Ran Koretzki
Basic Introduction
What are VM’s ?
What is migration ?
What is Live migration ?
What are VM’s ?
VM’s (Virtual Machines) - "completely isolated guest
operating system installation within a normal host
operating system". Modern virtual machines are
implemented with either software emulation or hardware
virtualization or (in most cases) both together.
This allows to run multiple independent VM’s on a single
physical Machine.
VM’s operating systems are not hardware depended.
What are VM’s ?
Traditional Architecture
Virtual Architecture
What are VM’s ? Benefits
Hardware independence.
Encapsulation - VMs can be described in a file
Possible to ‘snapshot’.
Easy to move and to backup.
Easy to clone and scale wide a server application.
Many VM venders : VMware, Microsoft , Citrix…
Enables running multiple operating systems
Consolidation & use of unused computation power.
Resource management.
High availability & disaster recovery.
Easy Management.
Migration – next on the agenda.
Migration
Definition - The ability to move VM’s from one PH to another.
In the past, to move a VM between 2 PH, it was necessary to
shutdown the VM, allocate the needed recourses to the new
PH, move the VM files, and start the VM in the new host.
The recourses that need to be transferred are : memory, the
internal state of the devices and of the virtual CPU. The must
time consuming to transfer is memory.
The problem : downtime.
The solution was at first Automation, but the real
improvement came with Live Migration.
Live Migration
Wiki Definition - allows a server administrator to move a
running virtual machine or application between different physical
machines without disconnecting the client or application. For a
successful live migration, the memory, storage, and network
connectivity of the virtual machine needs to be migrated to the
destination.
It mean it allows the server admin to move VM’s between PH
transparently to the clients.
It is done usually for Load balance between PH and for migration in
case of a hardware failure.
• Live migration of
virtual machines
• Zero downtime
By:
Fabio Checconi,
Tommaso Cucinotta,
Manuel Stein.
Presented by : Ran Koretzki
1. Show a heuristic to reduce the downtime of
a VM during live migration by scheduling
which memory pages to transmit first.
𝑉𝑀 ≡ {𝑝1 , … , 𝑝𝑁 }, where 𝑝𝑖 is a memory page of size P.
Available bandwidth for the transfer - b (bps).
Time needed to transfer a single page - 𝑇 =
H is the overhead
𝑃+𝐻
𝑏
Each page 𝑝𝑖 will be accessed for “write” with a
probability of 𝜋𝑖 during each time frame T.
where
1. At time 𝑡1 (start of Migration)- 𝐷1 is a set of pages to be
transmitted. It is set to be the entire page set used by the VM.
𝐷1 = 𝑛1 .
2. For 𝑘 = 1, … , 𝐾 do: all the pages in 𝐷𝑘 are transferred
according to the order specified by 𝜑𝑘 = 1 … 𝑛𝑘 → {1. . 𝑁};
The transfer ends at 𝑡𝑘+1 = 𝑡𝑘 + 𝑛𝑘 𝑇;
𝑛𝑘+1 pages in 𝐷𝑘+1 are found dirty again;
3. Stop the VM and transfer the last 𝑛𝑘+1 pages, up to migration
𝑃+𝐻
finishing time 𝑡𝑓 = 𝑡𝐾+1 + 𝑛𝐾+1
using a bandwidth of 𝑏𝑑 bps,
𝑏𝑑
with 𝑏𝑑 ≥ 𝑏 .
Down Time (meaning VM not available) is - (𝑡𝑑 = 𝑡𝑓 − 𝑡𝐾 )
𝑃+𝐻
𝑡𝑑 =
𝑛𝐾+1
𝑏𝑑
Overall Migration time: (𝑡𝑡𝑜𝑡 = 𝑡𝑓 − 𝑡1 )
𝑡𝑡𝑜𝑡
𝑃+𝐻
=
𝑏
𝐾
𝑛𝐾 + 𝑡𝑑
𝑘=1
1. The probability of a page 𝑝𝑖 that is not dirty at time 𝑡1 (start of migration) to become
dirty and thus need to be transmitted in the final migration round is:
Pr 𝑝𝑖 ∈ 𝐷2 𝑝𝑖 ∉ 𝐷1 = 1 − (1 − 𝜋𝑖 )𝑛1
2. The probability of a page 𝑝𝑖 that is dirty at time 𝑡1 (start of migration) to become
dirty and thus need to be transmitted in the final migration round is:
−1
Pr 𝑝𝑖 ∈ 𝐷2 𝑝𝑖 ∈ 𝐷1 = 1 − 1 − 𝜋𝑖 𝑛1+1−𝜑1 (𝑖)
(Where 𝜑1 −1 ∙ : 1 … 𝑁 → {1 … 𝑛1 } denotes the inverse of the 𝜑1 ∙ function.)
𝐸 𝑡𝑡𝑜𝑡
𝑃+𝐻
𝑃+𝐻
=
𝑛1 +
∗
𝑏
𝑏𝑑
𝑛1 −
𝑛1
𝑗=1(1
− 𝜋𝜑1 𝑗 )𝑛1 +1−𝑗 +(𝑁
− 𝑛1 − 𝑖∉𝐷1(1 − 𝜋𝑖 )𝑛1
The order of transmission of the pages that minimizes the
expected number of dirty pages found at the end of the 𝑘 𝑡ℎ live
migration step must satisfy the following condition:
∀𝑗𝜋𝜑𝑘 (𝑗) (1 − 𝜋𝜑𝑘(𝑗) )𝑛𝑘−𝑗 ≤ 𝜋𝜑𝑘 (𝑗+1) (1 − 𝜋𝜑𝑘 (𝑗+1) )𝑛𝑘−𝑗
Conclusion: If the probabilities 𝜋𝑖 are lower than
1
,
𝑛𝑘 +1
the
optimum ordering is obtained for increasing values of the
probabilities 𝜋𝑖 . On the other hand, if the probabilities are
1
greater than , the optimum ordering is obtained for decreasing
2
values of the 𝜋𝑖 .
All pages are equal, but some are more equal
Problem: wasteful to transmit at each step
Solution: Wait until the end, when VM is down
Algorithm: among the 𝑛𝑘 pages that are found as dirty at start of step k, for a
set of pages 𝐹𝑘 ⊂ 𝐷𝑘 delay the transmission to when the VM is stopped.
Which pages?
𝐹𝑘 ≜ {𝑝𝑖 ∈ 𝐷𝑘 |𝜋𝑖 ≥ 𝜋} (where 𝜋 is a threshold value)
𝑃+𝐻
𝑏𝑑
𝑃+𝐻
𝑃+𝐻
E 𝑡𝑡𝑜𝑡 ≤ 𝐸 𝑡𝑡𝑜𝑡 − 𝐹1 1 − 𝜋
+ |𝐹1 |(1 − 𝜋)
𝑏
𝑏𝑑
Conclusion : It is possible to achieve a negligible increase in down-time with a
substantial decrease of overall migration time.
E 𝑡𝑑 ≤ 𝐸 𝑡𝑑 + |𝐹1 |(1 − 𝜋)
Problem: Need to know, precisely, 𝜋𝑖 for each page 𝑝𝑖 .
Solution: Gathering information during run-time. (Statistics)
Problem: Non-negligible overheads.
Solution: LRU - Frequency-based approach
Computational Resources:
Scheduling Guarantee by the Kernel – (Q, P).
Meaning period P and share Q
Network Resources:
b needs to be constant and Possible to reserve.
An unstable network is not part of this model, but a
migration will still succeed (but it may not be Live
Migration)
The authors have modified the KVM hypervisor.
Page Tracing mechanism
Page accessed are traced within the hypervisor, using a bitmap
The implementation will exploit this information to modify transfer order
Simulations –
Virtualized VideoLAN Client (VLC) as a streaming server
6500 mapped pages (16KB/page)
Transfer rate of 100MBit/sec.
8 sec. (!)
Guaranteed bandwidth of 50 Mbit/s
Standard vs. LRU
570 -> 300 (47%) (K=1)
360 -> 290 (19.4%) (K=3)
4800 -> 4500 (6.25%) (K=1)
5500 -> 5000 (9.1%) (K=3)
LRU with delayed transmission
LRU vs. Improved LRU (𝜋 = 0.30)
300 -> 220 (27%) (K=3)
5000 ->4400 (12%) (K=3)
• It’s possible to minimize downtime and improve QoS with
simple page ordering algorithms
• With a certain bandwidth, LRU has been proved as an
effective aid for page ordering, achieving good results.
• Further Work needs to be done…