presentation
Download
Report
Transcript presentation
Survey of State-of-the-art in Inter-VM
Communication Mechanisms
Jian Wang
Introduction
Shared memory research
Scheduler optimization research
Challenges and problems
2
Virtual
Virtual
Machine
Machine
A
B
Hypervisor
(or Virtual Machine Monitor)
Virtualization technology is
mainly focused on building
the isolation barrier
between co-located VMs.
However, applications often
wish to talk across this
isolation barrier
E.g. High performance grid
apps, web services, virtual
network appliances,
transaction processing,
graphics rendering.
Physical Machine
3
Transparent to applications BUT
High communication overhead between
co-located VMs
Native Loopback
Xen Inter-VM
Flood Ping
RTT
(Microsecs)
TCP
Bandwidth
(Mbps)
UDP
Bandwidth
(Mbps)
6
140
4666
2656
4928
707
4
Domain 0
PKT
Communication data path between co-located VMs
5
Packet
routed
VM 1
Domain-0
VM 2
Xen
Put packet
Ask Xen to
Ask Xen to
into a page
transmit
swap/copy
pages
pages
6
Advantages of using Shared Memory:
No need for per-packet processing
Pages reused in circular buffer
Writes are visible immediately
Fewer hypercalls (only for signaling)
7
VM 1
VM 2
Xen
Allocate one pool
Ask Xen to
of pages
share pages
8
1.Performance:
High throughput, low latency and acceptable CPU consumption.
2.Transparency:
Don't change the app. Don't change the kernel.
3. Dynamism:
On-the-fly setup/teardown channels.
Auto discovery. Migration support.
9
Dom1
DomX
t1
Dom2
…...
t2
Dom1
DomX
t3
Dom1
……
t4
Dom2
10
Scheduler induced delays
Jboss
query1
reply1
DB
Jboss
DB
query2
query1
reply2
Running
on dedicated servers
query2
reply1
reply2
Runnning on
consolidated
server
Scheduler induced delays
Network latency
11
Lack of communication awareness in
VCPU scheduler
Lacks knowledge of timing requirements of
tasks/applications within each VM.
Absence of support for real-time
inter-VM interactions
Unpredictability of current VM scheduling
mechanisms
12
Low latency
Independent of other domains’ workloads
Predictable
13
Shared Memory Research
14
Xiaolan Zhang, Suzanne McIntosh
Shared Memory between two domains
One way communication pipe
Below Socket layer
Bypass TCP/IP stack
No auto discovery, no migration support, no
transparency
15
Server
socket();
bind(sockaddr_inet);
listen();
accept();
Client
•Remote address
socket();
• Remote port #
connect(sockaddr_inet);
• Local port #
• Remote VM #
socket();
bind(sockaddr_xen);
•Remote VM #
socket();
• Remote grant #
connect(sockaddr_xen);
System returns grant # for client
16
Kangho Kim Cheiyol Kim
Bi-directional communication
Transparent to applications
Below Socket layer
Significant kernel modifications,
No migration support, TCP only
17
Domain A
Event channel
Domain B
SQ
SQ
Head Tail
Head Tail
RQ
RQ
Head Tail
Head Tail
18
Wei Huang, Matthew Koop
IVC library providing efficient intra-physical node
communication through shared memory
Provides auto discovery and migration support
User transparency or kernel transparency not fully
supported, only MPI protocol supported
19
IVC consists of two parts:
A user space communication
library
A kernel driver
Uses a general socket style
interface.
20
Prashanth Radhakrishnan, Kiran Srinivasan
Map in the entire physical memory of the peer VM
Zero copy between guest kernels
On-the-fly setup/teardown channels not supported,
In their model, VMs need to fully trust each other, which
is not practical.
21
22
Jian Wang, Kartik Gopalan
Enables direct traffic exchange between co-located
VMs
Transparency for Applications and Libraries
Kernel Transparency
Automatic discovery of co-located VMs
On-the-fly setup/teardown XenLoop channels
Migration transparency
23
XenLoop Architecture
One-bit bidirectional channel
Netfilter
hook to capture
Applications
Applications
other endpoint
and examine outgoing to notify theLockless
producerSocketpackets.
Layer
Socket Layer
that data is available in FIFO
consumer circular buffers
Transport Layer
Transport Layer
Network Layer
Software
Bridge
FIFO A B
FIFO B A
OUT
IN
XenLoop Layer
Netfront
Virtual Machine A
Network Layer
Event Channel
N
B
Domain
Discovery
N
Software
Bridge B
Domain 0
IN
OUT
Software
Bridge
XenLoop Layer
Netfront
24
Virtual Machine B
XenSosket
XWay
IVC
MMNet
Xenloop
User Transparent
X
√
X
√
√
Kernel Transparent
√
X
X
√
√
Transparent
Migration Support
X
X
Not fully
transparent
X
√
Standard protocol
support
X
Only TCP
Only MPI or
√
app protocols
√
Auto VM Discovery
& Conn. Setup
X
X
√
√
√
Complete memory
isolation
√
√
√
X
√
Location in
Software Stack
Below
Below
User Library
socket layer socket layer + syscalls
Below
IP layer
Below IP
layer
Copying Overhead
2 copies
2 copies
4 copies at
present
2 copies
2 copies
25
Scheduler Optimization Research
26
Preferentially scheduling
communication oriented domains
Introduce short term unfairness
Performance VS Fairness
Address inter-VM communication
characteristics
27
Sriram Govindan, Arjun R Nath
Prefer VM with most pending network packets
Both to be sent and received
Predict pending packets
Receive prediction
Send prediction
Fairness
Still preserve reservation guarantees over a coarser time
scale – PERIOD
28
Packet Reception
Domain 1 Domain 2
…
Domain n
Guest
Domains
domain1.pending-Hypervisor
Packet arrive
at the NIC
Domain0.pending-Domain0.pending++
NIC
Interrupt
Domain0
Domain1.pending++
Schedule
Domain
1.
Now,
schedule
domain0.
29
29
Diego Ongaro, Alan L. Cox
Boosting I/O domains
Used when an idle domain is sent a virtual interrupt
Run-queue ordering
Within each state, sorts domains by credits remaining
Tickling too soon
Don’t tickle while sending virtual interrupts
30
Hwanju Kim, Hyeontaek Lim
Use task info to determine whether a domain that gets a
event notification is I/O-bound
Give the domain a partial boost if it is I/O bound.
Partial boosting
Partial boosted VCPU can preempt a running VCPU and
handle the pending event.
Whenever it is inferred as non-I/O-bound, the VMM will
revoke CPU from the partially boosted VCPU.
Use correlation information to predict whether an event is
directed for I/O tasks
Block I/O
Network I/O
31
Jian Wang, Kartik Gopalan
Dom1
DomX
t1
Dom2
…...
t2
DomX
t3
Dom1
……
t4
Dom2 cannot get time slice as early as possible
32
One time slice(30ms)
Dom1
Dom2
One way AICT
Dom1
Dom2
Dom1
Two way AICT
Basic Idea
Donate unused time slices to the target domain
Proper Accounting
When source domain donates time slice to target guest, charge credits on
source domain in stead of target domain.
33
Real-time Guarantee
Coordinate with guest scheduler
Compositional VM systems
Web Server
Dom1
Application Server
Dom2
Database Sever
Dom3
34
For co-located inter-VM communication
Shared memory greatly improves performance
Optimizing scheduler has much benefits
35
Thank You.
Questions?
36
Backup slides
37
XenLoop Performance
Netperf UDP_STREAM
38
XenLoop Performance (contd.)
39
XenLoop Performance (contd.)
40
XenLoop Performance (contd.)
Migration Transparency
Colocated VMs
Separated VMs
Separated again
41
Future Work
Compatibility with routed-mode Xen setup
Implemented. Under testing.
Packet interception b/w socket and transport layers
Do this without changing the kernel.
Will reduce 4 copies to 2 (as others), significantly improving
bandwidth performance.
XenLoop for Windows guest?
Windows Linux XenLoop Channel
XenLoop architecture mostly OS agnostic.
42