CSE 543 - Computer Security (Fall 2004)

Download Report

Transcript CSE 543 - Computer Security (Fall 2004)

CSE 598c Virtual Machines
“Diagnosing Performance Overheads in the Xen Virtual Machine
Environment”
Aravind Menon, Jose Renato Santos, Yoshio Turner, G. Janakiraman, Willy Zwaenepoel
Lisa Johansen
March 13, 2006
CSE598c - Virtual Machines - Spring 2006 - Diagnosing Performance Overheads in the Xen Virtual Machine Environment
Page 1
Motivation
• Performance of an application in VM environments
are affected by:
–
–
–
–
Operating System
Other Processes
Underlying VMM
Other VMs
• We want a way to measure the elements which effect
performance in a Virtual Machine environment (Xen)
CSE598c - Virtual Machines - Spring 2006 - Diagnosing Performance Overheads in the Xen Virtual Machine Environment
Page 2
Outline
•
•
•
•
Overview Statistical Analysis in VMs
Xenoprof
Performance Debugging
Performance Overhead Analysis in Xen
CSE598c - Virtual Machines - Spring 2006 - Diagnosing Performance Overheads in the Xen Virtual Machine Environment
Page 3
Issues in VM Statistical Analysis
OS
Virtual Machine
P1
P3
P1
P4
P2
P2
P1
P3
VM
P2
VM
P4
Kernel
Kernel
Kernel
VMM
Hardware
Hardware
VM
P1
Kernel
• Distributed computing
– Distributed profiling
• VM’s don’t have access to hardware events
– VMM does
CSE598c - Virtual Machines - Spring 2006 - Diagnosing Performance Overheads in the Xen Virtual Machine Environment
Page 4
Xenoprof
• In order to handle distributed profiling, each VM runs
an OProfile for individual profiling
• In order to monitor hardware, Xenoprof accepts
hypercalls from OProfile and returns samples through
interrupts
P1
P2
P3
P1
P4
P2
Kernel
P1
Kernel
OProfile
Dom0
OProfile
OProfile
Xenoprof
VMM
Hardware
CSE598c - Virtual Machines - Spring 2006 - Diagnosing Performance Overheads in the Xen Virtual Machine Environment
Page 5
How it works
• Each profiling domain queries the Xenoprof to find
out if it should be the initiator
– If there are multiple domains, Dom0 must be the initiator
• The initiator collects profiling requirements from the
participants and forwards this information to the
Xenoprof
• Xenoprof collects program counter samples in
accordance to the instructions
• These samples are then given to the OProfilers
where they are mapped to the correct process
• Individual or system wide performance can then be
determined
CSE598c - Virtual Machines - Spring 2006 - Diagnosing Performance Overheads in the Xen Virtual Machine Environment
Page 6
Performance Debugging - Networking
• The motivating example was looking at the
comparison of receiver throughput between Linux
and XenoLinux
– Varying the size of the user-level buffer greatly effects
XenoLinux. Why?
• Using Xenoprof they found:
– XenoLinux kernel was the source of the increase in
execution time
– skb_copy_bits, skbuff_ctor, and tcp_collapse were the
culprit functions
• This is all due to time spent defragmenting memory
taken up by empty socket buffer contents
CSE598c - Virtual Machines - Spring 2006 - Diagnosing Performance Overheads in the Xen Virtual Machine Environment
Page 7
Performance Overhead Evaluation
• Given this cool new tool, let’s apply it and determine
performance overheads
– Namely in network communication because it is an
important element of VMs
• Evaluate:
– Receiver workload
– Sender workload
– Web server workload
• In three configurations:
– Xen-domain0
– Xen-guest0 (same CPU)
– Xen-guest1 (different CPUs)
CSE598c - Virtual Machines - Spring 2006 - Diagnosing Performance Overheads in the Xen Virtual Machine Environment
Page 8
Receiver workload
• Domain0
– Degraded performance when compared to Linux
– Found that instruction TLB misses and data TLB misses
are much greater than in Linux (primary cause)
• May be TLB flushing or increase in working set size
– Instruction cost is greater in XenoLinux due to overheads
that exist within Xen
• Guest0 & Guest1
– Degraded performance when compared to Dom0
– Significant increase in instructions
• Page remapping and transfer from Dom0 to DomUs
– Increased L2 cache misses caused by increased working
set size
CSE598c - Virtual Machines - Spring 2006 - Diagnosing Performance Overheads in the Xen Virtual Machine Environment
Page 9
Sender workload
• Domain0
– No throughput differences when compared to Linux
• Guest0
– Huge throughput degradation based on the high instruction
cost (max 706 Mb/s compared to 3764 Mb/s)
– The TCP stack processes a larger number of packets than
Dom0 to transfer the same amount of data
• Due to the lack of TCP segmentation offload support
• Also computes large checksums
• Driver domain model prevents these instructions to be offloaded into
physical interface
• If similar abilities are taken away from Dom0, we see similar results
CSE598c - Virtual Machines - Spring 2006 - Diagnosing Performance Overheads in the Xen Virtual Machine Environment
Page 10
Webserver workload
• Overall, very similar to the receive and send
• Domain0
– Higher TLB miss rate than that of Linux
• Guest0
–
–
–
–
Higher instruction costs
Highest L2 cache miss rates
Highest computational overhead
TSO offload don’t matter due to the small payloads
• Guest1
– Higher instruction costs
– Higher L2 cache miss rates
– TSO offload don’t matter due to the small payloads
CSE598c - Virtual Machines - Spring 2006 - Diagnosing Performance Overheads in the Xen Virtual Machine Environment
Page 11
Conclusion
• Xenoprof is a tool to examine performance within Xen
• Xenoprof has been used to examine the different
performance elements of network communication in
Xen
• It can be used to evaluate other performance within
Xen
CSE598c - Virtual Machines - Spring 2006 - Diagnosing Performance Overheads in the Xen Virtual Machine Environment
Page 12