17-20 May 2010, Aussois

Download Report

Transcript 17-20 May 2010, Aussois

Journées Informatiques de l'IN2P3
17-20 May 2010, Aussois, France
P. Mato /CERN

Brief introduction to Virtualization
◦ Taxonomy
◦ Hypervisors


Usages of Virtualization
The CernVM project
◦ Application Appliance
◦ Specialized file system

CernVM as job hosting environment
◦ Clouds, Grids and Volunteer Computing

Summary
Journées Informatiques de l'IN2P3 17-20 May 2010, Aussois, France
P. Mato/CERN
2


Credit for bringing virtualization
into computing goes to IBM
IBM VM/370 a reimplementation
of CP/CMS, and was made
available in 1972
◦ added virtual memory hardware
and operating systems to the System/370 series.

Even in the 1970s anyone with any sense could see
the advantages virtualization offered
◦ It separates applications and OS from the hardware
◦ In spite of that, VM/370 was not a great commercial success

The idea of abstracting computer resources continued
to develop
Journées Informatiques de l'IN2P3 17-20 May 2010, Aussois, France
P. Mato/CERN
3

Virtualization of system computer resources such as
◦ Memory virtualization
 Aggregates RAM resources from networked systems
into virtualized memory pool
◦ Network virtualization
 Creation of a virtualized network addressing space
within or across network subnets
 Using multiple links combined to work as though
they offered a single, higher-bandwidth link
◦ Virtual memory
 Allows uniform, contiguous addressing of physically
separate and non-contiguous memory and disk areas
◦ Storage virtualization
 Abstracting logical storage from physical storage
 RAID, disk partitioning, logical volume management
Memory
Networking
Storage
Journées Informatiques de l'IN2P3 17-20 May 2010, Aussois, France
P. Mato/CERN
4

This is what most people today identify with term
“virtualization”
◦ Also known as server virtualization
◦ Hides the physical characteristics of computing platform
from the users
◦ Host software (hypervisor or VMM) creates a simulated
computer environment, a virtual machine, for its guest OS
◦ Enables server consolidation

Platform virtualization approaches
◦
◦
◦
◦
◦
Operating system-level virtualization
Partial virtualization
Paravirtualization
Full virtualization
Hardware-assisted virtualization
Journées Informatiques de l'IN2P3 17-20 May 2010, Aussois, France
P. Mato/CERN
5


Virtual machine simulates enough hardware to allow an
unmodified "guest" OS
A key challenge for full virtualization is the interception
and simulation of privileged operations
◦ The effects of every operation performed within a given virtual
machine must be kept within that virtual machine
◦ The instructions that would "pierce the virtual machine"
cannot be allowed to execute directly; they must instead be
trapped and simulated.

Examples
◦ Parallels Workstation, Parallels Desktop for Mac, VirtualBox,
Virtual Iron, Oracle VM, Virtual PC, Virtual Server, Hyper-V,
VMware Workstation, VMware Server (formerly GSX Server),
QEMU
Journées Informatiques de l'IN2P3 17-20 May 2010, Aussois, France
P. Mato/CERN
6

To create several virtual servers on
one physical machine we need a hypervisor
or Virtual Machine Monitor (VMM).
◦ The most important role is to arbitrate the
access to the underlying hardware, so that
guest OSes can share the machine.
◦ VMM manages virtual machines (Guest OS
+ applications) like an OS manages
processes and threads.

Most modern operating system work
with two modes:
◦ kernel mode
 allowed to run almost any CPU instructions, including "privileged"
instructions that deal with interrupts, memory management…
◦ user mode
 allows only instructions that are necessary to calculate and process
data, applications running in this mode can only make use of the
hardware by asking the kernel to do some work (a system call).
Journées Informatiques de l'IN2P3 17-20 May 2010, Aussois, France
P. Mato/CERN
7

A technique that all (software
based) virtualization solutions
use is ring deprivileging:
◦ the operating system that runs
originally on ring 0 is moved to
another less privileged ring like
ring 1.
◦ This allows the VMM to control
the guest OS access to resources.
◦ It avoids one guest OS kicking
another out of memory, or a
guest OS controlling the
hardware directly.
Journées Informatiques de l'IN2P3 17-20 May 2010, Aussois, France
P. Mato/CERN
8

Virtualization technique that presents a software
interface to virtual machines that is similar but not
identical to that of the underlying hardware.
◦ Guest kernel source code modification instead of binary
translation
◦ The paravirtualization provides specially defined 'hooks' to
allow the guest(s) and host to request and acknowledge these
tasks, which would otherwise be executed in the virtual
domain (where execution performance is worse)
◦ Paravirtualized platform may allow the virtual machine monitor
(VMM) to be simpler (by relocating execution of critical tasks
from the virtual domain to the host domain) and faster

Paravirtualization requires the guest operating system
to be explicitly ported for the para-API
◦ a conventional O/S distribution which is not paravirtualization
aware cannot be run on top of a paravirtualized VMM.
Journées Informatiques de l'IN2P3 17-20 May 2010, Aussois, France
P. Mato/CERN
9

With hardware-assisted virtualization,
the VMM can efficiently virtualize the
entire x86 instruction set by handling
these sensitive instructions using a
classic trap-and-emulate model in
hardware, as opposed to software
◦ System calls do not automatically result
in VMM interventions: as long as system
calls do not involve critical instructions,
the guest OS can provide kernel services
to the user applications.

Intel and AMD came with distinct implementations of
hardware-assisted x86 virtualization, Intel VT-x and AMD-V,
respectively
Journées Informatiques de l'IN2P3 17-20 May 2010, Aussois, France
P. Mato/CERN
10

Two strategies to reduce total overhead
◦ Total Overhead = Frequency of "VMM to VM" events * Latency of event
◦ Reducing the number of cycles that the VT-x instructions take.
 VMentry latency was reduced from 634 (Xeon 70xx) to 352 cycles in
the (Xeon 51xx, Xeon 53xx, Xeon 73xx)
◦ Reducing frequency of VMM to VM events
 Virtual Machine Control Block contains the state of the virtual CPU(s)
for each guest OS allowing them to run directly without interference
from the VMM.
Journées Informatiques de l'IN2P3 17-20 May 2010, Aussois, France
P. Mato/CERN
11

Software virtualization is very mature, but there is
very little headroom left to improve
◦ Second generation hardware virtualization (VT-x+EPT and
AMD-V+NPT) is promising
◦ it is not guaranteed that it will improve performance across
all applications due to the heavy TLB miss cost

The smartest way is to use a hybrid approach like
VMware ESX
 paravirtualized drivers for the most critical I/O
components
 emulation for the less important I/O
 Binary Translation to avoid the high "trap and emulate"
performance penalty
 hardware virtualization for 64-bit guests
Journées Informatiques de l'IN2P3 17-20 May 2010, Aussois, France
P. Mato/CERN
12


Virtual machines can cut time and money out of the
software development and testing process
Great opportunity to test software in a large variety of
‘platforms’
◦ Each platform can be realized by a differently configured
virtual machines
◦ Easy to duplicate same environment in several virtual machines
◦ Testing installation procedures from well defined ‘state’
◦ Etc.

Example: Execution Infrastructure in ETHICS (spin-off of
the EGEE project)
◦ Set of virtual machines that run a variety of platforms attached
to an Execution Engine where Build and Test Jobs are executed
on behalf of the submitting users
Journées Informatiques de l'IN2P3 17-20 May 2010, Aussois, France
P. Mato/CERN
14

Installing the “complete software environment” in
the Physicist’s desktop/laptop [or Grid] to be able
to do data analysis for any of the LHC experiments
is complex and manpower intensive
◦ In some cases not even possible if the desktop/laptop OS
does not match any of the supported platforms
◦ Application software versions change often
◦ Only a tiny fraction of the installed software is actually used


High cost to support large number of compilerplatform combinations
The system infrastructure cannot evolve
independently from the evolution of the application
◦ The coupling between OS and application is very strong
Journées Informatiques de l'IN2P3 17-20 May 2010, Aussois, France
P. Mato/CERN
15
Application

Traditional model
◦ Horizontal layers
◦ Independently developed
◦ Maintained by the different
groups
◦ Different lifecycle
Libraries

Application is deployed
on top of the stack
◦ Breaks if any layer changes
◦ Needs to be certified every
time when something
changes
◦ Results in deployment and
support nightmare
Journées Informatiques de l'IN2P3 17-20 May 2010, Aussois, France
P. Mato/CERN
16
Virtual Machine

◦ Analyzing application
requirements and dependencies
◦ Adding required tools and
libraries
◦ Building minimal OS
◦ Bundling all this into Virtual
Machine image
Application
Libraries
Tools
Databases
OS
Application driven approach

Virtual Machine images
should be versioned just like
the applications
◦ Assuring accountability to
mitigate possible negative
aspects of newly acquired
application freedom
Journées Informatiques de l'IN2P3 17-20 May 2010, Aussois, France
P. Mato/CERN
17

Emphasis in the ‘Application’
Virtual Machine
◦ The application dictates the platform and
not the contrary

Application (e.g. simulation) is
bundled with its libraries, services
and bits of OS
◦ Self-contained, self-describing, deployment
ready

Application
Libraries
Tools
Databases
OS
What makes the Application ready to run
in any target execution environment?
◦ e.g. Traditional, Grid, Cloud
 Virtualization is the enabling technology
Journées Informatiques de l'IN2P3 17-20 May 2010, Aussois, France
P. Mato/CERN
18
Journées Informatiques de l'IN2P3 17-20 May 2010, Aussois, France
P. Mato/CERN
19

Aims to provides a complete, portable and easy to
configure user environment for developing and running
LHC data analysis locally and on the Grid independent
of physical software and hardware platform (Linux,
Windows, MacOS)
◦ Code check-out, edition, compilation,
local small test, debugging,…
◦ Grid submission, data access…
◦ Event displays, interactive data analysis, …
◦ Suspend, resume…


Decouple application lifecycle
from evolution of system infrastructure
Reduce effort to install, maintain and keep up to date
the experiment software
http://cernvm.cern.ch
Journées Informatiques de l'IN2P3 17-20 May 2010, Aussois, France
P. Mato/CERN
20

R&D Project in CERN Physics Department
◦ Hosted in the SFT Group (http://sftweb.cern.ch/sft )
◦ The same group that takes care of ROOT & Geant4, looks
for common projects and seeks synergy between
experiments

CernVM Project started in 01/01/2007, funded for
4 years
◦ Good collaboration with ATLAS, LHCb and starting with CMS
Journées Informatiques de l'IN2P3 17-20 May 2010, Aussois, France
P. Mato/CERN
21
Starting from experiment software…












…ending with a custom Linux specialised for
a given task




Installable CD/DVD
Stub Image
Raw Filesystem Image
Netboot Image
Compressed Tar File
Demo CD/DVD (Live CD/DVD)
Raw Hard Disk Image
Vmware ® Virtual Appliance
Vmware ® ESX Server Virtual
Appliance
Microsoft ® VHD Virtual Apliance
Xen Enterprise Virtual Appliance
Virtual Iron Virtual Appliance
Parallels Virtual Appliance
Amazon Machine Image
Update CD/DVD
Appliance Installable ISO
Journées Informatiques de l'IN2P3 17-20 May 2010, Aussois, France
P. Mato/CERN
22
Every build and every file installed on the system
is automatically versioned and accounted for in a database
Journées Informatiques de l'IN2P3 17-20 May 2010, Aussois, France
P. Mato/CERN
23
Journées Informatiques de l'IN2P3 17-20 May 2010, Aussois, France
P. Mato/CERN
24
1.
Login to Web
interface
2.
Create user
account
3.
Select experiment,
appliance flavor and
preferences
Journées Informatiques de l'IN2P3 17-20 May 2010, Aussois, France
P. Mato/CERN
25

CernVM defines a common platform that can be used by all
experiments/projects
◦ Minimal OS elements (Just-enough-OS)
◦ Same CernVM virtual image for ALL experiments

It downloads only what is really needed from the experiment
software and puts it in the cache
◦ Does not require persistent network connection (offline mode)
◦ Minimal impact on the network
Journées Informatiques de l'IN2P3 17-20 May 2010, Aussois, France
P. Mato/CERN
26

CernVM comes with the read-only file system
(CVMFS) optimized for software distribution
◦ Very little fraction of the experiment software is actually
used (~10%)
◦ Very aggressive local caching, web proxy cache (squids)
◦ Transparent file compression
◦ Integrity checks using checksums, signed file catalog
◦ Operational in off-line mode

No need to install any experiment software
◦ ‘Virtually’ all versions of all applications are already
installed
◦ The user just needs to start using it to trigger the download
Journées Informatiques de l'IN2P3 17-20 May 2010, Aussois, France
P. Mato/CERN
27
~1000 different
IP addresses
~2000 different
IP addresses
Journées Informatiques de l'IN2P3 17-20 May 2010, Aussois, France
P. Mato/CERN
28
Proxy and slave servers could be
deployed on strategic locations to
reduce latency and provide
redundancy
CernVM
HTTP
server
CernVM
CernVM
Proxy
Server
Proxy
Server
Proxy
Server
HTTP
server
Working with ATLAS & CMS Frontier
teams to reuse already deployed
squid proxy infrastructure
Journées Informatiques de l'IN2P3 17-20 May 2010, Aussois, France
P. Mato/CERN
29
CernVM
Proxy
Server
CernVM
Proxy
Server
HTTP
HTTP
HTTP
server
server
server
Content
Distribution
Network
Proxy
Server
CernVM
LAN
CROWD: P2P like mechanism for discovery of
nearby CernVMs and cache sharing between
them. No need to manually setup proxy servers
(but they could still be used where exist)
HTTP
HTTP
HTTP
server
server
server
M
I
R
O
R
R
I
N
G
WAN
Use Content Delivery Network (such as SimpleCDN) to
remove a single point of failure and fully mirror the
central distribution to at least one more site.
Journées Informatiques de l'IN2P3 17-20 May 2010, Aussois, France
P. Mato/CERN
30


Is the convergence of three major trends
◦ Virtualization - Applications separated from infrastructure
◦ Utility Computing – Capacity shared across the grid
◦ Software as a Service – Applications available on demand
Commercial Cloud offerings can be integrated for
several types of work such as simulations or computebound applications
◦ Pay-as-you-go model
◦ Question remains in their data access capabilities to match our
requirements
◦ Good experience from pioneering experiments (e.g. STAR MC
production on Amazon EC2)
◦ Ideal to absorb computing peak demands (e.g. before
conferences)

Science Clouds start to provide compute cycles in the
cloud for scientific communities
Journées Informatiques de l'IN2P3 17-20 May 2010, Aussois, France
P. Mato/CERN
31

CernVM as job hosting
environment on Cloud/Grid
◦ Ideally, users would like to run
their applications on the grid (or
cloud) infrastructure in exactly the
same conditions in which they
were originally developed

CernVM already provides
development environment and
can be deployed on cloud
(EC2)
◦ One image supports all four LHC
experiments
◦ Easily extensible to other
communities
Fermilab, March 8 20010- 32


Exactly the same environment for development
(user desktop/laptop) and large job execution
(grid) and final analysis (local cluster)
Software can be efficiently installed using CVMFS
◦ HTTP proxy assures very fast access to software even if VM
cache is cleared


Can accommodate multi-core jobs
Deployment on EC2 or alternative clusters
◦ Nimbus, Elastic
Fermilab, March 8 20010- 34

BOINC
◦ Open-source software for volunteer computing and grid
computing (http://boinc.berkeley.edu/ )
◦ Ongoing development to use VirtualBox running CernVM as
a job container
 http://boinc.berkeley.edu/trac/wiki/VirtualBox
◦ Adds possibility to run unmodified user applications
◦ Better security due to guest OS isolation
BOINC
LHC@HOME
PanDA
Pilot

Cloud computing (IaaS, Infrastructure as a Service)
should enable us to ‘instantiate’ all sort of virtual
clusters effortless
◦ PROOF clusters for individuals or for small groups
◦ Dedicated Batch clusters with specialized services
◦ Etc.

Turnkey, tightly-coupled cluster
◦ Shared trust/security context
◦ Shared configuration/context information

IaaS tools such as Nimbus would allow one-click
deployment of virtual clusters
◦ E.g. the OSG STAR cluster: OSG head-node (gridmapfiles, host
certificates, NFS, Torque), worker nodes: SL4 + STAR
Journées Informatiques de l'IN2P3 17-20 May 2010, Aussois, France
P. Mato/CERN
36

Virtualization is a broad term that refers to the
abstraction of computer resources
◦ Old technology making a comeback thanks to breakdown in
frequency scaling and appearance of multi and many core
CPU technology
◦ Enabling vertical software integration
◦ Enabling technology of Cloud computing
◦ Virtualization is here to stay for a foreseeable future

CernVM
◦ A way simplify software deployment and jump on the
Cloud-wagon
◦ User environment petty well understood, evolving towards a
job hosting environment (grid, cloud, volunteering
computing)
Journées Informatiques de l'IN2P3 17-20 May 2010, Aussois, France
P. Mato/CERN
37