- VMware Labs

Download Report

Transcript - VMware Labs

Storage Virtualization
Discussion led by Larry Rudolph
MIT IAP 2010 The Science of Virtualization
© 2009 VMware Inc. All rights reserved
Virtualize Storage device
Guest OS
Guest Disk
 Guest OS Device Driver for Storage
 Device is emulated
 Virtual Disk is emulated as a single
file on the real file system
Device Driver
Device Emulation
Map virtual disk to
File on physical disk
Physical Disk
Device Driver
Real Disk
 Simple
• Use direct file access
• Guest issues block reads & writes
• Guest issues some other random stuff, but it is just
detail
 Details, details, details
2
VM’s file system size?
 Does each virtual disk have a
fixed size?
• If it is a file on the host file system, then how to
do that?
 Should all the space be preallocated?
 What does Guest OS tell user
• How much space is left on disk?
 Guest OS makes use of a buffer
cache
• Cache of blocks read already, from read-ahead
• Host OS also maintains a block cache
• Silly to have multiple copies
• What if there are several VMs
• Host block cache may thrash
3
Guest OS does disk I/O scheduling
 Guest thinks it has a physical disk attached
 Guest schedules I/O to optimize arm movement
• Is this a problem? Does it matter?
• Guest does not know the truth
 Guest issues a command to virtual disk via disk driver
 Captured by VMM’s emulated device
 Command (translated &) issued to real device
 Return back up to Guest OS, which issues next
command
• But might be too late
4
Optimization: simple sharing
 Assume all n VMs running same version of Windows XP
•
•
•
•
Do not need to keep n copies of Windows read-only code
Can use virtualization to save disk space
How? Virtualization is a level of indirection; just like shared pages
How can this be done efficiently?
 What about other blocks?
• COW – copy on write
 Can / should storage be shared?
• Can host “see” files in the guest?
• What if guest is not running?
• Think about how virus protection works
• Can guest “see” files on the host?
• What happened to Isolation?
5
Consider VMs in a data center
 Two broad models of storage:
 Local disks vs Storage Array Network (SAN)
 Google, for example, uses the local disk model
• Query sent to lots of servers, each searches local disk
• To access data on remote disk, ask it’s server
 What happens when (virtualized) servers move from
one physical machine to another?
• Matters if real connection between the two
6
SANs: Storage Array Network
 Not as simple as it seems
 Servers and Disks connected to network
 SAN need not be commodities – can
provide many services
• These services are physically based.
 Logical Unit Number – LUN
• Think file system or file volume
• SANs deal with these things
 Protocols: Fiber Channel & SCSI
 What services can Storage Arrays
provide?
• RAID (e.g. use 9 disks to provide fault tolerance.)
• Lots of other stuff, but on a LUN granularity
7
SAN
Quality of Service – Multiqueue
 Many VMs sharing one physical connection
 If VMM has only a single queue, then one VM can
dominate leaving another VM starved for service
• Want to provide quality of service to all VMs
 Potential Solution
• Multiple queues
• VMM does appropriate queue popping to maintain QoS
• Best effort vs Fairness in scheduling
 Resource Sharing
• Bandwidth, time, ports, …
8
De-duplication
 Why store multiple copies of the same block; just use
pointers
 Good idea, let’s do it in S/W
 Details?
• Deletion?
• Migration?
• Backup?
9
Snapshots (versioning)
 Application consistent versioning
•
•
•
•
•
10
VMM must flush all block caches / buffers
Guest OS must flush all block caches / buffers
Guest Application must flush all block caches / buffers
Everyone must be told when it is safe to proceed
Who does the new mapping?
Atomicity – locking (reservations)
 There are times when it is desirable to lock
whole file system.
• Why?
 SANs provide locking on a LUN basis
11
Replication
 Want to maintain two copies of the file system
• Geographic separation in case of disaster
• What other reasons?
 SANs provide this on a LUN basis
• Multicast the write block messages
 What are problems with replication at the file
system level
12
Disaster Recovery
 Replication
• Asynchronous
• Synchronous (higher latency)
 Fail-over Mechanism
13
Thin Provisioning
 Allocate blocks on demand
• Map virtual block to physical block on some disk in storage array
• Permits over-commitment of storage
• When get close to real capacity, can add more physical disks to storage array
 Good idea, let’s also do it in S/W
14
Predictive read and other (common case) optimizations
 There are non-volatile buffers in front of disk
 Can do read-ahead to avoid latency costs
• E.g. during boot-up, easy to predict which blocks read next
• Buffer is finite, cannot read everything into it
• Assume some logical flow of reads
 Pattern of reads is on file system level not LUN
15
vMotion – migrate VM and its storage
 Can migrate VM from server to server while it is still
running
 Want to do the same for the (virtual) file system
 Want to migrate live
• Eager or lazy?
16
Summary
 Storage is a big part of data-center (Server,
Network, & Storage)
 Today, the Physical deals in terms of LUNs, the
Virtual in files
• Agree on standard set of APIs
 VMFS (virtual machine file system)
•
•
•
•
•
•
•
17
Provides API’s to be supported by physical SANs
Block copy
Zeroing
Atomic test & set
Thin Provisioning
Block release
…..
Storage virtualization for virtual machines
Christos Karamanolis, VMware, Inc.
© 2009 VMware Inc. All rights reserved
VMware Products
 “Hosted” Products
• VMware Workstation (WS), VMware
Player – free
• ACE
• VMware GSX Server, VMware Server –
free
 “Bare metal” Product
• VMware ESX Server
• Virtual Center
• Services: HA, DRS, Backup
Copyright © 2006 VMware, Inc. All rights reserved.
19
ESX Server Architecture
 Platform for running
virtual machines (VMs)
 Virtual Machine Monitor
(VMM)
 VMkernel takes full
control of physical
hardware
• Resource management
• High-performance network
and storage stacks
• Drivers for physical hardware
 Service console
• For booting machine and
external interface
Copyright © 2006 VMware, Inc. All rights reserved.
20
vSphere Storage Virtualization overview
Copyright © 2006 VMware, Inc. All rights reserved.
21
Storage virtualization: Virtual Disks
• Virtual Disks are presented as
SCSI disks to VM
• Guest OS sees h/w Host Bus
Adapter (HBA)
• Buslogic or LSIlogic
• Proprietary “PVSCSI” adapter
(optimized virtualization)
• HBA functionality emulated by
VMM
• Virtual Disks stored on either
• Locally attached disks
• Storage Area Network (SAN)
• Network Attached Storage (NFS,
iSCSI)
Copyright © 2006 VMware, Inc. All rights reserved.
22
VMkernel: virtual disks -> physical storage
 Support for virtual disks
Virtual
Machine
ESX
VMkernel
(vSCSI)
virtual SCSI
COW
virtual
disks
support
virt disks -> phys storage
Raw
VMFS
NFS
LUN
• Process SCSI cmds from VM HBA
• Hot-add virtual disks to VMs
• Snapshots (COW): versioning, rollback
 Map virtual disk I/O to physical
storage I/O
• Handle physical heterogeneity
• Different phys storage type per virtual
disk
core
SCSI
LUN
disk scheduling
rescan
multipathing
device
drivers
FC
driver
iSCSI
driver
Copyright © 2006 VMware, Inc. All rights reserved.
23
 Advanced storage features
• Disk scheduling
• Multipathing, LUN discovery
• Pluggable architecture for 3rd parties
 Device drivers for real h/w
VMware File System (VMFS)
 Efficient FS for virtual disks on
SANs
• Performance close to native
 Physical storage (LUN)
management
VMFS
• VMFS file systems automatically
recognized when LUNs discovered
• Lazy provisioning of virtual disks
2.vmdk.redo
1.vmdk
 Clustered: sharing across ESX
3.vmdk
hosts
2.vmdk
• Per-file locking
• Uses only SAN, not IP network
SAN
Copyright © 2006 VMware, Inc. All rights reserved.
24
 Supports mobility & HA of VMs
 Scalability: 100s of VMs on 10s of
hosts sharing a VMFS file system
VMFS performance- sequential read (old graphs)
sequential-read workload
180
MBps
160
140
physical
120
virtual
100
80
60
40
20
0
4k
8k
16k
32k
64k
block size
 ESX 2.5 on HP Proliant DL 580, 4 processors, 16 GB mem.
 VM/OS: Windows 2003, Ent.Ed., uni-processor, 3.6 GB mem, 4
GB virtual disk on one 5-disk RAID5 LUN, on Clariion CX500
 Testware: IOmeter, 8 outstanding IOs
Copyright © 2006 VMware, Inc. All rights reserved.
25
VMFS performance- random read (old graphs)
random-read workload
30
25
physical
virtual
MBps
20
15
10
5
0
4k
8k
16k
32k
64k
block size
 ESX 2.5 on HP Proliant DL 580, 4 processors, 16 GB mem.
 VM/OS: Windows 2003, Ent.Ed., uni-processor, 3.6 GB mem, 4
GB virtual disk on one 5-disk RAID5 LUN, on Clariion CX500
 Testware: IOmeter, 8 outstanding IOs
Copyright © 2006 VMware, Inc. All rights reserved.
26
Raw access to LUNs
 Clustering applications in VMs
• Sharing between VMs using SCSI
reservations
• Sharing between VMs and physical
machines
VMFS
1.vmdk
from VMs
4.rdm
• Some disk arrays provide snapshots,
SAN
Copyright © 2006 VMware, Inc. All rights reserved.
27
 In-band control of array features
mirroring, etc. that is controllable via
special SCSI commands
• Similar for access to tape drives
• We provide pass-through mode that
allows VM to control this functionality
Lower layers of storage stack
 Disk scheduling
• Control share of disk bandwidth to each VM
 Multipathing
• Failover to new path when existing storage path fails
• Transparently provides reliable connection to
storage for all VMs
• Improves portability & ease of management
• No multipathing software required in Guest OS
 LUN rescanning and VMFS file system
discovery
• Dynamically add/remove LUNs on ESX
 Storage Device Drivers
• May need to copy/map guest data to low memory
because of driver DMA restrictions
Copyright © 2006 VMware, Inc. All rights reserved.
28
ESX
VMkernel
vSCSI
virtual
disks
support
COW
virt disks -> phys storage
VMFS
Raw
LUN
NFS
core LUN disk sched
SCSI rescan
multipathing
device FC
drivers driver
iSCSI
driver
Storage Solutions
Copyright © 2006 VMware, Inc. All rights reserved.
29
Enabling solutions
 ESX’s storage virtualization platform
• VM state encapsulation
• Generic snapshot/rollback tool
• Highly available access to phys storage
• Safe sharing of the phys storage
 Enabler for virtualization solutions
• Ease of management in the data center
• VM High-Availability and mobility
• Protection of VMs: application + data
Copyright © 2006 VMware, Inc. All rights reserved.
30
Future directions: IO resource control
 Typical usage scenario: singleapplication VM
• Leverage knowledge of
application/workload?
• Provisioning, resource management,
metadata, format
 Advanced IO scheduling
• Adapt to dynamic nature of SAN, workloads
• Combine with control of CPU, mem, net
• Local resource control vs. global goals
• Target high-level ‘business’ QoS goals
Copyright © 2006 VMware, Inc. All rights reserved.
31
Storage VMotion
• Migrate VM disks
• Non-disruptive while VM is running
• VM granularity, LUN independent
• Can be combined with VMotion
• Uses:
• Upgrade to new arrays
LUN A1
LUN B1
LUN A2
LUN B2
Array A (off lease)
32
Array B (NEW)
• Migrate to different class of storage
(up/down tier)
• Change VM -> VMFS volume
mappings
Business continuity
 Virtual machine (configuration, disks) encapsulates all
state needed for recovery
 Safe VM state sharing on SAN, allows immediate recovery
when physical hardware fails  high availability
• Fail over VM upon failure; fully automated
• No h/w dependencies, no re-installation, no idle h/w
 Extend this model to with non-shared and even remote
storage
Copyright © 2006 VMware, Inc. All rights reserved.
33
Consolidated Backup (VCB) Framework
 Backup framework for VMs
 Proxy backup server
• May be in VM or not
• Copy virtual disk off SAN/NAS
 Disk consistency
• Delta disks for consistent disk
image
• Opt. Application quiescing (VSS)
 3rd party backup software
• Number of vendors
 Restore virtual disks from
backup and start VM
Copyright © 2006 VMware, Inc. All rights reserved.
34
• Use VMware Converter
Disaster Recovery (DR)
Site Recovery Manager (SRM)
• Automates disaster recovery
workflows:
• Setup, testing, failover, failback
• Central management of recovery
plans from Virtual Center
• Manual recovery processes ->
automated recovery plans
• Uses 3rd-party storage replication
• Array-based replication
• Setup replication at VMFS volume
granularity (group of LUNs)
35
Breaking virtual disk encapsulation?
 Share/clone/manage VM state
• encapsulation vs. manageability
 Opaque virtual disks
• Pros: encapsulation, isolation, generic versioning and
rollback, simple to manage
• Cons: too coarse granularity, proliferation of VM versions,
redundancy, no sharing
 Explicit support for VM “templates”, share VM
boot virtual disks
 VM state-aware FS that allows transparent
sharing (NSDI-06)
Copyright © 2006 VMware, Inc. All rights reserved.
36
Storage trends and challenges
37
Storage Trends: Converged Data Center Networks
Driving Forces
• need to reduce costs
• need to simplify and
consolidate management
• need to support next-gen
data center architectures
Industry Responses
• converged fabrics
• data center ethernet
• FCoE
• new data center network
topologies
• new rack architectures
• new management paradigms
38
Storage Trends - Information Growth
Driving Forces
• 60% raw growth
• Shift to rich content vs.
transactions
• Consumed globally and via
mobile
• Longer retention periods
• More use of “archival”
information in business
processes
• Desire to move beyond
tape
39
Industry Responses
• Increased focus on policy
• new governance functions
• Categorization software
• ILM, archiving, etc.
• Larger disk drives
• 1TB -> 2TB -> 3TB -> 4TB
• Data deduplication
• removing redundancy
• “Content” clouds
• moving information closer to
users
Storage vision
Today:
40
Tomorrow:
• Manual configuration and
• Zero physical storage
management of physical
storage based on vendors
tools
• multiple tiers of magnetic
media
• Manual mapping of
workload/app to storage
• manual performance
optimization
• disk and tape co-existing
management
• Manage Apps / workloads,
not media
• Automatic, dynamic storage
tiering
• Scale-out (distributed)
storage architectures based
on commodity h/w
• Policy-based automated
placement and movement of
data
Towards the vision
41
Storage awareness and integration
VMware Infrastructure
Virtual Datacenter OS from VMware
Storage
Partners
Infrastructure
vCompute
vServices
vStorage
Storage
operations
• VMFS
Storage
management
• Storage VMotion
• Storage Virtual
Appliances
• vStorage API’s
• VM storage
management
• Linked Clones
• Thin Provisioning
• VMDirectPath
42
vNetwork
vCloud
Consolidated I/O Fabrics
Common Storage and Network
Built on low-latency, lossless 10GbE
FCoE, iSCSI, NAS
Both cost effective bandwidth and
increased bandwidth per VM
Simplified infrastructure and
management
• Simplify physical infrastructure
• Unify LAN, SAN and VMotion networks
• Scale VM performance to 10GE
43
LAN
LAN
SAN A SAN B
VMotion
Scale-out storage architectures
ESX
hosts
Virtual Name Space (VNS):
Hide object placement
details
Client:
ESX hosts
Singlepane
mgmt
Asymmetric
Access protocol,
e.g. pNFS
Specify per-object policy
Storage cluster:
Object mobility, availability,
reliability, snapshots, thin
provisioning, dedup, etc
Storage
containers:
Provide different
tiers of storage
with different
capabilities
44
White
box
storage
h/w
Per-object policy
enforcement
Fabric
Automated, adaptive
storage provisioning
and tiering