Slides - Indico [Home]

Download Report

Transcript Slides - Indico [Home]

OpenNebula in production
The infrastructure
Stefano Lusso – INFN Torino
OpenNebula infrastructure
Providing persistent storage
Storage infrastructure
Network infrastructure
2
OpenNebula Storage Provisioning
A Datastore is any storage medium used to store
disk images for VMs
$ onedatastore list
ID NAME
0 system_servic
1 default
2 files
100 cached_qcow
101 cached_raw
103 persistent_da
104 persistent_da
105 persistent_da
106 persistent_da
109 system_worker
SIZE
1.8T
2T
115.1G
2T
2T
115.1G
2.2T
2.1T
0M
-
AVAIL
91%
68%
22%
68%
68%
22%
45%
34%
-
CLUSTER
Services
Workers
Workers
Workers
IMAGES
0
48
15
12
2
3
6
19
7
0
TYPE
sys
img
fil
img
img
img
img
img
img
sys
DS
fs
fs
fs
fs
fs
fs
iscsi
iscsi
-
TM
shared
shared
ssh
qcow2
shared
ssh
ssh
iscsi
iscsi
ssh
3
OpenNebula Storage Provisioning
Persistent storage
The storage space should survive the VM lifecycle
Different solution adopted:
• Shared partition (nfs, GlusterFS…)
• Transferred partition (ssh)
• Block devices (iSCSI)
4
OpenNebula Storage Provisioning
Persistent storage from shared partitions
The space is located on a dedicated server
A separate infrastructure is needed
The shared partition can be mounted at
contextualization level
5
OpenNebula Storage Provisioning
Persistent storage transferred via ssh
The image is copied via ssh – time consuming
The image is transferred back to the Datastore only after
the VM is successfully shut down -- inconsistency may
occur
6
OpenNebula Storage Provisioning
Persistent storage via iSCSI
iSCSI driver provides the possibility of using block-devices for VM
images instead of the default file form – improved performance
It works with linux tgtd
It can be forced working with SAN appliance
7
OpenNebula Storage Provisioning
Datastore example
$ onedatastore show 103
DATASTORE 103 INFORMATION
ID
: 103
NAME
: persistent_data
USER
: oneadmin
GROUP
: oneadmin
CLUSTER
: TYPE
: IMAGE
DS_MAD
: fs
TM_MAD
: ssh
BASE PATH
: /var/lib/one/datastores/103
$ onedatastore show 105
DATASTORE 105 INFORMATION
ID
: 105
NAME
: persistent_data_iscsi
USER
: oneadmin
GROUP
: oneadmin
CLUSTER
: TYPE
: IMAGE
DS_MAD
: iscsi
TM_MAD
: iscsi
BASE PATH
: /var/lib/one/datastores/105
8
OpenNebula Storage Provisioning
Persistent storage example
Client (WN) VM
/home size 10GB
/data size 4TB
Head Node VM
9
OpenNebula Storage Provisioning
Persistent storage example
$ onevm show 18082
VM DISKS
ID TARGET IMAGE
0 vda
ubuntu-server-14.04-v3
1 vdb
raw - 117.2G
3 vdd
Giunti-Home
4 vde
Giunti-Data
TYPE SAVE SAVE_AS
file
NO
fs
NO
file YES
file YES
-
$ oneimage show Giunti-Home
IMAGE 446 INFORMATION
ID
: 446
NAME
: Giunti-Home
USER
: giunti
GROUP
: ec2
DATASTORE
: persistent_data_iscsi
TYPE
: DATABLOCK
REGISTER TIME : 07/08 12:05:55
PERSISTENT
: Yes
SOURCE
: iqn.2013-10.org.opennebula:one-iscsi.to.infn.it.vg-one.lv-one-446
FSTYPE
: xfs
SIZE
: 19.5G
10
STATE
: used
RUNNING_VMS
: 1
OpenNebula Storage Provisioning
About iSCSI
iSCSI
iSCSI (Internet Small Computer System Interface) is a data transport protocol used
to carry block-level data over IP networks
Initiator
Typically a server, host or device driver that initiates(i.e. begins) iSCSI command
sequences.
Target
iSCSI targets break down iSCSI command sequences from initiators and process the
SCSI commands.
From the hypervisor perspective (KVM) the iSCSI block device is seen as a local disk (eg. sd*)
11
OpenNebula Storage Provisioning
iSCSI initiator on KVM
[root@one-kvm-63 ~]# ls -alh /var/lib/one/datastores/0/18082
total 5.1G
-rw-rw-r--rw-rw----rw-rw-r--rw-r--r-lrwxrwxrwx
1
1
1
1
1
oneadmin
oneadmin
oneadmin
oneadmin
oneadmin
oneadmin 1.1K Aug 6 20:02 deployment.0
oneadmin 3.7G Sep 7 12:32 disk.0
oneadmin 118G Sep 7 12:17 disk.1
oneadmin 384K Aug 6 20:02 disk.2
oneadmin
40 Aug 6 20:02 disk.2.iso -> /var/lib/one/datastores/109/18082/disk.2
lrwxrwxrwx 1 oneadmin oneadmin
117 Aug 7 16:37 disk.3 -> /dev/disk/by-path/ip-192.168.1.202:3260-iscsiiqn.2013-10.org.opennebula:one-iscsi.to.infn.it.vg-one.lv-one-446-lun-1
lrwxrwxrwx 1 oneadmin oneadmin
104 Aug 7 16:37 disk.4 -> /dev/disk/by-path/ip-192.168.1.215:3260-iscsiiqn.2004-04.com.qnap:ts-809u:iscsi.giuntidata.c60347-lun-0
ip-192.168.1.202:3260
iqn.2013-10.org.opennebula
one-iscsi.to.infn.it
vg-one.lv-one-446-lun-1
( IP addr target ) ( tcp port 3260 )
(iSCSI qualified name) (yyyy-mm.naming-authority)
(hostname)
(partition LVM)
12
OpenNebula Storage Provisioning
iSCSI target on linux
Dedicated server with tgtd running and storage partition available
scsi3 : mpp virtual bus adapter :version:09.03.0C05.0652,timestamp:Tue Jan 8 05:52:48 CST 2013
scsi 3:0:0:0: Direct-Access
SUN
VirtualDisk
0760 PQ: 0 ANSI: 5
scsi(3:0:0:0): Enabled tagged queuing, queue depth 30.
sd 3:0:0:0: Attached scsi generic sg3 type 0
sd 3:0:0:0: [sda] 4294967296 512-byte logical blocks: (2.19 TB/2.00 TiB)
[root@one-iscsi] ~# blkid | grep sda
/dev/sda: UUID="1sebp8-e22F-RraL-tYp2-1e6N-boD3-NqwgYq" TYPE="LVM2_member"
[root@one-iscsi ~]# fdisk -l /dev/mapper/vg--one-lv--one--446
Disk /dev/mapper/vg--one-lv--one--446: 21.0 GB, 20971520000 bytes
255 heads, 63 sectors/track, 2549 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
[root@one-iscsi ~]# blkid | grep vg--one-lv--one--446
/dev/mapper/vg--one-lv--one--446: UUID="59150b9e-d713-4e71-a3e7-a67215083386" TYPE="xfs”
13
OpenNebula Storage Provisioning
iSCSI target on NAS
The iSCSI LUN are provided by QNAP TS-809U NAS
Targets are configured by hand
14
OpenNebula Storage Provisioning
iSCSI target on NAS network restrictions
This QNAP iSCSI NAS has poor granularity in network filtering.
Network restrictions are needed to allow only KVM Hypervisors access
15
OpenNebula Infrastructure
Storage Infrastructure
16
OpenNebula Infrastructure
Storage Infrastructure
Storage requirements:
• Available && performing shared area for service VM
• Lots of TB for certain applications
Solutions
• GlusterFS
• Nfs
• iSCSI
17
OpenNebula Infrastructure
GlusterFS
GlusterFS Volume
The volume is the collection of bricks and most of the Gluster file system operations
happen on the volume. Gluster file system supports different types of volumes based
on the requirements. Some volumes are good for scaling storage size, some for
improving performance and some for both.
A Volume can be Distributed, Replicated, Striped, Dispersed in any intelligent
combination.
A volume is typically a ext4/xfs partition mounted on the server
It is also possible to attach a hot tier to a volume (with promotion/demotion policy)
18
OpenNebula Infrastructure
GlusterFS DHT
DHT stands for Distributed Hash Table.
The way GlusterFS's DHT works is based on a few basic principles:
• All operations are driven by clients, which are all equal. There are no special nodes
with special knowledge of where files are or should be.
• Directories exist on all subvolumes (bricks or lower-level aggregations of bricks); files
exist on only one.
• Files are assigned to subvolumes based on consistent hashing.
Two bricks simplified example:
Brick A: Hash range from 0 to 0x7ffffff
Brick B: Hash range from 0x80000000 to 0xffffffff
A new file is created and corresponding the hash is 0xabad1dea
The hash is between 0x80000000 and 0xbfffffff, so the corresponding file's hashed
location would be on Brick B
19
OpenNebula Infrastructure
GlusterFS Rebalance
As bricks are added or removed, or files are renamed, many files can end up
somewhere other than at their hashed locations. When this happens, the volumes
need to be rebalanced.
This process consists of two parts:
• fix-layout - Calculate new layouts, according to the current set of bricks (and
possibly their characteristics)
• migrate-data - Migrate any "misplaced" files to their correct (hashed) locations
20
OpenNebula Infrastructure
GlusterFS self-healing
In a replicated volume [minimum replica count 2] it could happen, due to some
failures, one or more brick among the replica bricks go down for a while.
If user deletes a file, only the online brick will get affected.
When the offline brick comes online at a later time, it is necessary to have that file
removed from this brick.
The synchronization between the replica bricks is called healing.
The pro-active self-heal daemon runs in the background, diagnoses issues and
automatically initiates self-healing periodically on the files which require healing.
21
OpenNebula Infrastructure
Ceph
OpenNebula can be integrated with Ceph, a distributed object store and file system.
The Ceph datastore driver provides OpenNebula users with the possibility of using Ceph
block devices as their Virtual Images.
With some limitations:
• This driver only works with libvirt/KVM drivers. Xen is not (yet) supported.
• This driver requires that the OpenNebula nodes using the Ceph driver must be part
of a running Ceph cluster.
• The hypervisor nodes need to be part of a working Ceph cluster and the Libvirt and
QEMU packages need to be recent enough to have support for Ceph.
22
OpenNebula Infrastructure
GlusterFS Volumes
GlusterFS is is flexible depending on the layout configuration.
It can be used for many purposes:
• Storage backend for running machines
• Storage backend for image repository
• Remote partition for VM (or group of VMs)
23
OpenNebula Infrastructure
Torino GlusterFS storage setup
SRV-01
SRV-02
master
KVM-SERVICES
KVM-WORKERS
The OpenNebula server and KVM use GlusterFS shared filesystem
[oneadmin@one-master ~]$ df -h
Filesystem
Size Used Avail Use% Mounted on
one-san-01:/VmDir
917G 188G 683G 22% /var/lib/one/datastores/0
one-san-01:/PERSISTENT-STORAGE
2.3T 1.2T 1.1T 53% /var/lib/one/datastores/104
one-san-01:/IMAGEREPO
2.0T 643G 1.4T 32% /var/lib/one/datastores/1
one-san-01:/HOMECLOUD
574G
42G 504G
8% /users
24
OpenNebula Infrastructure
Torino GlusterFS Volumes
Volume Name: PERSISTENT-STORAGE
Type: Distribute
Volume ID: 1b5751dd-7d21-40dc-a214-15b8e87c6299
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: one-san-02.to.infn.it:/mnt/brick-persistor-ext4
Options Reconfigured:
auth.allow: 192.168.5.*
Volume Name: VmDir
Running
Type: Replicate
Volume ID: 8636b8be-89e2-45b9-aabd-03cf8fa33539
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: one-san-01.to.infn.it:/bricks/VmDir01
Brick2: one-san-02.to.infn.it:/bricks/VmDir02
Options Reconfigured:
auth.allow: 192.168.5.*
VM
25
OpenNebula Infrastructure
Torino nfs storage setup
SRV-A
SRV-B
VM- GROUP
VM
If a large amount of data has to be exported to a small number
of VM NFS is the simplest way
26
OpenNebula Infrastructure
nfs storage and ebtables
According to ebtables a MAC address filter is applied at bridge interface level
The server exports to a specific virtual network
[root@one-dsrv-98 ~]# cat /etc/exports
/disk/cmps-home 172.16.219.0/24(rw,fsid=0,no_root_squash)
/disk/cmps-data 172.16.219.0/24(rw,no_root_squash)
The server must have an IP address in VM client VNET (172.16.219.0)
[root@one-dsrv-98 ~]# ifconfig eth0:1
eth0:1
Link encap:Ethernet HWaddr 02:00:AC:10:DB:64
inet addr:172.16.219.100 Bcast:172.16.219.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
And the MAC Address of the interface is forced in the SAME range (02:00:AC:10:DB:**)
[root@one-dsrv-98 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
MACADDR=02:00:AC:10:DB:64
ONBOOT=yes
Eth0
192.168.1.98
IPADDR=192.168.1.98
NETMASK=255.255.248.0
Eth0:1 172.16.219.100
NM_CONTROLLED=no
27
OpenNebula Infrastructure
Storage comparison
simplicity
ssh
✔
nfs
✔
performance
redundancy
✔
✔
iSCSI
GlusterFS
capacity
✔
✔
The above considerations depends on the available hardware
28
OpenNebula Infrastructure
Network infrastructure
Network requirements for a production infrastructure:
• Performance: many 10Gbps servers should work together.
• Reliability: modular / redundant switch
• Isolation: Vlan + ebtables
And Different Physical LAN
o Public access (193.205.66.128/25)
o Private access (192.168.0.0/21 172.16.*.0/24)
o Remote administration (10.10.1.0/24)
29
OpenNebula Infrastructure
Network infrastructure
Network services:
DNS + DHCP: dnsmaq + cobbler
NAT: iptable, OpenWrt
KVM are installed via cobbler and puppet
30
OpenNebula Infrastructure
Network infrastructure
Despite the OpenNebula design and installation guide
recommend to install a dedicated network to manage and
monitor the hypervisors, and move image files, in the running
installation the service and instance network coexist.
Also tenant’s Vnet are on the same physical LAN and the
“Vrouters” share the same KVM network interface
31
OpenNebula Infrastructure
Public Network Connection
Public network
KVM
WN
UsrB
KVM
WN
UsrA
Vrouter
NAT
Vrouter
KVM-SRV
Private network(s)
Network demanding tenants (WN) have NAT access
Other tenants have their own Vrouter that shares KVM-SRV network interface
32