OpenStackMitakaSummit Tokyo 2015

Download Report

Transcript OpenStackMitakaSummit Tokyo 2015

How Adobe Built An OpenStack Cloud
Jun Park (Ph.D, MBA), Solutions Architect At Adobe
Arghya Banerjee, Sr. Systems Engineer At Adobe
OpenStack Mitaka Summit At Tokyo, Oct 2015
Swiss Cheese Model
Flaws In Defense layers
If aligned, flaws would allow an accident to occur
From
Wikipedia
2
Two More Factors That Complicate
Things
SpaceTime Continuum
- Einstein
Interactions,
Higgs Field & Boson
From
Wikipedia
From
Youtube
3
Our Template To Analyze
Components
In Red: Bugs or Issues
In Green: Fix or Stable
Dependencies
Time
4
OpenStack Survey, May 2015
5
Adobe OpenStack Architecture
Storage: Ceph
RBD
VM1
eth0
eth1
VM3
VM2
eth0
eth1
eth0
eth1
Private Networks: VxLAN-based
External Provider Networks: VLAN-based
Adobe Network Firewall
Adobe Corporate Networks
6
What Happened At Networking?
A New Bug: OVS Sporadically Crashes In Adding A Port
(https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1336555 and 1449012)
Neutron
OpenvSwitc
h
(OVS)
Restarting agents
re-establishes entire flows
Fix ready, not added
Security Group
O(N^2) Issue
OVS 2.0.1
OVS 2.3.0
Released:
Bug Fix OVS 2.1.3
Mega Flow In all OVS 2.xOVS 2.0.2
Multiprocessing
Released
This Bug
Introduced with
OVS Mega Flow
Ubuntu 14.04
Bug Report
Trusty Released With OVS 2.0.1
With OVS 2.0.1 In Ubuntu 14.04
Ubuntu
14.04
Jun ‘13
Dec ‘13
Apr ‘14
7
Jul ‘14Aug ‘14
Enhancement Patch
Not Yet Integrated
(e.g., 270 secs to 3 secs
For 25K rules)
Cherry-Pick
On OVS 2.0.2
In Ubuntu 14.04.2
May ‘15
What Happened At Networking?
A New Bug: OVS Sporadically Crashes In Adding A Port
(https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1336555 and 1449012)
OVS
OVS 2.0.1
Released:
Mega Flow
Multiprocessing
OpenStack
Summits
•
•
Some companies reverted OVS to LinuxBridge
Some pundits spread FUD about Neutron!
Atlanta
IceHouse
Paris
Juno
Cherry-Pick
Onto OVS 2.0.2
In Ubuntu 14.04
Ubuntu 14.04
Trusty Released
With OVS 2.0.1
Ubuntu
14.04
Dec ‘13
Apr ‘14 May ‘14
8
Vancouver
Kilo
Nov ‘14
May ‘15
What Happened At Storage?
Ceph Operational Instability,
Cinder Scalability Issue
Cinder
Ceph
Ubuntu
14.04
Cinder is stuck
when Ceph is stuck
(e.g., use local drive
for copying an image)
Failover Instability
With FireFly
Ubuntu 14.04
Trusty Released
With Ceph FireFly 0.79
Apr ‘14 May ‘14
9
Enhancement Solution
Not Yet Integrated
(e.g., APIs Stacked Up ->
Multiprocessing)
Hammer?
Ubuntu 14.04 Updates
With Ceph FireFly 0.80.10
July ‘15
What Happened At Data Node?
Kernel Memory Bug,
Security Issue
KVM
Kernel
Security Issue
XFS
Deadlock
Bug
Security Patch
Bug Fix
Ubuntu 14.04
Trusty Released
With Kernel…
Ubuntu
14.04
Dec‘13
Ubuntu 14.04
Trusty Released
With Kernel…
Apr ‘14May ‘14
10
Nov‘14
May ‘15 July ‘15
Check List

Networks






Understand OVS and find stable OVS
Cherry-pick for Neutron Scalability: firewall rules
Our own out-of-band rate limiting on networks, e.g., 200 Mbps
Set up right MTU size on OVS structure
Turn off GRO/LRO on hosts
Storage



Decouple Storage system from OpenStack API services
Cinder Scalability
Ceph Stability: Hammer, reconfigure towards optimal
11
How To Test at Scale


Emulate future production env

Create hundreds of VMs, inject workloads, and destroy all

Recycle this entire test over and over again

Findings: dead tokens stacked up
Each component scalability

Neutron: OVS

Cinder: Ceph

Nova: KVM
12
Have We Done Enough?
3?
4?
It's not that I'm so smart, it's just
that I stay with problems longer.
- Albert Einstein
14
New Efforts In OpenStack

OpenStack Product Working Group


Governance/DefCoreCommittee


Defining OpenStack Core
Large Deployment Team


Link up between contributors and users
Operational issues for large delpoyments
Open Virtual Network (OVN)

In-kernel Conntrack, DPDK, etc. Will run atop OVS
15
APPENDIX
17
USE CASE: Mesos Cluster
18
Possible Combinations
Bare Metals Containers
Containers
In Containers
VMs
19
VMs
Mesos Cluster Via Heat
Host1
Host2
VM2: mesos slave1
VM1: mesos master
-> Ubuntu-mesos image
available via diskimage-builder
-> Post configuration for master
-> starting services
Host3
VM3: mesos slave2
http server
-> Ubuntu-mesos image
-> Post configuration for slave
using mesos master IP.
-> starting20 services
http server
Mesos Cluster with Marathon
Request to run a micro-service
via REST API
Marathon
Mesos Master
With ZooKeeper
Mesos Slave1
Mesos Slave2
http server
http server
21
Master + 2 slaves: Heat Stacks
22
Topology of Slave2
23
Marathon: Two Apps on Slave1
24
App Running On Slave
25
Mesos UI
26
Heat Template
Components
Dependencies
Time
27
Adobe OpenStack Architecture
VM1
eth0
eth1
Linux Bridge
OpenvSwitc
h
bond0
External Provider Networks: VLAN-based
Physical
VLANs
Adobe Network Firewall
Adobe Corporate Networks
28
Volume Management in OpenStack
Set of Images
1. Copy
2. Snapshot
3. Volumes
Copy-On-Write (COW)
Ceph Volume
Base Volume For
All Three VMs
Individual COW
Volumes