ESX服务器,虚拟交换机
Download
Report
Transcript ESX服务器,虚拟交换机
VMware虚拟化最佳实践及规划
议程
应用实施范围考虑
服务器采购考虑
虚拟机部署考虑
管理维护考虑
议程
应用实施范围考虑
服务器采购考虑
虚拟机部署考虑
管理维护考虑
应用实施范围总体原则
不适合采用虚拟化的应用
具有特殊硬件访问要求的应用
高性能图形显卡 --- 不适用虚拟化
特殊的串/并行加密设备 ---不适用虚拟化
USB设备连接需求 --- 可能不适用,可采用外置USB设备代替,需经过测试
即使在高配置的服务器上仍然具有很高负载的应用 --- 可能不适用,需分析当前服务器
配置情况
可以采用虚拟化的应用
除上述不适合采用虚拟化的应用之外的所有应用
可根据应用迁移的复杂程度决定虚拟化先后顺序
较易实现P2V的应用可先做迁移,如可用Converter工具直接迁移的应用
较难或不能做P2V迁移的应用可考虑采用重新安装方式后迁
根据管理的需要决定是否做虚拟化
虚拟化转变过程对现有业务的影响程度
转变为虚拟化后对现有管理的影响程度
部门之间协调的难易程度
Types of Host Systems
Vertically scaled vs commodity server
Different resource pool “quantization” models
App
App
App
OS
OS
OS
VM
VM
VM
App
App
App
OS
OS
OS
VM
VM
VM
App
App
App
OS
OS
OS
OS
OS
OS
VM
VM
VM
VM
VM
VM
App
App
App
OS
OS
OS
VM
VM
VM
Hypervisor
Physical
Host
App
App
App
Hypervisor
Hypervisor
Physical Host
App
App
Physical
Host
App
OS
OS
OS
VM
VM
VM
Hypervisor
Physical Host
Different Types of Resource “Pools”
Vertically scaled hosts provide
a bigger contiguous pool
Workloads dovetail
Typically scales to higher %
utilization
Clusters of commodity servers
are more like a collection of
little pools
“Buckets” of capacity
Necessitates motioning
Model has advantages and
disadvantages
Constraints Affecting Virtualization
Diversity of Function
Increasing
Difficulty
Singletons
(Back Office,
homegrown apps,
etc.)
Business
Constraints
Technical
Constraints
Farms
(Horizontally Scaled
Servers, Common
Services, etc.)
Workload
Constraints
Non-Critical
(Standalone, local storage, etc)
Critical
(Clustered, Multi-Homed, etc.)
Criticality
of Server
Workload (Utilization) Constraints
Different “food groups” must be considered
CPU Utilization
Disk I/O
Network I/O
Memory Utilization
Overhead of virtualization should be factored in
Disk and Network I/O can create CPU load
iSCSI can load CPU as well
Must also consider operational history
Month End
Year End (if very patient)
Technical Constraints
Technical constraints generally deal with:
Compatibility (are systems/apps compatible)
Affinity (are systems part of a logical group)
In most environments these constraints include:
Network connectivity (subnet-level)
Application Interconnect
Incumbent storage technologies
Hardware and Peripherals in use
Software Support/Certification
These are different for above vs below kernel
Shared vs separate OS images
Business and Process Constraints
These are often overlooked if sizing is focus
You can get away with this in labs but not in production
Common constraints on virtualization include:
Maintenance Windows and Change Freezes
Location and other physical constraints
Operational Environments, Security Zones, App Tiers
Business Groups, Departments, Customers
Political Considerations and Constraints
Ignoring these tends to “randomize” an environment
If diversity is goal then do it on purpose
议程
应用实施范围考虑
服务器采购考虑
虚拟机部署考虑
管理维护考虑
虚拟化中使用的硬件应满足兼容性列表要求
所有用于实现VMware VI3虚拟架构解决方案的设备,包括:服务器
系统、存储系统、IO卡设备等,应满足VMware VI3产品兼容列表
的要求,最新的兼容列表可从如下的连接找到:
http://www.vmware.com/resources/techresources/cat/119
服务器系统兼容列表
《HCL: Systems Compatibility Guide For ESX Server 3.5 and ESX
Server 3i》
存储系统兼容列表
《HCL: Storage / SAN Compatibility Guide For ESX Server 3.5 and ESX
Server 3i》
IO卡设备兼容列表,包括网卡、FC HBA卡和iSCSI HBA卡等
《HCL: I/O Compatibility Guide For ESX Server 3.5 and ESX Server 3i》
ESX服务器硬件配置考虑要点– CPUs
ESX 调度CPU周期满足虚拟机和Service Console的处理请
求
可用的CPU目标数量越多,ESX管理这个调度机制的效果越
好 (单台服务器配置8个以上的CPU核会有最好的效果)
超线程技术并不能提供等同于多核处理器的好处;建议关闭CPU
的超线程功能(如果有的话)
使用具有EM64T能力的Intel VT 或AMD V 技术的CPU可以同
时支持运行32位和64位的虚拟机
采用同一厂商、同一产品家族和同一代处理器的服务器组成的
集群,可以获得最好的VMotion兼容能力
ESX3.5 U2的Enhanced VMotion兼容性扩大了原有VMotion的兼
容能力-《Alleviating Constraints with Resource Pools Live Migration with Enhanced
VMotion》
---参见《Best Practices for Successful VI Design》
ESX服务器硬件配置考虑要点- 内存
内存资源往往比CPU资源更会成为潜在的瓶颈
在某些时候,虚机环境的内存使用量可能会超过物理内存值:
Host swap file (尽量少用以获得最佳性能)
Transparent Page Sharing(多个虚机共享相同内存页面)
注意服务器特定的内存配置要求
DIMM sizes, bank pairing, parity, upgrade considerations (mix
and match or forklift replacement)
尽可能将服务器配置到最大内存,采用最大容量的内存条
(特别是当没有配满全部内存条时)
---参见《Best Practices for Successful VI Design》
ESX服务器硬件配置考虑要点- 网络
虚拟架构的基本网络连接部件组成
Port Group
(Management virtual machine)
Port Group
(Vmotion, iSCSI, NFS)
Port Group
(VM connectivity)
---参见《Best Practices for Successful VI Design》
ESX服务器硬件配置考虑要点- 网络-虚拟交换机和端口组
最少配置一个虚拟交换机,测试环境可用2个虚拟交换机,生产环
境建议最少配置3个虚拟交换机
虚拟交换机可同时支持3种类型的端口组 (Service
Console, VMkernel, VM)
建议将Service Console、VMkernel和虚机端口组各自使用自己的虚拟
交换机
可用VLAN技术分割不同的端口组
对于使用VMotion和DRS功能的服务器集群,网络配置应该相匹配
(虚拟交换机的数量与网络卷标名应保持一致)
ESX服务器Service Console使用固定IP,配置正确的speed和duplex。
---参见《Best Practices for Successful VI Design》
ESX服务器硬件配置考虑要点- 网络基本组件
ESX服务器,虚拟交换机,物理网卡
出于冗余的考虑,每个虚拟交换机建议至少分配两个物理网
卡
每个ESX服务器的物理网卡/口数量取决于准备配置的虚拟交
换机的数量
如果3种类型的端口组(SC, VMkernel, VM)都在不同的虚拟交换机
上,生产环境建议至少6个物理网卡/口
如果给包含虚拟机端口组的虚拟交换机分配更多的物理网卡/口,可以
获得负载均衡的好处
---参见《Best Practices for Successful VI Design》
ESX服务器硬件配置考虑要点- 与物理网络的连接
物理网卡/口与物理交换机
同一个虚拟交换机上的不同物理网
卡/口应连接到不同的物理交换机
上
将一个集群中所有服务器的
VMotion功能端口组所使用的物理
网卡/口都连到同一套物理交换机
上 (同样遵循上述第一条规则)
---参见《Best Practices for Successful VI Design》
Example 1: Blade Server with 2 NIC Ports
Candidate Design:
Team both NIC ports
Portgroup3
VLAN 30
SC
vmkernel
Portgroup1
VLAN 10
Portgroup2
VLAN 20
vSwitch
Create one virtual switch
Create three port groups:
Use Active/Standby policy
for each portgroup
Portgroup1: Service Console (SC)
vmnic0
vmnic1
Portgroup2: VMotion
Portgroup3: VM traffic
VLAN Trunks
(VLANs 10, 20, 30)
Use VLAN trunking
Trunk VLANs 10, 20, 30 on each uplink
Active
Standby
Example 2: Server with 4 NIC Ports
Candidate Design:
Create two virtual switches
Portgroup4
VLAN 40
Portgroup3
VLAN 30
SC
vmkernel
Portgroup1
VLAN 10
Portgroup2
VLAN 20
vSwitch0
vSwitch1
vmnic0
vmnic1
vmnic3
Team two NICs to each vSwitch
vSwitch0 (use active/standby for each
portgroup):
Portgroup1: Service Console (SC)
Portgroup2: VMotion
vSwitch1 (use Originating Virtual PortID)
vmnic2
Portgroup3: VM traffic #1
VLANs
30, 40
VLANs
10, 20
Portgroup4: VM traffic #2
Use VLAN trunking
vmnic1 and vmnic3: Trunk VLANs 10, 20
Active
Standby
vmnic0 and vmnic2: Trunk VLANs 30, 40
Example 3: Server with 4 NIC Ports (Slight Variation)
Candidate Design:
Create one virtual switch
Portgroup4
VLAN 40
Portgroup3
VLAN 30
SC
vmkernel
Portgroup1
VLAN 10
Portgroup2
VLAN 20
vSwitch0
vmnic0
vmnic1
vmnic3
Create two NIC teams
vSwitch0 (use active/standby
for portgroups 1 & 2):
Portgroup1: Service Console (SC)
Portgroup2: Vmotion
Use Originating Virtual PortID
for Portgroups 3 & 4
vmnic2
VLANs
30, 40
VLANs
10, 20
Portgroup3: VM traffic #1
Portgroup4: VM traffic #2
Use VLAN trunking
Active
Standby
vmnic1 and vmnic3: Trunk VLANs 10, 20
vmnic0 and vmnic2: Trunk VLANs 30, 40
Servers with More NIC Ports
More than 4 NIC Ports—Design Considerations
With Trunks (VLAN tagging):
Use previous approach and scale up to meet additional bandwidth and redundancy
requirements
Add NICs to NIC team supporting VM traffic
VLAN Tagging always recommended, but options if NICs
available:
Dedicated NIC for VMotion
At least one NIC
Dedicated NICs for IP Storage (NFS and/or iSCSI)
Usually two teamed NICs (consider IP-hash & etherchannel if multiple destinations
and Multi-Chassis Etherchannel employed on physical switches)
Dedicated NIC(s) for Service Console
At least two for availability
Note: easy to consume many physical NICs and switch ports if
not using VLAN tagging
ESX服务器硬件配置考虑要点- 硬盘
应尽可能采用外置共享磁盘阵列存放虚拟机文件
ESX服务器内置硬盘应有充分的冗余,建议采用RAID1
ESX服务器自身对硬盘要求,安装时的Partition划分:
不建议用安装时的自动硬盘划分方法,因为/、/var、/home会放再
同一个目录下,当/(root)满了时,ESX服务器会发生严重问题。建
议:
/boot
50 到100 MB (Primary Partition)
/ 8.0 到18GB (Primary Partition)
(swap)
2倍的Service Console内存, 建议固定使用1.6G
/var
4GB 或更大
建议足够的ESX服务器程序空间大小为18GB
本地端的ISO以及其他文本文件的存放空间要考虑
ESX服务器建议配置-新购
为了尽可能的发挥虚拟化的作用,最大限度的利用单台服务器的资源,
建议用于虚拟化宿主服务器的配置应达到或超过如下标准:
服务器CPU路数
双路
四路
八路
CPU(建议主频2GHz以上)
双路四核
四路双核或四核
四路双核或四核+
内存
16GB+
32GB+
64GB+
无外接存储
4+ / 6+
4+ / 6+
4+ / 6+
使用FC存储
4+ / 6+
4+ / 6+
4+ / 6+
使用IP存储
6+ / 8+
6+ / 8+
6+ / 8+
FC HBA口(建议4Gb或8Gb产品)
2
2
2
内置硬盘(使用外置磁盘阵列时)
2
2
2
电源
双冗余
双冗余
双冗余
千兆网口
从性价比和可用性考虑,不建议在单路服务器上部署虚拟化
虚拟化宿主服务器建议配置-现有
对于目前业内用的比较多的四路服务器 ,建议如下:
四路单核服务器:运算能力较弱,虚机数量应控制在10个以内,内存
配置建议在12GB-16GB;
四路双核服务器:运算能力中等,虚机数量可做到10-15个左右,内存
配置建议在16GB-24GB;
四路四核服务器:运算能力强劲,虚机数量可做到15-30个左右,内存
配置建议在24GB-32GB。
VC服务器最佳配置建议
处理器:2.0GHz或更高的Intel或AMD x86处理器,VC支持多
处理,可支持至多2个CPU。
内存:最低需求为2GB,假使数据库和VC安装于同一台,建
议增加至4GB。
磁盘空间:最小为560MB,建议2GB。
网卡:建议用Gigabit。
最低硬件配置---单个2GHzCPU,2GB内存,千兆网口
可支持20个同时连接,管理50台物理机,1000个虚拟机左右
建议配置---双CPU,4GB内存,千兆网口
可支持50个同时连接,管理200台物理机,2000个虚拟机左右
议程
应用实施范围考虑
服务器采购考虑
虚拟机部署考虑
管理维护考虑
虚机个数的规划
单台服务器所能支持虚机数量的决定因素:
服务器的硬件配置
CPU性能---多核高主频技术使得CPU成为性能瓶颈的可能性越来越低
内存大小---做为硬指标的内存,配置越高,所能支持的虚机数量越多
网络端口---千兆网环境已很普遍,网络带宽大多有保证,更多从管理角度来考虑
HBA卡---磁盘访问性能对虚机数量有一定影响,建议采用4Gb或8GbHBA卡以减少链路影响
本地磁盘---内置磁盘的可用性及IO吞吐能力均较弱,不建议在其上存放虚拟机,推荐使用外
置高性能磁盘阵列
应用负载大小
由于物理服务器资源自身的最大限制,应用负载越大,所能同时运行的虚机数量越少
建议将不同应用访问特性的应用混合部署在同一物理服务器上
灵活运用DRS和VMotion技术可将物理机与虚机的比率关系调到最优
考虑到HA及DRS所要求的资源冗余,所有运行虚机在正常负载下,总体资源使用率不超过
三分之二会比较合适
经验值:双路四核10个虚机左右,四路四核15-30个虚机(仅为参考)
虚机资源的分配---CPU、内存
CPU分配原则:
尽量使用最少的vCPUs,如果是单线程应用,不支持多线程处理,请不要使用
virtual SMP
虚拟CPU数量不要等于或超过物理CPU核数,如双路双核服务器配置的虚机最多使
用两个虚拟CPU
当配置虚拟机的时候须了解ESX服务器本身也有一些overhead。需注意不要超过所
有虚拟机使用率和所有vCPU汇总数目。
观察”idle loop spin”功能参数,某些操作系统当它们闲置时,并不会真正的释放
virtual CPU。
确认配置了单一处理器的虚拟机为”UP HAL/kernel”,多处理器的虚拟机必须设定
为”SMP HAL/kernel”。
内存分配原则:
内存总量为在资源评估后,计算虚拟机评估结果所需实际物理内存的总和,其他由
于应用程序而产生的更多内存需要可以用ESX的磁盘内存来解决
关键应用可考虑固定内存的方法以保证性能的稳定性
DRS Best Practices: Hardware Configuration
Ensure hosts are CPU compatible
Intel vs AMD
Similar CPU family/SSE3 status
Enhanced VMotion Compatibility (EVC)
“VMware VMotion and CPU Compatibility” whitepaper
CPU incompatibility => limited DRS VM migration options
Larger Host CPU and memory size preferred for VM
placement (if all equal)
Differences in cache or memory architecture =>
inconsistency in performance
DRS Best Practices: Cluster Configuration
Higher number of hosts => more DRS balancing options
Recommend up to 32 hosts/cluster
May vary with VC server configuration and VM/host ratio
Network configuration on all hosts
VMotion network: Security policies, VMotion nic enabled, GigE
network, etc
Virtual Machine network present on all hosts
VM datastore shared across all hosts
VM floppy/CD connected to host device
DRS Best Practices: VM Resource Settings
Reservations, Limits, and Shares
Shares take effect during resource contention
Low limits can lead to wasted resources
High VM reservations may limit DRS balancing
Overhead memory
Use resource pools (RP) for better manageability
Virtual CPU’s and Memory size
High memory size and virtual CPU’s => fewer migration
opportunities
Configure VMs based on need
DRS Best Practices: Algorithm Settings
Aggressiveness threshold
Moderate threshold (default) works well for most cases
Aggressive thresholds recommended if
Homogenous clusters and
VM demand relatively constant and
Few affinity/anti-affinity rules
Use affinity/anti-affinity rules only when need
Affinity rules: closely interacting VMs
Anti-affinity rules: I/O intensive workloads, availability
Automatic DRS mode recommended (cluster-wide)
Manual/Partially automatic mode for location-critical VMs (per VM)
Per VM setting overrides cluster-wide setting
HA Best Practices - Setup & Networking
Proper DNS & Network settings are needed for initial configuration
After configuration DNS resolutions are cached to /etc/FT_HOSTS (minimizing the
dependency on DNS server availability during an actual failover)
DNS on each host is preferred (manual editing of /etc/hosts is error prone)
Redundancy to ESX Service Console networking is essential (several options)
Choose the option that minimizes single points of failure
Gateways/isolation addresses should respond via ICMP (ping)
Enable PortFast (or equivalent) on network switches to avoid spanning tree related
isolations
Network maintenance activities should take into account dependencies on the
ESX Service Console network(s)
VMware HA can be temporarily disabled through the Cluster->Edit Settings dialog
Valid VM network label names required for proper failover
Virtual machines use them to re-establish network connectivity upon restart
HA Network Configuration
Network redundancy between the ESX service consoles is essential
for reliable detection of host failures & isolation conditions
A single service console network with
underlying redundancy is usually sufficient:
ESX Host
Use a team of 2 NICs connected to different physical
switches to avoid a single point of failure
Configure vNics in vSwitch for Active/Standby
configuration (rolling failover = “yes”, default load
balancing = route based on originating port ID)
X
Consider extending timeout values & adding
multiple isolation addresses (*see appendix)
Timeouts of 30-60 seconds will slightly extend
recovery times, but will also allow for intermittent
network outages
HA Network Configuration (Continued)
Beyond NIC teaming, a secondary service console network can be
configured to provide redundant heartbeating & isolation detection
HA will detect and use a secondary
service console network
Adding a secondary service console portgroup
to an existing VMotion vSwitch avoids having to
dedicate an additional subnet & NIC for this
purpose
Also need to specify an additional isolation
address for the cluster to account for the added
redundancy (*see appendix)
Continue using the primary service
console network & IP address for
management purposes
Be careful with network maintenance that
affects the primary service console network and
the secondary / VMotion network
HA Best Practices – Resource Management
Larger groups of homogenous servers will allow higher levels of
utilization across an HA/DRS enabled cluster (on average)
More nodes per cluster (current maximum is 16) can tolerate multiple host
failures while still guaranteeing failover capacities
Admission control heuristics are conservatively weighted (so that large servers
with many VMs can failover to small servers)
To define the sizing estimates used for admission control, set
reasonable reservations as the minimum resources needed
Admission control will exceed failover capacities when reservations are not set;
otherwise HA will use largest reservation specified as the “slot” size.
At a minimum, set reservations for a few virtual machines considered “average”
Admission control may be too conservative when host and VM sizes
vary widely
Perform your own capacity planning by choosing “Allow virtual machines to be
powered on even if they violate availability constraints”. HA will still try to restart
as many virtual machines as it can.
议程
应用实施范围考虑
服务器采购考虑
虚拟机部署考虑
管理维护考虑
Impact of VirtualCenter Downtime
Component
Impact Experienced
Virtual Machines
Unaffected, management requires direct
connections to ESX Servers
ESX Servers
Unaffected, management requires direct
connections to ESX Servers
Performance & Monitoring
Statistics
Historical records will have gaps during outages,
still available via ESX Servers
VMotion
Unavailable
VMware DRS
Unavailable
VMware HA
Agents unaffected & provide failover functionality,
admission control unavailable
---参见《Bulletproof VirtualCenter - A Guide to Protecting VirtualCenter》
VirtualCenter Components
Database Server
AD Domain
Controller
License Server
VirtualCenter Server
DNS Server
Web Access
---参见《Bulletproof VirtualCenter - A Guide to Protecting VirtualCenter》
VirtualCenter – Recommended Collocation
Collocation of
VirtualCenter
components is
desirable for most
environments
Focus of this
session is on
providing protection
for these
components
Industry standard
solutions assumed
for other
components
VirtualCenter
Server
Database
Server
License Server
AD Domain
Controller
Web Access
One Server,
Physical or Virtual
DNS
Server
---参见《Bulletproof VirtualCenter - A Guide to Protecting VirtualCenter》
VirtualCenter Components (Additional Details)
VirtualCenter Service: almost stateless
Information about inventory stored in the database
Some state files stored locally on VirtualCenter server
Web Access
No state information
License Server
License file stored locally
14 day Grace period if unavailable
---参见《Bulletproof VirtualCenter - A Guide to Protecting VirtualCenter》
VirtualCenter – Local Configuration Files
SSL
Certificate
VirtualCenter Server
Database Server
Upgrade
Files
Web Access
Config.
File
License
File
License Server
One Server, Physical or Virtual
---参见《Bulletproof VirtualCenter - A Guide to Protecting VirtualCenter》
Step 1 for High Availability: Protect the Database
Database outage will terminate VirtualCenter service
As of VirtualCenter 2.0.1 Patch 2, Windows Service Manager
will automatically attempt to restart it every 5 minutes,
indefinitely
VirtualCenter Database should be independently
installed and managed
For local availability use the preferred mechanism for the type of
database being used (VMware HA, MSCS, Database specific
mechanisms)
For disaster recovery, database should be replicated to a remote
site as part of an overall DR plan
---参见《Bulletproof VirtualCenter - A Guide to Protecting VirtualCenter》
Step 2 for High Availability: Protect VirtualCenter
VMware HA and Microsoft
Cluster Services (MSCS) are the
two most popular options
Other 3rd party solutions
possible*
*Supported directly by 3rd party
Option a): VMware HA
Virtual instances only
Subject to shared storage / network
constraints
Only requires single OS &
application instance; no explicit
replication
Option b): MSCS
Bang!
VC
Failover
VC
VirtualCenter 2.0.2 patch 2 or
beyond
Physical or virtual instances
Requires 2 identical OS &
application installations; explicit
replication of files
Involves additional configuration
efforts & ongoing maintenance
---参见《Bulletproof VirtualCenter - A Guide to Protecting VirtualCenter》
VirtualCenter: Physical vs. Virtual
Physical
Backups done using traditional
tools
Dedicated server required
Performance limited only by
server hardware
Virtual
Backups possible through
traditional tools, VCB, snapshots,
cloning, etc.
Dedicated server not required,
resources can be shared with
other virtual machines
Performance from shared
resources; tuning may be needed
For additional details refer to the following documentation:
http://www.vmware.com/pdf/vi3_vc_in_vm.pdf
VirtualCenter with VMware HA: Out-of-Band
vpxd
Two approaches
Two VirtualCenter instances
manage each other
(pictured)
Both run in HA cluster
Each manages the other’s
HA cluster
vpxd
Separate VirtualCenter
instance is used to manage
2-node HA cluster (not
pictured)
VirtualCenter with VMware HA: In-Band
vpxd
VirtualCenter Server
manages the VMware HA
cluster providing its
protection
When the ESX hosts with
VirtualCenter VM fails, VM is
restarted automatically by HA
Failover functionality provided
by HA is independent from
VirtualCenter (postconfiguration)
VirtualCenter with MSCS – Physical
Best practice: use Majority
Node Set quorum with
witness share
Ethernet Network
vpxd
vcdb
May be used as geographically
dispersed cluster for disaster
recovery solution
VCDB may be used in another
cluster group on the same
cluster
Requires a third node
For additional details refer to the following documentation:
http://www.vmware.com/pdf/VC_MSCS.pdf
VirtualCenter with MSCS – Virtual
Requires use of quorum
disk clustering:
Quorum disk on the shared
storage
vpxd
LAN
SAN
System disks for both
clustered virtual machines
on local storage
Incompatible with VMotion
or VMware HA
For additional details refer to the following documentation:
http://www.vmware.com/pdf/VC_MSCS.pdf
Rise of the Phoenix - Disaster Recovery
Replication of
state files
VI Services
VirtualCenter Server
VirtualCenter Server
Primary
Standby
VI
Inventory
Standard DR Solution
Database Server
Database Server
VirtualCenter Disaster Recovery Overview
Disaster recovery solution consists of three pieces:
VI Inventory data
Use the standard DR solution of the database vendor
VI Services
Cold Standby – Re-install VirtualCenter and restore local
configuration files
Warm Standby – Pre-install VirtualCenter and synchronize local
configuration files, but keep 2nd instance disconnected
All other infrastructure services: AD, DNS, etc.
Use existing product-specific solutions
Cold Standby Recovery Procedure
Maintain separate
up to date copy of
local configuration
files
ESX Server hosts
reconnect
automatically
Disaste
r!
yes
Able to assign
primary’s IP to
standby?
Install fresh
VirtualCenter
instance
Install
config files;
connect to
standby DB
no
Done
Run script to reconnect
all ESX Server hosts
For additional details refer to the new VI perl toolkit:
http://www.vmware.com/go/viperl/scripts
Warm Standby Recovery Procedure
Keep local
configuration files
synchronized
between
primary and
standby
ESX Server hosts
reconnect
automatically
Disaste
r!
Also allows for
scripting and
automation
yes
Able to assign
primary’s IP to
standby?
no
Done
Faster end-to-end
recovery (RTO)
Run script to reconnect
all ESX Server hosts
Replication of
configuration files
through host-based
replication or backup
tools
Monitoring Individual VirtualCenter Components
Entity
Sub entity
Component
Metric
Tool to use
vpxd.exe found
WMI
vpxd is running as service
WMI
files are exists
WMI
size
WMI
modification date
WMI
exe file exists
WMI
service is up and running
WMI
files are exists
WMI
size
WMI
modification date
WMI
exe file exists
WMI
service is up and running
WMI
Virtual Center Service - vpxd
Virtual Center
Virtual Center Certificates
License Server Service
License Server
License Files
VirtualCenter
Server
Web Service
Web Service
http://localhost is reachable
Web Page
Host System
(where VC is
running on)
Perl or vbs script
– HTTP GET
critical files (?) are found
WMI
CPU load < 90%
WMI
System disk <80% FULL
WMI
Network OK
WMI
ODBC Connection
Connection works
WMI
Database integrity
Some "select" statement on critical tables
sql
System
Database server
VirtualCenter – Important Files (Details)
File
Contents
Location
Flex License
files
License keys for all server-based
licensed features
C:\Program Files\VMware\VMware
License Server\Licenses
VirtualCenter
Configuration
file
Governs the behavior of VirtualCenter
Server and its interaction with ESX
Server hosts as well as client
programs, such as the VI Client.
C:\Documents and Settings\All
Users\Application Data\VMware\VMware
VirtualCenter\vpxd.cfg
SSL certificate
The certificate used to authenticate all
communication with ESX Server
hosts.
%ALLUSERSPROFILE%\Application
Data\VMware\VMware VirtualCenter\SSL
Upgrade files
In special circumstances, it might be
necessary to make custom changes
to the agent which gets pushed to
ESX Server hosts as they are added
to the environment
C:\Program Files\VMware\VMware
VirtualCenter 2.0\upgrade
VirtualCenter Communication Ports
Communication Channel
Default Port Number
and Protocol
Configurable?
VirtualCenter to ESX Server
host
902; TCP and UDP
Not easily (requires numerous
manual config file changes)
VirtualCenter to VI Client
902; TCP and 443; TCP for
initiation
Yes, via VI Client
Web browser to VirtualCenter
80 & 443; TCP
Yes, via VI Client
3rd party SDK client to
VirtualCenter
80 & 443; TCP
Yes, via VI Client
Web browser to virtual machine
console
903; TCP
No
VI Client to virtual machine
console
903; TCP
No
VI Client to ESX Server host
902; TCP and 443; TCP for
initiation
No
Web browser to ESX Server
host
443; TCP
No
3rd party SDK client to ESX
Server host
443; TCP
No
Overview of Performance Statistics
VPXD rolls up real time performance
statistics to historical five minutes
statistics and sends to database
Historical
Past Day
5 min
Rollup every
30 minutes
Hostd
Past Week
Rollup every
2 hours
Real Time
statistics collection
Database
VirtualCenter
Past Month
2 hr
Rollup every
24 hours
Past Year
Hostd
30 min
Purge after
one year
24 hr
Database Sizing for Optimal Performance
Memory - databases are most efficient when given enough
memory to cache their working data set
Disk I/O - sufficient disk device bandwidth for log devices to
prevent transactions from bottlenecking on disk I/O
CPU - SQL Server is designed to use parallel processing
whenever possible
SQL Server’s TEMPDB - used extensively in VMware
VirtualCenter 2.0 not as much in VMware VirtualCenter 2.5
VMware VirtualCenter 2.5 Performance Tips
Configure performance statistics levels for collection
levels (see next slide)
configure past day and past week statistics at level 3
past month and past year statistics at level 1.
fewer performance statistics for historical data
Benefits
significant reduction in the size of VMware VirtualCenter
database.
significant reduction in processing of rollups
谢
谢
参考文档索引
一、应用实施范围考虑
a) 从VMware虚拟化技术出发,对应用系统进行合理分类
b) 分适合部署和不适宜部署两个方面考虑
c) 不同类型的应用,各自对系统资源的耗用特点
PO2570-Advanced Capacity Planning in Virtual Environments.ppt
参考文档索引
二、服务器采购考虑
a) 结合VMware软件支持,采购时对CPU芯片选择上有何需要注意的地方
TA2641-Turning Apples into Oranges VMware Enhanced VMotion
Compatibility.ppt
TA2823-Alleviating Constraints with Resource Pools Live Migration with
Enhanced VMotion.ppt
b) CPU、内存配置的比例关系,有什么最佳实践
VI2189-Best Practices for Successful VI Design.ppt
c) 什么情况、场合,需要对IO、网络、PCI卡等设备配置有特别考虑,如何
考虑
TA2441-VI3 Networking Concepts and Best Practices.ppt
TA2554-VI3 Networking Advanced Configurations and Troubleshooting.ppt
PO3580-Solving SAN Connectivity and Management Challenges in VI3.ppt
参考文档索引
三、虚机部署考虑
a) 虚机个数如何规划, 如何判断最佳平衡点(虚机个数vs虚机性能)
TA2421-DRS Technical Overview and Best Practices.ppt
drs_performance_best_practices_wp.pdf
b) 基于什么原则或者策略(如不同类型应用交叉部署),可达到更佳效
果
TA2550-ESX Server Best Practices for Performance.ppt
c) 如何监控一段时间的虚机、物理机性能数据,基于哪些关键因素进行
分析,从而对虚拟化环境进行合理调整
PO1882-Advanced Log Analysis for VMware ESXESXi 3.5.ppt
PO2708-Advanced Performance Tuning for VMware Infrastructure 3.ppt
TA2375-Interpreting Performance Statistics in VMware VI3.ppt
参考文档索引
四、管理维护考虑
a) 物理机到虚机迁移需要考虑什么、注意什么
b) 虚机到虚机迁移需要考虑什么、注意什么
VC2.5 Guided Consolidation and Converter.ppt
c) 有效监控虚机、物理机状态、性能
TA2375-Interpreting Performance Statistics in VMware VI3.ppt
TA1401-Understanding “Host” and “Guest” Memory Usage and Related Memory
Management Concepts.ppt
PO3008-Timekeeping and Time-sensitive Applications Best Practices.ppt
PO3078-Optimizing the Network and Management Systems to Support Your
Virtual Infrastructure.ppt
d) 需要备份哪些虚拟化环境的基础、关键数据
PO2601-VMware VirtualCenter 2.5 Database Best Practices.ppt
PS_BC23_Bulletproof VirtualCenter - A Guide to Protecting VirtualCenter.ppt