Transcript NIC Teaming
TA05
VMware
Infrastructure 3 Networking –
Advanced Configuration
and Troubleshooting
Jean Lubatti
Product Support Engineer, VMware
Housekeeping
Please turn off your mobile phones, blackberries
and laptops
Your feedback is valued: please fill in the session
evaluation form (specific to that session) & hand it to the
room monitor / the materials pickup area at registration
Each delegate to return their completed event evaluation
form to the materials pickup area will be eligible for a free
evaluation copy of VMware’s ESX 3i
Please leave the room between sessions, even if your next
session is in the same room as you will need to be
rescanned
Agenda
Components of the Networking Stack
Virtual NIC overview and troubleshooting
VSwitch overview
PortGroups overview
VLANs
VST, EST and VGT
The native VLAN
NIC Teaming
Port Id based, IP hash based
Reverse teaming
Beaconing and shotgun
Rolling Failover
VSwitch advanced options
Security settings
Notify switch
VMKernel Network Traffic
Command Line Utilities
Advanced Troubleshooting summary
Q&A
Virtual NICs overview
A virtual NIC is an emulated layer 2 device used to connect to
the vSwitch
Each virtual NIC has a MAC address of its own and does address based
filtering
No need for implementation of a PHY (Physical Layer)
No auto-negotiation
Speed/Duplex/Link are irrelevant
Ignore speed/duplex reported in the guest OS
Actual speed of operation depends on the CPU cycles available and
speed of the uplinks
Different types of Virtual NICs
Virtual adapter for VMs
VLance, vmxnet, enhanced vmxnet (for esx 3.5.0) and E1000
Vswif for Service console
Vmknic for VMKernel
Troubleshooting Virtual NICs
Check the VM configuration
Make sure the guest OS
recognizes the virtual adapter
and loads the appropriate driver
Use utilities like lspci, lsmod, Device
Manager etc
Check the guest OS and VM logs
for any obvious errors
MAC address conflict can occur
only if
You manually set conflicting MAC
addresses
After manually copying VMs, you
choose not to regenerate a new UUID
when prompted
Unplugging / replugging a vNIC
changes the virtual port ID!
Troubleshooting Virtual NICs
It is possible to manually turn
off advanced vNIC features
This may help troubleshooting
But do not jump to conclusions!
0x0
All disabled
0x1
Zero copy enabled
0x2
TSO enabled
0x3
Zero Copy and TSO
enabled (default)
(or nothing)
VSwitch overview
Software implementation of an ethernet switch
How is it similar to a physical switch?
Does MAC address based forwarding
Provides standard VLAN segmentation
Configurable
Uplink aggregation
How is it different?
Does not need to learn MAC addresses
It knows the MAC addresses of the virtual NICs connecting to it
Packets not destined for a VM are forwarded outside
Single tier topology
No need to participate in Spanning Tree Protocol
Can do rate limiting
VSwitch overview: Spanning Tree Protocol
STP is a link management
protocol that prevents
network loops
Loops are not possible
within the same vSwitch
No packet entering a
vSwitch will ever be
allowed to go back to the
physical network
Two vSwitches cannot be
connected
Single level topology
Loops are not possible
inside ESX without a
layer 2 bridging VM
ESX Server
Virtual
Machines
Virtual
NICs
VMKernel
NIC
VSwitch
Physical
NICs
1000
Mbps
Physical
Switches
1000
Mbps
VSwitch
100
Mbps
PortGroups overview
PortGroups are configuration templates for ports on the
vSwitch
Efficient way to specify the type of network connectivity needed
by a VM
PortGroups specify
VLAN Configuration
Teaming policy (can override vSwitch setting)
Layer 2 security policies (can override vSwitch setting)
Traffic shaping parameters (can override vSwitch setting)
PortGroups are not VLANs
PortGroups do not segment the vSwitch into separate broadcast
domains unless they have different VLAN IDs
PortGroup overview: Configurations
VLANs: Virtual Switch Tagging
VMKernel
NIC
Most commonly deployed
configuration and
recommended setup
ESX Server
Virtual
Machines
Virtual
NICs
The vSwitch does the
tagging/untagging
VSwitch
VSwitch
Physical switch port
should be a trunk port
Number of VLANs per VM
is limited to the number of
vNICs
Physical Switch
VLAN 104
VLAN 105
VLAN 106
VSwitch
802.1Q tagged frames
on the physical NIC
vSwitch tags and
strips the frames
VLANs: External Switch Tagging
ESX Server
Virtual
Machines
Virtual
NICs
No configuration required
on the ESX Server
VMKernel
NIC
VSwitch
Physical
NICs
VLAN tagging and
stripping is done by the
physical switch
100
Mbps
1000
Mbps
Physical
Switch
VSwitch
VLAN 105
VLAN 106
Rest of the network
The vSwitch does not tag or
strip the frames
Number of VLANs
supported is limited to the
number of physical NICs
on the ESX server
vSwitch receives
untagged frames
Physical switch is
responsible for the
tagging and stripping
VLANs: Virtual Guest Tagging
ESX Server
Virtual
Machines
Virtual
NICs
VMKernel
NIC
VSwitch
VSwitch
PortGroup VLAN ID is set to
4095
Tagging and stripping of
VLAN IDs happens in the
guest VM
802.1q software/driver in the VM
Physical Switch
VLAN tagging and
stripping
software/driver
needed in the VM
VLAN 4095
VSwitch
vSwitch does not
tag or strip the
frames
In VGT mode guest can
send/receive any VLAN
tagged frame
Number of VLANs per guest
is not limited to the number
of vNICs
VMWare does not ship a
802.1q vmxnet driver
Windows: Only with E1000
Linux: dot1q module
VLANs: Native VLAN
Using the native VLAN is
fully supported on ESX
The vSwitch
won’t deliver
untagged frames
to the VM unless
the portgroup
has no VLAN
specified.
However, it is important to
remember which part of the
network infrastructure is tagging
and untagging the frames!
Default native VLAN is often
VLAN 1
If you have to use default native
VLAN on a VST configuration
Use a PortGroup with no vlan id set
VM with a
VLAN ID 1
Virtual Switch
VLAN 1
Frames not
tagged
Physical Switch
with
Native VLAN ID 1
Physical Machine
with VLAN ID 1
VLANs: Troubleshooting
Remember “who” should tag.
The ESX or the physical switch?
It cannot be both!
Trunk encapsulation should be
set to 802.1q
No ISL, LANE etc.
Trunking should be static and
unconditional
The physical switch
sees multiple VLAN ids
on the same port
No Dynamic Trunking Protocol (DTP)
Manually specify all the VLANs
to be trunked
No VLAN Trunking Protocol (VTP)
Disallow unnecessary VLAN IDs on
the physical switch port
ESX won’t spend time processing
unnecessary broadcasts
Configure the
switch to expect
frames with VLAN Id
105 and 106 on this
port
The physical switch port
needs to be configured
as a trunk port
NIC Teaming
Allows for multiple active NICs to be used in a teaming
configuration
User can choose the policy for distribution of traffic across the NICs
Standby uplinks replace active uplinks when active uplinks fail to
meet specified criteria
VM ports
1
2
3
4
5
6
7
Active
A
B
8
9
10
Standby
C
D
uplink ports
E
F
11
12
13
14
NIC Teaming: Failure criteria
Use vimsh
hostsvc/net/portgroup_set
Conservative defaults:
Speed > 10Mb
Duplex = full
Beacons received
Other possible settings
Percentage of errors
NIC Teaming:
PortGroup based Teaming Configuration
VM ports
Teaming policy attributes
can vary by PortGroups on
a single vSwitch
1
2
3
4
5
6
7
8
9
10 11 12 13 14
Active
A B
Standby
C
D
Standby Active
Four load balancing policies
Originating Port ID based
Source MAC address based
IP hash based
Explicit failover order
A B
C
D
uplink ports
E
F
A B
C
D
Standby
A B
C
E
F
Standby
E
F
Active
D
E
F
NIC Teaming: Port Id (or MAC Hash)
Both policies are relying on a given
VM MAC address always using the
same outgoing physical NIC
Port-ID is the default and is
recommended over MAC hash
Load balancing on a per vNIC basis
Both allows teaming across physical
switches in the same broadcast
domain
ESX Server
Virtual
Machines
Virtual
NICs
VMKernel
NIC
VSwitch
Requires the physical switch not to be
aware of the teaming
The physical switch learns the
MAC/switch port association
Physical
NICs
Inbound traffic is received on the same
NIC
Power operations or connect
operations on a vNIC will increment
the port ID!
Physical
Switch
NIC Teaming: IP hash
ESX Server
Uplink chosen based on
Source and
Destination IP Address
Load balancing on a per
connection basis
Requires physical switch to be
aware of the teaming
Does not allow teaming across
physical switches
Inbound traffic can be received
on any one of the uplinks
The switch sees VM2’s MAC
address on all three ports
Need to enable Link
Aggregation on the physical
switch ports
Virtual
Machines
Virtual
NICs
VMKernel
NIC
VSwitch
Physical
NICs
Physical
Switch
NIC teaming: Reverse Teaming
VMs can receive duplicate
broadcast/multicast packets
Reverse teaming eliminates
this
Receive frames only from an
uplink port we would have used
to transmit
ESX Server
Virtual
Machines
Virtual
NICs
VMKernel
NIC
VSwitch
Optimizes local traffic on the
vSwitch
Drop external frames with local
source MAC addresses
Physical
NICs
Physical
Switch
If using port id or MAC hash based teaming don’t enable
link aggregation on the physical switch
NIC Teaming: Link redundancy
Failure detection
Link status
Beacon Probing
Rolling Failover
Fail-back if set to `No`
NIC Teaming: Beacon Probing
Beacon probing attempts to
detect failures which don’t
result in a link state failure
for the NIC
Broadcast frames sent
from each NIC in the team
should be seen by other
NICs in the team (no IP
hash!)
Physical
Switches
Core
switch /
upstream
infrastructure
ESX Server
Virtual
Machines
Virtual
NICs
VMKernel
NIC
VSwitch
Physical
NICs
NIC Teaming: Beacon Probing
Beacon probing attempts to
detect failures which don’t
result in a link state failure
for the NIC
Broadcast frames sent
from each NIC in the team
should be seen by other
NICs in the team (no IP
hash!)
ESX Server
Virtual
Machines
Virtual
NICs
VMKernel
NIC
VSwitch
Physical
NICs
NICs not receiving beacons
Physical
no longer have minimum
Switches
criteria and are discarded Core
switch /
upstream
infrastructure
?
NIC Teaming: Beacon Probing and “shotgun”
Beacon probing attempts to
detect failures which don’t
result in a link state failure
for the NIC
Broadcast frames sent
from each NIC in the team
should be seen by other
NICs in the team (no IP
hash!)
NICs not receiving beacons
no longer have minimum
criteria and are discarded
If all
all NICs
NICs are
arediscarded,
discarded,
all NICs
all
NICs will
willbe
beused!
used!
ESX Server
Virtual
Machines
Virtual
NICs
VMKernel
NIC
VSwitch
Physical
NICs
Physical
Switches
Core
switch /
upstream
infrastructure
?
NIC Teaming:
Rolling failover (3.0.X) and Failback (3.5.0)
For it to have any effect, rolling failover requires at least one standby NIC
Does not make sense with IP hash teaming
Called differently in 3.0.X and 3.5.X
Example case scenario:
Service Console PortGroup
HA
VMKernel PortGroup
iSCSI/NAS
Use link state tracking as an alternative
Switch
Switch
comes
goesback
down
New
Activestandby
NIC NIC
Isolated!
New
Standby
Active
NIC
NIC
But STP still blocks the uplink!
NIC Teaming: Troubleshooting
The switch ports should have consistent VLAN
configuration
Multi-switch configurations
Make sure the NICs are in the same broadcast domain
Do not use IP hash based teaming policy across multiple physical
switches
Link Aggregation needs to be enabled on the switch ports for IP hash
based teaming
Configure physical switch LA to be static and
unconditional
No support for PAgP or LACP negotiation
NIC Teaming: Tips
ESX Server
Virtual
Machines
Use port-id based NIC teaming in a
multi-switch configuration
Use different types of NICs in a
team. E.g.
Virtual
NICs
VMKernel
NIC
VSwitch
Intel and Broadcom
Onboard and PCI card
For faster failovers
Physical
NICs
Disable Link Auto-negotiation
Follow STP recommendations
Use standby adapters and rolling
failover when availability is an
absolute must
Onboard
Intel
Broadcom
PCI card
Physical
Switch
Beaconing
Upgrade to 3.0.2
Use Link State Tracking as an
alternative
Not needed on fat tree topology
Rest of the network
Rest of the network
vSwitch advanced options: Security settings
Promiscuous Mode
If allowed, guest receives all
frames on the vSwitch
Some applications need
promiscuous mode
Network sniffers
Intrusion detection systems
MAC Address Change
If allowed, malicious guests
can spoof MAC addresses
Forged Transmits
If allowed, malicious guests can spoof MAC addresses or cause MAC
Flooding
Security settings should reflect application requirements
Some applications might need to forge or change MAC addresses
E.g.: Microsoft NLB in unicast mode works by forging MAC addresses.
Vswitch advanced option: Notify Switch
Client MAC address is notified
to the switch via RARP packet
Allows the physical switch to
learn the MAC address of the
client immediately
Why RARP?
ESX Server
Virtual
Machines
Virtual
NICs
VMKernel
NIC
RARP
PACKET
L2 broadcast reaches all
switches
L3 information not required
Switch notified whenever
New client comes into existence
MAC address changes
Teaming status changes
Settings should reflect
application requirements
Physical
NICs
Physical
Switch
The switch learns the
MAC address and
updates its tables
VSwitch
Vmkernel Network Traffic
ESX Server
VMKernel TCP/IP Stack routing
table determines packet flow
Put IP Storage and VMotion on
separate subnets for isolation
iSCSI
VMotion
NFS
VMKernel
TCP/IP Stack
Else traffic will go through the same
vmknic: No Isolation
Vmkernel TCP/IP Routing
Table
If multiple vmknics in a subnet are
connected to the same vSwitch
Outgoing traffic is seen only on one
vmknic
Only limited load balancing based on
IP hash
VLAN segmentation won’t help
isolate outgoing traffic between the
vmknics
VMKernel
NICs
vmknics
VSwitch
Physical
NICs
VSwitch
VSwitch
Vmkernel Traffic: Troubleshooting
cat /proc/vmware/net/tcpip/ifconfig
Use vmkping
Ping uses Service Console TCP/IP stack
Vmkping uses VMKernel TCP/IP stack
Command Line Utilities
esxcfg-vswitch
esxcfg-nics
esxcfg-vswif
esxcfg-vmknic
Command Line Utilities: vimsh
Shell interface
Low-level interface to VI
Use tab for completion
Powerful command line
interface
Advanced troubleshooting:
Key principles
Always remember what equipment is supposed to do the VLAN tagging
Always remember what is a L2 infrastructure. A given MAC should only
be advertised/used at a single point of the infrastructure.
Always remember what are the failure criteria on a NIC, and how can
ESX answer the failure.
Rule out one layer after the other
Several aggregation types are possible
Several types of VLAN tagging are possible (even if VST is preferred)
Several types of physical NICs are supported and use different drivers
Several virtual NICs are available
Virtual NIC features can be individually disabled
Failover can be fine tuned
Advanced troubleshooting:
Check the network hint
Every NIC collects a trace of the type of traffic seen on it
The hint is purely informational
Wildly different hints on two cards in the same vSwitch, especially for EST is
usually a good sign that both cards are not in the same broadcast domain
Can also be obtained on the command line (see vimsh)
Advanced Troubleshooting:
Collecting Network Traces on the vSwitch
Run tcpdump/wireshark/netmon inside a VM or in the Service
Console
Traffic visibility depends on the PortGroup policy settings
Allow Promiscuous Mode
VLAN segmentation rules apply
Use VGT by setting VLAN ID to 4095
Intra VM traffic is captured.
Advanced Troubleshooting:
Collecting Network Traces on the vSwitch
Q&A
Session ID: TA05
VI3 Networking: Advanced
Configurations and Troubleshooting
Jean Lubatti, VMWare
Special thanks to:
Srinivas Neginhal, VMWware
Emiliano Turra, VMWare