FCoE Overview - IEEE Entity Web Hosting

Download Report

Transcript FCoE Overview - IEEE Entity Web Hosting

FCoE Overview
IEEE CommSoc/SP Chapter
Austin, Texas, May 21 2009
Tony Hurson
[email protected]
Networked Storage History
servers
clients
Fabric
Characteristics:
Packet drop on
buffer full
High-low
Latency
High-low
throughput
No multipathing
Ethernet (TCP/IP)
network
Filebased
data
transfer
(eg, NFS,
CIFS)
Fabric
Characteristics:
Lossless
Low Latency
High throughput
Reliable
Redundant paths
(failover)
Fibre Channel
Storage Area
Network
SCSI
Block
Data
Transfer
CPU
Data
Data
Data
Data
Data
Data
Data
FC Target System
Fileserver
NAS
Network Attached Storage
SAN
Storage Area Network
Data
SCSI Read, Write over FC
Host
Target
Host
FCP_C
MND
Target
FCP_C
MND
FCP_D
A
TA
Unsolicited data
(modest amount)
_
FCP
_
FCP
_
FCP
A
D AT
Exchange
FCP
A
D AT
A
D AT
_
FCP
R SP
SCSI Read
ER _
_ XF
RDY
FCP_D
A
TA
FCP_D
A
TA
Sequence (may
be out of order)
_
FCP
R SP
SCSI Write
FC Fabric Port Terminology
host
N_Port
switch
F_Port
E_Port
N_Port
switch
F_Port
F_Port
N_Port
F_Port
N_Port
E_Port
host
target
N_Port - Host or Target endpoint
F_Port - Endpoint-facing switch port
E_Port - Inter-switch port
Virtualization adds a ‘V’ prefix to all of these
FC Routing
Fabric Shortest Path First
switch
switch
switch
switch
Based on OSPF (IP)
“Static” Routing Tables per switch
Chooses shortest paths (hop counts)
switch
switch
switch
switch
Load balances multiple paths
Handles link failover automatically
switch
switch
switch
switch
Ethernet Routing
 Dynamic Scheme: Source Learning
 If unicast DstMAC is not in lookup table,
flood frame to all ports except its source
port.
 Note source port of SrcMAC in lookup
table, if not already present
 Age/invalidate lookup entries
 Similar flooding behavior for multicast
 Precludes loops in fabric
FC Frame Format
SOF
Frame Header
31
Opt. header
Payload (2KB + Markers)
CRC
EOF
0
23
R_CTL
D_ID
Fabric-assigned (Fabric Login) source,
destination [V]N_Port identifiers
S_ID
Sequence trackers
Type
F_CTL
SEQ_ID
DF_CTL
OX_ID
SEQ_CNT
RX_ID
Parameter
Local, Remote Exchange Identifiers, used to
look up Exchange state at endpoints
Protocol Stack History and
Comparison
SCSI
FCP
FC-3
Mapping, Discovery,
Services, Recovery
Transport
iSCSI
Mapping
TCP
Transport
FC-2V
FC-0
FC-3
Mapping, Discovery,
Services, Recovery
Transport
FC-2V
IP
FC-1
FCP
Network
Link
PHY
Ethernet
Chronological order of development
FCoE
Lossless
Ethernet
Encap/decap
Link
PHY
Lossless Ethernet – via PAUSE
When port receive buffer
fills to a high watermark,
issue PAUSE XOFF to link
peer; when buffer drains to
low watermark, issue
PAUSE XON to peer
Switch or Endpoint
Eth Rx
Port receive packet buffer
Inbound
PAUSE
Inbound
PAUSE
Port transmit
buffer
Eth Tx
Switch or Endpoint
Ethernet
link
HWM
Outbound PAUSE generator
Eth Tx
Eth Rx
Outbound PAUSE generator
Port receive
buffer
LWM
Port transmit
buffer
FCoE Early Deployment Example
Firewall
FC Storage Array
To/From
internet
Lossless, Converged Ethernet Fabric
FCoE FC
gateway
FC fabric
Presentation Tier
20, 4-way SMP
diskless blades
Application Tier
8, 16-way SMP
diskless blades
Database Tier
Large SMP
FCoE Frame Format
0
31
EtherType = FCoE_TYPE
Version
SOF
Encapsulated FC Frame (n words)
EOF
FCoE Endpoint Model
FC-3/FC-4
FC-2V
VN_Port
FCoE_LEP
FIP mgmt.
protocol
FIP - Fibre Channel Initialization Protocol initiates Fabric Logins with FCoE switch (FCF)
Each Fabric Login Establishes a VN_Port and a
VN_Port - VF_Port logical connection.
Each VN_Port has a unique MAC address, serveror fabric-provided.
FCoE_LEP - link endpoint, performs
encapsulation/decapsulation of FC frame.
Lossless Ethernet MAC
To lossless Eth. Fabric
FCoE Switch Functional Model
(To FC fabric)
(To FC endpoint)
E_Port
F_Port
FC Switch (FC-SW-5)
VF_Port
VF_Port
VE_Port
FCoE_LEP
FCoE_LEP
FCoE_LEP
FCoE_LEP
FCoE_LEP
FCoE_LEP
FCoE_LEP
FCoE_LEP
FCoE_LEP
FIP mgmt.
protocol
FIP mgmt.
protocol
FIP mgmt.
protocol
Lossless Ethernet
MAC (FCF-MAC)
Lossless Ethernet
MAC (FCF-MAC)
Lossless Ethernet
MAC (FCF-MAC)
To lossless Eth. Fabric
To lossless Eth. Fabric
To lossless Eth. Fabric
Converged Ethernet
 AKA Data Center Bridging (DCB). Run up to
four major traffic classes on single 10 GbE
fabric. In order of market prevalence:
 Networking (TCP/IP, lossy).
 Block Storage (lossless FCoE, or lossless/lossy
iSCSI).
 Management (“heartbeat” traffic, low bandwidth,
but must get through).
 Inter-Process Communication (clustered
computing: high bandwidth, low latency, lossless
preferred).
Groundwork for DCB
 IEEE 802.1Qaz – ETS & DCBX –
bandwidth allocation to major traffic
classes (Priority Groups); plus DCB
management protocol.
 IEEE 802.1Qbb – Priority PAUSE.
Selectively PAUSE traffic on link by
Priority Group.
 IEEE 802.1Qau – Dynamic Congestion
Notification.
IEEE 802.1Qaz Enhanced
Transmission Selection
 Support at least 3 Priority Groups/traffic
classes
 PGs identified by Priority field of existing
802.1Q VLAN Tag
 Configured Bandwidth per PG has 1%
resolution
 PG15 has limitless bandwidth (use
sparingly!, for Management)
 Work Conservation – if the wire’s free, use
it.
ETS Configuration Example




PG0 (Storage): 40% of port b/w
PG1 (Networking): 20% of port b/w
PG2 (IPC): 40% of port b/w
PG15 (mgmt): limitless
 If a PG underutilizes, others can fill
the space.
 Typical implementation: DWRR.
IEEE 802.1Qbb Priority PAUSE
Switch or Endpoint
Priority PAUSE!!
PG0 only
PG0 - Storage
Switch or Endpoint
PG0 - Storage
PG1 - Networking
PG2 - IPC
PG15 - Management
Output queues, by
traffic class
DWRR scheduler
lossless buffer
ETS
PG1 - Networking
10 GbE link
lossy buffer
PG2 - IPC
Generally,
Networking (TCP/
IP) should NEVER
be PAUSEd
PG15 - Management
Receive Buffers, by
traffic class
IEEE 802.1Qau Dynamic
Congestion Control Background
 Lossless fabrics are prone to congestion
spreading (congestion trees).
 Ethernet-FC gateways with their different
port speeds (10 GbE; 8 Gbps) are natural
bottlenecks.
 ETS Work Conservation model adds fuel to
fire.
 Solution: switches/endpoints notify traffic
sources of incipient congestion, via
feedback messages; sources reduce rates
accordingly.
Congestion Notification in Action
1. Source endpoint,
supporting ‘n’
Congestion
Controlled Flows,
tags each outbound
packet with CCF#
2. Switch (or dest.
Endpoint) detects
incipient
congestion; issues
Congestion
Notification
Message back to
data source
CNM
Data
3. Source reacts to
CNM, reducing tx
rate. Source
recovers its rate
over time and via
byte counting
Destination
endpoint
Congestion Control at Endpoint
Transmit
IEEE 802.1Qaz/Qau Endpoint
Typical
implementation:
byte-based token
buckets
Shallow buckets (2
- 6 packets) for
rapid CNM
response
PG0 - Storage
PG1 - Networking
PG2 - IPC
PG15 - Management
802.1 Qaz
ETS
DWRR scheduler
802.1Qau
Congestion Control
rate limiters
(“Reaction Points”)
Shallow queues (2 6 packets) for rapid
CNM response
CNMs only slow
down RPs. Rate
recovery is internal
(byte- and timebased)
Incoming Congestion Notification Messages (CNMs) - “Slow Down!”
10 GbE
link
FCoE Summary
 Presents new, but very familiar, PHY and
Link Layers for FC.
 Core switching discipline remains FC-SW-5.
 Higher FC layers almost completely
unchanged (that’s the legacy value!)
 Biggest Ethernet-level requirement:
lossless fabric.
 Part of Converged Ethernet initiative – lots
of ancillary activity at IEEE.
Further Reading
 FCoE: www.t11.org
 IEEE 802.1Q(az|au|bb):
www.ieee.org
 Thank you! Questions?