iSER on InfiniBand

Download Report

Transcript iSER on InfiniBand

iSER on InfiniBand
(and SCTP)
Problem Statement
• Currently defined IB Storage I/O protocol
– SRP (SCSI RDMA Protocol)
– SRP does not have a discovery or management
protocol
– SRP does not have a wide following
• The RDMA Consortium voted overwhelmingly to
create iSER instead of porting SRP to IP
• Missing the new function of iSCSI & iSER
– Immediate Data
– Unsolicited Data
• Version 2 is at Level 0 & has not been updated for 1.5
years
• SCTP is not defined for iSER
Reason for iSER over IB or SCTP
• Would like to have the same basic Storage
Protocol across all RDMA Networks
–
–
–
–
Easer to train staff
Easer to create bridging products
Motivate storage industry into an iSCSI/iSER mentality
May help the acceptance of iSCSI/iSER on IP networks
• Desire for a common Discovery and Management
protocol across iSCSI, iSER/iWARP, and IB
– Want the same Management and discovery process
and Software to handle IP networks and IB networks
Similarities iWARP   IB
•
•
•
•
•
•
•
Local STags   L_Key
Remote STags   R_Key
RDMA SendSE
RDMA SendInvSE (New)
RDMA Read/Write
Shared RQs (New)
ZBTOs   ZBVA (New)
Proposed New Logical Structure
+-------------------------------------+
|
SCSI
|
+-------------------------------------+
|
iSCSI
|
DI ------>
+-------------------------------------+
|
iSER
|
+-------------------------------------+
|
RDMAP
|
+------------------------+------------+
|
DDP
|
|
+--------------+---------+ InfiniBand |
|
MPA
|
|
(RC)
|
+--------------+ SCTP
|
|
|
TCP
|
|
|
+--------------+---------+------------+
Example of iSCSI/iSER Layering in Full Feature Mode
Clarify the Term iWARP
• Update the iSER Draft
– Use term iWARP to mean either TCP or SCTP
implementations
– Use the term iWARP/TCP to mean iWARP
over a TCP/IP base
– Use the term iWARP/SCTP to mean iWARP
over an SCTP base
Clarify the Term RDMAP
• Update the iSER Draft:
– Use the term RDMAP to mean any RDMA
protocol over iWARP, InfiniBand, or any other
carrier of RDMA Protocols
– Use the term RDMAP/iWARP to mean an
implementation using iWARP
– Use the term RDMAP/IB to mean the
implementation using InfiniBand
– Etc.
Things to be addressed for
iSER on IB or SCTP
1. Defining, Addressing and Discovery of IB
Storage Nodes
2. Handling of Login (SCTP or IB)
3. Selection of one path to storage vrs others
4. Handling older IB networks
– (Network equipment with Pre 1.2 Architecture)
I. Addressing
• Background: IB has IP addressing
– Part of IP-over-IB (IPoIB)
– Proposal for Mapping Port to IB ServiceID
• IETF IPS WG should validate that:
– iSCSI and iSER Discovery and Management
can operate with IB via IPoIB
– If validated, may not even require normative
changes to draft
• IBTA (InfiniBand Trade Assoc.)
standardize Port to ServiceID mapping
II. Login
• SCTP and IB need a way to send the iSCSI
Login PDUs
• SCTP and IB need a way to transit to full iSER
mode
• IETF IPS WG discussion needed to ascertain
the best way to do this
– iSCSI assumes that TCP/IP streaming is used
– But iSER does not care, as long as it can transit into
Full RDMA mode
– iSER Spec needs language to permit this
• No need to define details, just language to permit
• Leave details up to implementations
– May have examples in Appendix, or separate
informational drafts
III. Path Selection
• A target could have several types of portal groups
– iSCSI, iSER/TCP, iSER/SCTP, IB, …
– Some Host Systems may prefer one type vrs others
• Can leave this completely up to implementation
– Therefore not an IETF IPS issue (except informational)
– For IB let IBTA standardize connection approach
• Preference for direct Endport connection
• Preference for iSER Gateway vrs IPoIB
And/Or
• Can add TPG type information to:
– SendTargets, SLP, iSNS
– Would be an IETF IPS issue
IV. Handling older RDMA Networks
• May be an IETF IPS Workgroup issue
Or
• May be out of scope as a compatibility Hack
However:
• Some applications have requested to have these
features
• VA Based TO
• Explicit Invalidates only
• Toleration Language and Hello Flags permit both
Reference
• http://www.haifa.il.ibm.com/satran/ips/
iSER-in-an-IB-network-V9.pdf
Backups
I. Defining, Addressing and Discovery
• IB nodes are addressed via a GID (Global ID)
• With IP-over-IB (IPoIB) all nodes have Normal IP
addresses
• IP Addresses are converted to GIDs via ARP
– Returned like MAC Address
• Therefore, SendTargets, SLP and iSNS can
continue to function in the same way
• SendTargets, SLP, and iSNS can all use normal
TCP/IP via IPoIB
II. Handling of Login
• iSCSI Login depends on the value of
MaxRecvDataSegmentLength = 8192
• iSCSI Login & Login Reply is basically a half duplex
process
• IB (and SCTP) can send Login PDUs to Target with <= 8K
data
– IB Node will work with RC connections using “RDMA Sends”
– No issue of Flow Control (it is half duplex) & Expecting buffer can
be queued Max 8192 + iSCSI header
– Transition to iSER mode is not something special in IB
• Therefore, words are being proposed for the Login to be
done in IB with Sends (or normal SCTP messages)
– iSCSI Login PDUs remain unchanged
III. Selection of Paths to Storage
• In an IB environment it is useful to have a way to
select an IB Storage Endpoint in preference to
– An IB to: an iSCSI or iSER/iWARP Gateway,
or
– An iSCSI TCP IPoIB Gateway to IP Network
• And a way to select an IB to: iSCSI or
iSER/iWARP Gateway in preference to
– An iSCSI TCP IPoIB Gateway to IP Network
• This is done via IB defined connection process
– Being address in the IBTA
– Not an IETF issue
IV. Handling Older IB Networks
(ZBTO vrs VABTOs)
• Some IB Networks will not support ZBTO
– They require a VA (VABTOs)
• By using a previously reserved bit in the
Hello/HelloReply message Initiators can
request VABTOs
– Can treat the Actual STag and VABTO as a
Virtual STag (96 bits instead of 32 bits) in
iSER Headers (only)
IV. Handling Older IB Networks
(Missing Auto HW Invalidate)
• Some IB Network Nodes can not issue
SendInvSE type messages
– Can just get by with SendSE type message
– iSER requires Initiator side invalidates
• Some IB Networks Nodes can not receive
SendInvSE and then Automatically Invalidate
STags
– Initiator tells Target by using previously reserved bit in
Hello Message