Transcript Headline

100 Gb/s InfiniBand Transport
over up to 100 km
Klaus Grobe and Uli Schlegel, ADVA Optical Networking, and
David Southwell, Obsidian Strategics,
TNC2009, Málaga, June 2009
Agenda
 InfiniBand in Data Centers
 InfiniBand Distance Transport
2
© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.
InfiniBand in Data Centers
3
© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.
Connectivity performance
Fiber Link Capacity [b/s]
Adapted from: Ishida, O., “Toward Terabit LAN/WAN” Panel, iGRID2005
10T
WDM
InfiniBand
1T
Ethernet
100G
FC
10G
Bandwidth per Direction [Gb/s]
640
HDRx12
EDRx12
320
160
HDRx4
QDRx12
EDRx4
80
QDRx4
HDRx1
40
EDRx1
20
QDRx1
10
2008
2009
2010
1G
100M
1990
1995
2000
2005
2010
 Bandwidth requirements follow Moore’s Law (# transistors on a chip)
 So far both, Ethernet and InfiniBand outperform Moore’s growth rate
4
© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.
2011
Time
InfiniBand Data Rates
InfiniBand
IBx1
IBx4
IBx12
Single Data Rate, SDR
2.5 Gb/s
10 Gb/s
30 Gb/s
Double Data Rate, DDR
5 Gb/s
20 Gb/s
60 Gb/s
Quad Data Rate, QDR
10 Gb/s
40 Gb/s
120 Gb/s
IB uses 8B/10B coding, e.g., IBx1 DDR has 4 Gb/s throughput
Copper
 Serial (x1, not much seen on the market)
 Parallel copper cables (x4, x12)
Fiber Optic
 Serial for x1 and SDR x4 LX (serialized I/F)
 Parallel for x4, x12
5
© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.
Converged Architectures
iFCP
FCIP
iSCSI
DCB
InfiniBand
Operating System / Application
Small Computer System Interface (SCSI)
FCP
FCP
iFCP
FCIP
TCP
TCP
TCP
IP
IP
IP
FCoE
Ethernet
Ethernet
Ethernet
DCB
lossy
lossy
lossy
Latency
iSCSI
FCP
SRP
IB
lossless
lossless
Performance
SRP – SCSI RDMA Protocol
6
© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.
HPC Networks today
FC and GbE HBAs
and IB HCAs
IB
FC Eth
Server
Cluster
Relevant Parameters
IB
 LAN HBA based on GbE/10GbE
FC Eth
 SAN HBAs based on 4G/8G-FC
 HCAs based on IBx4 DDR/QDR
IB
FC Eth
IB
FC Eth
FC
FC
FC
FC SAN
Ethernet LAN
Typical HPC Data Center today
 Dedicated networks / technologies for LAN, SAN, CPU (server) interconnect
 Consolidation required (management complexity, cables, cost, power)
7
© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.
InfiniBand Distance
Transport
8
© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.
Generic NREN
Large, dispersed Metro Campus, or
Cluster of Campuses
DC
DC
DC
DC
DC
DC
DC
Connection to
Backbone (NREN)
DC
Dedicated (P2P) Connection
to large Data Centers
DC
Core (Backbone) Router
DC Large Data Center
9
Layer-2 Switch
OXC / ROADM
© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.
InfiniBand-over-Distance
Difficulties and solution considerations
Technical difficulties:
 IB-over-copper – limited distance (<15 m)
 IB-to-XYZ conversion – high latency
 No IB buffer credits in today’s switches for distance transport
 High-speed serialization and E-O conversion needed
Requirements:
 Lowest latency, hence highest throughput is a must
 Interworking must be demonstrated
10
© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.
InfiniBand Flow Control
 InfiniBand is credit-based per virtual lane (16)
 On initialization, each fabric end-point declares its capacity to receive data
 This capacity is described as its buffer credit
 As buffers are freed up, end points post messages updating their credit status
 InfiniBand flow control happens before transmission, not after it – lossless transport
 Optimized for short signal flight time; small buffers are used inside the ICs:
Limits effective range to ~300 m
HCA A
HCA B
Across IB Link
2
From
System
Memory
11
4
1
Update Credit
3
© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.
Into
System
Memory
InfiniBand Throughput vs. Distance
Throughput
With B2B Credits
w/o B2B Credits
Distance
 Only sufficient Buffer-to-Buffer credits (B2B credits) in conjunction with error-free
optical transport can ensure maximum InfiniBand performance over distance
 Throughput drops significantly after several 10 m w/o additional B2B credits, this is
caused by an inability to keep the pipe full by restoring receive credits fast enough
 Buffer credit size depends directly on desired distance
12
© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.
InfiniBand-over-Distance Transport
DC
IB
CPU/
Server
Cluster
IB
IB HCAs
IB SF
FC
FC
LAN
SAN
FC
…
WDM
DC
80 x 10G DWDM (redundant)
WDM
…
FC
FC
LAN
SAN
FC
 Point-to-point
 Typically, <100 km,
but can be extended to
any arbitrary distance
10GbE…100GbE
Gate
way
IB SF – InfiniBand Switch Fabric
 Low latency (distance!)
 Transparent infrastructure (should support other protocols)
13
© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.
NREN
IB Transport Demonstrator Results
Obsidian Campus C100
ADVA FSP 3000 DWDM
 4x SDR copper to serial 10G optical
 Up to 80 x 10Gb/s transponders
 840 ns port-to-port latency
 <100 ns latency per transponder
 Buffer Credits for up to 100 km
(test equipment ready for 50 km)
 Max. reach 200/2000 km
SendRecV Throughput vs. Distance
1
Throughput [GB/s]
Throughput [GB/s]
SendRecV Throughput vs. Message Length
0.8
0.4 km
25.4 km
50.4 km
75.4 km
100.4 km
0.6
0.4
0.2
0
1
0.8
0.6
32 kB
128 kB
0.4
512 kB
0.2
4096 kB
0
0
1000
2000
3000
Message Length [kB]
14
DWDM
80 x 10G DWDM
…
DWDM
B2B
Credits
SerDes
…
B2B
Credits
SerDes
…
…
N x 10G InfiniBand Transport over >50 km Distance demonstrated
4000
0
20
40
60
Distance [km]
© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.
80
100
Solution Components
WCA-PC-10G WDM Transponder
Bit rates: 4.25 / 5.0 / 8.5 / 10.0 / 10.3 / 9.95 / 10.5 Gb/s
Applications: IBx1 DDR/QDR, IBx4 SDR, 10GbE WAN/LAN PHY, 4G-/8G-/10G-FC
Dispersion tolerance: up to 100 km w/o compensation
Wavelengths: DWDM (80 channels) and CWDM (4 channels)
Client port: 1 x XFP (850 nm MM, or 1310/1550 nm SM)
Latency <100 ns
Campus C100 InfiniBand Reach Extender
Optical bit rate 10.3 Gb/s (850 nm MM, 1310/1550 nm SM)
InfiniBand bit rate 8 Gb/s (4x SDR v1.2 compliant port)
Buffer credit range up to 100 km (depending on model)
InfiniBand node type: 2-port switch
Small-packet port-to-port latency: 840 ns
Packet forwarding rate: 20 Mp/s
15
© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.
Solution 8x10G InfiniBand Transport
FSP 3000 DWDM System (~100 km, dual-ended)
Chassis, PSUs, Controllers
10G DWDM Modules
Optics (Filters, Amplifiers)
16
~€10.000,-
~€100.000,~€10.000,-
Sum (budgetary)
~€120.000,-
16 x Campus C100 (100 km)
~€300.000,-
System total (budgetary)
~€420.000,-
© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.
An Example…
NASA's largest
supercomputer uses
16 Longbow C102
devices to span two
buildings, 1.5 km
apart, at a link speed
of 80 Gb/s and a
memory-to-memory
latency of just 10 µs.
17
© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.
Thank you
[email protected]
IMPORTANT NOTICE
The content of this presentation is strictly confidential. ADVA Optical Networking is the exclusive owner or licensee of the content,
material, and information in this presentation. Any reproduction, publication or reprint, in whole or in part, is strictly prohibited.
The information in this presentation may not be accurate, complete or up to date, and is provided without warranties or representations
of any kind, either express or implied. ADVA Optical Networking shall not be responsible for and disclaims any liability for any loss or
damages, including without limitation, direct, indirect, incidental, consequential and special damages, alleged to have been caused by
or in connection with using and/or relying on the information contained in this presentation.
Copyright © for the entire content of this presentation: ADVA Optical Networking.