LOFAR – Arms

Download Report

Transcript LOFAR – Arms

LOFAR
–
Wide
Area
Network
LOFAR – Network Experiences
Klaas Stuurwold
Roel Gloudemans
Peter Maat
LOFAR Network Experiences 20-9-2007
1
LOFAR Network Overview
LOFAR – Arms
~ 45 Stations
2.5Gb/s per station in 4 approx. 600 Mb/s data streams
Max distance Station to core is 80km
LOFAR – Core
~ 32 Stations
2 - 12Gb/s per station in max 12 1Gb/s Mb/s data streams
Max distance Station to core is 80km
Data characteristics
Raw Ethernet or IP/UDP
Unidirectional
Jumbo frames
Per station a few large
data streams
Point to point connections
LOFAR Network Experiences 20-9-2007
Monitoring and control ~ 100Mb/s per station
LOFAR Partner data ~ 100Mb/s per station
Max latency: 10ms
Availability min. 95%
Upgradeable:
Longer arms
Higher data rates
2
Initial approach: I
Use COTS equipment
Experience readily available
Cheap
Short order to deployment time
Flexible
Use 1Gb/s technology combined with CWDM
Cheapest option
Use switches and no media converters
Optimal flexibility, no replugging needed on RFP board or port
Cheap, one switch with multiple ports costs about as much as one media converter
However, networks for different goals not isolated by default
Use VLAN’s to separate data from management and partners
Separate IP address spaces possible
Separation of network traffic
Bandwith control or Quality of Service per VLAN possible on most switches
Use multi-link trunking to combine several 1Gb/s ports into one logical port
No need for Spanning tree protocol which generates network overhead
One single logical interface to configure
LOFAR Network Experiences 20-9-2007
3
Initial Approach: II
See Poster
LOFAR Network Experiences 20-9-2007
4
Unexpected Challenges: I
Configuration errors might make a remote station unaccessible.
In the beginning it was thought that not much configuration would be needed
In practice, changes where happening every week if not day
Result: Regular occurrence of “terminal” miss-configuration.
Solution: Add “out-of-band” management equipment for switches and servers
For network equipment a separate data-path is needed
RS-232 to Ethernet boxes are readily available. Often with extended
features which prove to be extremely useful. e.g. the possibility to connect
door switches and temperature sensors
The Jumbo frame “standard”
Jumbo frames can be defined according to the Ethernet II standard, or the
IEEE802.3 standard
The difference: In the IEEE standard the VLAN tag is included in the 9000 bytes,
thus the data frame must be smaller. Result: No data transmission if part of the
network is using the Ethernet II standard with max data frame size.
Solution: Check specs carefully before buying.
LOFAR Network Experiences 20-9-2007
5
Unexpected Challenges: II
Real Unidirectional traffic
The data receiving stations do no transmit any ethernet traffic. This prevents the switches from learning their
MAC addresses.
Thus all data is broadcasted on all ports
So when one station starts transmitting the station becomes unreachable (if we didn’t have VLAN’s the
whole network would become unreachable)
Solutions:
Create a static MAC address table on the switch. Surprisingly few switch brands support this. It is also
error prone.
Schedule a network ping on the data stations. Works very well, but is not possible with a Blue Gene
The tree is not fair
We want our switches to be able to operate on full speed on
all ports
Nowadays this is possible on most 1Gb/s switches if you plug
right (Cisco is the most famous exception)
On the switch mainboard there is often one ASIC per couple
of ports. These ports can communicate to each other on full
speed all the time. However these ASIC’s are grouped under
another ASIC, this puts a limitation on ASIC to ASIC traffic.
Some of these switches are sold as Full Speed Non-Blocking!
Solution: Read switch specs very careful and agree on a
return policy. Don’t trust cheap switches!
LOFAR Network Experiences 20-9-2007
6
Unexpected Challenges: III
Multi-Link trunking, is the pain worth the gain?
One sender-receiver data stream cannot be balanced over the links in a trunk
Thus, on a trunk consisting of 2 links, it is only possible to have 2 600Mb/s data
streams and not 3.
Data streams are not balanced over the links according to the current load of the
link. An algorithm with sender-receiver MAC/IP addresses or protocol numbers is
used.
Solutions:
Use 10Gb/s networking
Choose source and destination addresses wisely. Obtain the balancing
algorithm from the vendor. (Very often this is classified information!)
Buy cheap, pay later
In our experience the firmware of cheap switches often contain blocking bugs
The performance of cheap switches is not always what you’d expect from the
documentation. Fair tree and/or CPU issues
The better (more expensive) switches often have features that help you
overcome unexpected challenges
Price and MTBF are related
LOFAR Network Experiences 20-9-2007
7
The 10G story: I
The results are preliminary. We had some problems with our special
“Glow-in-the-dark” ™ fibre. This limited the connection speeds to
max 8Gb/s.
10G networking, a solution for:
Trunking; not needed at station level for LOFAR. One 10G lightpath is an
ideal mechanism to connect stations abroad.
The amount of fibres/muxes needed per station
Built in overhead for future growth
Less maintenance/less complicated infrastructure
At the same price (or less!). Short range (10km) optical at 2000 euro per
switch port
Modern premium brand switches are able do 10Gb/s full speed non
blocking.
The addition of 2 10Gb/s switches between 2 computer systems
which where connected directly before only added 0.3ms extra delay
Modern PCI-X based systems are able to almost saturate a 10Gb/s
link with TCP/UDP traffic if protocol offloading cards are used. (e.g.
Myrinet cards; use Jumbo frames!).
LOFAR Network Experiences 20-9-2007
8
The 10G story: II
Connections between networks often require filtering or some form of address
translation
We were not able to to find any differences in network performance when filter rules where
added to the switch.
When filtering was done at the PC level, with <10 rules no difference, but with 255 rules, the
performance dropped to 7.8Gb/s
Fibre quality is important. A fibre which performs well on 1Gb/s might not perform with
10Gb/s. No clear problems could be detected except for a few bad packets and
inconsistent test results.
LOFAR Network Experiences 20-9-2007
9