DoE Workshop

Download Report

Transcript DoE Workshop

DataTAG Project Update
CGW’2003 workshop, Crakow (Poland)
October 28, 2003
Olivier Martin, CERN, Switzerland
CGW03, Crakow, 28 October 2003
DataTAG partners
http://www.datatag.org
Funding agencies
Cooperating Networks
DataTAG Mission
TransAtlantic Grid

EU  US Grid network research

High Performance Transport protocols

Inter-domain QoS

Advance bandwidth reservation

EU  US Grid Interoperability

Sister project to EU DataGRID
Main DataTAG achievements
(EU-US Grid interoperability)

GLUE Interoperability effort with DataGrid, iVDGL
& Globus

GLUE testbed & demos

VOMS design and implementation in collaboration
with DataGrid
 VOMS evaluation within iVDGL underway

Integration of GLUE compliant components in
DataGrid and VDT middleware
CGW03, Crakow, 28 October 2003
5
Main DataTAG achievements
(Advanced networking)

Internet landspeed records have been beaten one after the other by
DataTAG project members and/or teams closely associated with
DataTAG:
 Atlas Canada lightpath experiment (iGRID2002)
 New Internet2 landspeed record (I2 LSR) by Nikhef/Caltech team
(SC2002)
 Scalable TCP, HSTCP, GridDT & FAST experiments (DataTAG
partners & Caltech)
 Intel 10GigE tests between CERN (Geneva) and SLAC (Sunnyvale)
– (Caltech, CERN, Los Alamos NL, SLAC)
 New I2LSR (Feb 27-28, 2003): 2.38Gb/s sustained rate,
single TCP/IP v4 flow, 1TB in one hour
 Caltech-CERN
 Latest IPv4 & IPv6 I2LSR were awarded live from
Indianapolis during Telecom World 2003:
 May 6, 2003: 987 Mb/s single TCP/IP v6 stream
 Oct 1, 2003, 5.44 Gb/s sustained rate, single TCP/IP v4 stream,
1.1TB in 26 minutes -> 1 680MB CD/second
CGW03, Crakow, 28 October 2003
6
Significance of I2LSR to the Grid?

Essential to establish the feasibility of multi-Gigabit/second
single stream IPv4 & IPv6 data transfers:
 Over dedicated testbeds in a first phase
 Then across academic & research backbones
 Last but not least across campus network
 Disk to disk rather than memory to memory
 Study impact of high performance TCP over disk servers

Next steps:
 Above 6Gb/s expected soon between CERN and Los Angeles
(Caltech/CENIC PoP) across DataTAG & Abilene
 Goal is to reach 10Gb/s with new PCI Express buses
 Study alternatives to standard TCP
 Non-TCP transport
 HSTCP, FAST, Grid-DT, etc…
CGW03, Crakow, 28 October 2003
7
Impact of high performance flows
across A&R backbones?
Possible solutions:
•
Use of “TCP friendly” non-TCP (i.e. UDP) transport
•
Use of Scavenger (i.e. less than best effort) services
CGW03, Crakow, 28 October 2003
8
DataTAG testbed overview
(phase 1/2.5G & phase2/10G)
CGW03, Crakow, 28 October 2003
9
Layer1/2/3 networking
(1)

Conventional layer 3 technology is no longer fashionable
because of:
 High associated costs, e.g. 200/300 KUSD for a 10G router
interfaces
 Implied use of shared backbones

The use of layer 1 or layer 2 technology is very attractive
because it helps to solve a number of problems, e.g.
 1500 bytes Ethernet frame size (layer1)
 Protocol transparency (layer1&2)
 Minimum functionality hence, in theory, much lower costs (layer1&2)
Layer1/2/3 networking
(2)

So called, « lambda Grids » are becoming very
popular,
 Pros:
 circuit oriented model like the telephone network, hence no need
for complex transport protocols
 Lower equipment costs (i.e. typically a factor 2 or 3 per layer)
 the concept of a dedicated end to end light path is very elegant
 Cons:
 « End to end » still very loosely defined, i.e. site to site,
cluster to cluster or really host to host
 High cost, Scalability & Additional required middleware to deal
with circuit set up, etc
Multi vendor 2.5Gb/s
layer 2/3 testbed
INRIA
Layer 3
VTHD
L3 Servers
Routers
GigE switch
Layer 2
Layer 1
GigE switch
A1670
Multiplexer
A-7770
2.5G
2*GigE
C-7606
CERN
SuperJanet
PPARC
INFN/CNAF
8*GigE
P-8801
C-ONS15454
J-M10
10G
UvA
GEANT
GARR
To STARLIGHT
From CERN
L2 Servers
Abilene
ESNet
Canarie
CGW03, Crakow, 28 October 2003
Ditto
2.5G
STARLIGHT
12
State of 10G deployment
and beyond

Still little deployed, because of lack of demand,
hence:
 Lack of products
 High costs, e.g. 150KUSD for a 10GigE port on a Juniper T320
router
 Even switched, layer 2, 10GigE ports are expensive, however the
prices should come down to 10KUSD/port towards the end of
2003.

40G deployment, although more or less
technologically ready, is unlikely to happen in the
near future, i.e. before LHC starts
10G DataTAG testbed extension
to Telecom World 2003 and Abilene/Cenic
On September 15, 2003, the DataTAG
project was the first transatlantic testbed
offering direct 10GigE access using Juniper’s
VPN layer2/10GigE emulation.
Sponsors: Cisco, HP, Intel, OPI (Geneva’s
Office for the Promotion of Industries &
Technologies), Services Industriels de Geneve,
Telehouse Europe, T-Systems

Impediments to high E2E
throughput across LAN/WAN
infrastructure
For many years the Wide Area Network has been the
bottlemeck, this is no longer the case in many
countries thus, in principle, making the deployment
of data intensive Grid infrastructure possible!
 Recent I2LSR records show for the first time ever
that the network can be truly transparent and that
throughputs are limited by the end hosts

The dream of abundant bandwith has now become a
reality in large, but not all, parts of the world!


Challenge shifted from getting adequate bandwidth to
deploying adequate LANs and cybersecurity
infrastructure as well as making effective use of it!
Major transport protocol issues still need to be
resolved, however there are many encouraging
signs that practical solutions may now be in sight.
NEC’2003 Conference, Varna (Bulgaria) 19 September 2003
15
Single TCP stream performance
under periodic losses
Bandwidth Utilization (%)
Effect of packet loss
100
90
80
70
60
50
40
30
20
10
0
0.000001
Loss rate =0.01%:
LANBW
utilization= 99%
WANBW
utilization=1.2%
0.00001
0.0001
0.001
0.01
0.1
Packet Loss frequency (%)
WAN (RTT=120ms)
LAN (RTT=0.04 ms)
1
10
Bandwidth available = 1 Gbps
 TCP throughput is much more sensitive to packet loss in WANs
than in LANs
 TCP’s congestion control algorithm (AIMD) is not suited to gigabit
networks
 Poor limited feedback mechanisms
 The effect of even very small packet loss rates is disastrous
 TCP is inefficient in high bandwidth*delay networks
 The future performance of data intensive grids looks grim if we
continue to rely on the widely-deployed TCP RENO stack