IHEPCCCSCICReport_hbn011007V2

Download Report

Transcript IHEPCCCSCICReport_hbn011007V2

Networks and Grids for High Energy Physics
and Global e-Science, and the Digital Divide
Standing Committee on Inter-regional Connectivity
Harvey B. Newman
California Institute of Technology
ICFA SCIC Report
IHEPCCC Meeting, January 10, 2007
The LHC Data “Grid Hierarchy” Evolved:
MONARC  DISUN, ATLAS & CMS Models
…
CERN/Outside Ratio ~1:4 T0/(T1)/(T2) ~1:2:2
~40% of Resources in Tier2s
US T1s and T2s Connect to US LHCNet PoPs
Online
GEANT2+NRENS
USLHCNet + ESnet
CC In2P3
BNL T1
10 Gbps
10 – 40
Gbps
UltraLight/DISUN
Outside/CERN Ratio Larger; Expanded Role of
Emerging
Vision:
A Richly
Structured,
Global
Dynamic System
Tier1s
& Tier2s:
Greater
Reliance
on Networks
Petabytes Per Month
“Onslaught of the LHC”
0.2 to 1.1 PBytes/Mo. 2 Months (Apr.-June 2006)
1.0
0.8
0.6
0.4
0.2
0
SC4 (2006):CMS PhEDEx Tool Used to
2006
Transfer 1-2 PBytes/Mo. for 5 Months
By Destination
By Source
FNAL
CERN
6
UCSD
Tier2 at
1 GByte/sec
200-300 MB/sec
Recently other US Tier2s at 250-300 MB/sec;
Fermilab working to bring Tier2s in Europe “up to speed”
Caltech/CERN & HEP at SC2006:
Petascale Transfers for Physics
~200 CPU
56 10GE
Switch Ports
50 10GE NICs
100 TB Disk
New Disk-Speed WAN Transport
Apps. for Science (FDT, LStore)
Research
Partners
FNAL, BNL, UF,
UM, ESnet, NLR,
FLR, Internet2,
ESNet, AWave,
SCInet,Qwest,
UERJ, UNESP,
KNU, KISTI
Corporate
Partners
Cisco, HP
Neterion
Myricom
DataDirect
BlueArc
NIMBUS
FDT: Fast Data Transport
Results 11/14 – 11/15/06
 Stable disk-to-disk flows Tampa-Caltech:
Efficient Data Transfers
Stepping up to 10-to-10 and 8-to-8 1U
 Reading and writing at disk
Server-pairs 9 + 7 = 16 Gbps; then
speed over WANs (with TCP)
Solid overnight. Using One 10G link
for the first time
 Highly portable: runs on all
major platforms.
 Based on an asynchronous,
multithreaded system, using
Java NIO libraries
 Streams a dataset (list of files)
continuously, from a managed
pool of buffers in kernel space,
through an open TCP socket
Smooth data flow from each
disk to/from the network
New Capability Level: 40-70 Gbps
No protocol start-phase
per rack of low cost 1U servers
between files
LHCNet, ESnet Plan 2006-2009:
20-80Gbps US-CERN, ESnet MANs, IRNC
AsiaPac
SEA
US-LHCNet:
Wavelength Triangle to
NY-CHI-GVA-AMS
Quadrangle
2007-10: 30, 40, 60, 80G
Europe
Europe
Aus.
ESnet
SDN Core:
30-50G
SNV
BNL
Japan
Japan
CHI
NYC
GEANT2
SURFNet
IN2P3
DEN
Metro
Rings
DC
FNAL
Aus.
SDG
ALB
ESnet IP Core
≥10 Gbps
ATL
CERN
ELP
ESnet hubs
New ESnet hubs
Metropolitan Area Rings
Major DOE Office of Science Sites
High-speed cross connects with Internet2/Abilene
Production IP ESnet core, 10 Gbps enterprise IP traffic
Science Data Network core, 40-60 Gbps circuit transport
Lab supplied
Major international
LHCNet Data Network
NSF/IRNC circuit; GVA-AMS connection via Surfnet or Geant2
10Gb/s
10Gb/s
30Gb/s
2 x 10Gb/s
US-LHCNet
Data Network
(3 to 8 x 10 Gbps
US-CERN)
ESNet MANs to FNAL & BNL;
Dark fiber to FNAL
HEP Major Links:
Bandwidth Roadmap in Gbps
US LHC
NWG
2005
2006
2007
2008
2009
2010
CERN-BNL (ATLAS)
0.5
5
15
20
30
40
CERN-FNAL (CMS)
7.5
15
20
20
30
40
Other (Tier2, Tier3, InterRegional Traffic ……)
2
10
10
10-15
20
20-30
TOTAL US-CERN BW Est.
10
30
45
50-55
80
100-110
US LHCNet Bandwidth
10
20
30
40
60
80
Backup
10
10
10-20
20
20-30
REQUIREMENTS Estimate
Other BW (GEANT2,
Surfnet, IRNC, Gloriad…)
Moderating a Trend: >~1000X to 100X BW Growth Per Decade
Roadmap may be modified once 40-100 Gbps channels appear.
Note Role of other networks across the Atlantic: notably GEANT2
ESnet4
W. Johnston ESnet
Core networks: 50-60 Gbps by 2009-2010, 200-600 Gbps by 2011-2012
Canada
Asia-Pacific
Asia Pacific
(CANARIE)
Canada
Europe
(CANARIE)
(GEANT)
GLORIAD
CERN (30+ Gbps)
CERN (30+ Gbps)
Europe
(Russia and
China)
(GEANT)
Boston
Australia
1625 miles / 2545 km
Science Data
Network Core
Boise
IP Core
New York
Denver
Washington
DC
Australia
Tulsa
LA
Albuquerque
San Diego
South America
IP core hubs
(AMPATH)
SDN hubs
Primary DOE Labs
Core network fiber path is
High speed cross-connects
~ 14,000 miles / 24,000 km
with Ineternet2/Abilene
Possible hubs
2700 miles / 4300 km
South America
(AMPATH)
Jacksonville
Production IP core (10Gbps)
SDN core (20-30-40-50 Gbps)
MANs (20-60 Gbps) or
backbone loops for site access
International connections
11
LHCOPN: Overlay T0-T1 Network (CERN-NA-EU)
Renater lightpaths
CERN
AS513
IN2P3
AS789
Renater
Switch
Lightpath
RAL
UKlight
Geant2 Lightpaths
DFN lightpaths
Geant2 - IP
Gridka
to: CNAF, SARA
to: GridKA
US
LHCNet
PIC
RedIRIS lightpaths
ESnet
AS766
NORDUnet
NORDUgrid
to: GridKA
CANARIE
ASnet
BNL
FNAL
TRIUMF
ASCC
AS43
AS3152
AS
AS 9264
Main
path
Backup
path
T1-T1
path
L1/L2
network
L3
network
SURFnet
Tier 1
CNAF
AS137
GARR
Netherlight
AS680
SARA
AS1126
E. Martelli
Next Generation LHCNet:
Add Optical Circuit-Oriented Services
Geant2: GFP, VCAT
& LCAS on Alcatel
Based on CIENA “Core Director” Optical Multiplexers (Also Internet2)
 Robust fallback, at the optical layer
 Circuit-oriented services: Guaranteed Bandwidth Ethernet Private Line (EPL)
 New standards-based software: VCAT/LCAS: Virtual, Dynamic Channels
Internet2’s “NewNet” Backbone
Level(3) Footprint;
Infinera 10 X 10G Core;
CIENA Optical Muxes
Initial deployment – 10 x 10 Gbps wavelengths over the footprint
First round maximum capacity – 80 x 10 Gbps wavelengths; expandable
Scalability – potential migration to 40 Gbps or 100 Gbps capability
Reliability – carrier-class standard assurances for wavelengths
The community will transition to NewNet from now, over period of 15 months
+Paralleled by Initiatives in: nl, ca, jp, uk, kr; pl, cz, sk,
pt, ei, gr, hu, si, lu, no, is, dk … + >30 US states
ICFA Standing Committee on
Interregional Connectivity (SCIC)
 Created in July 1998 in Vancouver ; Following ICFA-NTF
CHARGE:
 Make recommendations to ICFA concerning the connectivity
between the Americas, Asia and Europe
 As part of the process of developing these
recommendations, the committee should
 Monitor traffic on the world’s networks
 Keep track of technology developments
 Periodically review forecasts of future
bandwidth needs, and
 Provide early warning of potential problems
 Main focus since 2002: the Digital Divide in the HEP
Community
SCIC in 2005-2006
http://cern.ch/icfa-scic
Three 2006 Reports; Update for 2007 Soon:
Rapid Progress, Deepening Digital Divide
 Main Report: “Networking for HENP” [H. Newman, et al.]
Includes Updates on the Digital Divide, World
Network Status; Brief updates on Monitoring
and Advanced Technologies
29 Appendices: A World Network Overview
Status and Plans for the Next Few Years of Nat’l &
Regional Networks, HEP Labs, & Optical Net Initiatives
 Monitoring Working Group Report
[L. Cottrell]
Also See:
 TERENA (www.terena.nl) 2005 and 2006 Compendiums:
In-depth Annual Survey on R&E Networks in Europe
 http://internetworldstats.com: Worldwide Internet Use
 SCIC 2003 Digital Divide Report
[A. Santoro et al.]
ICFA Report 2006 Update:
Main Trends Deepen and Accelerate
 Current generation of 10 Gbps network backbones and major Int’l
links arrived in 2001-5 in US, Europe, Japan, Korea; Now China
 Bandwidth Growth: from 4 to 2500 Times in 5 Years; >> Moore’s Law
 Rapid Spread of “Dark Fiber” and DWDM: the emergence of
Continental, Nat’l, State & Metro “Hybrid” Networks in Many Nations
Cost-effective 10G or N X 10G Backbones, complemented by
Point-to-point “Light-paths” for “Data Intensive Science”, notably HEP
First large scale 40G project: CANARIE (Ca): 72 waves and ROADMs
 Proliferation of 10G links across the Atlantic & Pacific; Use of
multiple 10G Links (e.g. US-CERN) along major paths began in Fall 2005
On track for ~10 X 10G networking for LHC, in production by 2007-8
 Technology evolution continues to drive performance higher,
equipment costs Lower
Commoditization of Gigabit and now 10-Gigabit Ethernet on servers
Use of new busses (PCI Express) in PC’s and network interfaces in 2006
Improved Linux kernel for high speed data transport; multi-CPUs
 2007 Outlook: Continued growth in bandwidth deployment & use
Transition to Community Owned or
Operated Optical Infrastructures
National Lamba Rail
Example: NLR
www.nlr.net
Each
Link to 32 X 10G
Cost Recovery Model
Supports: Cisco Research
Wave, UltraScience Net,
Atlantic & Pacific Wave;
Initiatives w/HEP
A Network of Networks
 WaveNet:
point-to-point
lambdas
 FrameNet: Ethernet based
services
 PacketNet: IP Routed Nets
GÉANT2 November 2006
Multi-Wavelength Core + 0.6-10G Loops
Dark Fiber Connections
Among 16 Countries:
 Austria
 Belgium
 Bosnia-Herzegovina
 Czech Republic
 Denmark
 France
 Germany
 Hungary
 Ireland
 Italy,
 Netherland
 Slovakia
 Slovenia
 Spain
 Switzerland
 United Kingdom
Amsterdam Internet Exchange Point 1/09/07
Traffic Doubled (to 226 Gbps Peak) in <1 Year
200 G
5 Minute
Max
150 G
100 G
Average
Some Annual Growth Spurts;
Typically In Summer-Fall
“Acceleration” Last Summer
The Rate of HENP Network Usage Growth (80-100+% Per Year)
is Matched by the Growth of Traffic in the World at Large
226 Gbps
Internet Growth in the World At Large
Work on the Digital Divide
from Several Perspectives
 Share Information: Monitoring, Tracking BW Progress;
Dark Fiber Projects & Pricing
 Model Cases: Poland, Slovakia, Brazil, Czech Rep., China …
 Encourage Access to Dark Fiber
 Encourage, and Work on Inter-Regional Projects
 GLORIAD, Russia-China-Korea-US-Europe Optical Ring
 Latin America: CHEPREO/WHREN (US-Brazil); RedCLARA
 Mediterranean: EUMEDConnect; Asia-Pacific: TEIN2
 India Link to US, Japan and Europe
 Technical Help with Modernizing the Infrastructure:
 Provide Tools for Effective Use: Data Transport, Monitoring,
Collaboration
 Design, Commissioning, Development
 Raise Awareness: Locally, Regionally & Globally
Digital Divide Workshops
Diplomatic Events: WSIS, RSIS, Bilateral: e.g. US-India
SCIC Monitoring WG
PingER (Also IEPM-BW)
R. Cottrell
Monitoring & Remote Sites (1/06)
Measurements from 1995 On
Reports link reliability & quality
Countries monitored
 Contain 90% of world population
 99% of Internet users
New
3700 monitor-remote site pairs
 35 monitors in 14 countries
Capetown,Rawalpindi, Bangalore
1000+ remote sites in 120
Countries
Countries: N. America (2), Latin America (18), Europe (25), Balkans (9),
Africa (31), Mid East (5), Central Asia (4), South Asia (5), East Asia (4),
SE Asia (6), Russia includes Belarus & Ukraine (3), China (1) and Oceania (5)
SCIC Monitoring WG - Throughput
Improvements 1995-2006
Progress: but Digital Divide is Mostly Maintained
40% annual
improvement
Factor ~10/7 yrs
Behind Europe
6 Yrs: Russia,
Latin America
7 Yrs: Mid-East,
SE Asia
10 Yrs: South Asia
11 Yrs: Cent. Asia
12 Yrs: Africa
India, Central
Asia, and Africa
are in Danger of
Falling Even
Farther Behind
Bandwidth of TCP < MSS/(RTT*Sqrt(Loss))
Matthis et al., Computer Communication Review 27(3), July 1997
SCIC Digital Divide
Workshops and Panels
 2002-2005:
An effective way to raise awareness of the problems, and
discuss approaches and opportunities for solutions
with national and regional communities, and gov’t officials
 ICFA Digital Divide Workshops: Rio 2/2004; Daegu 5/2005
 CERN & Internet2 Workshops on R&E Networks in Africa
 February 2006
 CHEP06 Mumbai: Digital Divide Panel, Network Demos,
& Workshop [SCIC, TIFR, CDAC, Internet2, Caltech]
“Moving India into the Global Community Through Advanced
Networking”
 October 9-15 2006:
 ICFA Digital Divide Workshops in Cracow & Sinaia
 April 14-17 2007: “Bridging the Digital Divide”: Sessions at
APS Meeting in Jacksonville; sponsored by Forum for
International Physics
International ICFA Workshop on
HEP Networking, Grids, and Digital
Divide Issues for Global e-Science
http://chep.knu.ac.kr/HEPDG2005
Workshop Missions
 Review the status and outlook, and focus on
May 23-27,
2005 interissues in data-intensive
Grid computing,
regional connectivity
and Grid
enabled analysis
Daegu,
Korea
for high energy physics
 Relate these to theDongchul
key problemSon
of
the Digital
Divide
Center
for High
Energy Physics
 Promote awareness of these issues in various
Newman
regions, focusingHarvey
on the Asia
Pacific, Latin
California
Institute
America,
Russia, and
Africa of Technology
 Develop approaches to eliminate the Divide and
 Help ensure that the basic requirements for
global collaboration are met, related to all of
these aspects
International ICFA Workshop on
HEP Networking, Grid and
Digital Divide Issues for Global E-Science
National Academy of Arts and Sciences
Cracow, October 9-11, 2006
http://icfaddw06.ifj.edu.pl/index.html
Sinaia, Romania
October 13-18, 2006
http://niham.nipne.ro/events2006/
Highest Bandwidth Link in European NREN’s
Infrastructure; The Trend to Dark Fiber
10.0G
10000
Dk
Cz
1.0G
1000
Backbone Speed (Mbps)
Percentage
of Dark Fiber
in Backbone
100
0.1G
80-100%
80-100 [7]
50-80 [2]
50-80%
Pource ntage
of dark fibe r in
the backbone
Is
It
Nl
Pl
No
Ei
Ch
Pt Si Sk
Lu
10
5-50 [4] 0.01G
5-50%
[12]
Unav ailable
No
data Data
1
Aus
tria (
ACO
net)
Be lg
ium
(BEL
N ET
)
Cy p
r
u
s
Czec
(
C
YNE
h Re
T)
publ
ic (C
ESN
ET)
De n
mark
( UNI
.C)
Esto
ni a (
EEN
ET)
Finla
nd (F
U NE
Fran
T)
ce ( R
ENA
TER
)
Germ
any
(DFN
)
Gree
ce (G
Hun
RNE
gar y
T)
( HUN
GAR
NET
)
Icela
nd (R
Hne
t)
Irela
nd ( H
EAn
et )
Italy
(GAR
R)
Lat v
ia ( L
ANE
T)
Lat v
ia ( L
ANE
T-2)
Lithu
ania
( LITN
Luxe
ET)
mbo
urg (
RES
TE N
Ne th
A)
erlan
ds S
URF
net )
Norw
ay (U
NINE
TT )
Pola
nd (P
IONI
ER)
Port
ugal
( FCC
N)
Slov
enia
( ARN
ES)
Slov
akia
(SAN
ET)
Spai
n (R
edIR
IS)
Swe
den
(
S
Swit
UNE
ze rla
T)
nd (S
WITC
H)
<5
0-5 %
NRENs with dark fiber can deploy light paths, to support
separate communities &/or large applications.
Up to 100X gain in some cases, at moderate cost.
New with > 50% in ‘06: de, gr, se; az, bl, sb/mn. More planned
Source:
TERENA
www.terena.nl
SLOVAK Academic Network
May 2006: All Switched Ethernet
http://www.sanet.sk/en/index.shtm
120km CBDF
Cost 4 k
Per Month
1 GE 2/16/05
T. Weis
 1660 km of Dark Fiber CWDM Links
 August 2002: Dark Fiber Link, to Austria
 April 2003:
Dark Fiber Link to Czech Republic
2500x: 2002-2006
 2004:
Dark Fiber Link to Poland
 10 GbE Cross-Border Dark Fiber to Austria & Czech Republic;
(11/2006); 8 X 10G over 224 km with Nothing In-Line shown
Czech Republic: CESNET2
2500 km
Leased
Dark Fibers
(since 1999)
1 GbE LightPaths in
CzechLight;
1 GbE to
Slovakia;
2005-6: 32 Wavelength
Software-Configurable DWDM Ring
+ More 10GE Connections Installed
1 GbE to
Poland
Poland: PIONIER 20G + 10G Cross
Border Dark Fiber Network (Q4 2006)
GDAŃSK
6000 km of Owned Fiber;
Multi-Lambda Core
21 Academic MANs
5 HPCCs
KOSZALIN
OLSZTYN BASNET
SZCZECIN
BIAŁYSTOK
TORUŃ
GÉANT2 10+10 Gb/s
 Moved to 20G on all
major links
Cross Border Dark Fibers
 20G to Germany
 10G Each to Cz, Sk
 Moved to Connect All
Neighbors at 10G in 2006
+ 20G to GEANT;
10G to Internet2
BYDGOSZCZ
155 Mb/s
POZNAŃ
WARSZAWA
ZIELONA
GÓRA
ŁÓDŹ
WROCŁAW
2 x 10
Gb/s
(2 l)
CBDF
10Gb/s
(2 l)
RADOM
CZĘSTOCHOWA
KIELCE
OPOLE
PUŁAWY
LUBLIN
KATOWICE
RZESZÓW
KRAKÓW
BIELSKO-BIAŁA
CESNET, SANET 2x10 Gb/s
 CBDF in Europe: 8 Links now, including CCIN2P3 link to CERN;
12 more links planned in the near future [source: Terena]
CBDF
10Gb/s
(1 l)
MAN
Romania: RoEduNet Topology
155 Mbps Inter-City; 2 X 622 Mbps to GEANT
Connects 610
Institutions
to GEANT:
 38 Universities
 32 Research
Institutes
 500 Colleges
& High Schools
 40 Others
RoGrid Plans
for 2006
 10G Experimental
Link UPB-RoEdunet
 Upgrade 3-4 Local
Centers to 2.5G
2007 Plan: Dark Fiber Infrastructure
with 10G Light-paths
(help from Caltech and CERN)
N. Tapus
Brazil: RNP2 Next-Generation
Backbone New vs. Old
A factor of
70 to 300 in
Bandwidth
2006:
Buildout of dark fiber
nets in 27 cities with
RNP PoPs underway
200 Institutions
Connected at 1 GbE
in 2006-7 (welladvanced)
2.5G (to 10G) WHREN
(NSF/RNP) Link to US;
622M Link to GEANT
M. Stanton
Plan: Extend to the
Northwest; Dark fiber across
Amazon jungle to Manaus
President of India Collaborating with US,
CERN, Slovakia via VRVS/EVO
Coincident with
Data Transfers of
~500 Mbps
15 TBytes to/from
India in 2 Days
INDIA
Chennai POP VSNL
TIFR to Japan Connectivity
Mumbai-Japan-US
International IPLC (4 X Links
STM-1)
LANDING STATIONS
VSNL
TIC
MUX
MUX
TIC
Cable
TIC
EAC
MUX
MUX
EAC
Cable
TIFR Link to Japan
+ Onward to US & Europe
Caltech, TIFR, CDAC,
Prabhadevi POP VSNL
Loaned Link from
JGN2, World Bank, IEEAF,
VSNL
VSNL
at CHEP06
Internet2, VSNL
MUX
End
to End Bandwidth
TTML
MUX
4 X 155 Mbps
on SeMeWe3 Cable
STM-16 Ring
Goal is to Move to
10 Gbps
on SeMeWe4
Express
MUX
JAPAN LAND
STANDING
SINGAPORE LANDING STATION
STM-64 Ring
TTML
JAPAN
EAC
EAC
Mux
MUX
EAC Tokyo
Backhaul
ANC Comspace
VSNL
EAC
Mux
VSNL Shinagawa
PoP
TGN
Mux
Dark Fibre
LL
TGN
Mux
Mux
Towers
TTML
Sparked Planning for
Juniper
STM-16
Ring Generation
a Next
R&EM10
with STM-4
Network in India
TTML
MUX
TIFR Mumbai, INDIA
interface
INTERFACE TYPES
STM 4
OR
Foundry
BI15000 with
OC-12
interface
INTERFACE TYPES
OC-12
Dark
Fibre
LL
Mux
NTT Otemachi Bldg, JAPAN
+ Onward to US, Europe 
The HEP Community: Network
Progress and Impact
 The national, continental and transoceanic networks used
by HEP and other fields of DIS are moving to the
N X 10G range
 Growth rate much faster than Moore’s Law
 40 – 100G Tests; ; Canada moving to first N X 40G network
 “Dark Fiber”-based, hybrid networks, owned and/or
operated by the R&E community are emerging, and
fostering rapid progress, in a growing list of nations:
 ca, nl, us, jp, kr; pl, cz, fr, br, no, cn, pt, ie, gr, sk, si, …
 HEP is learning to use long range networks effectively
 7-10 Gbps TCP flows over 10-30 kkm; 151 Gbps Record
 Fast Data Transport Java Application: 1-1.8 Gbps per
1U Node, disk to disk; i.e. 40-70 Gbps per 40U rack
Working to Close the Digital Divide, for
Science, Education and Economic Development
 HEP groups in US, EU, Japan, Korea, Brazil, Russia are
working with int’l R&E networks and advanced net projects,
and Grid Organizations; helping by
 Monitoring connectivity worldwide to/from HEP groups
and other sites (SLAC’s IEPM project)
 Co-developing and deploying next generation
Optical nets, monitoring and management systems
 Developing high throughput tools and systems
 Adapting the tools & best practices for broad use
in the science and Grid communities
 Providing education and training in state of the art
technologies & methods
 A Long Road Ahead Remains:
Eastern Europe, Central & SE Asia, India, Pakistan, Africa
Extra Slides Follow
SCIC Main Focus Since 2002
As we progress we are in danger of leaving the
communities in the less-favored regions of the
world behind
We must Work to Close the Digital Divide
To make physicists from all world regions full
partners in the scientific discoveries
This is essential for the health of our global
collaborations, and our field
Digital Divide Illustrated by Network
Infrastructures: TERENA Core Capacity
Source: www.terena.nl
Uzbe kistan UzSciNe t
Albania (ANA)
Jordan (NITC)
Core capacity goes
up in Leaps:
1 to 2 to N X 10 Gbps;
1-2.5 to 10 Gbps;
0.6-1 to 2.5 Gbps
Ky rgy zstan (KRENA)
Se rbia/Monte ne gro AMREJ)
Ukraine (URAN)
Iran (IRANET)
Bulgaria (IST)
Croatia (CARNe t)
Morocco
Turke y (ULAKBIM)
Russian Fe de ration (RBNe t)
Israe l (IUCC)
Ge orgia (GRENA)
Alge ria (CERIST)
Estonia (EENET)
Lux e m bourg (RESTENA)
Lithuania (LITNET)
Curre nt Core Capacity
Ex pe cte d Incre ase in Tw o Y e a
Moldov a
Latv ia (LANET-2)
Rom ania (RoEduNe t)
By
~2007
SE Europe, Medit., FSU, Mid East:
Slower Progress With Older
Technologies (10-622 Mbps).
Digital Divide Will Not Be Closed
Slov e nia (ARNES)
Norw ay (UNINETT)
De nm ark (UNI.C)
Ire land (HEAne t)
Slov akia (SANET)
Ice land (RHne t)
Austria (ACOne t)
Hungary (HUNGARNET)
Italy (GARR)
Finland (FUNET)

Sw itze rland (SWITCH)
Gre e ce (GRNET)
N X 10G
Lambdas
Cze ch Re public (CESNET)





France (RENATER)
Spain (Re dIRIS)
Poland (PIONIER)
Current
Portugal (FCCN)
Be lgium (BELNET)
Ge rm any (DFN)
Unite d Kingdom
Ne the rlands SURFne t)
Sw e de n (SUNET)
1
1M
10
10M
100
100M
1000
1G
Core Capacity (in Mb/s)
10000
10G
20G
1000
SURFNet6 in the Netherlands
5300 km of Owned Dark Fiber
Optical Layer: 5 Rings
Up to 72 Wavelengths
Support for HEP,
Radioastronomers
Medical Research
K. Neggers
4 Years Ago:
4 Mbps was the highest
bandwidth link in Slovakia
HENP Bandwidth Roadmap for Major
Links (in Gbps): Updated 12/06
Year
Production
Experimental
Remarks
2001
2002
0.155
0.622
0.622-2.5
2.5
SONET/SDH
2003
2.5
10
DWDM; 1 + 10 GigE
Integration
2005-6
10-20
2-4 X 10
l Switch;
l Provisioning
2007-8
30-40
1st Gen. l Grids
2009-10
60-80
~100 or
2 X 40 Gbps
~5 X 40 or
~2 X 100
~25 X 40 or
~10 X 100

SONET/SDH
DWDM; GigE Integ.
40 or 100 Gbps l
Switching
2 Gen l Grids
~5 X 40 or
Terabit Networks
~2 X 100
~Fill One Fiber
2013-14
~Terabit
~MultiTbps
Continuing Trend: ~400 Times Bandwidth Growth Per Decade
Paralleled by ESnet Roadmap for Data Intensive Sciences
2011-12
nd
Internet2 Land Speed Records &
SC2003-2005 Records
 SC2003-5: 23, 101, 151 Gbps
 SC2006: FDT app.: Stable disk-todisk at 16+ Gbps on one 10G link
7.2G X
kkm
7.21 20.7
Gbps
20675 km
160
120
4.2 Gbps
16343km
100
80
60
40
20
4
Nov0
Jun 04
0
Throuhgput
Throughput (Gbps)
(Petabit-m/sec)
140
6.6 Gbps
16500km
Ap r04
3
No v0
03
Oct-
Feb 03
2
No v0
Ap r02
 IPv4 Multi-stream record
Internet2
LSRs:
Internet2
LSR - Single IPv4 TCP
stream
Blue = HEP
6.86 Gbps X 27kkm: Nov 2004
 PCI-X 2.0: 9.3 Gbps CaltechStarLight: Dec 2005
5.6 Gbps
 PCI Express: 9.8 Gbps Caltech –
10949km
5.4 Gbps
Sunnyvale, July 2006
2.5 Gbps 7067km
0.9 Gbps 10037km
 Concentrate now on reliable
0.4 Gbps 10978km
12272km
Terabyte-scale file transfers
 Disk-to-disk Marks:
536 Mbytes/sec (Windows);
500 Mbytes/sec (Linux)
System Issues: PCI Bus,
Network Interfaces, Disk I/O
Controllers, Linux kernel,CPU
Fast Data Transport Across the
WAN: Solid 10.0 Gbps
“12G + 12G”

FDT – Fast Data Transport
A New Application for Efficient Data Transfers
 Capable of reading and writing at disk speed over wide area networks
(with standard TCP) for the first time
 Highly portable and easy to use: runs on all major platforms.
 Based on an asynchronous, flexible multithreaded system, using
the Java NIO libraries, that:
 Streams a dataset (list of files) continuously, from a managed pool
of buffers in kernel space, through an open TCP socket
 Smooth flow of data from each disk
 No protocol start phase between files
 Uses independent threads to read and write on each physical device
 Transfers data in parallel on multiple TCP streams, when necessary
 Uses appropriate-sized buffers for disk I/O and for the network
 Restores the files from buffers asynchronously
 Resumes a file transfer session without loss, when needed
 Memory to memory ( /dev/zero to /dev/null ),
using two 1U systems with Myrinet 10GbE
PCI Express NIC cards
 Tampa-Caltech (RTT 103 msec):
10.0 Gbps Stable: indefinitely
 Long range WAN Path (CERN – Chicago
– New York – Chicago – CERN VLAN,
RTT 240 msec) ~8.5 Gbps
 Disk to Disk: performs very
close to the limit for the disk
or network speed
1U Disk server at CERN sending
data to a 1U server Caltech (each
with 4 SATA disks):

~0.85 TB/hr per rack unit =
9 Gbytes/sec per rack

FDT Test Results 11/14-11/15
10.0 Gbps
Overnight
FDT Test Results (2) 11/14-11/15
 Stable disk-to-disk flows Tampa-Caltech:  Cisco 6509E Counters:
Stepping up to 10-to-10 and 8-to-8 1U
 16 Gbps disk traffic and
Server-pairs 9 + 7 = 16 Gbps; then
Solid overnight
13+ Gbps FLR memory
traffic
 Maxing out the 20 Gbps
Etherchannel (802.3ad)
between our two Cisco
switches ?
L-Store: File System Interface to
Global Storage
 Provides a file system interface to (globally)
distributed storage devices (“depots”)
 Parallelism for high performance and reliability
 Uses IBP (from UTenn) for data transfer & storage
service


Generic, high performance, wide-area-capable
storage virtualization service; transport plug-in support
Write: break file into blocks, upload blocks
simultaneously to multiple depots (reverse for reads)
 Multiple metadata servers increase performance
& fault tolerance
 L-Store supports beyond-RAID6-equivalent
encoding of stored files for reliability and fault
L-Store Performance
 Multiple simultaneous writes to 24 depots

Each depot is a 3 TB disk server in a 1U case
 30 clients on separate systems uploading files
 Rate has scaled linearly as depots added:
3 Gbytes/sec so far; Continuing to add
 REDDnet deployment of 167 depots can sustain 25
3 GB/s
GB/s
30 Mins
Science Network Requirements Aggregation Summary
Science Drivers
Science Areas /
Facilities
End2End
Reliability
Advanced
Light
Source
-
Bioinformatics
-
Chem./
Combustion
-
Climate
Science
-
High
Energy
Physics
(LHC)
Connectivity
Today
5 years
End2End
End2End
Band
Band width
width
Traffic
Characteristics
• DOE sites
• US Universities
• Industry
1 TB/day
300 Mbps
• DOE sites
• US Universities
625 Mbps
12.5 Gbps
in two yrs
250 Gbps • Bulk data
• Remote
• DOE sites
• US Universities
• Industry
-
• DOE sites
• US Universities
• Int’l
-
Tens of • Bulk data
Gigabits/
second
5 PB/year • Bulk data
• Remote
5 Gbps
5 TB/day
1.5 Gbps
control
control
• Point-tomultipoint
control
Immediate Requirements
10 Gbps
60 to 80
99.95+% • US Tier1 (FNAL,
BNL)
(< 4 hrs • US Tier2
per
• International
(Europe, Canada)
year)
• Bulk data
• Remote
Gbps
(30-40
Gbps per
US Tier1)
• Bulk data
• Coupled
computational
processes
W. Johnston
ESnet
Network Services
• Guaranteed
bandwidth
• PKI / Grid
• Guaranteed
bandwidth
• High-speed
multicast
• Guaranteed
bandwidth
• PKI / Grid
Guaranteed
bandwidth
• PKI / Grid
• Guaranteed
bandwidth
• Traffic isolation
• PKI / Grid
Data Samples and Transport Scenarios
107 Event
Samples
AOD
RECO
Data
Transfer
Volume Time (Hrs)
(TBytes) @ 0.9 Gbps
0.5-1
2.5 - 5
RAW+RECO 17.5 - 21
MC
20
1.2 – 2.5
6 - 12
43 - 86
98
Transfer
Time (Hrs)
@ 3 Gbps
Transfer
Time (Hrs)
@ 8 Gbps
0.37-0.74
1.8 – 3.7
0.14 – 0.28
0.69 – 1.4
13 - 26
30
4.8 – 9.6
11
 107 Events: A typical data sample for analysis or reconstruction
development [Ref.: MONARC]; Equivalent to just ~1 day’s running
 Can only transmit ~2 RAW + REC or MC such samples/day on a 10G path
 Movement of ~108 event samples (e.g. refreshing disk caches on T2s;
re-distribution after re-reconstruction) would take ~1 day (RECO) to
~1 week (RAW, MC) with a 10G link running at high occupancy
 Requirements Outlook for 2008 – 2009 (Including peak needs)
 10-40 Gbps per Tier1 [US: underway; Europe: OPN at 10 Gbps]
 2-10 Gbps per Tier2 [US: 10 Gbps links underway]
US LHC
NWG
US LHC Network WG Issues
US LHC Network Working Group Meeting
23-24 October 2006 - FNAL
US LHC
NWG
US LHC Network Working Group
Production Network Issues (1/2)
 Implementing, maintaining and evolving the Roadmap
 Bandwidth and cost evolution
 Funding
 Which links can we rely on for mission-oriented use,
and backup ?
 Business Model: AUPs, policies and costs (?) related to peerings,
permitted flows, preferred flows, limits of use, etc.
 The old business model of general shared infrastructures
may well not apply
 We need to establish the necessary TransAtlantic
partnerships ?
 Developing a coordinated set of milestones for Production
Networking
 Incorporating necessary testing and integration steps
US LHC
NWG
US LHC Network Working Group
Production Network Issues (2/2)
 Operational Model: developed in concert with the evolving
Computing Models
Specifying T0/T1, T1/T1, T1/T2, T2/T2 and Other network
usage scenarios
Priorities and limits of network resource usage for
various classes of activities
Policies and allowed data flow paths
 Authorized peerings, routing and transit traffic
(e.g. Tier1/Tier2, Tier2/Tier2, Tier2/Tier3, Tier4/Tier4
flows; US LHCNet/GEANT peering)
 Access to and methods of implementing preferred
paths: e.g. routing with diffserv; layer2 VLANs;
VCAT/LCAS end-to-end circuit-oriented paths
 Other mechanisms (e.g. T2/T1/T1’/T2’ store
and forward ?)
US LHC
NWG
US LHC Network Working Group
Network Development Issues
 Developing a coherent Network Development plan, supporting
the Production Networking plan
Incorporating the necessary R&D activities
and milestones
 System-level Infrastructures
Authentication, Authorization, Accounting
Problem reporting, tracking and mitigation
End-to-end monitoring and tracking, of networks
and end-systems (e.g. PerfSONAR; MonALISA)
Multi-domain circuit-oriented path construction
 Automation to assist with operations and first-line management
of the network and end-systems:
Configurations
Error trapping, reporting, diagnosis, mitigation
Deploying agents for diagnosis, problem mitigation,
and optimization of operations
[to discuss]
US LHC
NWG
US LHC Network Working Group
Network Integration Issues
 Integrating operations among networks:
USLHCNet, Internet2, NLR, ESnet, GEANT; and
HEP Lab site networks: CERN, FNAL, BNL, SLAC, LBL, ANL, …
 Integrating with the experiments’ production software stacks for
dataset distribution and transport
(e.g. Phedex)
 Integrating with grid software stacks, especially where they include
transport services linking storage systems
(e.g. dcache/SRM)
 Inserting modern network-aware data transport and configurationoptimization tools into the above stacks, as they are developed
US LHC
NWG
US LHC Network WG: Long-Term
Mission-Driven Bandwidth Issues
 LHC bandwidth usage pattern is non-statistical; over-provisioning
of general IP backbone infrastructures will not meet the need
This also applies to US Tier2s. Eventually Tier3s ?
 Bandwidth exchange and (significant) backup is difficult between
mission-oriented and general purpose networks
Operational match (e.g. large backup on demand) ?
 Are there costs for peerings, and preferred traffic flows,
beyond the costs for dedicated links per se
What is the cost/charging model ?
[Will this change by 2010 in next-generation general nets ?]
 Need to plan for sufficient bandwidth (and equipment), in different
cost-scenarios; explore alternatives.
[Note that current pricing information is well-understood]
Some NRENs still charge a lot 1GbE links, or less:
e.g. Russia, Portugal
 Funding outlook and source(s)