Slides - TERENA Networking Conference 2004

Download Report

Transcript Slides - TERENA Networking Conference 2004

Networking Challenges to Deploying
a Worldwide LHC Computing Grid
10 June 2004
David Foster
&2004
Olivier
Martin, CERN
TNC
- Rhodes
Terena Networking Conference, Rhodes
1
CERN’s Member States
New Members
Founding Members
Austria (1959)
Bulgaria (1999)
Czech FR (1993)
Finland (1991)
Hungary (1992)
Poland (1991)
Portugal (1985)
Spain (1961)
Slovak FR (1993)
Belgium – Denmark
France – Germany
Greece – Italy
Norway – Sweden
Switzerland
The Netherlands
United Kingdom
Yugoslavia (left 1961)
10 June 2004
TNC 2004 - Rhodes
Slide 2
CERN in Numbers
• The organisation has 2’400 staff members
– Around 6500 scientists join CERN each year
• CERN has 475 collaborating institutes
– 267 in Europe and 208 around the world
• CERN’s budget is around 1’000 million CHF
– IT dept’s budget is 6% of the total - 60M CHF
• As of 2007, the LHC will conduct four experiments:
ALICE
10 June 2004
LHCb
TNC 2004 - Rhodes
CMS
ATLAS
Slide 3
LHC – the Large Hadron Collider
27 Km of magnets
with a field of 8.4 Tesla
Super-fluid Helium
cooled to 1.9°K
Two countercirculating proton
beams
Collision
energy 7 + 7 TeV
The world’s largest super-conducting structure
10 June 2004
TNC 2004 - Rhodes
Slide 4
The Large Hadron Collider
Requirements for data storage and analysis
4 large detectors
CMS
Storage –
ATLAS
Raw recording rate 0.1 – 1 GByte/sec
Accumulating data at 10-14 PetaBytes/year
~ 20 million CDs each year
10 PetaBytes of disk
LHCb
Processing –
100,000 of today’s fastest PCs
10 June 2004
TNC 2004 - Rhodes
Slide 5
Computing Challenges:
Petabyes, Petaflops, Global VOs
 Geographical dispersion: of people and resources
 Complexity: the detector and the LHC environment
 Scale:
Tens of Petabytes per year of data
5000+ Physicists
~500 Institutes
60+ Countries
Major challenges associated with:
Communication and collaboration at a distance
Managing globally distributed computing & data resources
Cooperative software development and physics analysis
New Forms of Distributed
Systems: Data Grids
TNC 2004 - Rhodes
CERN’s LHC Needs Very High
Aggregate Requirements for…
Computational Power
Roughly 100’000 of today’s fastest PCs
Data Storage
10-15 Petabytes a year (roughly 20 million CDs a year)
The Problem
CERN can only provide a
fraction of the necessary resources
10 June 2004
TNC 2004 - Rhodes
Slide 7
The Solution
Computing centres, which were isolated in the past,
will now be connected, uniting the computing resources of
collaborating institutions in the world using GRID technologies
This Creates an Additional Requirement
Network Throughput
10 to 40 Gbps network links between LHC centres
Thus, computational power & data storage become distributed,
using a very high speed network infra-structure world-wide
10 June 2004
TNC 2004 - Rhodes
Slide 8
CERN External Networking
Main Internet Connections
Mission Oriented
General Purpose
European A&R
connectivity
WHO
IN2P3
45Mbps
1Gbps
Swiss National
Research Network
GEANT
10Gbps
10Gbps
SWITCH
1Gbps
CERN
CIXP
2.5Gbps
USLIC
10Gbps
2.5Gbps
ATRIUM
VTHD / FR
10 June 2004
10Gbps
•
DataTAG
NetherLight
CERN
Internet
Exchange
Point
•
Gen. Purp. North
American A&R
Connectivity
(combined with DataTAG)
Network Research
TNC 2004 - Rhodes
Slide 9
LHC Data Grid Hierarchy
CERN/Outside Resource Ratio ~1:2
Tier0/( Tier1)/( Tier2)
~1:1:1
~PByte/sec
Online System
Experiment
~100-400
MBytes/sec
Tier 0 +1
10 Gbps
Tier 1
IN2P3 Center
INFN Center
RAL Center
Tier 2
Tier 3
~2.5 Gbps
InstituteInstitute Institute
~0.25TIPS
Physics data cache
Workstations
CERN 700k SI95
~1 PB Disk;
Tape Robot
Institute
0.1–1 Gbps
Tier 4
FNAL: 200k
SI95; 600 TB
2.5/10 Gbps
Tier2 Center
Tier2 Center
Tier2 Center
Tier2 Center
Tier2 Center
Physicists work on analysis “channels”
Each institute has ~10 physicists
working on one or more channels
TNC 2004 - Rhodes
Deploying the LHC Grid
Lab m
Uni x
grid for a
regional group
Uni a
CERN Tier 1
Lab a
UK
USA
Tier3
physics
department
France
The LHC
Tier
1
Computing
Tier2
Uni n
Centre
Italy
CERN Tier 0
Japan
Desktop
Lab b


10 June 2004
Taipei?
Germany
Lab c
[email protected]

grid for a
physics
study group
Uni y
Uni b
TNC 2004 Rhodes
Slide 11
What you
may get!
Lab m
Uni x
Uni a
CERN Tier 1
Lab a
UK
USA
physics
department
France
Tier 1
Tier2
Uni n
CERN Tier 0
Italy
Japan
physicist
……….
Lab b
Lab c


10 June 2004
Germany
Uni y
Uni b
TNC 2004 - Rhodes
Slide 12
[email protected]

Modes of Use
• Connectivity requirements are composed of a
number of “modes”
– “Buffered real-time” for the T0 to T1 raw data
transfer.
– “Peer Services” between the T1-T1 and T1-T2 for
the background distribution of data products.
– “Chaotic” for the submission of analysis jobs to T1
and T2 centers that may imply some “on-demand”
data transfer.
10 June 2004
TNC 2004 Rhodes
Slide 13
T0 – T1 Buffered Real Time Estimates
MB/Sec
Fermilab
Brookhaven
Karlsruhe
106.87
0.00
173.53
106.87
106.87
106.87
106.87
707.87
71.67
71.67
0.00
71.67
71.67
71.67
71.67
430.00
101.41
0.00
0.00
101.41
101.41
101.41
0.00
405.63
6.80
0.00
0.00
6.80
6.80
6.80
6.80
34.00
T1 Totals MB/sec
286.74
71.67
173.53
286.74
286.74
286.74
185.33
1577.49
T1 Totals Gb/sec
2.29
0.57
1.39
2.29
2.29
2.29
1.48
12.62
6.88
1.72
4.16
6.88
6.88
6.88
4.45
37.86
10.00
10.00
10.00
10.00
70.00
ATLAS
CMS
ALICE
LHCb
RAL
IN2P3
CNAF
PIC (Barcelona)
T0 Total
Estimated T1 Bandwidth Needed
(Totals * 1.5(headroom))*2(capacity)
Assumed
Bandwidth Provisioned
10 June 2004
10.00
10.00 TNC 2004 10.00
Rhodes
Slide 14
Some Milestones
Midyear
2004
2005
2006
Endyear
10Gbit “end-to-end” tests with Fermilab
First version of the LHC Community Network
proposal
10Gbit “end-to-end” test complete with European Partner
Measure performance variability and understand H/W and S/W Issues to ALL sites.
Document circuit switched options and costs, first real test if possible.
Circuit/Packet switch design completed.
LHC Community network proposal completed.
All T1 Fabric architecture documents completed.
LCG TDR completed
Sustained throughput test achieved to some sites: 2-4 Gb/sec for 2 months. H/W and S/W problems
solved.
All CERN b/w provisioned.
All T1 bandwidth in production (10Gb links)
Sustained throughput tests achieved to most sites.
Verified performance to all sites for at least 2 months.
Experiments are executing “Data Challenges” to exercise the software chains and, increasingly, the whole end-end
infrastructure
10 June 2004
TNC 2004 Rhodes
Slide 15
Main Networking Challenges
(1)
Fulfill the, yet unproven, assertion that the network can be « nearly »
transparent to the Grid
Deploy suitable Wide Area Network infrastructure (50-100 Gb/s)
Deploy suitable Local Area Network infrastructure (matching or
exceeding that of the WAN)
Seamless interconnection of LAN & WAN infrastructures
firewall?
End to End issues (transport protocols, PCs, NICs, etc)
where are we today:
memory to memory: 6.5Gb/s
disk to disk: 400MB (Linux), 1.2MB (Windows 2003
server/NewiSys)
10 June 2004
TNC 2004 Rhodes
Slide 16
Main TCP issues
Does not scale to some environments
– High speed, high latency
– Noisy
Unfair behaviour with respect to:
– RTT
– MSS
– Bandwidth
Widespread use of multiple streams in order to compensate for
inherent TCP/IP limitations (e.g. Gridftp, BBftp):
– Bandage rather than a cure
New TCP/IP proposals in order to restore performance in single
stream environments
– Not clear if/when it will have a real impact
– In the mean time, packet loss free infrastructures are
required, preferably without packet re-ordering!
10 June 2004
TNC 2004 Rhodes
Slide 17
TCP dynamics
(10Gbps, 100ms RTT, 1500Bytes
packets)
Window size (W) = Bandwidth*Round Trip Time
– Wbits = 10Gbps*100ms = 1Gb
– Wpackets = 1Gb/(8*1500) = 83333 packets
Standard Additive Increase Multiplicative Decrease
(AIMD) mechanisms:
– W=W/2 (halving the congestion window on loss event)
– W=W + 1 (increasing congestion window by one
packet every RTT)
Time to recover from W/2 to W (congestion
avoidance) at 1 packet per RTT:
– RTT*Wp/2 = 1.157 hour
– In practice, 1 packet per 2 RTT because of delayed
acks, i.e. 2.31 hour
Packets per second:
– RTT*Wpackets = 833’333 packets
10 June 2004
TNC 2004 Rhodes
Slide 18
10G DataTAG testbed extension
to Telecom World 2003 and Abilene/Cenic
On September 15, 2003, the DataTAG
project was the first transatlantic testbed
offering direct 10GigE access using Juniper’s
VPN layer2/10GigE emulation.
10 June 2004
TNC 2004 Rhodes
Slide 19
Internet2 landspeed record history
(IPv4 & IPv6) period 2000-2003
Internet2 landspeed record history
(in terabit-meters/second)
Evolution of the I2LSR in Gigabit/second
70000
6.000
60000
5.000
50000
4.000
40000
3.000
30000
IPv4 terabit-meters/second)
IPv6 (terabit-meters/second)
IPv4 (Gb/s)
IPv6 (Gb/s)
2.000
20000
1.000
10000
0.000
0
Month
Mar-00
Apr-02
Sep-02
Oct-02
Nov-02
Month
Feb-03
May-03
Oct-03
Nov-03
Nov-03
Month
Mar-00
Apr-02
Sep-02
Oct-02
Nov-02
Month
Feb-03
May-03
Oct-03
Nov-03
Nov-03
Impact of a single multiGb/s flow on the Abilene
backbone
20
Final DataTAG Review, 24 March 2004
20
Internet2 landspeed record
history (IPv4 & IPv6)
EVOLUTION OF THE I2LSR IN GIGABITS/SECOND
8.000
7.000
Gigabits/second
6.000
5.000
4.000
IPv4 (Gb/s) single stream
IPv4 (Gb/s) multiple streams
IPv6 (Gb/s) single stream
IPv6 (Gb/s) multiple streams
3.000
2.000
1.000
Apr-04
IPv4 (Gb/s) single stream
EVOLUTION OF THE I2LSR IN TERABITS-METERS/SECOND
MONTH
May-04
Nov-03
Feb-04
Oct-03
Nov-03
Feb-03
IPv6 (Gb/s) single stream
May-03
Oct-02
Nov-02
Apr-02
Sep-02
Month
Mar-00
0.000
Category
80000
70000
TERABIT-METERS/SECOND
60000
50000
40000
IPv4 terabit-meters/second)
IPv6 (terabit-meters/second)
30000
20000
10000
Impact of a single multiGb/s flow on the Abilene
backbone
0
Month Mar-00 Apr-02 Sep-02 Oct-02 Nov-02 Feb-03 May- Oct-03 Nov-03 Nov-03 Feb-04 Apr-04 May03
04
MONTH
Final DataTAG Review, 24 March 2004
21
21
Additional Challenges
• Real bandwidth estimates given the chaotic nature
of the requirements.
• End-end performance given the whole chain
involved
– (disk-bus-memory-bus-network-bus-memory-busdisk)
• Provisioning over complex network infrastructures
(GEANT, NREN’s etc)
• Cost model for options (packet+SLA’s, circuit
switched etc)
• Consistent Performance (dealing with firewalls)
• Merging leading edge research with production
networking
10 June 2004
TNC 2004 Rhodes
Slide 22
Layer1/2/3 networking (1)
Conventional layer 3 technology is no longer fashionable
because of:
– High associated costs, e.g. 200/300 KUSD for a 10G router interfaces
– Implied use of shared backbones
The use of layer 1 or layer 2 technology is very attractive
because it helps to solve a number of problems, e.g.
– 1500 bytes Ethernet frame size (layer1)
– Protocol transparency (layer1 & layer2)
– Minimum functionality hence, in theory, much lower costs (layer1&2)
10 June 2004
TNC 2004 Rhodes
Slide 23
Layer1/2/3 networking (2)
« Lambda Grids » are becoming very popular:
– Pros:
• circuit oriented model like the telephone network, hence no need for complex
transport protocols
• Lower equipment costs (i.e. « in theory » a factor 2 or 3 per layer)
• the concept of a dedicated end to end light path is very elegant
– Cons:
• « End to end » still very loosely defined, i.e. site to site, cluster to cluster or
really host to host
• Higher circuit costs, Scalability, Additional middleware to deal with circuit
set up/tear down, etc
• Extending dynamic VLAN functionality is a potential nightmare!
10 June 2004
TNC 2004 Rhodes
Slide 24
« Lambda Grids »
What does it mean?
Clearly different things to different people, hence the apparently easy
consensus!
Conservatively, on demand « site to site » connectivity
– Where is the innovation?
– What does it solve in terms of transport protocols?
– Where are the savings?
• Less interfaces needed (customer) but more standby/idle circuits needed (provider)
• Economics from the service provider vs the customer perspective?
– Traditionally, switched services have been very expensive,
» Usage vs flat charge
» Break even, switches vs leased, few hours/day
» Why would this change?
• In case there are no savings, why bother?
More advanced, cluster to cluster
– Implies even more active circuits in parallel
• Is it realistic?
Even more advanced, Host to Host
– All optical
– Is it realisitic?
10 June 2004
TNC 2004 Rhodes
Slide 25
Thank You!
10 June 2004
TNC 2004 Rhodes
Slide 26