Transcript title

HENP Grids and Networks
Global Virtual Organizations
Harvey B Newman
FAST Meeting, Caltech
July 1, 2002
http://l3www.cern.ch/~newman/HENPGridsNets_FAST070202.ppt
Computing Challenges:
Petabyes, Petaflops, Global VOs
 Geographical dispersion: of people and resources
 Complexity: the detector and the LHC environment
 Scale:
Tens of Petabytes per year of data
5000+ Physicists
250+ Institutes
60+ Countries
Major challenges associated with:
Communication and collaboration at a distance
Managing globally distributed computing & data resources
Cooperative software development and physics analysis
New Forms of Distributed Systems: Data Grids
Four LHC Experiments: The
Petabyte to Exabyte Challenge
ATLAS, CMS, ALICE, LHCB
Higgs + New particles; Quark-Gluon Plasma; CP Violation
Data stored
~40 Petabytes/Year and UP;
CPU
0.30 Petaflops and UP
0.1 to
1
Exabyte (1 EB = 1018 Bytes)
(2007)
(~2012 ?) for the LHC Experiments
LHC: Higgs Decay into 4 muons
(Tracker only); 1000X LEP Data Rate
(+30 minimum bias events)
All charged tracks with pt > 2 GeV
Reconstructed tracks with pt > 25 GeV
109 events/sec, selectivity: 1 in 1013 (1 person in a thousand world populations)
LHC Data Grid Hierarchy
CERN/Outside Resource Ratio ~1:2
Tier0/( Tier1)/( Tier2)
~1:1:1
~PByte/sec
Online System
Experiment
~100-400
MBytes/sec
Tier 0 +1
~2.5-10 Gbps
Tier 1
IN2P3 Center
INFN Center
RAL Center
Tier 2
Tier 3
~2.5-10 Gbps
InstituteInstitute Institute
~0.25TIPS
Physics data cache
Workstations
CERN 700k SI95
~1 PB Disk;
Tape Robot
FNAL: 200k
SI95; 600 TB
2.5-10 Gbps
Tier2 Center
Tier2 Center
Tier2 Center
Tier2 Center
Tier2 Center
Institute
0.1–10 Gbps
Tier 4
Physicists work on analysis “channels”
Each institute has ~10 physicists
working on one or more channels
Emerging Data Grid
User Communities
 Grid Physics Network (GriPhyN)
 ATLAS, CMS, LIGO, SDSS
 Particle Physics Data Grid (PPDG)







Int’l Virtual Data Grid Lab (iVDGL)
NSF Network for Earthquake Engineering
Simulation (NEES)
 Integrated instrumentation,
collaboration, simulation
Access Grid; VRVS: supporting
group-based collaboration
And
Genomics, Proteomics, ...
The Earth System Grid and EOSDIS
Federating Brain Data
Computed MicroTomography …
Virtual Observatories
HENP Related Data Grid
Projects
Projects
 PPDG I
USA DOE
$2M
1999-2001
 GriPhyN
USA NSF $11.9M + $1.6M
2000-2005
 EU DataGrid EU
EC
€10M
2001-2004
 PPDG II (CP) USA DOE
$9.5M
2001-2004
 iVDGL
USA NSF
$13.7M + $2M
2001-2006
 DataTAG
EU
EC
€4M
2002-2004
 GridPP
UK
PPARC >$15M
2001-2004
 LCG (Ph1)
CERN MS
30 MCHF
2002-2004
Many Other Projects of interest to HENP
 Initiatives in US, UK, Italy, France, NL, Germany, Japan, …
 Networking initiatives: DataTAG, AMPATH, CALREN-XD…
 US Distributed Terascale Facility:
($53M, 12 TeraFlops, 40 Gb/s network)
Daily, Weekly, Monthly and Yearly
Statistics on 155 Mbps US-CERN Link
20 - 100 Mbps Used Routinely in ’01 BW Upgrades Quickly Followed
by Upgraded Production Use
BaBar: 600 Mbps Throughput in ‘02
Tier A
"Physicists have indeed
foreseen to test the GRID
principles starting first
from the Computing
Centres in Lyon and
Stanford (California). A
first step towards the
ubiquity of the GRID."
Two centers are trying to work as one:
-Data not duplicated
-Internationalization
-transparent access, etc…
CERN-US Line + Abilene
3/2002
Pierre Le Hir
Le Monde
12 april 2001
D. Linglin: LCG Wkshop
Renater + ESnet
RNP Brazil (to 20 Mbps)
FIU Miami/So. America (to 80 Mbps)
Transatlantic Net WG (HN, L. Price)
Bandwidth Requirements [*]

2001 2002
2003
2004
2005
2006
CMS
100
200
300
600
800
2500
ATLAS
50
100
300
600
800
2500
BaBar
300
600
2300
3000
CDF
100
300
2000
3000
6000
D0
400
1600 2400 3200
6400
8000
BTeV
20
40
100
200
300
500
DESY
100
180
210
240
270
300
1100 1600
400
CERN 155- 622 2500 5000 10000 20000
BW
310
[*] Installed BW. Maximum Link Occupancy 50% Assumed
See http://gate.hep.anl.gov/lprice/TAN
MONARC: CMS Analysis Process
Hierarchy of Processes (Experiment, Analysis Groups,Individuals)
3000
SI95sec/event
1 job year
RAW Data
ExperimentWide Activity
(109 events)
Monte Carlo
Reconstruction
5000
SI95sec/event
~20 Groups’
Activity
(109  107 events)
~25 Individual
per Group
Activity
(106 –107 events)
Selection
3000
SI95sec/event
3 jobs per year
Re-processing
3 Times per year
New detector
calibrations
Or understanding
25 SI95sec/event
~20 jobs per
month
Iterative selection Trigger based and
Physics based
Once per month
refinements
10 SI95sec/event
~500 jobs per day
Analysis
Different Physics cuts
Algorithms
& MC comparison
applied to data
~Once per day
to get results
Tier0-Tier1 Link Requirements
Estimate: for Hoffmann Report 2001
1) Tier1  Tier0 Data Flow for Analysis
2) Tier2  Tier0 Data Flow for Analysis
3) Interactive Collaborative Sessions (30 Peak)
4) Remote Interactive Sessions (30 Flows Peak)
5) Individual (Tier3 or Tier4) data transfers
Limit to 10 Flows of 5 Mbytes/sec each
TOTAL Per Tier0 - Tier1 Link
0.5 - 1.0 Gbps
0.2 - 0.5 Gbps
0.1 - 0.3 Gbps
0.1 - 0.2 Gbps
0.8 Gbps
1.7 - 2.8 Gbps
NOTE:
Adopted by the LHC Experiments; given in the Steering
Committee Report on LHC Computing: “1.5 - 3 Gbps per
experiment”
Corresponds to ~10 Gbps Baseline BW Installed
on US-CERN Link
Report also discussed the effects of higher bandwidths
 For example all-optical 10 Gbps Ethernet + WAN by 2002-3
Tier0-Tier1 BW Requirements
Estimate: for Hoffmann Report 2001
 Does Not Include more recent ATLAS Data Estimates
270 Hz at 1033 Instead of 100Hz
400 Hz at 1034 Instead of 100Hz
2 MB/Event Instead of 1 MB/Event ?
 Does Not Allow Fast Download to Tier3+4
of “Small” Object Collections
Example: Download 107 Events of AODs (104 Bytes)  100
Gbytes; At 5 Mbytes/sec per person (above) that’s 6 Hours !
 This is a still a rough, bottoms-up, static, and hence
Conservative Model.
A Dynamic distributed DB or “Grid” system with Caching,
Co-scheduling, and Pre-Emptive data movement
may well require greater bandwidth
Does Not Include “Virtual Data” operations;
Derived Data Copies; Data-description overheads
Further MONARC Model Studies are Needed
Maximum Throughput on
Transatlantic Links (155 Mbps)
*





8/10/01
9/1/01
11/5/01
1/09/02
3/11/02
105 Mbps reached with 30 Streams: SLAC-IN2P3
102 Mbps in One Stream: CIT-CERN
125 Mbps in One Stream (modified kernel): CIT-CERN
190 Mbps for One stream shared on 2 155 Mbps links
120 Mbps Disk-to-Disk with One Stream on 155 Mbps
link (Chicago-CERN)
 5/20/02 450 Mbps SLAC-Manchester on OC12 with ~100 Streams
 6/1/02
290 Mbps Chicago-CERN One Stream on OC12 (mod. Kernel)
Also see http://www-iepm.slac.stanford.edu/monitoring/bulk/;
and the Internet2 E2E Initiative: http://www.internet2.edu/e2e
Some Recent Events:
Reported 6/1/02 to ICFA/SCIC
 Progress in High Throughput: 0.1 to 1 Gbps
 Land Speed Record: SURFNet – Alaska (IPv6)
(0.4+ Gbps)
 SLAC – Manchester (Les C. and Richard H-J)
(0.4+ Gbps)
 Tsunami (Indiana) (0.8 Gbps UDP)
 Tokyo – KEK (0.5 – 0.9 Gbps)
 Progress in Pre-Production and Production Networking
 10 Mbytes/sec FNAL-CERN (Michael Ernst)
 15 Mbytes/sec disk-to-disk Chicago-CERN
(Sylvain Ravot)
 KPNQwest files for Chapter 11; Stops network yesterday.
 Near Term Pricing of Competitor (DT) ok.
 Unknown impact on prices and future planning
in the medium and longer term
Baseline BW for the US-CERN Link:
HENP Transatlantic WG (DOE+NSF)
Transoceanic
Networking
Integrated with
the Abilene,
TeraGrid,
Regional Nets
and Continental
Network
Infrastructures
in US, Europe,
Asia, South
America



Link Bandwidth (Mbps)
20000
15000
Baseline evolution typical
of major HENP
links 2001-2006
10000
5000
0
FY2001 FY2002 FY2003 FY2004 FY2005 FY2006
BW (Mbps)
310
622
2500
5000
10000
20000
US-CERN Link: 622 Mbps this month
DataTAG 2.5 Gbps Research Link in Summer 2002
10 Gbps Research Link by Approx. Mid-2003
Total U.S. Internet Traffic
100 Pbps
Limit of same % GDP as
10 Pbps
Voice
1 Pbps
100Tbps
New Measurements
10Tbps
1Tbps
100Gbps
Projected at 3/Year
Voice Crossover: August 2000
10Gbps
1Gbps
ARPA & NSF Data to
100Mbps
10Mbps
96
4X/Year
2.8X/Year
1Mbps
100Kbps
10Kbps
1Kbps
100 bps
10 bps
1970
1975
1980
1985
1990
1995
2000
2005
2010
U.S. Internet Traffic
Source: Roberts et al., 2001
Internet Growth Rate
Fluctuates Over Time
U.S. Internet Edge Traffic Growth Rate
6 Month Lagging Measure
Growth Rate per Year
4.50
4.00
10/00–4/01 Growth
Reported 3.6/year
10/00–4/01 Growth
Reported 4.0/year
3.50
3.00
2.50
2.00
Average: 3.0/year
1.50
1.00
0.50
0.00
Jan 00 Apr 00 Jul 00 Oct 00 Jan 01 Apr 01 Jul 01 Oct 01 Jan 02
Source: Roberts et al.,
2002
AMS-IX Internet Exchange Throughput
Accelerating Growth in Europe (NL)
Monthly Traffic
2X Growth from 8/00 - 3/01;
2X Growth from 8/01 - 12/01
Hourly Traffic
3/22/02
6.0 Gbps
4.0 Gbps
2.0 Gbps
↓
ICFA SCIC Meeting March 9
at CERN: Updates from Members
 Abilene Upgrade from 2.5 to 10 Gbps
 Additional scheduled lambdas planned for targeted
for targeted applications: Pacific and National Light Rail
 US-CERN
 Upgrade On Track: to 622 Mbps in July;
Setup and Testing Done in STARLIGHT
 2.5G Research Lambda by this Summer: STARLIGHT-CERN
 2.5G Triangle between STARLIGHT (US), SURFNet (NL),
CERN
 SLAC + IN2P3 (BaBar)
 Getting 100 Mbps over 155 Mbps CERN-US Link
 50 Mbps Over RENATER 155 Mbps Link, Limited by ESnet
 600 Mbps Throughput is BaBar Target for this Year
 FNAL
 Expect ESnet Upgrade to 622 Mbps this Month
 Plans for dark fiber to STARLIGHT underway, could be
done in ~4 Months; Railway or Electric Co. provider
ICFA SCIC: A&R Backbone and
International Link Progress
 GEANT Pan-European Backbone (http://www.dante.net/geant)
 Now interconnects 31 countries
 Includes many trunks at 2.5 and 10 Gbps
 UK
 2.5 Gbps NY-London, with 622 Mbps to ESnet and Abilene
 SuperSINET (Japan): 10 Gbps IP and 10 Gbps Wavelength
 Upgrade to Two 0.6 Gbps Links, to Chicago and Seattle
 Plan upgrade to 2 X 2.5 Gbps Connection to
US West Coast by 2003
 CA*net4 (Canada): Interconnect customer-owned dark fiber
nets across Canada at 10 Gbps, starting July 2002
 “Lambda-Grids” by ~2004-5
 GWIN (Germany): Connection to Abilene Upgraded
to 2 X 2.5 Gbps early in 2002
 Russia
 Start 10 Mbps link to CERN and ~90 Mbps to US Now
2.510 Gbps Backbone
210 Primary Participants
All 50 States, D.C. and Puerto Rico
80 Partner Corporations and Non-Profits
22 State Research and Education Nets
15 “GigaPoPs” Support 70% of Members
Caltech Connection with GbE
to New Backbone
National R&E Network Example
Germany: DFN TransAtlanticConnectivity
Q1 2002
 2 X OC12 Now: NY-Hamburg
and NY-Frankfurt
 ESNet peering at 34 Mbps
 Upgrade to 2 X OC48 expected
in Q1 2002
 Direct Peering to Abilene and
Canarie expected
 UCAID will add another 2 OC48’s;
Proposing a Global Terabit
Research Network (GTRN)
 FSU Connections via satellite:
STM 16
Yerevan, Minsk, Almaty, Baikal
 Speeds of 32 - 512 kbps
 SILK Project (2002): NATO funding
 Links to Caucasus and Central
Asia (8 Countries)
Currently 64-512 kbps
Propose VSAT for 10-50 X BW:
NATO + State Funding
National Research Networks
in Japan
 SuperSINET
 Started operation January 4, 2002
 Support for 5 important areas:
NIFS
IP
Nagoya U
HEP, Genetics, Nano-Technology,
Space/Astronomy, GRIDs
Nagoya
 Provides 10 ’s:
 10 Gbps IP connection
 7 Direct intersite GbE links Osaka
Osaka U
 Some connections to
10 GbE in JFY2002
Kyoto U
 HEPnet-J
 Will be re-constructed with ICR
MPLS-VPN in SuperSINET Kyoto-U
NIG
WDM path
IP router
OXC
Tohoku
U
KEK
Tokyo
NII Chiba
NII
Hitot.
ISAS

Proposal: Two TransPacific
2.5 Gbps Wavelengths, and
Japan-CERN Grid Testbed by ~2003
U Tokyo
Internet
NAO
IMS
U-Tokyo
DataTAG Project
NewYork
ABILEN
E
UK
SuperJANET4
It
GARR-B
STARLIGHT
ESNET
GENEVA
GEANT
NL
SURFnet
STAR-TAP
CALRE
N
Fr
Renater
 EU-Solicited Project. CERN, PPARC (UK), Amsterdam (NL), and INFN (IT);
and US (DOE/NSF: UIC, NWU and Caltech) partners
 Main Aims:
 Ensure maximum interoperability between US and EU Grid Projects
 Transatlantic Testbed for advanced network research
 2.5 Gbps Wavelength Triangle 7/02 (10 Gbps Triangle in 2003)
TeraGrid (www.teragrid.org)
NCSA, ANL, SDSC, Caltech
A Preview of the Grid Hierarchy
and Networks of the LHC Era
Abilene
Chicago
Indianapolis
Urbana
Caltech
San Diego
UIC
I-WIRE
OC-48 (2.5 Gb/s, Abilene)
Multiple 10 GbE (Qwest)
Multiple 10 GbE
(I-WIRE Dark Fiber)
Idea to extend the TeraGrid to CERN
ANL
Starlight / NW Univ
Multiple Carrier Hubs
Ill Inst of Tech
Univ of Chicago
Indianapolis
(Abilene
NCSA/UIUC
NOC)
Source: Charlie Catlett, Argonne
CA ONI, CALREN-XD + Pacific Light
Rail Backbones (Proposed)
Also:
LA-Caltech
Metro Fiber;
National
Light Rail
Key Network Issues &
Challenges
Net Infrastructure Requirements for High Throughput
 Packet Loss must be ~Zero (at and below 10-6)
 I.e. No “Commodity” networks
 Need to track down uncongested packet loss
 No Local infrastructure bottlenecks
 Multiple Gigabit Ethernet “clear paths” between
selected host pairs are needed now
 To 10 Gbps Ethernet paths by 2003 or 2004
 TCP/IP stack configuration and tuning Absolutely Required
 Large Windows; Possibly Multiple Streams
 New Concepts of Fair Use Must then be Developed
 Careful Router, Server, Client, Interface configuration
 Sufficient CPU, I/O and NIC throughput sufficient
 End-to-end monitoring and tracking of performance
 Close collaboration with local and “regional” network staffs
TCP Does Not Scale to the 1-10 Gbps Range
A Short List: Revolutions in
Information Technology (2002-7)
 Managed Global Data Grids (As Above)
 Scalable Data-Intensive Metro and Long Haul
Network Technologies
 DWDM: 10 Gbps then 40 Gbps per ;
1 to 10 Terabits/sec per fiber
 10 Gigabit Ethernet (See www.10gea.org)
10GbE / 10 Gbps LAN/WAN integration
 Metro Buildout and Optical Cross Connects
 Dynamic Provisioning  Dynamic Path Building
 “Lambda Grids”
 Defeating the “Last Mile” Problem
(Wireless; or Ethernet in the First Mile)
 3G and 4G Wireless Broadband (from ca. 2003);
and/or Fixed Wireless “Hotspots”
 Fiber to the Home
 Community-Owned Networks
A Short List: Coming Revolutions
in Information Technology
 Storage Virtualization
 Grid-enabled Storage Resource Middleware (SRM)
 iSCSI (Internet Small Computer Storage Interface);
Integrated with 10 GbE  Global File Systems
 Internet Information Software Technologies
 Global Information “Broadcast” Architecture
E.g the Multipoint Information Distribution
Protocol ([email protected])
 Programmable Coordinated Agent Architectures
E.g. Mobile Agent Reactive Spaces (MARS)
by Cabri et al., University of Modena
 The “Data Grid” - Human Interface
 Interactive monitoring and control of Grid resources
By authorized groups and individuals
By Autonomous Agents
HENP Major Links: Bandwidth
Roadmap (Scenario) in Gbps
Year
Production
Experimental
Remarks
2001
0.155
0.622-2.5
SONET/SDH
2002
0.622
2.5
SONET/SDH
DWDM; GigE Integ.
2003
2.5
10
DWDM; 1 + 10 GigE
Integration
2005
10
2-4 X 10
 Switch;
 Provisioning
2007
2-4 X 10
~10 X 10;
40 Gbps
1st Gen.  Grids
2008
~10 X 10
or 1-2 X 40
~5 X 40 or
~20-50 X 10
40 Gbps 
Switching
2010
~5 X 40 or
~20 X 10
~25 X 40 or
~100 X 10
2nd Gen  Grids
Terabit Networks
~Terabit
~MultiTerabit
~Fill One Fiber or
Use a Few Fibers
2012
One Long Range Scenario (Ca. 200812)
HENP As a Driver of Optical Networks
Petascale Grids with TB Transactions
 Problem: Extract “Small” Data Subsets of 1 to 100 Terabytes






from 1 to 1000 Petabyte Data Stores
Survivability of the HENP Global Grid System, with
hundreds of such transactions per day (circa 2007)
requires that each transaction be completed in a
relatively short time.
Example: Take 800 secs to complete the transaction. Then
Transaction Size (TB)
Net Throughput (Gbps)
1
10
10
100
100
1000 (Capacity of
Fiber Today)
 Summary: Providing Switching of 10 Gbps wavelengths
within ~3 years; and Terabit Switching within 5-10 years
would enable “Petascale Grids with Terabyte transactions”,
as required to fully realize the discovery potential of major
HENP programs, as well as other data-intensive fields.
Internet2 HENP WG [*]
 Mission: To help ensure that the required
National and international network infrastructures
(end-to-end)
Standardized tools and facilities for high performance
and end-to-end monitoring and tracking, and
Collaborative systems

are developed and deployed in a timely manner, and used
effectively to meet the needs of the US LHC and other major
HENP Programs, as well as the at-large scientific community.
To carry out these developments in a way that is
broadly applicable across many fields
 Formed an Internet2 WG as a suitable framework:
Oct. 26 2001

[*] Co-Chairs: S. McKee (Michigan), H. Newman (Caltech);
Sec’y J. Williams (Indiana)
 Website: http://www.internet2.edu/henp; also see the Internet2
End-to-end Initiative: http://www.internet2.edu/e2e
True End to End Experience
 User perception
 Application
 Operating system
 Host IP stack
 Host network card
 Local Area Network
 Campus backbone
network
 Campus link to regional
network/GigaPoP
 GigaPoP link to Internet2
national backbones
 International
connections
EYEBALL
APPLICATION
STACK
JACK
NETWORK
...
...
...
...
HENP Scenario Limitations:
Technologies and Costs
 Router Technology and Costs
(Ports and Backplane)
 Computer CPU, Disk and I/O Channel
Speeds to Send and Receive Data
 Link Costs: Unless Dark Fiber (?)
 MultiGigabit Transmission Protocols
End-to-End
 “100 GbE” Ethernet (or something else) by
~2006: for LANs to match WAN speeds
Throughput quality improvements:
BWTCP < MSS/(RTT*sqrt(loss)) [*]
80% Improvement/Year
 Factor of 10 In 4 Years
Eastern Europe
Far Behind
China Improves
But Far Behind
[*] See “Macroscopic Behavior of the TCP Congestion Avoidance Algorithm,”
Matthis, Semke, Mahdavi, Ott, Computer Communication Review 27(3), 7/1997
11900 Hosts;
6620 Registered Users
in 61 Countries
43 (7 I2) Reflectors
Annual Growth 2 to 3X
Networks, Grids and HENP
 Next generation 10 Gbps network backbones are
almost here: in the US, Europe and Japan
 First stages arriving, starting now
 Major transoceanic links at 2.5 - 10 Gbps in 2002-3
 Network improvements are especially needed in
Southeast Europe, So. America; and some other regions:
 Romania, Brazil; India, Pakistan, China; Africa
 Removing regional, last mile bottlenecks and
compromises in network quality are now
All on the critical path
 Getting high (reliable; Grid) application performance
across networks means!
 End-to-end monitoring; a coherent approach
 Getting high performance (TCP) toolkits in users’ hands
 Working in concert with AMPATH, Internet E2E, I2
HENP WG, DataTAG; the Grid projects and the GGF