International Networks and the US

Download Report

Transcript International Networks and the US

Global Lambdas and Grids for
Particle Physics in the LHC Era
Harvey B. Newman
California Institute of Technology
SC2005
Seattle, November 14-18 2005
Beyond the SM: Great Questions of
Particle Physics and Cosmology
1. Where does the pattern of
particle families and masses
come from ?
2. Where are the Higgs particles;
what is the mysterious Higgs
field ?
3. Why do neutrinos and quarks
oscillate ?
4. Is Nature Supersymmetric ?
5. Why is any matter left in the
universe ?
6. Why is gravity so weak?
7. Are there extra space-time
dimensions?
You Are Here.
Other elements
Neutrinos
Stars
Free H and He
Dark matter
Dark energy
0.03%
0.3%
0.5%
4%
23%
72%
We do not know what makes
up 95% of the universe.
Large Hadron Collider
CERN, Geneva: 2007 Start
 pp s =14 TeV L=1034 cm-2 s-1
 27 km Tunnel in Switzerland & France
CMS
TOTEM
Atlas
pp, general
purpose; HI
5000+ Physicists
250+ Institutes
60+ Countries
ALICE : HI
LHCb: B-physics
Higgs,
SUSY,Analyze
Extra Dimensions,
Violation,
QG
Plasma, …
Challenges:
petabytes ofCP
complex
data
cooperatively
Harnessthe
global
computing, data & network resources
Unexpected
LHC Data Grid Hierarchy
CERN/Outside Resource Ratio ~1:2
Tier0/( Tier1)/( Tier2)
~1:1:1
~PByte/sec
~150-1500
MBytes/sec
Online System
Experiment
CERN Center
PBs of Disk;
Tape Robot
Tier 0 +1
Tier 1
10 - 40 Gbps
IN2P3 Center
INFN Center
RAL Center
FNAL Center
~10 Gbps
Tier 2
Tier 3
Tier2 Center
Tier2 Center
Tier2 Center
Tier2 CenterTier2 Center
~1-10 Gbps
Institute Institute
Physics data
cache
Workstations
Institute
Institute
Tens of Petabytes by 2007-8.
An Exabyte ~5-7 Years later.
1 to 10 Gbps
Tier 4
Emerging Vision: A Richly Structured, Global Dynamic System
Long Term Trends in Network
300-1000X/10Yrs
ESnet Monthly Accepted Traffic Through
Traffic
Volumes:
May, 2005
ESnet Accepted Traffic 1990 – 2005
Exponential Growth: +82%/Year for
the Last 15 Years; 400X Per Decade
W. Johnston
Apr, 05
Sep, 04
Jul, 03
Feb, 04
Dec, 02
Oct, 01
May,02
Mar, 01
Aug, 00
Jun, 99
Jan, 00
Apr, 98
Nov, 98
Sep, 97
Jul, 96
Feb, 97
Dec, 95
Oct, 94
May, 95
Mar, 94
Aug, 93
Jun, 92
 SLAC
Jan, 93
0
Progress
in Steps
Nov, 91
100
100
Apr, 91
200
200
Feb, 90
300
300
Sep, 90
TByte/Month
400
400
10 Gbit/s
TERABYTES Per Month
600
600
500
500
R. Cottrell
Traffic Growth in Steps:
~10X/4 Years.
 Projected: ~2 Terabits/s by ~2014
 “Summer”
‘05: 2x10 Gbps links:
one for production, one for R&D
Internet 2 Land Speed Record (LSR)
NB: Manufacturers’ Roadmaps for 2006:
One Server Pair to One 10G Link
100
80
5.4 Gbps
2.5 Gbps 7067km
0.9 Gbps 10037km
0.4 Gbps 10978km
12272km
60
40
20
Nov04
Jun 04
Ap r04
3
No v0
Oct-0
3
0
Feb 03
End System Issues: PCI-X Bus,
Linux Kernel, NIC Drivers, CPU
120
4.2 Gbps
5.6 Gbps 16343km
10949km
2
 Disk-to-disk Marks:
536 Mbytes/sec (Windows);
500 Mbytes/sec (Linux)
140
6.6 Gbps
16500km
No v0
 IPv6 record: 5.11 Gbps between
Geneva and Starlight: Jan. 2005
160
Nov. 2004 Record Network
Throughput (Gbps)
7.21 Gbps
20675 km
Throuhgput
(Petabit-m/sec)
7.2G X 20.7 kkm
Internet2 LSR - Single IPv4 TCP stream
Ap r02
 IPv4 Multi-stream record with
FAST TCP: 6.86 Gbps X 27kkm:
Nov 2004
Internet2 LSRs:
Blue = HEP
HENP Bandwidth Roadmap
for Major Links (in Gbps)
Year
Production
Experimental
Remarks
2001
2002
0.155
0.622
0.622-2.5
2.5
SONET/SDH
2003
2.5
10
DWDM; 1 + 10 GigE
Integration
2005
10
2-4 X 10
 Switch;
 Provisioning
2007
2-4 X 10
~10 X 10;
40 Gbps
~5 X 40 or
~20-50 X 10
~25 X 40 or
~100 X 10
1st Gen.  Grids
SONET/SDH
DWDM; GigE Integ.
40 Gbps 
~10 X 10
Switching
or 1-2 X 40
2nd Gen  Grids
2011
~5 X 40 or
Terabit Networks
~20 X 10
~Fill One Fiber
2013
~Terabit
~MultiTbps
Continuing Trend: ~1000 Times Bandwidth Growth Per Decade;
HEP: Co-Developer as well as Application Driver of Global Nets
2009
LHCNet , ESnet Plan 2006-2009:
20-80Gbps US-CERN, ESnet MANs, IRNC
AsiaPac
SEA
Europe
Europe
Aus.
ESnet
2nd Core:
30-50G
SNV
BNL
Japan
Japan
CHI
LHCNet US-CERN:
Wavelength Triangle
10/05: 10G CHI + 10G NY
2007: 20G + 20G
2009: ~40G + 40G
NYC
DEN
Metro
Rings
GEANT2
SURFNet
IN2P3
DC
FNAL
IRNC Links
Aus.
SDG
ALB
ESnet IP Core
≥10 Gbps
ATL
CERN
ELP
ESnet hubs
New ESnet hubs
Metropolitan Area Rings
Major DOE Office of Science Sites
High-speed cross connects with Internet2/Abilene
Production IP ESnet core, 10 Gbps enterprise IP traffic
Science Data Network core, 40-60 Gbps circuit transport
Lab supplied
Major international
LHCNet Data Network
NSF/IRNC circuit; GVA-AMS connection via Surfnet or Geant2
10Gb/s
10Gb/s
30Gb/s
2 x 10Gb/s
LHCNet
Data Network
(2 to 8 x 10 Gbps
US-CERN)
ESNet MANs to FNAL & BNL;
Dark fiber (60Gbps) to FNAL
Global Lambdas for Particle Physics
Caltech/CACR and FNAL/SLAC Booths
 Preview global-scale data analysis of the LHC Era
(2007-2020+), using next-generation networks and intelligent
grid systems
 Using state of the art WAN infrastructure and Grid-based
Web service frameworks, based on the LHC Tiered
Data Grid Architecture
 Using a realistic mixture of streams: organized transfer of
multi-TB event datasets, plus numerous smaller flows of
physics data that absorb the remaining capacity.
 The analysis software suites are based on the Grid-enabled
Analysis Environment (GAE) developed at Caltech and U.
Florida, as well as Xrootd from SLAC, and dcache from FNAL
 Monitored by Caltech’s MonALISA global monitoring
and control system
Global Lambdas for Particle Physics
Caltech/CACR and FNAL/SLAC Booths
 We used Twenty Two [*] 10 Gbps waves to carry bidirectional
traffic between Fermilab, Caltech, SLAC, BNL, CERN and other
partner Grid Service sites including: Michigan, Florida,
Manchester, Rio de Janeiro (UERJ) and Sao Paulo (UNESP)
in Brazil, Korea (KNU), and Japan (KEK)
 Results
 151 Gbps peak, 100+ Gbps of throughput sustained for hours:
475 Terabytes of physics data transported in < 24 hours
 131 Gbps measured by SCInet bwc team on 17 of our waves
 Using real physics applications and production as well as test
systems for data access, transport and analysis:
bbcp, xrootd, dcache, and gridftp; and grid analysis tool suites
 Linux kernel for TCP-based protocols, including Caltech’s FAST
 Far surpassing our previous SC2004 BWC Record
of 101 Gbps
Monitoring NLR,
Abilene/HOPI,
LHCNet, USNet,
TeraGrid, PWave,
SCInet, Gloriad,
JGN2, WHREN,
other Int’l R&E
Nets, and
14000+
Grid Nodes
Simultaneously
I. Legrand
Switch and Server Interconnections
at the Caltech Booth (#428)
 15 10G Waves
 72 nodes with
280+ Cores
 64 10G Switch
Ports: 2 Fully
Populated
Cisco 6509Es
 45 Neterion
10 GbE NICs
 200 SATA
Disks
 40 Gbps
(20 HBAs)
to StorCloud
 Thursday –
Sunday Setup
http://monalisa-ul.caltech.edu:8080/stats?page=nodeinfo_sys
Fermilab
 Our BWC data sources
are the Production
Storage Systems and
File Servers used by:
 CDF
 DØ
 US CMS Tier 1
 Sloan Digital
Sky Survey
 Each of these produces,
stores and moves MultiTB to PB-scale data:
Tens of TB per day
 ~600 gridftp servers
(of 1000s) directly
involved
bbcp ramdisk to ramdisk transfer (CERN to Chicago)
(3 TBytes of Physics Data transferred in 2 Hours)
16MB window, 2 streams
440000
430000
kBytes/sec
420000
410000
400000
390000
380000
370000
1
39 77 115 153 191 229 267 305 343 381 419 457 495 533 571 609 647 685 723 761 799 837 875 913 951 989 1027
Units of 5 seconds
Xrootd Server Performance
A. Hanushevsky
 Scientific Results
100
 Ad hoc Analysis of Multi-
Netw ork I/O in MB/Sec
percent CPU remaining
events/sec processed 40000
80
30000
60
20000
40
10000
20
0
0
50 100 150 200 250 300 350 400
Number of Concurrent Jobs
Excellent Across WANs
Events/sec
%cpu or MB/sec
Single Server Linear Scaling
TByte Archives
 Immediate exploration
 Spurs novel discovery
approaches
 Linear Scaling
 Hardware Performance
 Deterministic Sizing
 High Capacity
 Thousands of clients
 Hundreds of Parallel
Streams
 Very Low Latency
 12us + Transfer Cost
 Device + NIC Limited
Xrootd Clustering
Unbounded Clustering
Xrootd Clustering
Self organizing
A
Client
open file X
B
Total Fault Tolerance
Data
Servers
D
Automatic realtime
E
reorganization
go to C
C
go to F
open file X
Redirector
(Head Node)
Supervisor
(sub-redirector)
Result
F
Cluster
Client sees all servers as xrootd data servers
Minimum Admin
Overhead
Better Client
CPU Utilization
More results in
Remote
Sites: Caltech, UFL, Brazil…..
ROOT
Analysis
ROOT
Analysis
GAE Services
GAE Services
GAE Services
ROOT
Analysis
Authenticated users automatically
discover, and initiate multiple
transfers of physics datasets (Root
files) through secure Clarens based
GAE services.
Transfer is monitored through
MonALISA
Once data arrives at the target sites
(remote) analysis can start by
authenticated users, using the Root
analysis framework.
Using the Clarens Root viewer or
COJAC event viewer data from remote
can be presented transparently to the
user.
SC|05 Abilene and HOPI Waves
GLORIAD: 10 Gbps Optical Ring
Around the Globe by March 2007
GLORIAD Circuits Today
 10 Gbps Hong Kong-Daejon-
Seattle
 10 Gbps Seattle-Chicago-NYC
(CANARIE contribution to
GLORIAD)
 622 Mbps Moscow-AMS-NYC
 2.5 Gbps Moscow-AMS
 155 Mbps Beijing-Khabarovsk-
Moscow
China, Russia, Korea, Japan,
US, Netherlands Partnership
US: NSF IRNC Program
 2.5 Gbps Beijing-Hong Kong
 1 GbE NYC-Chicago
(CANARIE)
ESLEA/UKLight SC|05
Network Diagram
6 X 1 GE
OC-192
KNU (Korea) Main Goals
Uses 10Gbps GLORIAD
link from Korea to US,
which is called BIGGLORIAD, also part of
UltraLight
 Try to saturate this
BIG-GLORIAD link with
servers and cluster
storages connected
with 10Gbps
 Korea is planning to
be a Tier-1 site for
LHC experiments

Korea BIG-GLORIAD
U.S.
KEK (Japan) at SC05
10GE Switches on the
KEK-JGN2-StarLight Path
JGN2: 10G Network
Research Testbed
• Operational since 4/04
• 10Gbps L2 between
Tsukuba and Tokyo
Otemachi
• 10Gbps IP to Starlight since
August 2004
• 10Gbps L2 to Starlight since
September 2005
Otemachi–Chicago OC192 link
replaced by 10GE WANPHY
in September 2005
Brazil HEPGrid:
Rio de Janeiro (UERJ)
and Sao Paulo (UNESP)
“Global Lambdas for Particle Physics”
A Worldwide Network & Grid Experiment
 We have Previewed the IT Challenges of Next Generation Science
at the High Energy Frontier (for the LHC and other major programs)
 Petabyte-scale datasets
 Tens of national and transoceanic links at 10 Gbps (and up)
 100+ Gbps aggregate data transport sustained for hours;
We reached a Petabyte/day transport rate for real physics data
 We set the scale and learned to gauge the difficulty of the global
networks and transport systems required for the LHC mission
 But we set up, shook down and successfully ran the system in <1 week
 We have substantive take-aways from this marathon exercise
 An optimized Linux (2.6.12 + FAST + NFSv4) kernel for data transport;
after 7 full kernel-build cycles in 4 days
 A newly optimized application-level copy program, bbcp, that
matches the performance of iperf under some conditions
 Extension of Xrootd, an optimized low-latency file access application for
clusters, across the wide area
 Understanding of the limits of 10 Gbps-capable systems under stress
“Global Lambdas for Particle Physics”
A Worldwide Network & Grid Experiment
 We are grateful to our many network partners: SCInet,
LHCNet, Starlight, NLR, Internet2’s Abilene and HOPI, ESnet,
UltraScience Net, MiLR, FLR, CENIC, Pacific Wave, UKLight,
TeraGrid, Gloriad, AMPATH, RNP, ANSP, CANARIE and JGN2.
 And to our partner projects: US CMS, US ATLAS, D0, CDF,
BaBar, US LHCNet, UltraLight, LambdaStation, Terapaths,
PPDG, GriPhyN/iVDGL, LHCNet, StorCloud, SLAC IEPM,
ICFA/SCIC and Open Science Grid
 Our Supporting Agencies: DOE and NSF
 And for the generosity of our vendor supporters, especially
Cisco Systems, Neterion, HP, IBM, and many others, who have
made this possible
 And the Hudson Bay Fan Company…
Extra Slides
Follow
Global Lambdas for Particle Physics Analysis
SC|05 Bandwidth Challenge Entry
Caltech, CERN, Fermilab, Florida,
Manchester, Michigan, SLAC, Vanderbilt,
Brazil, Korea, Japan, et al
CERN's Large Hadron Collider experiments:
Data/Compute/Network Intensive
Discovering the Higgs, SuperSymmetry, or
Extra Space-Dimensions - with a Global Grid
Worldwide Collaborations of Physicists
Working Together; while
Developing Next-generation Global Network
and Grid Systems
Analysis
Sandbox
3rd party
application
Catalog
Storage
datasets
Service
Clarens
(ACL, X509,
Discovery)
Web server
XML-RPC
SOAP
Java RMI
JSON RPC
Clarens
Client
http/
https
Network
 Authentication
 Access control on Web
Services.
 Remote file access
(and access control on
files).
 Discovery of Web
Services and Software.
 Shell service. Shell like
access to remote
machines (managed by
access control lists).
 Proxy certificate
functionality
 Virtual Organization
management and role
management.
User's point of access to a Grid system.
Provides environment where user can:
Access Grid resources and services.
Execute and monitor Grid applications.
Collaborate with other users.
One stop shop for Grid needs
Start (remote)
select
analysis
dataset
Portals can lower the barrier for users to
access Web Services and using Grid enabled
applications