20080122-barczyk

Download Report

Transcript 20080122-barczyk

US LHC
NWG
Dynamic Circuit Services
in
US LHCNet
Artur Barczyk, Caltech
Joint Techs Workshop
Honolulu, 01/23/2008
US LHC
NWG
US LHCNet Overview
Mission oriented network:
Provide trans-Atlantic network infrastructure
to support the US LHC program
SARA
Starlight
CERN
Manlan
Four PoPs:
 CERN
 Starlight (→ Fermilab)
 Manlan (→ Brookhaven)
2008: 30 (40) Gbps trans-Atlantic bandwidth
(roadmap: 80 Gbps by 2010)
 SARA
Large Hadron Collider @ CERN
Start in 2008
US LHC
NWG
 pp s =14 TeV L=1034 cm-2 s-1
 27 km Tunnel in Switzerland & France
6000+ Physicists &
Engineers
250+ Institutes
60+ Countries
Atlas
LHCb
ALICE
CMS
Higgs,
SUSY,Analyze
Extra Dimensions,
Violation,
QG
Plasma, …
Challenges:
petabytes ofCP
complex
data
cooperatively
Harnessthe
global
computing, data & network resources
Unexpected
The LHC Data Grid Hierarchy
US LHC
NWG
CERN/Outside Ratio ~1:4 T0/(T1)/(T2) ~1:2:2
~40% of Resources in Tier2s
US T1s and T2s Connect to US LHCNet PoPs
Online
GEANT2+NRENS
Germany T1
USLHCNet + ESnet
BNL T1
10 – 40
Gbps
10 Gbps
Outside/CERN Ratio Larger; Expanded Role of
Emerging Vision: A Richly Structured, Global Dynamic System
Tier1s & Tier2s: Greater Reliance on Networks
The Roles of Tier Centers
US LHC
NWG
11 Tier1s, over 100 Tier2s
→ LHC Computing will be more dynamic & network-oriented
Defines the dynamism
of data transfers
Prompt calibration and
alignment
Reconstruction
Store complete set of RAW data
Reprocessing
Store part of
processed data
Monte Carlo
Production
Physics Analysis
Tier 0
(CERN)
Requirements for
Dynamic Circuit
Services in US LHCNet
Physics Analysis
Tier 1
Tier 1
Tier 2
Tier 3
CMS Data Transfer Volume
(May – Aug. 2007)
10 PetaBytes transferred
Over 4 Mos. = 8.0 Gbps Avg.
(15 Gbps Peak)
US LHC
NWG
End-system capabilities growing
US LHC
NWG
40 G In
40 G Out
88 Gbps Peak; 80+ Gbps
Sustainable for Hours,
Storage-to-Storage
Managed Data Transfers
US LHC
NWG
 The scale of the problem and the capabilities of the end-systems require a
managed approach with scheduled data transfer requests
 The dynamism of the data transfers defines the requirements for
scheduling
 Tier0 → Tier1, linked to duty cycle of the LHC
 Tier1 → Tier1, whenever data sets are reprocessed
 Tier1 → Tier2, distribute data sets for analysis
 Tier2 → Tier1, distribute MC produced data
 Transfer Classes
 Fixed allocation
 Preemptible transfers
 Best effort
All of this will happen
“on demand” from
Experiment’s Data
Management systems
 Priorities
 Preemption
 Use LCAS to squeeze low(er) priority circuits
 Interact with End-Systems
 Verify and monitor capabilities
Needs to work end-to-end:
collaboration in GLIF,
DICE
Managed Network Services
Operations Scenario
US LHC
NWG
 Receive request, check capabilities, schedule network resources
 “Transfer N Gigabytes from A to B with target throughput R1”
 Authenticate/authorize/prioritize
 Verify end-host rate capabilities R2 (achievable rate)
 Schedule bandwidth B > R2; estimate time to complete T(0)
 Schedule path with priorities P(i) on segment S(i)
 Check progress periodically
 Compare rate R(t) to R2, update time to complete T(i) to T(i-1)
 Trigger on behaviours requiring further action
 Error (e.g. segment failure)
 Performance issues (e.g. poor progress, channel underutilized, long
waits)
 State change (e.g. new high priority transfer submitted)
 Respond dynamically: to match policies and optimize throughput
 Change channel size(s)
 Build alternative path(s)
 Create new channel(s) and squeeze others in class
Managed Network Services:
End-System Integration
US LHC
NWG
Required for a robust end-to-end production system
 Integration of network services and end-systems
 Requires end-to-end view of the network and end-systems, real-time
monitoring
 Robust, real-time and scalable messaging infrastructure
 Information extraction and correlation
 e.g. network state, end-host state, transfer queues-state
 Obtain via network services  end-host agent (EHA) interactions
 Provide sufficient information for decision support
 Cooperation of EHAs and network services
 Automate some operational decisions using accumulated experience
 Increase level of automation to respond to: increases in usage,
number of users, and competition for scarce network resources
Lightpaths in US LHCNet domain
US LHC
NWG
Dynamic setup and reservation of lightpaths has been successfully
demonstrated by the VINCI project controlling optical switches
Control Plane
Data Plane
(Virtual Intelligent Networks for Computing Infrastructures in Physics)
Planned Interfaces
US LHC
NWG
 Most, if not all, LHC data transfers will cross more than one domain
 E.g. in order to transfer data from CERN to Fermilab:
 CERN → US LHCNet → ESnet → Fermilab
 VINCI Control Plane for intra-domain,
 DCN (DICE/GLIF) IDC for inter-domain provisioning
I-NNI:
VINCI (custom)
protocols
UNI:
DCN IDC?
LambdaStation?
TeraPaths?
E-NNI:
UNI:
Web Services
(DCN IDC)
VINCI custom protocol,
client = EHA
Protection Schemes
US LHC
NWG
 Mesh-protection at Layer 1
 US LHCNet links are assigned to
primary users
 CERN – Starlight for CMS
 CERN – Manlan for Atlas
 In case of link failure cannot blindly
use bandwidth belonging to the other
collaboration
 Carefully choose protection links,
e.g. use the indirect path (CERNSARA-Manlan)
 Designated Transit Lists, and DTL-
Sets
 High-level protection features
implemented in VINCI
 Re-provision lower priority circuits
 Preemption, LCAS
Needs to work end-to-end:
collaboration in GLIF,
DICE
Basic Functionality To-Date
US LHC
NWG
 Semi-automatic intra-domain circuit
provisioning
 Bandwidth adjustment (LCAS)
 End-host tuning US
by LHCNet
the End-Host Agent
 End-to-End monitoring
routers
Pre-production (R&D) setup:
Local domain: routing of private
IP subnets onto tagged VLANs
Core network (TDM): VLAN based
Virtual Circuits
Ultralight
routers
Ciena
CoreDirectors
High
performance
servers
14
MonALISA: Monitoring the
US LHCNet Ciena CDCI Network
SARA
USLHCnet
Starlight
Manlan
CERN
Geneva
US LHC
NWG
Roadmap Ahead
 The current capabilities include
 End-to-End monitoring
 Intra-domain circuit provisioning
 End-host tuning by the End-Host Agent
 Towards a production system (intra-domain)
 Integrate existing end-host agent, monitoring and measurement




services
Provide a uniform user/application interface
 Integration with experiments’ Data Management Systems
Automated fault handling
Priority-based transfer scheduling
Include Authorisation, Authentication and Accounting
 Towards a production system (inter-domain)
 Interface to DCN IDC
 Work with DICE, GLIF on IDC protocol specification
 Topology exchange, routing, end-to-end path calculation
 Extend AAA infrastructure to multi-domain
US LHC
NWG
Summary and Conclusions
US LHC
NWG
 Movement of LHC data will be highly dynamic
 Follow LHC data grid hierarchy
 Different data sets (size, transfer speed and duration), different priorities
 Data Management requires network-awareness
 Guaranteed bandwidth end-to-end (storage-system to storage-system)
 End-to-end monitoring including end-systems
 We are developing the intra-domain control plane for US
LHCNet
 VINCI project, based on MonALISA framework
 Many services and agents are already developed or in advanced state
 Use Internet2’s IDC protocol for inter-domain provisioning
 Collaboration with Internet2, ESNet, LambdaStation, Terapaths on end-to-end
circuit provisioning