20050503-HENP-McKee

Download Report

Transcript 20050503-HENP-McKee

Richard Cavanaugh
University of Florida
HENP SIG
Spring Internet2 Meeting
03.05.2005
R. Cavanaugh, HENP SIG, Spring
Internet2 Meeting
1
Outline
•
•
•
•
•
The Project
Science Drivers
UltraLight Network
Grid-enabled Analysis
Summary
03.05.2005
R. Cavanaugh, HENP SIG, Spring
Internet2 Meeting
2
The Project
• UltraLight is
– A four year $2M NSF ITR funded by MPS
– Application driven Network R&D
• Two Primary, Synergistic Activities
– Network “Backbone”: Perform network R&D /
engineering
– Applications “Driver”: System Services R&D /
engineering
• Ultimate goal : Enable physics analysis and
discoveries which could not otherwise be
achieved
03.05.2005
R. Cavanaugh, HENP SIG, Spring
Internet2 Meeting
3
NSF/PHY-EPP: Caltech, UF, UM, FIU, FNAL,
SLAC et al. Partnership
• A New Class of Integrated Information Systems
– Includes the Network as an Actively Managed Resource
– Based on a “Hybrid” packet-switched and circuit-switched optical
network infrastructure
• Ultrascale Protocols (e.g. FAST) and Dynamic Optical Paths
– Monitor, Manage and Optimize the Use of the Network
and Grid Systems in real-time
• Using a set of Agent-Based Intelligent Global Services
– Built on Top of an already-existing, developing software infrastructure
in round-the-clock operation:
• MonALISA, GEMS, GAE/Clarens
• Exceptional Support from Cisco Advanced Research and
Technology Infrastructure (ARTI) Group & Calient
• Exceptional NLR, CENIC, Internet2/Abilene, ESnet Support
03.05.2005
R. Cavanaugh, HENP SIG, Spring
Internet2 Meeting
4
UltraLight Activities
TEAMS: Physicists, Computer Scientists, Network Engineers
• High Energy Physics Application Services
– Integrate and Develop physics applications into the UltraLight Fabric:
Production Codes, Grid-enabled analysis, User Interfaces to Fabric
• E-VLBI Application Services
– Integrate and Develop eVLBI applications into the UltraLight Fabric:
Specific Protocols and Bandwidth Management Techniques
• Global System Services
– Critical “Upperware” software components in the UltraLight Fabric:
Monitoring, Scheduling, Agent-based Services, etc.
• Network Engineering
– Routing, Switching, Dynamic Path Construction Ops., Management
• Testbed Deployment and Operations
– Including Optical Network, Compute Cluster, Storage, Kernel and
UltraLight System Software Configs.
• Education and Outreach
03.05.2005
R. Cavanaugh, HENP SIG, Spring
Internet2 Meeting
5
03.05.2005
R. Cavanaugh, HENP SIG, Spring
Internet2 Meeting
6
Evolving Quantitative Science Requirements for
Networks (DOE High Perf. Network Workshop)
Today
End2End
Throughput
5 years
End2End
Throughput
5-10 Years
End2End
Throughput
Remarks
High Energy
Physics
0.5 Gb/s
100 Gb/s
1000 Gb/s
High bulk
throughput
Climate (Data &
Computation)
0.5 Gb/s
160-200 Gb/s
N x 1000 Gb/s
High bulk
throughput
SNS NanoScience
Not yet
started
1 Gb/s
1000 Gb/s + QoS
for Control Channel
Remote control
and time critical
throughput
Fusion Energy
0.066 Gb/s
(500 MB/s burst)
0.198 Gb/s
(500MB/
20 sec. burst)
N x 1000 Gb/s
Time critical
throughput
Astrophysics
0.013 Gb/s
(1 TByte/week)
N*N multicast
1000 Gb/s
Computat’l
steering and
collaborations
Genomics Data &
Computation
0.091 Gb/s
(1 TBy/day)
100s of users
1000 Gb/s + QoS
for Control Channel
High throughput
and steering
Science Areas
03.05.2005
R. Cavanaugh, HENP SIG, Spring
Internet2 Meeting
7
History of Bandwidth Usage – One Large Network;
One large Research Site
ESnet Accepted Traffic 1990 – 2004
Exponential Growth Since ’92;
Annual Rate Increased from 1.7 to 2.0X
Per Year In the Last 5 Years
L. Cottrell
ESnet Monthly Accepted Traffic Through
W. Johnston
Nov, 2004
10 Gbps
W. Johnston
400
300
250
Progress
in Steps
200
150
100
 SLAC
50
Nov, 04
Jan, 04
Jun, 04
Mar, 03
Aug, 03
May,02
Oct, 02
Jul, 01
Dec, 01
Feb, 01
Apr, 00
Sep, 00
Nov, 99
Jan, 99
Jun, 99
Aug, 98
Oct, 97
Mar, 98
May, 97
Jul, 96
03.05.2005
Dec, 96
Feb, 96
Apr, 95
Sep, 95
Nov, 94
Jan, 94
0
Jun, 94
TByte/Month
350
Traffic ~400 Mbps; Growth in
Steps (ESNet Limit): ~ 10X/4 Years.
 July 2005:2x10 Gbps links: one for
production and one for research
 Projected: ~2 Terabits/s by ~2014
R. Cavanaugh, HENP SIG, Spring
Internet2 Meeting
8
101 Gigabit Per Second Mark!
2.95
101 Gbps
Unstable
end-sytems
Gbps to+from
Rio de Janeiro
1.0 Gbps to Seoul
END of demo
03.05.2005
R. Cavanaugh, HENP SIG, Spring
Internet2
Meeting
Source:
SC04 Bandwidth Challenge committee
9
MonALISA: SC04 Monitoring
Monitoring
SCINet, NLR,
Abilene,
LHCNet,
CHEPREO, Int’l
NRNs and 9000
Grid Nodes
Simultaneously
03.05.2005
R. Cavanaugh, HENP SIG, Spring
Internet2 Meeting
10
The UltraLight Network
• UltraLight has a non-standard core network with dynamic links
and varying bandwidth inter-connecting our nodes.
 Optical Hybrid Global Network
• Core of UltraLight will dynamically evolve with available
resources on other backbones
– such as NLR, HOPI, Abilene or ESnet.
• The main resources for UltraLight:
– LHCnet (IP, L2VPN, CCC)
– Abilene (IP, L2VPN)
– ESnet (IP, L2VPN)
– Cisco NLR wave (Ethernet)
– HOPI NLR waves (Ethernet; provisioned on demand)
– UltraLight nodes: Caltech, SLAC, FNAL, UF, UM,
StarLight, CENIC PoP at LA, CERN
03.05.2005
R. Cavanaugh, HENP SIG, Spring
Internet2 Meeting
11
UltraLight Network Engineering
•
GOAL: Determine an effective mix of bandwidth-management techniques
for the LHC application-space, particularly:
– Best-effort and “scavenger” using “effective” protocols
– MPLS with QOS-enabled packet switching
– Dedicated paths arranged with TL1 commands, GMPLS
•
PLAN: Develop, test most cost-effective combination of network
technologies:
– Exercise UltraLight applications on NLR, Abilene and campus networks, as well
as LHCNet, and our international partners
• Progressively enhance Abilene with QOS support to protect production traffic
• Incorporate emerging NLR and RON-based lightpath and lambda facilities
– Deploy and study ultrascale protocol stacks (such as FAST) addressing issues of
performance & fairness
– Use MPLS/QoS and other forms of BW management, and adjustments of optical
paths, to optimize end-to-end performance among a set of virtualized disk
servers
03.05.2005
R. Cavanaugh, HENP SIG, Spring
Internet2 Meeting
12
UltraLight Network
Infrastructure Elements
• Trans-US 10G
λs Riding on NLR, Plus CENIC, FLR, MiLR
– LA – CHI (2 Waves): HOPI and Cisco Research Waves
– CHI – JAX (Florida Lambda Rail)
• Dark Fiber Caltech – L.A.: 2 X 10G Waves (One to WAN In Lab);
10G Wave L.A. to Sunnyvale for UltraScience Net Connection
• Dark Fiber with 10G Wave: StarLight – Fermilab
• Dedicated Wave StarLight – Michigan Light Rail
• SLAC: ESnet MAN to Provide 2 X 10G Links (from July):
One for Production, and One for Research
• Partner with Advanced Research & Production Networks
– LHCNet (Starlight- CERN), Abilene/HOPI, ESnet, NetherLight, GLIF,
UKLight, CA*net4
– Intercont’l extensions: Brazil (CHEPREO/WHREN), GLORIAD, Tokyo,
AARNet, Taiwan, China
03.05.2005
R. Cavanaugh, HENP SIG, Spring
Internet2 Meeting
13
UltraLight Testbed
Now to the Driving Application Work… 
03.05.2005
R. Cavanaugh, HENP SIG, Spring
Internet2 Meeting
14
Grid Analysis Environment
03.05.2005
R. Cavanaugh, HENP SIG, Spring
Internet2 Meeting
15
UltraLight Analysis Architecture
Analysis
Client
- Discovery,
- Acl management,
- Certificate based
access
HTTP, SOAP,
XML-RPC
Grid Services
Web Server
Scheduler
Catalogs
FullyAbstract
Planner
Metadata
PartiallyAbstract
Planner
Data
Mngmt
FullyConcrete
Planner
Analysis
Client
Virtual
Data
Monitoring
Replica
Execution
Priority
Manager
03.05.2005
Grid Wide
Execution
Service
Applications
• Clients talk standard
protocols to “Grid
Services Web Server”,
• Simple Web service
API allows simple or
complex analysis
clients
• Typical clients: ROOT,
Web Browser, ….
• Clarens portal hides
complexity
• Key features: Global
Scheduler, Catalogs,
Monitoring, Grid-wide
Execution service.
R. Cavanaugh, HENP SIG, Spring
Internet2 Meeting
16
Peer 2 Peer System
• Allow a “Peer-toPeer” configuration
to be built, with
associated
robustness and
scalability features.
• Discovery of
Services
• No Single point of
failure
• Robust file download
03.05.2005
R. Cavanaugh, HENP SIG, Spring
Internet2 Meeting
17
UltraLight Application Services
UltraLight Application-layer
Global Services Services
Make UltraLight Available to Physics Applications & their Environments
ROOT
IGUANA
COBRA
ATHENA
Other apps.
Request Planning Services
Network
Management
Network Access
Workflow Management
Storage Access
Execution Services
Intelligent Agents
End-to-end Monitoring
Application Interface
UltraLight
Infrastructure
End-to-end Monitoring
•
Networking
Resources
Storage
Resources
Application Frameworks Augmented to
Interact Effectively with the Global
Services (GS)
–
GS Interact in Turn with the Storage
Access & Local Execution Service
Layers
03.05.2005
•
Computation
Resources
Apps. Provide Hints to High-Level
Services About Requirements
–
–
Interfaced also to managed Net and
Storage services
Allows effective caching, pre-fetching;
opportunities for global and local
optimization of thruput
R. Cavanaugh, HENP SIG, Spring
Internet2 Meeting
18
GAE and Ultralight
Make the Network an Integrated Managed
Resource
Application Interfaces
Monitor
• Unpredictable multi user
analysis
• Overall demand typically fills
the capacity of the
resources
• Real time monitor systems
for networks, storage,
computing resources,… :
E2E monitoring
Request Planning
Network Planning
Network Resources
Support data transfers ranging from the (predictable) movement of large scale
(simulated and real) data, to the highly dynamic analysis tasks initiated by
rapidly changing teams of scientists
03.05.2005
R. Cavanaugh, HENP SIG, Spring
Internet2 Meeting
19
MonALISA:
Monitoring Agents using a Large Integrated Services Architecture
• MonALISA able to dynamically
– register & discover
• Based on multi-threaded
engine
– Very scalable
• Services are self describing
• Code updates
– Automatic & secure
• Dynamic config for services
– Secure Admin Interface
• Active filter agents
– Process data
– Application specific monitoring
03.05.2005
• Mobile agents
– decision support
– global optimisations
Fully distributed,
noHENP
single
point of failure!
R. Cavanaugh,
SIG, Spring
Internet2 Meeting
20
MonALISA Example:
Monitoring Network Topology, Latencies, Routers
MonALISA
03.05.2005
R. Cavanaugh, HENP SIG, Spring
Internet2 Meeting
21
MonALISA Services Integrated
with Network Services
• Dedicated modules to
monitor and control optical
switches
• Used to control
– CALIENT switch @ CIT
– GLIMMERGLASS switch
@ CERN
• ML agent system
– Used to create global
path
– Algorithm can be
extended to include
prioritisation and preallocation
03.05.2005
R. Cavanaugh, HENP SIG, Spring
Internet2 Meeting
22
(Physics) Analysis on the Grid
Move from Existing Components to a Coherent System
• Catalogs to select
datasets,
• Resource &
Application
Discovery
• Schedulers guide
jobs to resources
• Policies enable “fair”
access to resources
• Robust (large size)
data (set) transfer
•
•
•
Client
Application
8
2
1
3
Discovery
Dataset
service
7
Catalogs
4
9
Planner/
Scheduler
Job
Submission
6
Execution
Storage
Management
5
5
Policy
Feedback to users (e.g. status of their jobs)
Crash recovery of components (identify and restart)
Provide secure authorized access to resources and
services.
03.05.2005
Steering
R. Cavanaugh, HENP SIG, Spring
Monitor
Information
Data
Transfer
Storage
Management
Ultralight core : data transfer, planning
scheduling, (sophisticated) policy management
on VO level, integration
More sophisticated componentsInternet2
& services
in years 3-4
Meeting
23
Clarens Web-Service Backbone
The glue which holds everything together
•
•
•
•
•
•
•
X509 Cert based access
Good Performance
Access Control Management
Remote File Access
Dyanamic Discovery of Services on a Global Scale
Available in Python and Java
Easy to install and part of VDT distribution:
–
–
•
•
wget -q -O http://hepgrid1.caltech.edu/clarens/setup_clump.sh |sh
export opkg_root=/opt/openpkg
Interoperability with other web service environments
such as Globus, through SOAP
Interoperability with MonALISA
Monitoring Clarens parameters
Service
publication
03.05.2005
R. Cavanaugh, HENP SIG, Spring
Internet2 Meeting
24
Many GAE Services Integrated with
Network Services
•
•
•
•
•
•
•
•
Core Clarens services
(Including a shell service and
remote file access)
File catalog service (contains
dataset information)
Sphinx Scheduler (UFL)
Service based scheduler
Job Submission BOSS
(Collaboration with INFN)
Root Clarens client
Caves (UFL) Analysis code
and command sharing
environment
Steering service. First
prototype of steering service
Discovery Service
UltraLight focuses on integration and sophisticated
automated decisions based on monitor information
03.05.2005
R. Cavanaugh, HENP SIG, Spring
Internet2 Meeting
25
Example : Remote Data File Access
via Clarens
03.05.2005
R. Cavanaugh, HENP SIG, Spring
Internet2 Meeting
26
Example: Sphinx
Scheduling Service
• Functions as like a Nerve Centre
• Data Warehouse
– Policies, Account Information,
Grid Weather, Resource
Properties and Status, Request
Tracking, Workflows, etc
Clarens WS Backbone
“?”
Grid Client
• Applies Data Mining methods
Recommendation
Engine
• Flexible Framework:
– Client (request/job submission)
Grid Resource
• Clarens Web Service
• Grid Clients
– Scheduling Service
• Clarens Web Service
• MonALISA Monitoring Repository
Grid Resource
Grid Resource
– Grid Resource
• MonALISA Monitoring Service
• Grid Services
03.05.2005
R. Cavanaugh, HENP SIG, Spring
Internet2 Meeting
MonALISA Monitoring Backbone
27
UltraLight Plans
• UltraLight envisions a 4 year program to deliver a new,
high-performance, network-integrated infrastructure:
• Phase I will last 12 months and focus on deploying the
initial network infrastructure and bringing up first services
• Phase II will last 18 months and concentrate on
implementing all the needed services and extending the
infrastructure to additional sites (We are entering this
phase starting approximately this summer)
• Phase III will complete UltraLight and last 18 months.
The focus will be on a transition to production in support
of LHC Physics; + eVLBI Astronomy
03.05.2005
R. Cavanaugh, HENP SIG, Spring
Internet2 Meeting
28
Summary
• For many years the Wide Area Network has been the bottleneck;
this is no longer the case in many countries
– Deployment of a data intensive Grid infrastructure is now possible!
– Recent I2LSR records show for the first time ever that the network can
be truly transparent; throughputs are limited by end-hosts
– Challenge shifted from getting adequate bandwidth to deploying
adequate infrastructure to make effective use of it!
• UltraLight promises to deliver the critical missing component for
future eScience: the integrated, managed network
– Next generation Network and Grid system
– Extend and augment existing grid computing infrastructures
(currently focused on CPU/storage) to include the network
as an integral component
03.05.2005
R. Cavanaugh, HENP SIG, Spring
Internet2 Meeting
29