Lambda Station

Download Report

Transcript Lambda Station

Lambda Station
Matt Crawford, Fermilab
co-PI: Don Petravick, Fermilab
co-PI: Harvey Newman, Caltech
HEP Computing
• Labs plus University
Community
• Vast ensembles of
commodity equipment
• Something like a
petabyte of IDE disk
• Storage system to
storage system transfer
• Refresh of 200 TB of
state at universities
• Structured production,
“chaotic” analysis
HEP Networking
• Office of High Energy Physics funds LHCnet, (OC192 triangle
Starlight  CERN MANLAN)
• Interested in switched optical networking
–
–
–
–
UltraLight (Caltech)
UltraScience Net (ORNL)
OSCARS MPLS tunnels (ESnet: FNALBNL, etc)
FNAL-CERN 875 MB/Sec SS-SS service challenge
• Interest, testing, and following of improvements to TCP at high
bandwidth  delay
• Given the directions of HEP computing, the ends of “pipes” are
likely to be locally, competently engineered networks.
Problem statement
• Experiments and applications now running, or starting soon, will
benefit from data movement capabilities now available only on
bleeding-edge networks.
• These systems are connected to production site networks.
Duplicating site infrastructure to connect them to specialpurpose networks is an expense to be avoided if possible.
• Multihoming the endpoints to multiple networks is complicated
and expensive and it (nearly) precludes graceful failover when
one path is lost.
• Applications (and operating systems) should not have to be recustomized for every new network technology or highperformance path.
Additional complications
• Rates are not predictable for real data sources and sinks.
– Memory-to-memory is somewhat deterministic, but disk-todisk has several uncontrolled variables.
• Applications may use multiple streams for maximum exploitation
of high-speed links. Lambda Station must be able to deal in
aggregates.
• Straggler flows persist after bulk of transfer has completed, and
continued use of high-volume path may be wasteful at that
point.
• Aggressive protocols for the wide area may have negative
impacts on the last mile (site or site’s “uplink”) network.
Lambda Station
• Function
– Schedule use of one or more reservable network paths
– Arrange for traffic to be forwarded onto such paths
Interfaces to other systems
• To application (or to manual request system)
• To authentication/authorization infrastructure
• To site’s internal network (dynamic reconfiguration of
packet forwarding rules)
– Operate at any granularity, down to single flows
• Site’s border/connection point to reservable path
• Peer site’s Lambda Station
• Talk to advanced WANs, through network operatordefined setup protocol, as needed*
• Monitoring, accounting, status reporting
Block Diagram
Client application interface
• Application describes the traffic which is to be routed
over an alternative path.
– Traffic selectors: 6-tuples [ IP version, {src cidr(s)}, {dst
cidr(s)}, protocol, {src port(s)}, {dst ports(s)} ]
– Transfer rate, total volume, duration, direction
– Earliest desired start
• LS and host agree on packet-selection method - we
lean toward DSCP.
• LS informs application of actual BW allocated and
setup status.
• Host or LS should inform the other of early
termination, if it occurs.
Site network interface
• Configure local site’s internal routing to
divert traffic to the alternate path.
• Graceful teardown – resume normal
internal routing before WAN path is torn
down.
• Different version of this module will deal
with different varieties of site network.
– Each site might plug in its own scripts.
Site-edge router interface
• Graceful setup – Enable the reserved WAN
path before internal routing directs traffic onto
it.
• ACL may be in effect on this device to prevent
unauthorized use.
• ACL very likely to be in effect with respect to
incoming traffic from the WAN.
– At some sites, this is a path which bypasses
firewalls!
LS-to-LS protocol
• Exchange traffic selectors
• Coordinate setup & teardown
• Verify path continuity
– Implies that LS can communicate
simultaneously over reserved and
commodity network paths.
• Inform of early traffic termination
Advanced WAN interface
• Multiple flavors of high-performance WANs are
anticipated.
– Some WANs may require forwarding state to be created
before use.
– Some may have their own reservation system, which end
systems need not learn to use if it reserves through Lambda
Station instead.
• Lambda Station’s WAN module will parameterize and
adapt to each sort of WAN, providing an abstract
view.
– DOE UltraScience Net, ESNET, LHCNet, UltraLight.
Requirements for Production
• Robustness
– LS must enable production systems to make trial use of
advanced networks, and cleanly restore default forwarding
behavior upon completion or path failure.
• Monitoring
– Lambda Station must present its own state and history.
– Currently it serves this info through its web server.
– Investigating MonaLisa (OSG component).
• Accounting
– In many environments, different sub-organizations share the
network resource. LS must gather usage information to
support accounting.
Provide sample integration
• With Storage Systems that are
components of the USCMS software
and computing project.
• Currently are :
– Managed storage elements.
• SRM / GridFTP protocols.
• Now implementing LS client calls in
SRM/dCache.
Current status
• Release 1.0 – today.
– A stable, usable snapshot of a work in progress.
– Based on Perl with SOAP::Lite
– Dynamically reconfigures site routers to send
traffic over alternate paths
– End system applied DSCP tags to specialtreatment flows.
– Traffic path varied cleanly – unnoticed by
application; hiccups in throughput at each change.
Path switching effects
Deployment Scenarios
Client capabilities: identifying high-impact traffic ...
1.
2.
3.
4.
Specify src & dst address groups, but no more.
Specify src and/or dst ports as well as addresses.
Apply DSCP label selected by client
Apply DSCP label as directed by Lambda Station.
Client capabilities: Lambda Station integration level ...
1. Lambda Station called manually via web interface
2. SOAP call by wrapper around client application
3. SOAP calls from within the client application
Site network capabilities ...
1. Static router config w/ fixed PBR based on DSCP
2. Router ACLs activated and inactivated by LS
3. Lambda Station constructs and applies ACLs for PBR
Directions
• Next version being built on Apache Axis
– probably will use jClarens
– WSDL is sure to evolve
• IPv6 support is mere placeholder as yet
• Adding support for Force10 site routers
• Looking forward to speaking to your
lightpath WAN directly!
Summary
• Lambda Station’s role in data-intensive
science is to dynamically connect production
end-systems to advanced high-performance
wide-area networks.
– Bring the systems to the network
– Bring the network to the systems
• Prototyping has shown the feasibility of using
dynamically selected network paths for traffic
between production site networks.