Transcript PPT

An Architectural Approach to
Managing Data in Transit
Micah Beck
Director & Associate Professor
Logistical Computing and Internetworking Lab
Computer Science Department
University of Tennessee
DOE Data Management Workshop
3/17/2004
“Data in Transit”
» After being generated by an instrument or
supercomputer
» Not stored in a permanent archive
» Serving the diverse purposes of a community of
users and applications
» Being transferred, processed and stored to meet
changing and unanticipated needs
• Visualization
• Data Mining
• Collaboration
• Distributed Computing
Interoperability via a Common Interface
» Span heterogeneous physical resources, operating
systems, local management schemes
» Serve changing and unexpected application
requirements; enable application autonomy
» We measure success in terms of infrastructure
deployment scalability
• In networks and distributed systems, this
means number, distribution, global reach,
spanning administrative domains…
• The Internet is the gold standard of
infrastructure deployment scalability
Layering as An Architectural Approach
» Abstractions at each layer can hide differences at
lower layers
» Exposed approaches avoid creating overly complex
mechanisms at lower layers
» The E2E Principle: Attributes of lower layers
implemented on shared infrastructure enable
deployment scalability
• Generality: Serve diverse application needs,
model diverse lower layer resources
• Weak semantics: Don’t give too much away at
one time!
The IP Network Stack
Application
Transport
Network
Link
Physical
…
common interface (IP)
IP’s Failure of Scalability
» Today, IP is failing as a common interface
» The design of IP is out of date
• Application communities are more diverse
• Link layer technologies violate IP assumptions
» Application communities are defining their own
common interfaces for general resource sharing,
deploying their own infrastructure (e.g. the Grid)
» Some networking communities have abandoned
interoperability at the network layer between
widely divergent link layer technologies
(e.g. optical switching & IP)
The Transit Layer:
A New Location for Interoperability
» Expand the link layer to a local layer to model
transfer, storage and processing resources
» Insert a new transit layer between the local and
network layers to implement a common interface
to diverse technologies at the local layer
» Adopt a highly general common interface at the
transit layer, providing a uniform view of all of the
resources of the network node
» Build diverse network services on top of this
common interface to model diverse application
requirements
» “Locating Interoperability in the Network Stack”,
Micah Beck & Terry Moore, UT-CS-04-520, Univ. of
TN CS Dept Tech Rpt
The Transit Network Stack
Application
Transport
…
Network
Transit
common interface
Local
Physical
transfer
storage processing
Transit Networking: A Unified View
“… memory
locations … are
just wires turned
sideways in time”
Dan Hillis, 1982,
Why Computer
Science is No
Good
Logistical Networking: An Overlay
Implementation of the Transit Layer
» Logistical Networking is an overlay implementation
of transit layer functionality built on top of the IP
network
» The Internet Backplane Protocol is the common
transit layer interface for Logistical Networking
» Network nodes are IBP “depots” that run as user
level processes, communicate using TCP/IP as well
as other link and network layer protocols
» Depots also serve storage and processing
resources to Logistical Networking clients
LN Tools and Deployment
» The Logistical Runtime System (LoRS) is a set of
tools based on IBP that enable users to take
advantage of the resources of IBP depots
» Logistical Distribution Network (LoDN) is a data
directory, monitoring and management system
» The Logistical Backbone is a Resources Discovery
service and global experimental IBP testbed
• Over 35 TB of storage available
• Over 300 depots in 21 countries
• Leverages the resources of PlanetLab
» Additional depots deployed at ORNL & NERSC
L-Bone: August 2003 (20TB)
Example LN Applications
» Astrophysics: Terascale Supernova Initiative
(A. Mezzacappa, ORNL; J. Blondin, NCSU)
• Management of massive datasets
» Fusion Energy Research (S. Klasky, PPPL)
• Streaming of simulation data during generation
» Viewset-Based Visualization
• Prestaging & caching of distant data
» Content Distribution
• Heroic data distribution problems (Linux ISOs)
» Multimedia Networking
• Creation, mgt & delivery of high value content
LN Futures and Directions
» Storage
• Implementation of file system services
• Moving data through firewalls at line speed
• QoS in highly controlled environments
» Networking
• Interoperability at ultrascale
• Advanced services (e.g. multicast)
» Computation
• Offloading visualization to IBP depots
• Developing sets of operations to support
application communities
Thank you!
[email protected]
http://loci.cs.utk.edu