Transcript sigcomm-02

An End-to-End Approach to
Scalable Network Storage
Micah Beck, Associate Professor
Director, Logistical Computing &
Internetworking (LoCI) Lab
Terry Moore, Associate Director
James S. Plank, Associate Professor & Director
Computer Science Department
SIGCOMM 2002, Pittsburgh
August 23, 2002
A Generalized
Communication Scenario
» A quantum of data originates
• from a node N
• at time t
» and either does or does not arrive
• at a destination at a node N
• at time t
» and if it does arrive it may be corrupted.
Scenario: Networking
» Characteristics
• N and N can be members of a globally scalable
network
• t-t is a delay we seek in general to minimize
fairly
» Fits the characteristics of layers 1 through 3 of the
Internet stack.
» When delivering data in a network, one cannot
count on : low delay, high probability of correct
delivery
Scenario: Storage
» Characteristics
• N and N are identical or part of a non-scalable
network
• There is no a priori bound on t-t
» Fits the characteristics of directly connected or
closely coupled storage
» When storing data in a closely coupled network,
we count on: low delay, high probability of correct
delivery
The End-to-End Approach
to Networking
» No reliance on the timely or accurate delivery of
any particular quantum of data
» High delay and corruption must only be of
sufficiently low probably
» Fairness between competing network participants.
» This allows a high degree of autonomy and faulty
behavior in the operation of the network
» Scalability!
12
End-to-End is Unnecessary for Closely
Coupled Storage
» If a storage device can be relied on to operate
with
• predictable delay
• high accuracy and
• high availability
» Then it can be used without the burden of
implementing layered end-to-end services.
» But the assumption of reliability can impose a cost
when the assumption fails to hold true and the
resource fails.
Scenario: Scalable Network Storage
» Characteristics
• N and N can be members of a globally scalable
network
• There is no a priori bound on t-t
» Fits the characteristics of storage accessed over a
globally scalable Internet
» When storing data in a network, one cannot count
on: low delay, high probability of correct delivery
Scalable Network Services Are Like
the Network Itself
»
»
»
»
»
Intermittently inaccessible
Vulnerable to partition
Prone to corruption in transit
Unpredictable latencies/jitter
End-to-End: Never require a network service to be
bigger, better or more complex than wide area
access allows
An End-to-End Approach to Storage
» No reliance on the timely or accurate delivery of
any particular packet
» High delay and corruption must only be of
sufficiently low probably
» Fairness between competing network participants.
» This allows a high degree of autonomy and faulty
behavior in the operation of the network
» Scalability!
Internet Backplane Protocol (IBP)
allocate!
Na
depot
capability
store!
Nw
data
depot
load!
Nr
depot
Allocation Attributes
» Duration (  permanent)
» Hard vs. Soft
» Read/Write semantics:
• Linear Append (write to end)
• Linear Truncate (write to start)
• Circular FIFO (with interlock)
• Circular Queue (no interlock)
» Depots implemented using disk and RAM
• same API and semantics
• performance differs
Internet Backplane Protocol (IBP)
» Depots (servers) that
make allocation of
primitive “byte arrays”
available to clients
» A depot is
implemented as a
daemon, protocol is
RPC over TCP
» Byte arrays are not
blocks (more abstract)
• Network capabilities
(primitive security)
• Variable extents
» Byte arrays are not
files (weaker
semantics)
• Size & duration are
limited
• “Volatile” allocations
• Best effort reliability
and availability
• No directory
structure, accounting
• No caching,
replication
Building on IBP
» Many applications assume file semantics
• Unbounded size & duration
• High reliability & availability
• Caching & replication
» In a layered architecture, these are implemented
through aggregation and additional intelligence at
the next level
» Resource discovery: Logistical Backbone
• Directory of depots, active probing
• Client library
The Network Storage Stack
• Our adaption of the network stack
architecture for storage
• Like the IP Stack
Applications
Logistical File System
Logistical Tools
L-Bone
• Each level encapsulates details from the
lower levels, while still exposing details
to higher levels
exNode
IBP
Local Access
Physical
ExNode vs inode
IBP Allocations
the network
local system
capabilities
exNode
inode
user
kernel
block addresses
disk blocks
IBP-Mail: SMTP attachments
by reference
»
The Problem: How to attach huge files?
1. Store the file on an IBP depot
2. Send capability with the mail message.
3. The receiver gets the file from the depot.
» Future work: Asyncrhonous routing
IBP Mail
SMTP
sender
exNode
receiver
IBP
write
IBP
read
IBP copy
Logistical Networking
Application Areas
»
»
»
»
»
»
Source routing
Bandwidth adaption
Reducing (BWdelay)
Reliable multicast
Content Distribution
Remote access to
structured data
» Managing
computation state
» Temporary storage
» Very large data sets
» Multimedia
» Collaborative
computing &
visualization
Software & Infrastructure
»
»
»
»
»
Tools open source, multiplatform
IBP Depot (server) and C client library
exNode and end-to-end services library
Logistical Backbone server (LDAP-based)
Linux/C is primary development platform
• Java clients are under development
» Command-line utilities, GUI
» Public L-Bone deployment
• Currently 1.6 TB in North America and Europe
» http://loci.cs.utk.edu
Lbone + exNode + GUI: Download
Logistical Networking is a TCP
» Storage is a fundamental element of
communication
» The end-to-end approach can apply to services
other than data transmission
» Logistical Networking achieves the benefits of
adherence to end-to-end principles:
• Application autonomy, network transparency
• Aggressive innovation
» Logistical Networking is a
Transformative Community Project
Some Further Thoughts
(it’s a position paper, after all)
IBP depots vs. IP routers
» IBP enables an intermediate node in a scalable
network to implement high-performance storage
» What about putting storage on IP routers?
• That other E2E principle tells us not to add
functionality to the network in order to serve
particular applications
• Current IP applications have no use for storage
at intermediate nodes
» This would interfere with the IP fast path in order
to support a subset of applications…
… On The Other Hand
» IP datagrams are stored, then forwarded
» Every router implements substantial RAM buffers
» The management of this storage is highly
specialized:
• Limited size allocation (MTU)
• Fast forwarding (FIFO, fair queuing, pipelining)
» This specialization of buffer management supports
interactive & near-real time applications
» Hypothesis: a requirement of fast forwarding at IP
intermediate nodes violates end-to-end
Are We In For a Tussle?
» A intermediate node with storage can support an
“MTU” the size of its maximum allocation: O(1GB)
• IPv6 has a 32-bit datagram size field
» Low latency forwarding may be incompatible with
such a monstrous MTU at intermediate nodes.
» A network that abandoned low-latency forwarding
as a requirement would be more truly “best effort”
and would allow greater autonomy, generality.
» Asynchronous applications are important!
» Does the need to support interactive applications
limit the scalability of the Internet?
Let’s not give up on end-to-end
until we’ve really given it a try!