Hard replacement

Download Report

Transcript Hard replacement

I2.1: In-Network Storage
PI Presentation by
Arun Iyengar
NS-CTA INARC Meeting
23-24 March 2011
Cambridge, MA
Key Aspects of this Work
• Make better use of space within network storage
– Remove redundant content
• From communications between network nodes
• From network storage itself
• From images which may have similar content
– Only store most relevant content
• Use proper cache replacement policies
• Handle disruptions and failures
– Nodes may fail, become unreachable
– Packet losses may occur
• Data Issues
– Dealing with data provenance
– Data consistency
• How provenance affects data consistency decisions
Key Problem
• Nodes within a network need adequate
storage and memory
– Mobile devices may have limited storage/memory
– Content may be replicated for high availability,
increasing storage/memory requirements
– Even if persistent storage is sufficient, maintaining
as much content in main memory may be
desirable for performance reasons
• Our work
– Redundancy elimination at multiple levels
– Making better use of existing memory/storage
space
Redundancy Elimination at
Multiple Levels
Redundancy elimination at different levels can cause
Mutual interference, complicated behavior
Semantic Level (e.g. redundancy in visual images)
Communication Protocol Level
Application Level (e.g. cached items)
Page Level (main memory)
File Level (Disk)
Redundancy Elimination at
Communication Protocol Level
• In-network caching algorithms to reduce network
utilization by removing redundant bytes
communicated
GW
Motivation
•
Several in-network caching algorithms previously suggested
(e.g. Spring , Wetherall, SIGCOMM 2000)
However, evaluations mainly conducted on packet traces
When deploying those algorithms on protocols
(TCP/UDP/IP), several issues can arise
•
•
–
•
E.g., protocol correctness (e.g., termination)
In addition, packet-traces studies are restrictive
–
–
Can only analyze certain metrics (e.g., bytes savings)
Cannot evaluate other metrics (e.g., TCP delay) and interactions
between mechanisms (e.g., TCP exp. Backoff)
Goals
• Analyze Protocol Correctness
• Conduct a more comprehensive evaluation of
performances in real environments
– Packet losses, packet re-ordering, etc.
– Bytes, Delay
• Design new algorithms that are more robust
• Develop analytical models to rigorously analyze and
predict performances
Redundancy Elimination at
Communication Protocol Level
• Communications between nodes over a network may
contain redundant content
– Removing those redundancies can reduce network
utilization and congestion
• Fingerprint: number computed from a byte string
using a one-way function.
– Fingerprint consumes less space than byte string
– Low probability of different strings having same
fingerprints
• Rabin fingerprint for string of bytes t1, t2, t3,…, tn:
– RF(t1, t2, t3,…, tn) = (t1pn + t2pn-1…+ tn-1p + tn) mod M
– p and M are constant integers
• Computationally cheap to compute next fingerprint
from previous one:
– RF(ti+1, …, tn+i) = (RF(ti, …,tn+i-1) – ti * pn) * p + tn+i mod M
– For faster execution, all values of ti * pn can be
precomputed, stored in a table
Redundancy Elimination
Algorithm
• Maintain caches (at both sender and receiver) of
recent packets indexed by fingerprints
• Generate representative fingerprints for packets
being sent
• Look up each fingerprint in cache
• If match found, compare bytes to make sure that
there was no fingerprint collision
• If bytes match, expand match to largest possible
matching string
• Update sender cache
• For byte strings with matching content, send tokens
indentifying fingerprint instead of actual bytes
Our Implementation
• Original Spring-Weatherall paper did not
implement a system with this redundancy
scheme
• Our implementation encountered following
issues
Illustration of Interactions
•
•
•
•
RE creates dependencies between packets
Wireless links are loss-prone
Chains of packets may be undecodable
TCP performance may be severely affected (e.g., exp. backoff)
GW
IP 3
IP 2
IP 1
IP 1
IP 2
x
IP 2
IP 2, IP 3 can
Network in-caching alg. create dependencies between IP packets,
which
not be decoded
increase correlated losses and trigger TCP algorithms (e.g.,
IP 3 exp. Backoff)
IP 3
Illustration of Protocol
Correctness Violation
IP 1
GW
IP 2
IP 1
IP 1
IP 2
x
IP 3
IP 4
Same object
Re-transmit IP 1
IP 2 cannot be decoded
IP 3
IP 2
IP 3
Re-transmit IP 1
IP 3 cannot be decoded
IP 4
IP 3
IP 4
A single lost (or re-ordered) packet can stall a IP
TCP
connection!
4 cannot
be decoded
New Algorithms
• Algorithm 1:
– Check TCP sequence numbers
– Packets only encoded with previous packets
– Variant: I-P frames (similar to MPEG)
• Algorithm 2:
– Flush caches upon packet retransmissions
Results
Results: I/P frames
Memory Deduplication
• Memory system, cache, file system may have
multiple entities which are identical
– For example, two pages in a memory system may be
identical
• Challenge is to identify duplicate entities, combine
them so that only one copy is stored and shared
• Maintain a hash of stored entities.
• When two items hash to same value, do a byte-bybyte comparison to verify that the entities are in fact
identical
– Use of hash function significantly reduces number of
comparison operations to check for identical entities
Delta Encoding
• Multiple objects within a cache may be similar but
not identical
– Deduplication will not work
• Identify similar cached objects by taking Rabin
fingerprints of parts cached objects, looking for
objects with similar Rabin fingerprints
• For objects with similar content, some of the objects
o1, o2,…, on can be stored as differences from one or
more base objects
– No need to store complete data for o1, o2,…, on
• If overhead of unencoding a differenced object is an
issue, delta encoding can be restricted to cached
objects which are infrequently requested
Compression
• Cached objects can be compressed to further
decrease memory/storage consumption
• If computational overhead of compression is a
problem, compression should only be applied
to cached objects which are infrequently
accessed.
Cache Replacement
• Determine how to make best use of cache
space
• LRU, Greedy-dual-size have been used in the
past
• Have developed new cache replacement
algorithms which are superior for DTNs
Caching: Basic Idea
• Utility-based data placement
– A unified probabilistic framework
– Ensure that the more popular data is always
cached nearer to the brokers
• Data utility is calculated based on
– Its chance to be forwarded to the brokers
– Its popularity in the network
• Each two caching nodes optimize their data
placement upon contact
Data Utility
• Data specifications
– Data i is destined for brokers B1, …, Bk
– Data i has a finite lifetime T
• The utility uij of data i at node j evaluates the
benefit of caching data i at node j
– cij: The probability that i can be forwarded to the
brokers within T
– wi: The probability that data i will be retrieved in
the future
Cached Data Placement
• Whenever two nodes contact, they exchange
their cached data to maximize the utilities of
the data they cache
– Hard replacement: only for data which is currently
requested by the brokers
– Soft replacement: for the other unrequested data
• Hard replacement is prioritized to ensure that
the requested data is forwarded to the brokers
Unified Knapsack Formulation
• When nodes A and B contact, put the data
cached on both A and B into the same
selection pool with size k
–
– uiA: utility of data i at A
– si: size of data I
– SA: buffer size of A
• Similar for node B
Hit Rate
Data access delay
Overhead
Military Relevance
• Our In-Network storage techniques are important for
communications between soldiers in real
deployments
– Handling disruptions
– Dealing with failed/captured nodes and unreliable network
links
– Handling lost communications, packet losses
• Collaborations with Robert Cole (CERDEC), John
Hancock (ArtisTech/IRC), Matthew Aguirre
(ArtisTech/IRC), Alice Leung (BBN/IRC)
– They are studying how are techniques can be applied to
military scenarios
– Studying our DTN caching techniques to run experiments
on traces typical of military scenarios
• Joint work with Guohong Cao’s group, CNARC
Impact and Collaborations
• ICDCS 2011 paper co-authored with Guohong
Cao (Penn State) of CNARC
• Collaborations with Robert Cole (CERDEC),
John Hancock (ArtisTech/IRC), Matthew
Aguirre (ArtisTech/IRC), Alice Leung (BBN/IRC)
– They have used our DTN caching code to run
experiments on traces typical of military scenarios
Future Work
• Refine techniques on redundancy elimination
in network communications
• Study redundancy at other levels in the
system
• Study interactions, interferences, and
synergies between redundancy elimination at
different levels of the system
Summary and Conclusion
• New techniques for redundancy elimination
– Reduces space requirements for in-network
storage
• Redundancy elimination being performed at
multiple levels of the entire system
• New methods for making better use of caches
in DTNs
– These methods are of interest to other
collaborators in NS-CTA
Arun Iyengar’s NS-CTA
Contributions
• I2.1 - In-Network Storage is only task funding Arun’s research
• Research Contributions in caching, redundancy elimination summarized in
previous slides
• Publications:
–
–
–
–
“Supporting Cooperative Caching in Disruption Tolerant Networks”. W. Gao, G. Cao, A. Iyengar, M.
Srivatsa. Accepted in ICDCS 2011
"Social-Aware Data Diffusion in Delay Tolerant MANETs", Yang Zhang, Wei Gao, Guohong Cao, Tom La
Porta, Bhaskar Krishnamachari, Arun Iyengar. Book chapter to appear in Handbook of Optimization in
Complex Networks: Communication and Social Networks, Springer.
“Provenance driven data dissemination in disruption tolerant networks”. M. Srivatsa, W. Gao and A.
Iyengar. Under submission, Fusion 2011
“Resolving Negative Interferences between In-Network Caching Methods”. F. Le, M. Srivatsa, A.
Iyengar and G. Cao. Under preparation
• Patent Application:
– "System and method for caching provenance information", Wei Gao, Arun
Iyengar (lead inventor), Mudhakar Srivatsa, IBM patent application.
• Initiated and established collaboration with Guohong Cao’s group
(CNARC).
– Mentor for Wei Gao (Guohong Cao’s PhD student) for internship at IBM.
• Initiated and established collaboration with Robert Cole (CERDEC), John
Hancock (ArtisTech/IRC), Matthew Aguirre (ArtisTech/IRC), Alice Leung
(BBN/IRC)
Data Consistency
• Problem: How to make sure cached data is
current
– Resolving inconsistencies between different copies
• Data consistency in DTNs
– Limited connectivity can make it difficult to
achieve strong consistency
– Expiration times can be used for maintaining
cache consistency