User-level Internet Path Diagnosis
Download
Report
Transcript User-level Internet Path Diagnosis
User-level Internet Path
Diagnosis
Ratul Mahajan, Neil Spring, David
Wetherall and Thomas Anderson
Designed by Yao Zhao
A distributed system is one in which the failure of a
computer you didn’t even know existed can render
your own computer unusable.
L. Lamport
Motivation
Can end users, with no special privileges
identify and pinpoint faults inside the network
that degrade the performance of their
applications?
Why (unprivileged) end users?
Operators do not share the users’ view of the network
Operators may have no more insight than unprivileged
users for problems inside other administrative domains
user can directly contact the responsible ISP leading to
faster problem resolution
Many techniques are more effective and scalable with
fault localization than blindly trying all possibilities
Outline
Diagnosis architecture
Diagnosis Tool: Tulip
Evaluation
Recommendations
Conclusion
Problem
An Ideal Trace-based Solution
Routers log packet activity and make
these traces available to users.
The log at each router is recorded for
both input and output interfaces.
impractical for deployment
Packet-based Solutions
Complete Embedding
Reduced Embedding
Remove the step of embedding the complete input packet in
the output packet
Constant Space Embedding
Each router along the path records information into each
packet that it forwards.
Barring two exceptions, the scheme above is equivalent to
the path trace.
Sample TTL
Real Clocks
Unsynchronized clock
Finite precision
New Fields of Packet Header
in the Architecture
Outline
Diagnosis architecture
Diagnosis Tool: Tulip
Evaluation
Recommendations
Conclusion
Internet Approximations
Out-of-band measurement probes
ICMP timestamp requests to access
time at the router
IP identifiers instead of per-flow
counters
Packet Reordering
Assumptions for Packet Loss
IP-IDs are consecutive
Small size packets usually have low loss rate
80% of the time from over 90% of the routers
In over 60% of the cases when any packet in the
triplet was lost, only the data packet was lost.
ICMP rate-limiting will not be mistaken as
packet loss
1 more check packet
Packet Loss
Packet Queuing
Similar to cing
Two practical
problems:
ICMP generation
time
Cable modems
and wireless links
Tulip
Network Load
Diagnosis time
BL/W
10 ~ 30 min per path
Parallel search vs Binary search
Two or more faults?
Outline
Diagnosis architecture
Diagnosis Tool: Tulip
Evaluation
Recommendations
Conclusion
Methodology
Evaluate applicability
Diagnosis granularity
Three sources: MIT, U Washington and
London
Destinations from Skitter
Validation
Diagnosis granularity (1)
Diagnosis granularity (2)
Validation
IP-IDs and ICMP timestamp vs End-toend measurement
Tulip vs Sting
Consistency of Tulip’s inferences
Consistency between Tulip and Paths
Two facts
Locating Loss and Delay in the Internet
Persistence of Faults
Outline
Diagnosis architecture
Diagnosis Tool: Tulip
Evaluation
Recommendations
Conclusion
Limitations of Tulip
Out-of-band measurements
Stable routing path
IP-ID counters
Limitations of ICMP timestamps
In-band vs Out-of-band Diagnosis
Priority of protocols
Packet drop
Packet size
Loss rate
Reordering
Other Recommendations
Path Verification
IP Identifiers
Router Timestamps
Related Works
Diagnosis Approaches
Measurement Primitives
Magpie
SPIE
NetFlow
Overlay primitives
IPMP
Measurement Tools
PING, Traceroute, pathchar, Sting
Conclusion
Tulip
Practical tool to diagnose packet reordering,
loss and queuing
Diagnosis architecture
In-band
Lightweight
Questions?