User-level Internet Path Diagnosis

Download Report

Transcript User-level Internet Path Diagnosis

User-level Internet Path
Diagnosis
Ratul Mahajan, Neil Spring, David
Wetherall and Thomas Anderson
Designed by Yao Zhao
A distributed system is one in which the failure of a
computer you didn’t even know existed can render
your own computer unusable.
L. Lamport
Motivation

Can end users, with no special privileges
identify and pinpoint faults inside the network
that degrade the performance of their
applications?

Why (unprivileged) end users?




Operators do not share the users’ view of the network
Operators may have no more insight than unprivileged
users for problems inside other administrative domains
user can directly contact the responsible ISP leading to
faster problem resolution
Many techniques are more effective and scalable with
fault localization than blindly trying all possibilities
Outline





Diagnosis architecture
Diagnosis Tool: Tulip
Evaluation
Recommendations
Conclusion
Problem
An Ideal Trace-based Solution



Routers log packet activity and make
these traces available to users.
The log at each router is recorded for
both input and output interfaces.
impractical for deployment
Packet-based Solutions

Complete Embedding



Reduced Embedding


Remove the step of embedding the complete input packet in
the output packet
Constant Space Embedding


Each router along the path records information into each
packet that it forwards.
Barring two exceptions, the scheme above is equivalent to
the path trace.
Sample TTL
Real Clocks


Unsynchronized clock
Finite precision
New Fields of Packet Header
in the Architecture
Outline





Diagnosis architecture
Diagnosis Tool: Tulip
Evaluation
Recommendations
Conclusion
Internet Approximations



Out-of-band measurement probes
ICMP timestamp requests to access
time at the router
IP identifiers instead of per-flow
counters
Packet Reordering
Assumptions for Packet Loss

IP-IDs are consecutive


Small size packets usually have low loss rate


80% of the time from over 90% of the routers
In over 60% of the cases when any packet in the
triplet was lost, only the data packet was lost.
ICMP rate-limiting will not be mistaken as
packet loss

1 more check packet
Packet Loss
Packet Queuing


Similar to cing
Two practical
problems:


ICMP generation
time
Cable modems
and wireless links
Tulip

Network Load


Diagnosis time


BL/W
10 ~ 30 min per path
Parallel search vs Binary search

Two or more faults?
Outline





Diagnosis architecture
Diagnosis Tool: Tulip
Evaluation
Recommendations
Conclusion
Methodology

Evaluate applicability




Diagnosis granularity
Three sources: MIT, U Washington and
London
Destinations from Skitter
Validation
Diagnosis granularity (1)
Diagnosis granularity (2)
Validation




IP-IDs and ICMP timestamp vs End-toend measurement
Tulip vs Sting
Consistency of Tulip’s inferences
Consistency between Tulip and Paths
Two facts


Locating Loss and Delay in the Internet
Persistence of Faults
Outline





Diagnosis architecture
Diagnosis Tool: Tulip
Evaluation
Recommendations
Conclusion
Limitations of Tulip




Out-of-band measurements
Stable routing path
IP-ID counters
Limitations of ICMP timestamps
In-band vs Out-of-band Diagnosis

Priority of protocols


Packet drop
Packet size


Loss rate
Reordering
Other Recommendations

Path Verification

IP Identifiers

Router Timestamps
Related Works

Diagnosis Approaches




Measurement Primitives



Magpie
SPIE
NetFlow
Overlay primitives
IPMP
Measurement Tools

PING, Traceroute, pathchar, Sting
Conclusion

Tulip


Practical tool to diagnose packet reordering,
loss and queuing
Diagnosis architecture


In-band
Lightweight
Questions?