Network diagnostics made easy

Download Report

Transcript Network diagnostics made easy

Network diagnostics made easy
Matt Mathis
3/17/2005
The Wizard Gap
The non-experts are falling behind
•
•
•
•
•
•
•
Year
1988
1991
1995
1999
2003
2004
Why?
Experts
1 Mb/s
10 Mb/s
100 Mb/s
1 Gb/s
10 Gb/s
40 Gb/s
Non-experts Ratio
300 kb/s
3:1
3 Mb/s
3000:1
TCP tuning requires expert knowledge
• By design TCP/IP hides the ‘net from upper layers
– TCP/IP provides basic reliable data delivery
– The “hour glass” between applications and networks
• This is a good thing, because it allows:
– Old applications to use new networks
– New application to use old networks
– Invisible recovery from data loss, etc
• But then (nearly) all problems have the same symptom
– Less than expected performance
– The details are hidden from nearly everyone
TCP tuning is really debugging
• Six classes of bugs limit performance
–
–
–
–
–
–
Too small TCP retransmission or reassembly buffers
Packet losses, congestion, etc
Packets arriving out of order or even duplicated
“Scenic” IP routing or excessive round trip times
Improper packet sizes (MTU/ MSS)
Inefficient or inappropriate application designs
TCP tuning is painful debugging
• All problems reduce performance
– But the specific symptoms are hidden
• But any one problem can prevent good performance
– Completely masking all other problems
• Trying to fix the weakest link of an invisible chain
– General tendency is to guess and “fix” random parts
– Repairs are sometimes “random walks”
– Repair one problem at time at best
The Web100 project
• When there is a problem, just ask TCP
– TCP has the ideal vantage point
• In between the application and the network
– TCP already “measures” key network parameters
• Round Trip Time (RTT) and available data capacity
• Can add more
– TCP can identify the bottleneck
• Why did it stop sending data?
– TCP can even adjust itself
• “autotuning” eliminates one of the 6 classes of bugs
See: www.web100.org
Key Web100 components
• Better instrumentation within TCP
– 120 internal performance monitors
– Poised to become Internet standard “MIB”
• TCP Autotuning
– Selects the ideal buffer sizes for TCP
– Eliminate the need for user expertise
• Basic network diagnostic tools
– Requires less expertise than prior tools
• Excellent for network admins
• But still not useful for end users
Web100 Status
• Two year no-cost extension
– Can only push standardization after most of the work
– Ongoing support of research users
• Partial adoption
– Current Linux includes (most of) autotuning
• John heffner is maintaining patches for the rest of Web100
– Microsoft
• Experimental TCP instrumentation
• Working on autotuning (to support FTTH)
– IBM “z/OS Communications Server”
• Experimental TCP instrumentation
The next step
• Web100 tools still require too much expertise
– They are not really end user tools
– Too easy to over look problems
– Current diagnostic procedures are still cumbersome
• New insight from web100 experience
– Nearly all symptoms scale with round trip time
• New NSF funding
– Network Path and Application Diagnosis
– 3 Years, we are at the midpoint
Nearly all symptoms scale with RTT
• For example
– TCP Buffer Space, Network loss and reordering, etc
– On a short path TCP can compensate for the flaw
• Local Client to Server: all applications work
– Including all standard diagnostics
• Remote Client to Server: all applications fail
– Leading to faulty implication of other components
Examples of flaws that scale
• Chatty application (e.g., 50 transactions per request)
– On 1ms LAN, this adds 50ms to user response time
– On 100ms WAN, this adds 5s to user response time
• Fixed TCP socket buffer space (e.g., 32kBytes)
– On a 1ms LAN, limit throughput to 200Mb/s
– On a 100ms WAN, limit throughput to 2Mb/s
• Packet Loss (e.g., 1% loss with 9kB packets)
– On a 1ms LAN, models predict 500 Mb/s
– On a 100ms WAN, models predict 5 Mb/s
Review
• For nearly all network flaws
– The only symptom is reduced performance
– But the reduction is scaled by RTT
• On short paths many flaws are undetectable
–
–
–
–
False pass for even the best conventional diagnostics
Leads to faulty inductive reasoning about flaw locations
This is the essence of the “end-to-end” problem
Current state-of-the-art relies on tomography and
complicated inference techniques
Our new technique
• Specify target performance for S to RC
• Measure the performance from S to LC
• Use Web100 to collect detailed statistics
– Loss, delay, queuing properties, etc
• Use models to extrapolate results to RC
– Assume that the rest of the path is ideal
• Pass/Fail on the basis of extrapolated performance
Example diagnostic output
End-to-end goal: 4 Mb/s over a 200 ms path including this section
Tester at IP address: xxx.xxx.115.170 Target at IP address: xxx.xxx.247.109
Warning: TCP connection is not using SACK
Fail: Received window scale is 0, it should be 2.
Diagnosis: TCP on the test target is not properly configured for this path.
> See TCP tuning instructions at http://www.psc.edu/networking/perf_tune.html
Pass data rate check: maximum data rate was 4.784178 Mb/s
Fail: loss event rate: 0.025248% (3960 pkts between loss events)
Diagnosis: there is too much background (non-congested) packet loss.
The events averaged 1.750000 losses each, for a total loss rate of 0.0441836%
FYI: To get 4 Mb/s with a 1448 byte MSS on a 200 ms path the total
end-to-end loss budget is 0.010274% (9733 pkts between losses).
Warning: could not measure queue length due to previously reported bottlenecks
Diagnosis: there is a bottleneck in the tester itself or test target
(e.g insufficient buffer space or too much CPU load)
> Correct previously identified TCP configuration problems
> Localize all path problems by testing progressively smaller sections of the full path.
FYI: This path may pass with a less strenuous application:
Try rate=4 Mb/s, rtt=106 ms
Or if you can raise the MTU:
Try rate=4 Mb/s, rtt=662 ms, mtu=9000
Some events in this run were not completely diagnosed.
Key features
• Results are specific and less technical
– Provides a list of action items to be corrected
– Provides enough detail for escalation
• Eliminates false pass test results
• Test becomes more sensitive on shorter paths
• Conventional diagnostics become less sensitive
– Depending on models, perhaps too sensitive
• New problem is false fail
• Flaws no longer mask other flaws
– A single test often detects several flaws
– They can be repaired in parallel
Some demos
wget http://www.psc.edu/~mathis/src/diagnostic-client.c
cc diagnostic-client.c -o diagnostic-client
./diagnostic-client kirana.psc.edu 70 90
Local server information
• Current servers a single threaded
– Silent wait if busy
• Kirana.psc.edu
– GigE attached directly to 3ROX
– Outside the PSC firewall
– Optimistic results to .61., .58. and .59. subnets
• Scrubber.psc.edu
– GigE attached in WEC
– Interfaces on .65. and .66. subnets
• Can be run on other Web100 systems
– E.g. Application Gateways
The future
• Collect (local) network pathologies
– Raghu Reddy is coordinating
– Keep archived data to improve the tool
– Harden the diagnostic server
• Widen testers to include attached campuses
– 3ROX (3 Rivers Exchange) customers
– CMU, Pitt, PSU, etc
– Expect to find much more “interesting” pathologies
• Replicate server at NCAR (FRGP) for their campuses
Related work
• Also looking at finding flaws in applications
– An entirely different set of techniques
• But symptom scaling still applies
– Provide LAN tools to emulate ideal long paths
• Support local bench testing
• For example classic ssh
– Long known performance problems
– Recently diagnosed to be due to internal flow control
– Chris Rapier developed a patch
• Already running on many PSC systems
– See: http://www.psc.edu/networking/projects/hpn-ssh/