The Performance Bottleneck Application, Computer, or Network

Download Report

Transcript The Performance Bottleneck Application, Computer, or Network

The Performance Bottleneck
Application, Computer, or Network
Richard Carlson
<[email protected]>
eVLBI Workshop – Performance Tuning Tutorial
September 17, 2006
Outline
• Why there is a problem
• What can be done to find/fix problems
• Tools you can use
Basic Premise
• Application’s performance should meet
your expectations!
• If they don’t you should complain!
• But you have to complain effectively.
Questions
• How many times have you said:
• What’s wrong with the network?
• Why is the network so slow?
• Do you have any way to find out?
• Tools to check local host
• Tools to check local network
• Tools to check end-to-end path
Unfortunate Reality
• Every problem, regardless of cause,
exhibits the same symptom
• The application performance doesn’t meet
the users expectations!
Possible Bottlenecks
• Network infrastructure
• Host computer/appliance
• Application design
Simple Network Picture
Bob’s
Host
Network Infrastructure
Carol’s
Host
Network Infrastructure
Switch 2
Switch 1
R4
Switch 3
R5
R8
R1
R3
R6
R2
R7
Switch 4
R9
Network Infrastructure Bottlenecks
• Links too small
• Using FastEthernet instead of Gigabit Ethernet
• Links congested
• Too many hosts crossing this link
• Scenic routing
• End-to-end path is longer than it needs to be
• Broken equipment
• Bad NIC, broken wire/cable, cross-talk
• Administrative restrictions
• Firewalls, Filters, shapers, restrictors
Host Computer Bottlenecks
• CPU utilization
• What else is the processor doing?
• Memory limitations
• Main memory and network buffers
• I/O bus speed
• Getting data into and out of the NIC
• Disk access speed
Application Behavior Bottlenecks
• Chatty protocol
• Lots of short messages between peers
• High reliability protocol
• Send packet and wait for reply before
continuing
• No run-time tuning options
• Use only default settings
• Blaster protocol
• Ignore congestion control feedback
Problems, Problems, Problems
• Problems can exist at multiple levels
• Network infrastructure
• Host computer
• Application design
• Multiple problems can exist at the same
time
• All problems must be found and fixed
before things get better
Transport Protocols 101
• Transmission Control Protocol (TCP)
• Provides applications with a reliable in-order
delivery service
• The most widely used Internet transport
protocol
• Web, File transfers, email, P2P, Remote login
• User Datagram Protocol (UDP)
• Provides applications with an unreliable delivery
service
• RTP, DVTS, DNS
Outline
• Why there is a problem
• What can be done to find/fix problems
• Tools you can use
Remote Image Processing
• Carol is analyzing astronomical images.
Bob needs to send a data file containing
digital images (50 MB per file) to Carol
every ½ hour. Bob and Carol are 2,000
miles apart. How long should each
transfer take?
• 5 minutes?
• 1 minute?
• 5 seconds?
What should we expect?
• Assumptions:
• 100 Mbps Fast Ethernet is the slowest link
• 50 msec round trip time
• Bob & Carol calculate:
• 50 MB * 8 = 400 Mbits
• 400 Mb / 100 Mb/sec = 4 seconds
Initial Test Results
Initial Test Results
• 18 Minutes!!! This is unacceptable!
• First look for network infrastructure
problem
• Use NDT tester to examine both hosts
Initial NDT testing shows
Duplex Mismatch at one end
NDT Found Duplex Mismatch
• Investigating this it is found that the
switch port is configured for 100 Mbps
Full-Duplex operation.
• Network administrator corrects
configuration and asks for re-test
Duplex Mismatch Corrected
SCP results after
Duplex Mismatch Corrected
Intermediate Results
• Time dropped from 18 minutes to 40
seconds.
• Is this acceptable???
• Remember your calculations said it
should take 4 seconds.
• 400 Mb / 40 sec = 10 Mbps
• Why are we limited to 10 Mbps?
• Are you satisfied with 1/10th of the possible
performance?
Default TCP window size
Calculating the Window Size
• Remember Bob found the round-trip
time was 50 msec
• Calculate window size limit
• 85.3KB * 8 b/B = 698777 b
• 698777 b / .050 s = 13.98 Mbps
• Stated another way
• 698777 b / 100 Mb/s = 6.99 msec
• 43 msec of idle time every RTT
Calculating the Window Size
• Calculate new window size
• (100 Mb/s * .050 s) / 8 b/B = 610.3 KB
• Use 8MB for testing purposes
Resetting Window Buffer
Intermediate Results
• Use application specific options to
manually reset buffer size
• Fixes problem for this application
• Doesn’t fix problem for other applications
• Need better ‘default behavior’ for all
applications
With TCP window size tuned
Steps so far
• Found and fixed Duplex Mismatch
• Network Infrastructure problem
• Found and fixed TCP window size values
• Host configuration problem
• Are we done yet?
SCP results with
auto-tuning enabled
Intermediate Results
• SCP still runs slower than expected
• Hint: SSH uses internal buffers
• Design choice by Application Developers
limit performance
• Patch available from PSC
SCP Results with tuned SCP
Final Results
• Fixed infrastructure problem
• Fixed host configuration problem
• Fixed Application configuration problem
• Achieved target time of 4 seconds to
transfer 50 MB file over 2000 miles
Follow-up questions
• What would have happened if I tried the
patched SCP version before fixing the
TCP buffer problem?
• Would not have been able to see
improvement.
• Discard patch because “it didn’t work”?
Why is it hard to Find/Fix Problems?
• Network infrastructure is complex
• Network infrastructure is shared
• Network infrastructure consists of
multiple components
Shared Infrastructure
• Other applications accessing the
network
• Remote disk access
• Automatic email checking
• Heartbeat facilities
• Other computers are attached to the
closet switch
• Uplink to facility infrastructure
• Other users on and off site
• Uplink from facility to gigapop/backbone
Other Network Components
• DHCP (Dynamic Host Resolution Protocol)
• At least 2 packets exchanged to configure your
host
• DNS (Domain Name Resolution)
• At least 2 packets exchanged to translate FQDN
into IP address
• Multiple addresses require a sequential search
• Network Security Devices
• Intrusion Detection, VPN, Firewall
Why is it hard to Find/Fix Problems?
• Computers have multiple components
• Each Operating System (OS) has a
unique set of tools to tune the network
stack
• Network Interface Cards also have
tuning options
• Application Appliances come with few
knobs and limited options
Computer Components
•
•
•
•
•
Main CPU (clock speed)
Front & Back side bus
Main Memory
I/O Bus (ATA, SCSI, SATA)
Disk (access speed and size)
Computer Issues
• Lots of internal components with multitasking OS
• Lots of tunable TCP/IP parameters that
need to be ‘right’ for each possible
connection
Why is it hard to Find/Fix Problems?
• Applications depend on default system
settings
• Problems scale with distance
• More access to remote resources
• 80/20 % rule since the early 1990’s, 80% of
your traffic leaves your local network
Default System Settings
• For Linux 2.6.13 there are:
• 11 tunable IP parameters
• 45 tunable TCP parameters
• 148 Web100 variables (TCP MIB)
• Currently no OS ships with default settings that work
well over trans-continental distances
• Some applications allow run-time setting of
some options
• 30 settable/viewable IP parameters
• 24 settable/viewable TCP parameters
• There are no standard ways to set run-time option
‘flags’
Application Issues
• Setting tunable parameters to the ‘right’
value
• Getting the protocol ‘right’
Outline
• Why there is a problem
• What can be done to find/fix problems
• Tools you can use
Tools, Tools, Tools
•
•
•
•
•
•
•
•
Ping
Traceroute
Iperf
Tcpdump
Tcptrace
BWCTL
NDT
OWAMP
•
•
•
•
•
•
•
•
AMP
Advisor
Thrulay
Web100
MonaLisa
pathchar
NPAD
Pathdiag
•
•
•
•
•
•
•
•
Surveyor
Ethereal
CoralReef
MRTG
Skitter
Cflowd
Cricket
Net100
Active Measurement Tools
• Tools that inject packets into the
network to measure some value
• Available Bandwidth
• Delay/Jitter
• Loss
• May require bi-directional traffic or
synchronized hosts
• May require running test program on
both hosts
Passive Measurement Tools
• Tools that monitor existing traffic on the
network and extract some information
• Bandwidth used
• Jitter
• Loss rate
• May generate some privacy and/or
security concerns
How do you set realistic
Expectations?
• Assume network bandwidth exists or
find out what the limits are
• Local LAN connection
• Site Access link
• Monitor the link utilization occasionally
• Weathermap
• MRTG graphs
• Look at your host config/utilization
• What is the CPU utilization
Distance Matters
• It’s harder to go fast over a long
distance
• TCP congestion control requires numerous
round trips to prevent flooding network
• TCP buffer limits can stop sender from
injecting new data into the network
• Application can exhibit poor behavior when
used over long distances
Ethernet, FastEthernet, Gigabit
Ethernet, 10 GE
• 10/100/1000 auto-sensing NICs are
common today
• Most facilities have installed 10/100
switched infrastructure
• Access network links are currently the
limiting factor in most networks
• Backbone networks are 10 Gigabit/sec
Wireless LAN’s
• 802.11b - 11 Mbps (expect 5)
• 802.11a – 34 Mbps (expect 15)
• 802.11g – 54 Mbps (expect 25)
• Expect large variations in speed due to
radio signal propagation
Focus on 2 tools
• Existing NDT tool
• Allows users to test network path for a
limited number of common problems
• Emerging PerfSonar tool
• Allows users to retrieve network path data
from major national and international REN
network
Network Diagnostic Tool
(NDT)
•Measure performance to users desktop
•Identify real problems for real users
• Network infrastructure is the problem
• Host tuning issues are the problem
•Make tool simple to use and understand
•Make tool useful for users and network
administrators
•Web-based JAVA applet allows testing
from any browser
Installing your own server
• All Internet2 tools are FREE
• Visit http://e2epi.internet2.edu/ for details
• Workshops are available to help your
administrator get them up and running
( http://e2epi.internet2.edu/net-perf-wkshp/ )
• Encourage your peers to start testing
• Encourage your vendors to include the client
programs
NPToolkit Bootable CD
Knoppix based Live-CD
Contains listed tools
Download from Internet2
Ask for a pre-built CD-ROM
http://e2epi.internet2.edu/network-performance-toolkit/network-performance-toolkit.iso
PerfSonar – Next Steps in
Performance Monitoring
• New Initiative involving multiple partners
• ESnet (DOE labs)
• GEANT (European Research and
Education network)
• Internet2 (Abilene and connectors)
• Sample tool (Joe Metzger ESnet)
https://performance.es.net/cgi-bin/perfsonar-trace.cgi
Traceroute Visualizer
Abilene Weather Map
http://loadrunner.uits.iu.edu/weathermaps/abilene/
Windows XP Performance
Google it!
• Enter “tuning tcp” into the google search
engine.
• Top 2 hits are:
http://www.psc.edu/networking/perf_tune.html
http://www-didc.lbl.gov/TCP-tuning/TCP-tuning.html
PSC Tuning Page
LBNL Tuning Page
Conclusions
• Applications can fully utilize the network
• All problems have a single symptom
• All problems must be found and fixed before
things get better
• Some people stop investigating before finding all
problems
• Tools exist, and more are being developed,
to make it easier to find problems