The Performance Bottleneck Application, Computer, or Network

Download Report

Transcript The Performance Bottleneck Application, Computer, or Network

The Performance Bottleneck
Application, Computer, or Network
Richard Carlson
Internet2
Part 1
Outline
•
•
•
•
Why there is a problem
What can be done to find/fix problems
Tools you can use
Ramblings on what’s next
Basic Premise
• Application’s performance should meet
your expectations!
• If they don’t you should complain!
Questions
• How many times have you said:
• What’s wrong with the network?
• Why is the network so slow?
• Do you have any way to find out?
• Tools to check local host
• Tools to check local network
• Tools to check end-to-end path
Underlying Assumption
• When problems exist, it’s the networks
fault!
NDT Demo First
Simple Network Picture
Bob’s
Host
Network Infrastructure
Carol’s
Host
Network Infrastructure
Switch 2
Switch 1
R4
Switch 3
R5
R8
R1
R3
R6
R2
R7
Switch 4
R9
Possible Bottlenecks
• Network infrastructure
• Host computer
• Application design
Network Infrastructure Bottlenecks
• Links too small
• Using standard Ethernet instead of FastEthernet
• Links congested
• Too many hosts crossing this link
• Scenic routing
• End-to-end path is longer than it needs to be
• Broken equipment
• Bad NIC, broken wire/cable, cross-talk
• Administrative restrictions
• Firewalls, Filters, shapers, restrictors
Host Computer Bottlenecks
• CPU utilization
• What else is the processor doing?
• Memory limitations
• Main memory and network buffers
• I/O bus speed
• Getting data into and out of the NIC
• Disk access speed
Application Behavior Bottlenecks
• Chatty protocol
• Lots of short messages between peers
• High reliability protocol
• Send packet and wait for reply before
continuing
• No run-time tuning options
• Use only default settings
• Blaster protocol
• Ignore congestion control feedback
TCP 101
• Transmission Control Protocol (TCP)
• Provides applications with a reliable in-order
delivery service
• The most widely used Internet transport
protocol
• Web, File transfers, email, P2P, Remote login
• User Datagram Protocol (UDP)
• Provides applications with an unreliable delivery
service
• RTP, DNS
Summary – Part 1
• Problems can exist at multiple levels
• Network infrastructure
• Host computer
• Application design
• Multiple problems can exist at the same
time
• All problems must be found and fixed
before things get better
Summary – Part 2
• Every problem exhibits the same
symptom
• The application performance doesn’t meet
the users expectations!
Outline
•
•
•
•
Why there is a problem
What can be done to find/fix problems
Tools you can use
Ramblings on what’s next
Real Life Examples
• I know what the problem is
• Bulk transfer with multiple problems
Example 1 - SC’04 experience
•Booth having trouble getting application
to run from Amsterdam to Pittsburgh
•Tests between remote SGI and local PC
showed throughput limited to < 20 Mbps
•Assumption is: PC buffers too small
•Question: How do we set WinXP
send/receive window size
SC’04 Determine WinXP info
http://www.dslreports.com/drtcp
SC’04 Confirm PC settings
•DrTCP reported 16 MB buffers, but test
program still slow, Q: How to confirm?
•Run test to SC NDT server (PC has Fast Ethernet
Connection)
• Client-to-Server: 90 Mbps
• Server-to-Client: 95 Mbps
• PC Send/Recv window size: 16 Mbytes (wscale 8)
• NDT Send/Recv window Size: 8 Mbytes (wscale 7)
• Reported TCP RTT: 46.2 msec
• approximately 600 Kbytes of data in TCP buffer
• Min window size / RTT: 1.3 Gbps
SC’04 Local PC Configured OK
•No problem found
•Able to run at line rate
•Confirmed that PC’s TCP window values
were set correctly
SC’04 Remote SGI
•Run test from remote SGI to SC show floor (SGI
is Gigabit Ethernet connected).
• Client-to-Server: 17 Mbps
• Server-to-Client: 16 Mbps
• SGI Send/Recv window size: 256 Kbytes (wscale 3)
• NDT Send/Recv window Size: 8 Mbytes (wscale 7)
• Reported RTT: 106.7 msec
• Min window size / RTT: 19 Mbps
SC’04 Remote SGI Results
•Needed to download and compile
command line client
•SGI TCP window is too small to fill
transatlantic pipe (19 Mbps max)
•User reluctant to make changes to SGI
network interface from SC show floor
•NDT client tool allows application to
change buffer (setsockopt() function call)
SC’04 Remote SGI (tuned)
•Re-run test from remote SGI to SC show floor.
• Client-to-Server: 107 Mbps
• Server-to-Client: 109 Mbps
• SGI Send/Recv window size: 2 Mbytes (wscale 5)
• NDT Send/Recv window Size: 8 Mbytes (wscale 7)
• Reported RTT: 104 msec
• Min window size / RTT: 153.8 Mbps
SC’04 Debugging Results
•Team spent over 1 hour looking at Win
XP config, trying to verify window size
•Single NDT test verified this in under 30
seconds
•10 minutes to download and install NDT
client on SGI
•15 minutes to discuss options and run
client test with set buffer option
SC’04 Debugging Results
•8 Minutes to find SGI limits and
determine maximum allowable window
setting (2 MB)
•Total time 34 minutes to verify problem
was with remote SGIs’ TCP send/receive
window size
•Network path verified but Application still
performed poorly until it was also tuned
Example 2 – SCP file transfer
• Bob and Carol are collaborating on a
project. Bob needs to send a copy of
the data (50 MB) to Carol every ½ hour.
Bob and Carol are 2,000 miles apart.
How long should each transfer take?
• 5 minutes?
• 1 minute?
• 5 seconds?
What should we expect?
• Assumptions:
• 100 Mbps Fast Ethernet is the slowest link
• 50 msec round trip time
• Bob & Carol calculate:
• 50 MB * 8 = 400 Mbits
• 400 Mb / 100 Mb/sec = 4 seconds
Initial SCP Test Results
Initial Test Results
• This is unacceptable!
• First look for network infrastructure
problem
• Use NDT tester to examine both hosts
Initial NDT testing shows
Duplex Mismatch at one end
NDT Found Duplex Mismatch
• Investigating this it is found that the
switch port is configured for 100 Mbps
Full-Duplex operation.
• Network administrator corrects
configuration and asks for re-test
Duplex Mismatch Corrected
SCP results after
Duplex Mismatch Corrected
Intermediate Results
• Time dropped from 18 minutes to 40
seconds.
• But our calculations said it should take 4
seconds!
• 400 Mb / 40 sec = 10 Mbps
• Why are we limited to 10 Mbps?
• Are you satisfied with 1/10th of the possible
performance?
Default TCP window settings
Calculating the Window Size
• Remember Bob found the round-trip
time was 50 msec
• Calculate window size limit
• 85.3KB * 8 b/B = 698777 b
• 698777 b / .050 s = 13.98 Mbps
• Calculate new window size
• (100 Mb/s * .050 s) / 8 b/B = 610.3 KB
• Use 1MB as a minimum
Resetting Window Value
With TCP windows tuned
Steps so far
• Found and fixed Duplex Mismatch
• Network Infrastructure problem
• Found and fixed TCP window values
• Host configuration problem
• Are we done yet?
SCP results with tuned windows
Intermediate Results
• SCP still runs slower than expected
• Hint: SCP uses internal buffers
• Patch available from PSC
SCP Results with tuned SCP
Final Results
• Fixed infrastructure problem
• Fixed host configuration problem
• Fixed Application configuration problem
• Achieved target time of 4 seconds to
transfer 50 MB file over 2000 miles
Why is it hard to Find/Fix Problems?
• Network infrastructure is complex
• Network infrastructure is shared
• Network infrastructure consists of
multiple components
Shared Infrastructure
• Other applications accessing the
network
• Remote disk access
• Automatic email checking
• Heartbeat facilities
• Other computers are attached to the
closet switch
• Uplink to campus infrastructure
• Other users on and off site
• Uplink from campus to gigapop/backbone
Other Network Components
• DHCP (Dynamic Host Resolution Protocol)
• At least 2 packets exchanged to configure your
host
• DNS (Domain Name Resolution)
• At least 2 packets exchanged to translate FQDN
into IP address
• Network Security Devices
• Intrusion Detection, VPN, Firewall
Network Infrastructure
• Large complex system with potentially
many problem areas
Why is it hard to Find/Fix Problems?
• Computers have multiple components
• Each Operating System (OS) has a
unique set of tools to tune the network
stack
• Application Appliances come with few
knobs and limited options
Computer Components
•
•
•
•
•
Main CPU (clock speed)
Front & Back side bus
Main Memory
I/O Bus (ATA, SCSI, SATA)
Disk (access speed and size)
Computer Issues
• Lots of internal components with multitasking OS
• Lots of tunable TCP/IP parameters that
need to be ‘right’ for each possible
connection
Why is it hard to Find/Fix Problems?
• Applications depend on default system
settings
• Problems scale with distance
• More access to remote resources
Default System Settings
• For Linux 2.6.13 there are:
• 11 tunable IP parameters
• 45 tunable TCP parameters
• 148 Web100 variables (TCP MIB)
• Currently no OS ships with default settings that work
well over trans-continental distances
• Some applications allow run-time setting of
some options
• 30 settable/viewable IP parameters
• 24 settable/viewable TCP parameters
• There are no standard ways to set run-time option
‘flags’
Application Issues
• Setting tunable parameters to the ‘right’
value
• Getting the protocol ‘right’
How do you set realistic
Expectations?
• Assume network bandwidth exists or
find out what the limits are
• Local LAN connection
• Site Access link
• Monitor the link utilization occasionally
• Weathermap
• MRTG graphs
• Look at your host config/utilization
• What is the CPU utilization
Ethernet, FastEthernet, Gigabit
Ethernet
• 10/100/1000 auto-sensing NICs are
common today
• Most campuses have installed 10/100
switched infrastructure
• Access network links are currently the
limiting factor in most networks
• Backbone networks are 10 Gigabit/sec
Site Access and Backbone
• Campus access via Regional ‘GigaPoP’
• Confirm with campus admin
• Abilene Backbone
• 10 Gbps POS links coast-to-coast
• Other Federal backbone networks
• Other Commercial network
• Other institutions, sites, and networks
Tools, Tools, Tools
•
•
•
•
•
•
•
•
Ping
Traceroute
Iperf
Tcpdump
Tcptrace
BWCTL
NDT
OWAMP
•
•
•
•
•
•
•
•
AMP
Advisor
Thrulay
Web100
MonaLisa
pathchar
NPAD
Pathdiag
•
•
•
•
•
•
•
•
Surveyor
Ethereal
CoralReef
MRTG
Skitter
Cflowd
Cricket
Net100
Active Measurement Tools
• Tools that inject packets into the
network to measure some value
• Available Bandwidth
• Delay/Jitter
• Loss
• Requires bi-directional traffic or
synchronized hosts
Passive Measurement Tools
• Tools that monitor existing traffic on the
network and extract some information
• Bandwidth used
• Jitter
• Loss rate
• May generate some privacy and/or
security concerns
Abilene Weather Map
MRTG Graphs
Windows XP Performance
Outline
•
•
•
•
Why there is a problem
What can be done to find/fix problems
Tools you can use
Ramblings on what’s next
Focus on 3 tools
• Existing NDT tool
• Allows users to test network path for a
limited number of common problems
• Existing NPAD tool
• Allows users to test local network
infrastructure while simulating a long path
• Emerging PerfSonar tool
• Allows users to retrieve network path data
from major national and international REN
network
Network Diagnostic Tool
(NDT)
•Measure performance to users desktop
•Identify real problems for real users
• Network infrastructure is the problem
• Host tuning issues are the problem
•Make tool simple to use and understand
•Make tool useful for users and network
administrators
NDT user interface
• Web-based JAVA applet allows testing
from any browser
• Command-line client allows testing from
remote login shell
NDT test suite
• Looks for specific problems that affect a
large number of users
•
•
•
•
•
•
Duplex Mismatch
Faulty Cables
Bottleneck link capacity
Achievable throughput
Ethernet duplex setting
Congestion on this network path
Duplex Mismatch Detection
•Developing analytical model to describe
how network operates (no prior art?)
•Expanding model to describe UDP and
TCP flows
•Test models in LAN, MAN, and WAN
environments
NIH/NLM grant funding
Four Cases of Duplex Setting
FD-FD
HD-FD
FD-HD
HD-HD
Bottleneck Link Detection
•What is the slowest link in the end-2-end
path?
• Monitors packet arrival times using libpacp
routine
• Use TCP dynamics to create packet pairs
• Quantize results into link type bins (no
fractional or bonded links)
Cisco URP grant work
Normal congestion detection
•Shared network infrastructures will cause
periodic congestion episodes
• Detect/report when TCP throughput is
limited by cross traffic
• Detect/report when TCP throughput is
limited by own traffic
Faulty Hardware/Link Detection
•Detect non-congestive loss due to
• Faulty NIC/switch interface
• Bad Cat-5 cable
• Dirty optical connector
•Preliminary works shows that it is
possible to distinguish between
congestive and non-congestive loss
Full/Half Link Duplex setting
•Detect half-duplex link in E2E path
• Identify when throughput is limited by halfduplex operations
•Preliminary work shows detection
possible when link transitions between
blocking states
Finding Results of Interest
• Duplex Mismatch
• This is a serious error and nothing will work
right. Reported on main page and on
Statistics page
• Packet Arrival Order
• Inferred value based on TCP operation.
Reported on Statistics page, (with loss
statistics) and order: value on More
Details page
Finding Results of Interest
• Packet Loss Rates
• Calculated value based on TCP operation.
Reported on Statistics page, (with out-oforder statistics) and loss: value on More
Details page
• Path Bottleneck Capacity
• Measured value based on TCP operation.
Reported on main page
Additional Functions and Features
•Provide basic tuning information
•Basic Features
• Basic configuration file
• FIFO scheduling of tests
• Simple server discovery protocol
• Federation mode support
• Command line client support
•Created sourceforge.net project page
NPAD/pathdiag
• A new tool from researchers at
Pittsburgh Supercomputer Center
• Finds problems that affect long network
paths
• Uses Web100-enhanced Linux based
server
• Web based Java client
Long Path Problem
• E2E application performance is
dependant on distance between hosts
• Full size frame time at 100 Mbps
•
•
•
•
Frame = 1500 Bytes
Time = 0.12 msec
In flight for 1 msec RTT = 8 packets
In flight for 70 msec RTT = 583 packets
Long Path Problem
1 msec H1 – H2
H2
Switch 2
Switch 1
R4
R5
R8
R1
X
H1
Switch 3
R3
R6
R9
R2
R7
Switch 4
70 msec H1 – H3
H3
TCP Congestion Avoidance
• Cut number of packets by ½
• Increase by 1 per RTT
• LAN (RTT=1msec)
• In flight changes to 4 packets
• Time to increase back to 8 is 4msec
• WAN (RTT = 70 msec)
• In flight changes to 292 packets
• Time to increase back to 583 is 20.4 seconds
PerfSonar – Next Steps in
Performance Monitoring
• New Initiative involving multiple partners
• ESnet (DOE labs)
• GEANT (European Research and
Education network)
• Internet2 (Abilene and connectors)
PerfSonar – Router stats on a
path
• Demo ESnet tool
https://performance.es.net/cgi-bin/perfsonar-trace.cgi
Paste output from Traceroute into the window and view
the MRTG graphs for the routers in the path
Author: Joe Metzger ESnet
Traceroute Visualizer
The Wizard Gap*
* Courtesy of Matt Mathis (PSC)
Google it!
• Enter “tuning tcp” into the google search
engine.
• Top 2 hits are:
http://www.psc.edu/networking/perf_tune.html
http://www-didc.lbl.gov/TCP-tuning/TCP-tuning.html
PSC Tuning Page
LBNL Tuning Page
Internet2 Land Speed Record
• Challenge to community to demonstrate
how to run fast – long distance flows
• 2000 record – 751 Mbps over 5,262 km
• 2005 record - 7.2 Gbps over 30,000 km
Conclusions
• Applications can fully utilize the network
• All problems have a single symptom
• All problems must be found and fixed before
things get better
• Some people stop investigating before finding all
problems
• Tools exist, and more are being developed,
to make it easier to find problems
• Extra Material
Outline
•
•
•
•
Why there is a problem
What can be done to find/fix problems
Tools you can use
Ramblings on what’s next
Introduction
• Where have we been and where are we
headed?
• Technology and hardware
• Transport Protocols
Basic Assumption
• The Internet was designed to improve
communications between people
What does the future hold?
• Moore’s Law shows no signs of slowing
down
• The original law says the number of transistors
on a chip doubles every 18 months
• Now it simply means that everything gets faster
PC Hardware
•
•
•
•
CPU processing power (flops) is increasing
Front/back side bus clock rate is increasing
Memory size is increasing
HD size is increasing too
• For the past 10 years, every HD I’ve purchased
cost $130
Scientific Workstation
• PC or Sparc class computer
•
•
•
•
Fast CPU
1 GB RAM
1 TB disk
10 Gbps NIC
• Today’s cost ~ $5,000
Network Capability
• LAN networks (includes campus)
• MAN/RON network
• WAN network
• Remember the 80/20 rule
Network NIC costs
•
•
•
•
10 Mbps NICs were $50 - $150 circa 1985
100 Mbps NICS were $50 - $150 circa 1995
1,000 Mbps NICS are $50 - $150 circa 2005
10 Gbps NICs are $1,500 - $2,500 today
• Note today 10/100/1000 cards are common
and 10/100 cards are < $10
Ethernet Switches
• Unmanaged 5 port 10/100 switch
~ $25.00
• Unmanaged 5 port 10/100/1000 switch
~ $50
• Managed switches have more ports and
are more expensive ($150 - $400 per
port)
Network Infrastructure
•
•
•
•
Campus
Regional
National
International
Campus Infrastructure
• Consists of switches, routers, and
cables
• Limited funds make it hard to upgrade
Regional Infrastructure
• Many states have optical networks
• Illinois has I-Wire
• Metro area optical gear is ‘reasonably’
priced
• Move by some to own fiber
• Flexible way to cut operating costs, but
requires larger up-front investment
National Infrastructure
• Commercial vendors have pulled fiber
to major metro areas
• NLR – n x 10 Gbps
• Abilene - 1 x 10 Gbps (Qwest core)
• FedNets - (DoE, DoD, and NASA all run
national networks)
• CA*net – n x 10 Gbps
• Almost 500 Gbps into SC|05 conference
in Seattle
International Infrastructure
• Multiple trans-atlantic 10 Gbps links
• Multiple trans-pacific 10 Gbps links
• Gloriad
Interesting sidebar
• China’s demand for copper, aluminum,
and steel have caused an increase in
theft
•
•
•
•
Man hole covers
Street lamps
Parking meters
Phone cable
• One possible solution is to replace
copper wires with FTTH solutions
Transport Protocol
• TCP Reno has know problems with loss at
high speeds
• Linear growth following packet loss
• No memory of past achievements
• TCP research groups are actively working
on solutions:
• HighSpeed-TCP, Scaleable-TCP, HamiltonTCP, BIC, CUBIC, FAST, UDT, Westwood+
• Linux (2.6.13) has run-time support for these
stacks
What drives prices?
• Electronic component prices are driven
by units produces
• Try buying a brand NEW i386 CPU
• Try upgrading your PC’s CPU
• NIC’s are no different