Internet Monitoring, March 2005
Download
Report
Transcript Internet Monitoring, March 2005
Internet Monitoring
Les Cottrell – SLAC
Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan,
March 15, 2005
Partially funded by DOE/MICS Field Work Proposal on Internet End-to-end
Performance Monitoring (IEPM), also supported by IUPAP
1
Overview
•
•
•
•
•
•
Why is measurement difficult yet important?
LAN vs WAN
SNMP
Effects of measurement interval
Passive
Active
– Tools including some results on Digital Divide
• Trouble shooting
– Tools, how to find things & who to tell
• New challenges
2
Why is measurement difficult?
• Internet's evolution as a composition of independently
developed and deployed protocols, technologies, and core
applications
• Diversity, highly unpredictable, hard to find “invariants”
• Rapid evolution & change, no equilibrium so far
– Findings may be out of date
• Measurement not high on vendors list of priorities
– Resources/skill focus on more interesting an profitable issues
– Tools lacking or inadequate
– Implementations poor & not fully tested with new releases
• ISPs worried about providing access to core, making results
public, & privacy issues
• The phone connection oriented model (Poisson distributions
of session length etc.) does not work for Internet traffic
3
(heavy tails, self similar behavior, multi-fractals etc.)
Add to that …
• Distributed systems are very hard
– A distributed system is one in which I can't get my work done because a
computer I've never heard of has failed. Butler Lampson
• Network is deliberately transparent
• The bottlenecks can be in any of the following components:
–
–
–
–
the applications
the OS
the disks, NICs, bus, memory, etc. on sender or receiver
the network switches and routers, and so on
• Problems may not be logical
– Most problems are operator errors, configurations, bugs
• When building distributed systems, we often observe unexpectedly
low performance
• the reasons for which are usually not obvious
• Just when you think you’ve cracked it, in steps security
4
Why is measurement important?
• End users & network managers need to be able to identify &
track problems
• Choosing an ISP, setting a realistic service level agreement,
and verifying it is being met
• Choosing routes when more than one is available
• Setting expectations:
– Deciding which links need upgrading
– Deciding where to place collaboration components such as a
regional computing center, software development
– How well will an application work (e.g. VoIP)
• Application steering (e.g. forecasting)
– Grid middleware, e.g. replication manager
5
LAN vs WAN
• Measuring the LAN
– Network admin has control so:
• Can read MIBs from devices
• Can within limits passively sniff traffic
• Know the routes between devices
– Manually for small networks
– Automated for large networks
• Measuring the WAN
– No admin control, unless you are an ISP
• Can’t read information out of routers
• May not be able to sniff/trace traffic due to privacy/security concerns
• Don’t know route details between points, may change, not under your
control, may be able to deduce some of it
– So typically have to make do with what can be measured from end
to end with very limited information from intermediate equipment
hops.
6
SNMP (Simple Network Management Protocol)
•
•
•
•
Example of an Application, usually built on UDP
Defacto standard for network management
Created by IETF to address short term needs of TCP/IP
Consists of:
– Management Information Bases (MIBs)
• Store information about managed object (host, router, switch etc.) – system
&status info, performance & configuration data
– Remote Network Monitoring (RMON) is a management tool for
passively watching line traffic
– SNMP communication protocol to read out data and set parameters
• Polling protocol, manager asks questions & agent responds
7
SNMP ModelAgent
MIB
Agent
MIB
Agent
MIB
Agent
MIB
TCP/IP net
Agent
MIB
Agent
MIB
Network Management Station(NMS)
• NMS contains manager software to send & receive SNMP
messages to Agents
• Agent is a software component residing on a managed node,
responds to SNMP queries, performs updates & reports
problems
• MIBs resides on nodes and at NMS and is a logical
description of all network management data.
8
SNMP version 1 limitations
• Authentication is inadequate:
– Password (community string) placed in clear in SNMP messages
• MIB variables must be polled separately, i.e. entire MIB
cannot be fetched with single command
• SNMPv2 and v3 attempt to address these and other
limitations
• Despite limitations, SNMP has been a huge success
– Provides device and link utilization (byte, packets) and errors
– Lot of facilities/tools built around SNMP to provide reports for
sites
– Security concerns limit access typically to very limited set of
owner/admins
• E.g. ISPs won’t let you poll their devices
9
SNMP Examples
• Using MRTG to display Router bits/s MIB variable
CERN
transAtlantic
traffic
10
Averaging/Sampling intervals
• Typical measurements of utilization are made for 5
minute intervals or longer in order not to create
much impact.
• Interactive human interactions require second or
sub-second response
• So it is interesting to see the difference between
measurement made with different time frames.
11
Utilization with
different
averaging times
• Same data, measured Mbits/s
every 5 secs
• Average over different time
intervals
• Does not get a lot smoother
• May indicate multi-fractal
behavior
5 secs
5 mins
1 hour
12
Averages vs maxima
• Maximum of all 5
sec samples can be
factor of 2 or more
greater than the
average over 5
minute intervals
13
Lot of heavy FTP activity
• The difference
depends on
traffic type
• Only 20%
difference in
max & average
14
Passive vs. Active Monitoring
• Active injects traffic on demand
• Passive watches things as they happen
– Network device records information
• Packets, bytes, errors … kept in MIBs retrieved by SNMP
– Devices (e.g. probe) capture/watch packets as they pass
• Router, switch, sniffer, host in promiscuous (tcpdump)
• Complementary to one another:
– Passive:
• does not inject extra traffic, measures real traffic
• Polling to gather data generates traffic, also gathers large amounts of data
– Active:
• provides explicit control on the generation of packets for measurement
scenarios
• testing what you want, when you need it.
• Injects extra artificial traffic
• Can do both, e.g. start active measurement and look at
passively
15
Passive tools
• SNMP
• Hardware probes e.g. Sniffer, NetScout, can be stand-alone
or remotely access from a central management station
• Software probes: snoop, tcpdump, require promiscous
access to NIC card, i.e. root/sudo access
• Flow measurement: netramet, OCxMon/CoralReef, Netflow
16
Example: Passive site border monitoring
• Use Cisco Netflow in Catalyst 6509 with MSFC, on
SLAC border
• Gather about 200MBytes/day of flow data
• The raw data records include source and destination
addresses and ports, the protocol, packet, octet and
flow counts, and start and end times of the flows
– Much less detailed than saving headers of all packets, but
good compromise
– Top talkers history and daily (from & to), tlds, vlans,
protocol and application utilization
• Use for network & security
17
SLAC Traffic profile
Mbps in
SLAC offsite links:
OC3 to ESnet, 1Gbps to Stanford U & thence OC12 to I2
OC48 to NTON
HTTP
Profile
bulk-data xfer dominates
iperf
2 Days
Last 6 months
FTP
SSH
bbftp
18
Top talkers by protocol
Volume dominated by single
Application - bbcp
1
100 10000
MBytes/day (log scale)
19
Flow sizes
SNMP
Real
A/V
AFS
file
server
Heavy tailed, in ~ out, UDP flows shorter than TCP, packet~bytes
75% TCP-in < 5kBytes, 75% TCP-out < 1.5kBytes (<10pkts)
UDP 80% < 600Bytes (75% < 3 pkts), ~10 * more TCP than UDP
Top UDP = AFS (>55%), Real(~25%), SNMP(~1.4%)
20
Flow lengths
• 60% of TCP flows less than 1 second
• Would expect TCP streams longer lived
– But 60% of UDP flows over 10 seconds, maybe due to
heavy use of AFS
21
Some Active Measurement Tools
• Ping connectivity, RTT & loss
– flavors of ping, fping, Linux vs Solaris ping
– but blocking & rate limiting
• Alternative synack, but can look like DoS attack
• Sting: measures one way loss
• Traceroute
– How it works, what it provides
– Reverse traceroute servers
– Traceroute archives
• Combining ping & traceroute,
– traceping, pingroute
• Pathchar, pchar, pipechar, bprobe, abing etc.
• Iperf, netperf, ttcp, FTP …
22
Ping
• ICMP client/server application built on IP
– Client send ICMP echo request, server sends reply
– Server usually in kernel, so reliable & fast
• User can specify number of data bytes. Client puts
timestamp in data bytes. Compares timestamp with
time when echo comes back to get RTT
• Many flavors (e.g. fping) and options
– packet length, number of tries, timeout, separation …
• Ping localhost (127.0.0.1) first, then gateway IP
address etc.
23
Ping example
Repeat count
Packet size
Remote host
RTT
syrup:/home$ ping -c 6 -s 64 thumper.bellcore.com
PING thumper.bellcore.com (128.96.41.1): 64 data bytes
72 bytes from 128.96.41.1: icmp_seq=0 ttl=240 time=641.8 ms
72 bytes from 128.96.41.1: icmp_seq=2 ttl=240 time=1072.7 ms Missing seq #
72 bytes from 128.96.41.1: icmp_seq=3 ttl=240 time=1447.4 ms
72 bytes from 128.96.41.1: icmp_seq=4 ttl=240 time=758.5 ms
Summary
72 bytes from 128.96.41.1: icmp_seq=5 ttl=240 time=482.1 ms
--- thumper.bellcore.com ping statistics --- 6 packets transmitted, 5 packets
received, 16% packet loss round-trip min/avg/max = 482.1/880.5/1447.4 ms
24
Traceroute
• UDP/ICMP tool to show route packets take from local to
Max hops
remote host
Remote host
Probes/hop
17cottrell@flora06:~>traceroute -q 1 -m 20 lhr.comsats.net.pk
traceroute to lhr.comsats.net.pk (210.56.16.10), 20 hops max, 40 byte packets
1 RTR-CORE1.SLAC.Stanford.EDU (134.79.19.2) 0.642 ms
2 RTR-MSFC-DMZ.SLAC.Stanford.EDU (134.79.135.21) 0.616 ms
3 ESNET-A-GATEWAY.SLAC.Stanford.EDU (192.68.191.66) 0.716 ms
4 snv-slac.es.net (134.55.208.30) 1.377 ms
5 nyc-snv.es.net (134.55.205.22) 75.536 ms
6 nynap-nyc.es.net (134.55.208.146) 80.629 ms
7 gin-nyy-bbl.teleglobe.net (192.157.69.33) 154.742 ms
8 if-1-0-1.bb5.NewYork.Teleglobe.net (207.45.223.5) 137.403 ms
9 if-12-0-0.bb6.NewYork.Teleglobe.net (207.45.221.72) 135.850 ms
No response:
10 207.45.205.18 (207.45.205.18) 128.648 ms
Lost packet or router
11 210.56.31.94 (210.56.31.94) 762.150 ms
ignores
12 islamabad-gw2.comsats.net.pk (210.56.8.4) 751.851 ms
13 *
14 lhr.comsats.net.pk (210.56.16.10) 827.301 ms
25
Reverse traceroute servers
• Reverse traceroute server runs as CGI script in web
server
• Allow measurement of route from other end. Important
for asymmetric routes. See e.g.
– www.slac.stanford.edu/comp/net/wan-mon/traceroute-srv.html
• CAIDA map of reverse traceroute servers
– www.caida.org/analysis/routing/reversetrace/
26
Pingroute
• Run traceroute, then ping each router n times
– helps identify where in route the problems start to occur
• Routers may not respond to pings, or may treat
pings directed at them, differently to other packets
27
Path characterization
• Pathchar
– sends multiple packets of varying sizes to each router
along route
– measures minimum response time
– plot min RTT vs packet size to get bandwidth
– calculate differences to get individual hop characteristics
– measures for each hop: BW, queuing, delay/hop
– can take a long time
• Pipechar/abing
– Also sends back-to-back packets and measures separation
Bottleneck
on return
– Much faster
– Finds bottleneck
Min spacing
At bottleneck
Spacing preserved
28
On higher speed links
Network throughput
• Iperf
– Client generates & sends UDP or TCP packets
– Server receives receives packets
– Can select port, maximum window size, port , duration,
Mbytes to send etc.
– Client/server communicate packets seen etc.
– Reports on throughput
• Requires sever to be installed at remote site, i.e. friendly
administrators or logon account and password
29
Iperf example
TCP port 5006
Max window size
3 parallel streams
Remote host
25cottrell@flora06:~>iperf -p 5008 -w 512K -P 3 -c sunstats.cern.ch
-----------------------------------------------------------Client connecting to sunstats.cern.ch, TCP port 5008
TCP window size: 512 KByte
-----------------------------------------------------------[ 6] local 134.79.16.101 port 57582 connected with 192.65.185.20 port 5008
[ 5] local 134.79.16.101 port 57581 connected with 192.65.185.20 port 5008
[ 4] local 134.79.16.101 port 57580 connected with 192.65.185.20 port 5008
[ ID] Interval
Transfer Bandwidth
[ 4] 0.0-10.3 sec 19.6 MBytes 15.3 Mbits/sec
[ 5] 0.0-10.3 sec 19.6 MBytes 15.3 Mbits/sec
[ 6] 0.0-10.3 sec 19.7 MBytes 15.3 Mbits/sec
• Total throughput =3*15.3Mbits/s = 45.9Mbits/s
30
Active Measurement Projects
• PingER – running at NIIT
• AMP – coming soon to NIIT
• One way delay:
– Surveyor (now defunct), RIPE (mainly Europe), owamp
•
•
•
•
•
•
IEPM-BW – running at NIIT
NIMI (mainly a design infrastructure)
NWS (mainly for forecasting)
Skitter
All projects measure routes
For a detailed comparison see:
– www.slac.stanford.edu/comp/net/wan-mon/iepm-cf.html
– www.slac.stanford.edu/grp/scs/net/proposals/infra-mon.html
31
AMP
• http://amp.nlanr.net/AMP/
–
–
–
–
–
AMP uses dedicated PCs as monitors, ~ 150 (June, 2005)
Today mainly does pings
Oriented to Internet 2, ~ 10 countries
Does mainly full mesh pinging
Being re-written to provide support for more probes
32
PingER
• Measure the network performance for developing regions
– From developed to developing & vice versa
– Between developing regions & within developing regions
• Use simple tool (PingER/ping)
– Ping installed on all modern hosts, low traffic interference,
– 21 pings each 30 mins to remote hosts (< 100bits/s average)
• Provides very useful measures
• Originated in High Energy Physics, now focused on DD
• Persistent (data goes back to 1995), interesting history
PingER coverage
Feb 2005
Monitoring site
Remote site
33
Examples:World View
C. Asia, Russia, S.E. Europe,
L. America, M. East, China: 45 yrs behind
S.E. Europe, Russia: catching up
Latin Am., Mid East, China: keeping up India, Africa: 7 yrs behind
India, Africa: falling behind
Important
for policy
makers
Many institutes in developing world have less performance than a
household in N. America or Europe
34
Losses
• US residential
Broadband users have
better access than sites
in many regions
35
Loss to Africa (example
of variability)
From PingER project
36
Compare with TAI
• UN Technology Achievement Index (TAI)
– Measures creation & diffusion of technology and building human skills
Note how bad Africa is
37
E2E Troubleshooting
• Solving the E2E performance problem is the critical
problem for the user
– Improve e2e throughput for data intensive apps in highspeed WANs
– Provide ability to do performance analysis & fault
detection ins Grid computing environment
– Provide accurate, detailed, & adaptive monitoring of all
distributed components including the network
38
Anatomy of a Problem
Applications
Developer
Hey, this is not
working right!
Others are
getting in ok
Not our problem
Talk to the other guys
LAN
Administrator
Applications
Developer
LAN
Administrator
Everything is
AOK
System
Administrator
Campus
Networking
Campus
Networking
The computer
Is working OK
No other
complaints
Gigapop
How do you solve
a problem along a path?
Looks fine
Gigapop
Backbone
From an Internet2 E2E presentation
by Russ Hobby
System
Administrator
All the lights
are green
We don’t see
anything wrong
The network is lightly
loaded
39
Needs
• Measurement tools to quickly, accurately and
automatically identify problems
– Automatically take action to investigate and gather
information, on-demand measurements
• Standard ways to discover request and report results
of measurements, for applications
– GGF/NMWG schemas
– Share information with people and apps across a
federation of measurement infrastructures
40
Trouble shooting
• Ping to localhost, ping to gateway & to remote host
– Use IP address to avoid nameserver problems
– Look for connectivity, loss & RTT
– May need to run for a long time to see some pathologies
(e.g. bursty loss dues to DSL loss of sync)
– Use synack or sting if ICMP blocked
•
•
•
•
Traceroute to remote host
Reverse traceroute from remote host to you
Ping routers along route
Look at history plots (PingER, AMP), when did
problem start, how big an effect is it?
41
Trouble shooting
• Try user application
• Iperf to test throughput
42
Where is a host?
• Name server lookup to find hostname given IP address
47cottrell@netflow:~>nslookup
Server: localhost
Address: 127.0.0.1
Name:
lhr.comsats.net.pk
Address: 210.56.16.10
210.56.16.10
• Triangulate position based on RTT measurements made to
unknown host from several hosts at known locations.
43
Whereis a host
• Do a Google search on IP address to location,
e.g.
• http://www.geobytes.com/IpLocator.htm
44
Hi-perf Challenges
• Packet loss hard to measure by ping
– For 10% accuracy on BER 1/10^8 ~ 1 day at 1/sec
– Ping loss ≠ TCP loss
• Iperf/GridFTP throughput at 10Gbits/s
– To measure stable (congestion avoidance) state for 90% of test
takes ~ 60 secs ~ 75GBytes
– Requires scheduling implies authentication etc.
• Using packet pair dispersion can use only few tens or
hundreds of packets, however:
– Timing granularity in host is hard (sub μsec)
– NICs may buffer (e.g. coalesce interrupts. or TCP offload) so need
info from NIC or before
• Security: blocked ports, firewalls, keys vs. one time
passwords, varying policies … etc.
45
Dedicated Optical Circuits
• Could be whole new playing field, today’s tools no
longer applicable:
– No jitter (so packet pair dispersion no use)
– Instrumented TCP stacks a la Web100 may not be
relevant
– Layer 1 & 2 switches make traceroute less useful
– Losses so low, ping not viable to measure
– High speeds make some current techniques fail or more
difficult (timing, amounts of data etc.)
46
More Information
• Tutorial on monitoring
– www.slac.stanford.edu/comp/net/wan-mon/tutorial.html
• RFC 2151 on Internet tools
– www.freesoft.org/CIE/RFC/Orig/rfc2151.txt
• Network monitoring tools
– www.slac.stanford.edu/xorg/nmtf/nmtf-tools.html
• Ping
– http://www.ping127001.com/pingpage.htm
• IEPM/PingER home site
– www-iepm.slac.stanford.edu/
• IEEE Communications, May 2000, Vol 38, No 5, pp
130-136
47
Simplified SLAC DMZ Network, 2001
Dial up
&ISDN
2.4Gbps
OC48 link
NTON
(#)
rtr-msfc-dmz
155Mbps
OC3 link(*)
Stanford
Internet2
OC12 link
622Mbps
Etherchannel 4 gbps
1Gbps Ethernet
100Mbps Ethernet
10Mbps Ethernet
Swh-dmz
slac-rt1.es.net
ESnet
swh-root
SLAC Internal Network
(*) Upgrade to OC12 has been requested
(#) This link will be replaced with a OC48
POS card for the 6500 when available 48
Flow lengths
• Distribution of netflow lengths for SLAC border
– Log-log plots, linear trendline = power law
– Netflow ties off flows after 30 minutes
– TCP, UDP & ICMP “flows” are ~log-log linear for
longer (hundreds to 1500 seconds) flows (heavy-tails)
– There are some peaks in TCP distributions, timeouts?
• Web server CGI script timeouts (300s), TCP connection
establishment (default 75s), TIME_WAIT (default 240s),
tcp_fin_wait (default 675s)
ICMP
TCP
UDP
49
Traceroute technical details
Rough traceroute algorithm
ttl=1; #To 1st router
port=33434; #Starting UDP port
while we haven’t got UDP port unreachable {
send UDP packet to host:port with ttl
get response
if time exceeded note roundtrip time
else if UDP port unreachable
quit
print output
ttl++; port++
}
• Can appear as a port scan
– SLAC gets about one complaint every 2 weeks.
50
Time series
UDP
TCP
Outgoing
Cat 4000 802.1q
Incoming
vs. ISL
51
Power law fit parameters by time
Just 2 parameters
provide a reasonable
description of the flow
size distributions
52
Not your normal Internet site
Ames IXP: approximately 60-65% was HTTP, about 13% was NNTP
Uwisc: 34% HTTP, 24% FTP, 13% Napster
53
PingER cont.
• Monitor timestamps and sends ping to remote site at
regular intervals (typically about every 30 minutes)
• Remote site echoes the ping back
• Monitor notes current and send time and gets RTT
• Discussing installing monitor site in Pakistan
– provide real experience of using techniques
– get real measurements to set expectations, identify
problem areas, make recommendations
– provide access to data for developing new analysis
techniques, for statisticians etc.
54
PingER
• Measurements from
–
–
–
–
–
–
38 monitors in 14 countries
Over 600 remote hosts
Over 120 countries
Over 3300 monitor-remote site pairs
Measurements go back to Jan-95
Reports on RTT, loss, reachability, jitter, reorders,
duplicates …
• Uses ubiquitous “ping” facility of TCP/IP
• Countries monitored
– Contain over 80% of world population
– 99% of online users of Internet
55
Surveyor & RIPE, NIMI
• Surveyor & RIPE use dedicated PCs with GPS
clocks for synchronization
– Measure 1 way delays and losses
– Surveyor mainly for Internet 2
– RIPE mainly for European ISPs
• NIMI (National Internet Measurement
Infrastructure) more of an infrastructure for
measurements and some tools (I.e. currently does
not have public available data,regularly updated)
– Mainly full mesh measurements on demand
56
Skitter
• Makes ping & route measurements to tens of
thousands of sites around the world. Site selection
varies based on web site hits.
– Provide loss & RTTs
– Skitter & PingER are main 2 sites to monitor
developing world.
57
“Where is” a host – cont.
• Find the Autonomous System (AS) administering
– Use reverse traceroute server with AS identification, e.g.:
• www.slac.stanford.edu/cgi-bin/nph-traceroute.pl
…
14 lhr.comsats.net.pk (210.56.16.10) [AS7590 - COMSATS] 711 ms (ttl=242)
– Get contacts for ISPs (if know ISP or AS):
• http://puck.nether.net/netops/nocs.cgi
• Gives ISP name, web page, phone number, email, hours etc.
– Review list of AS's ordered by Upstream AS Adjacency
• www.telstra.net/ops/bgp/bgp-as-upsstm.txt
• Tells what AS is upstream of an ISP
– Look at real-time information about the global routing system from
the perspectives of several different locations around the Internet
• Use route views at www.antc.uoregon.edu/route-views/
• Triangulate RTT measurements to unknown host from
multiple places
58
Who do you tell
• Local network support people
• Internet Service Provider (ISP) usually done by local
networker
– Use puck.nether.net/netops/nocs.cgi to find ISP
– Use www.telstra.net/ops/bgp/bgp-as-upsstm.txt to find
upstream ISPs
• Give them the ping and traceroute results
59
Achieving throughput
• User can’t achieve throughput available (Wizard gap)
• Big step just to know what is achievable
60