The netperf.net Inter-provider Network Performance

Download Report

Transcript The netperf.net Inter-provider Network Performance

The netperf.net
Inter-provider Network
Performance Monitoring Project
Avi Freedman
[email protected]
Traditional Performance Tools
• ping
• traceroute
• Q: What should you ping/traceroute to?
• Problem - routers de-prioritize responses to
ping/traceroute; may even shape ICMP.
Current ping-based Tools
• MCI’s traffic.mci.com page
– ping-monitoring of MCI routers
– always all-green
• “Internet weather” sites
– ping-monitoring of name servers
Then, the Keynote Study
• Clearly, they understood that pingmonitoring to routers is not the way to go
• So they said (presumably) let’s do HTTP that’s what people on the ‘net do.
• Attitude has been “don’t give us technical
excuses; we know you’re just trying to
hoodwink us”.
Keynote Study (more)
• So, what are they doing?
• The study consists of millions of HTTP
queries from collocated machines, querying
the web servers of each backbone.
• Machines are collocated on sites that are
often multi-homed.
• Also, they do URL performance monitoring,
not too badly, but negative data is not
useful.
What the Study Shows
• Wow, the Internet tends to be “worst” from
certain cities. No, actually, from certain
networks, we believe.
• Their written analysis frequently says “the
Internet was slow in Philadelphia”, or “the
Internet was slow in Pittsburgh because of
under-provisioning of fiber to Pittsburgh”.
Issues with the Keynote Test
• Little scientific method
– Machines are not all similar, much less
identical.
– Questionable statistics handling.
– Some people have special 10k files, some don’t
• Claim is “network performance is a much
bigger factor than web server performance”
• No data about how no-responses are
counted.
Issues with Keynote Study, ctd.
• “The Internet is slow in Pittsburgh” - not
how the Internet is architected. The
backbone that a site is on is much more
important than what city...
• MEASURING BACKWARDS!
• The Keynote study may be able to show
how good it is to be hosted on a given
network, but not how good it is to be a
browser.
Issues with Keynote, ctd.
• Study always delayed 3-6 months - it’s a 3month study released 3 months later. No
real-time (i.e. useful) data.
• Strange padding of numbers by multiplying
by strange factors that are not well-defined
so people can’t real-world correlate.
• Unwillingness to release sampling code
when validity challenged.
Keynote Study as SAT
• With the SAT, a highly positive score is a
good indicator of high potential
<something>. A negative score isn’t a good
indicator of anything in particular.
• The Keynote study is the SAT of Internet
measurement - a highly positive score is a
good indicator but a negative score could
mean many things.
Beating Keynote
• Many providers spend lots of time trying to
beat the Keynote test - dedicated server;
special 10k file, replicated servers, special
peering and announcing just a Keynote-only
web server IP/route, with diff. connectivity.
• We were going to deploy 15 optimized
Sparc 1+s running Linux, with a specialized
web server in the kernel that pre-computes
the 10 response packets. So we said “since
we have all of these machines...”
The netperf.net Study
• We assert that the main issue is what
network one is on, not what city one is in.
• Scientific method is important.
• Plan - put 2 machines on every network, or
on a SINGLE-homed customer of that
network. All are 16mb/200mb Sparc 1+s.
• Each Query machine queries all remote
Responder machines.
The Study’s Products
• Our short-term goal is to have a 30-minute
delayed NxN matrix set:
– One of UDP packet loss (not pings)
– One of TCP session establishment (10-byte
HTTP request)
– One of 1k-HTTP requests
– One of 10k-HTTP requests
• Longer-term, URL performance monitoring.
The Study’s Products (ctd)
• Backbone measurement data will be semireal time and free.
• We may charge for URL performance
monitoring. The raw data shows both “how
good to be a server” and “how good to be a
browser” on a network.
The Study’s Products (ctd)
• Main goal, though - to provide useful data
to network operators.
Current Status
• 9 pairs of machines are deployed; data is
coming in.
• 10 more pairs will be sent out by the end of
June.
• Many backbones have welcomed such a
study - concentric, savvis, epoch, ibm, ...
Current Status (ctd)
• Still looking for PSI, ANS, AGIS, Digex.
• Offering reciprocal collo to single-homed
ISPs.
• Still have software work (to subtract out
congested sites).
Contact Info
[email protected]