slides - The Fengs

Download Report

Transcript slides - The Fengs

Internet Tomography and
Geography
What is this area all about?
Related work in the area
Main Paper



WEBMAPPER’s features
How WEBMAPPER works
WEBMAPPER results summarized
Internet Tomography /
Geography
Where, geographically, are my clients located
(or servers)?
What’s the best server for me to get content
from?
Given two IP’s, what’s the latency between
them?
What IP’s all come from the same geographic
location?
Generally, show me a complete and accurate
map of the Internet
Related Work
B. Krishnamurthy and J. Wang, “On network-aware clustering of
webclients,” in Proceedings of SIGCOMM ’00, August 2000




Problem: Which IP’s are the same
Does BGP-table based clustering
Groups IP’s by common administrative control
Passive
S. Jamin, C. Jin, Y. Jin, D. Raz, Y. Shavitt, and L. Zhang, “On the
placement of intenet instrumentation,” in Proceedings of
INFOCOM ’00, March 2000, pp. 295–304.



Problem: What’s latency between 2 IP’s
Clusters the internet based on BGP address prefixes
Places “Tracers” everywhere in the Internet that actively ping each
other
Related work
An Investigation of Geographic Mapping
Techniques for Internet Hosts



Padmanabhan, Subramanian, SIGCOMM 2001
Problem: geo-locate from IP
3 solutions
 GeoTrack: infer location by DNS / traceroute
 GeoPing: probe target from known locations, triangulate
[active]
 GeoCluster: do passive BGP, IP-prefix based clustering
Related work
Predicting Internet Network Distance with
Coordinate-Based Approaches




Problem: Given 2 IP’s, what’s latency
Ng, Zhang, CMU, INFOCOM 2002
Active network of landmarks, pinging each other
Better math model (GNP) for interpolating
distance than IDMaps
Proprietary Solutions (Akamai, etc)

Problem: What’s the closest server to an IP?
Clustering and Server
Selection Using Passive
Monitoring
M. Andrews, B. Sheperd, A.
Srinivasan, P. Winkler, F. Zane
(Bell Labs), INFOCOM 2002
Problem: Server Selection
Given a client, and a set of servers, all
with identical content, tell me the “best”
server
For the client, “best” means lowest
latency or highest throughput
For the whole system, it may mean
something else
What is Passive Monitoring?
Content Servers don’t ping each other
Content Servers don’t ping clients
No pinging is done: no additional
network traffic is introduced
Instead, servers record the Round-Trip
Time of TCP handshake from a client
WEBMAPPER
WEBMAPPER: Clusterer
Clusterer uses the client-server latency pairs
reported by the servers
Determines which address prefixes
correspond to the same network location, and
which don’t
Does more than just find the closest server to
each cluster; assigns probabilities to each to
balance the network flow
WEBMAPPER Output
Client
cluster
Content
server 1
Content
server 2
Content
server 3
25.135.64.0/19 0.9
0.05
0.05
132.0.0.0/8
0
0.15
0.85
WEBMAPPER: Big Tree
Giant Binary Tree of all IP addresses
Well, not all… assume last 8 bits always
clustered
Root of tree: 0.0.0.0/0


Children 0.0.0.0/1, 128.0.0.0/1
Leaves 123.123.123.123/24
WEBMAPPER: Big Tree
Leaves also store



Sum of recorded distances (per server)
Squared sum of recorded distances
Number of recorded distances
Leaf data is periodically aged
exponentially (I = I * 0.9)
WEBMAPPER: Small Tree
Clusters are formed from big tree by
folding children into their parents when
they’re “similar” to their siblings
Uses statistical test to determine
“similar” (two-sample t-test)
Threshold based
Assigning Clusters to Servers
Assigning servers to clusters is
complicated

1. Testing Index
 Futzes with the probability of a cluster being
assigned to a server
 Based on multi-armed bandit solution
Assigning Clusters to Servers
Assigning Clusters to Servers
Calculate server capacity
Each cluster assignment has a latency
cost
Try to minimize the total cost without
breaking any server’s capacity
Graph theory to the rescue: min-cost
flow
Results
Experiment 1:




28 day log of busy web traffic
Recorded client IP, time, RTT
Clustering produced 17,270 clusters
Here are 12 example clusters for high
traffic days
Results
Experiment 2:



Set up a west-coast and east-coast server
Force clients to download something from
each (to get actual measurments)
Have clients download something from the
“either-or” server (WEBMAPPER powered)
Opinions
Statistics and weighing formulae are
cool
What’s a good way to tell if the
clustering is any good aside from eyeing
samples?
Two server test is all we can get out of
Bell Labs?