Transcript slides
On the Power of Off-line Data in
Approximating Internet Distances
Danny Raz ([email protected])
Technion - Israel Institute of Technology
and
Prasun Sinha ([email protected])
Bell Labs., Lucent Technologies
Outline
• Internet Distance
• Off line metrics
– Geographic distance, #hops, # AS, depth
•
•
•
•
Linear Regression for Internet distance estimation
Multi-variable linear regression
Accuracy of picking closest mirror site
The next step
Internet Distance
• Internet Distance: one way delay between hosts
• Components of Internet Distance
– Dynamic
• Server Load
• Network Congestion / Router Load
– Static
• propagation delay over the links
• Router processing delay
• Edge-router processing delay
Goal: To study the power of estimating the Static Internet Distance
using off-line metrics
Importance of Internet Distance
Estimation
• Picking closest mirror-site/cache
• For use in Content Distribution Networks
Approaches
• Dynamic
– Dynamic probing [Dykes et. al. Infocom ’00]
– Passive monitoring [Andrews et. al. Infocom ’02]
• Static
– Semi-active probing (IDMAPs) [Jamin et. al. Infocom ’00]
• Other relevant work:
– Geographic Distance and RTT: [Padmanabhan Sigcomm ‘02]
Static Internet Distance
AS #1
AS #2
AS #3
Core Router
• Propagation delay: geographical distance
Edge Router
AS: Autonomous System
• Router processing delay: # hops
• Edge-router processing delay: # AS
Static Internet Distance = geo-distance + hop-count + AS-count ?
Data Collection
•
•
•
•
Clients: 2500 public libraries in US
Servers (mirrors/caches): 8 traceroute locations in US
The location (latitude, longitude) is known for every host.
For every client-server pair
– Run multiple (10) traceroutes
– Pick the traceroute result with the smallest RTT
– Compute
• Geo-distance: based on latitude and longitude
• Hop-count: from traceroute
• AS-count: from traceroute based on names of routers and IP Address
Prefixes
Linear Regression
(Geo-distance and Hop-count)
minRTT vs. Geo-distance
SE (Std. Error) = 26.93
minRTT vs. Hop-count
SE (Std. Error) = 25.71
Multiple Linear Regression
(Multiple metrics)
minRTT vs.
Geo-distance, Hop-count
SE = 21.52
minRTT vs.
Geo-distance, AS-count
SE = 23.80
minRTT = geo-distance + hop-count +
AS-count ?
Term
Coefficient
p-value
Geo-distance
12.53 ()
<0.0001
Hop-count
2.45 ()
<0.0001
AS-count
-0.64 ()
0.0387
• High correlation between hop-count and AS-count (highest
among any other pair of metrics)
• Hop-count and AS-count should not be used together
A new Off-line metric: Depth
• Hop-count: requires dynamic probing
• Introduce an alternate metric: Depth
– Average Hop-count to the nearest backbone network (a
hand-made list of 30 big core networks)
– Constant per host (client/server)
– Alternately, measure in units of time rather than hops
– (Client depth + Server depth) as a metric
Linear Regression
(Depth)
minRTT vs. Depth
SE = 41.02
minRTT vs. Depth and Geo-distance
SE = 24.52
Squared Errors in Estimating
minRTT
Metric
SE
(Standard Error)
Geo-distance, Hop-count
21.52
Geo-distance, AS-count
23.80
Geo-distance, Depth
24.52
Hop-count
25.71
Geo-Distance
26.93
Depth
41.02
Accuracy of picking the nearest
mirror site
Hop-count
Geodistance,
Hop-count
Geodistance,
Depth
37.84%
44.32%
38.41%
33.98%
21.15%
53.07%
58.98%
55.91%
50.45%
20ms
33.75%
73.18%
76.70%
74.89%
70.91%
30ms
46.25%
90.91%
88.75%
91.36%
89.43%
Allowed
Delta
Random
Geodistance
0
12.50%
10ms
880 clients and 8 servers
Summary
• Combination of hop-count and geographic distance
improves over individual metrics
• Using Depth along with Geo-distance improves
performance and is completely off-line
• For closest mirror selection with 30 ms allowed deviation,
almost any metric gives 90% accuracy
Is there much space to improve?
The Next Step
• Global Data
– Collection and analysis of data based on
clients and servers spread across the globe
• Using both off-line and on-line
– Techniques to combine the power of off line
estimation with on-line estimation.