Presentation (PowerPoint File)

Download Report

Transcript Presentation (PowerPoint File)

On the Geographic Location of Internet
Resources
or
Where on Earth is the Internet?
Mark Crovella
Boston University Computer Science
with
Anukool Lakhina, John Byers, and Ibrahim Matta
Some observations about the Internet
• Rapid, decentralized growth:
– 90% of Internet systems were added in the last four
years
– Connecting to the network can be a purely local operation
• This rapid, decentralized growth has opened
significant questions about the physical structure
of the network; e.g.,
–
–
–
–
–
The number of hosts connected to the network
The properties of network links (delay, bandwidth)
The interconnection pattern of hosts and routers
The interconnection relationships of ISPs
The geographic locations of hosts, routers and links
Internet Science
• Engineering or Science?
Engineering: study of things made
Science: study of things found
• Although the Internet is a engineered
artifact, it now presents us with questions
that are better approached from a
scientific posture
For example: where is the Internet?
?
?
?
?
?
What is the relationship between geography and the network?
Why does this matter?
• Our motivation is in developing better
generators for “representative network
topologies”
– Many simulation based results in networking
depend critically on network topology
– Topology generation is still fairly ad hoc
• However, there is a scientific goal as well!
– We need to understand what drives Internet
growth
– Basic investigations will pay off in unexpected
ways
Assumptions and Definitions
• We will treat the Internet as an undirected
graph embedded on the Earth’s surface
– Nodes correspond to routers or interfaces
– Edges correspond to physical router-router links
– Not concerned with hosts (end systems)
• We will ignore many higher and lower level
questions
– Autonomous systems
– Link layers
Some Basic Questions
• What is the relationship between
population and network geometry?
• What effect does distance have on
network topology?
Our Basic Approach
• Obtain IP-level router maps
Mercator and Skitter
• Find geographic locations of those routers
Ixia’s IxMapping service
Mercator: Govindan et al., USC/ISI, ICSI
• Based on active probing from a single site
• Resolves aliases
• Uses loose source routing to explore alternate
paths
Skitter: Moore et al., CAIDA
• Traceroutes from 19 monitors to large set of
destinations
• Does not resolve aliases
• Destinations attempt to cover IP address space
Datasets
Mercator
• Collected August 1999
• 228,263 routers
• 320,149 links
Skitter
• Collected January 2002
• 704,107 interfaces
• 1,075,454 links
IxMapping: Moore et al., CAIDA
• Given an IP address, infers geographic location
based on a variety of heuristics
– Hostnames, DNS LOC, whois
e.g., 190.ATM8-0-0.GW3.BOS1.ALTER.NET is in Boston
• Able to map
– 99% of Mercator routers
– 98.5% of Skitter interfaces
• Similar to GeoTrack [Padmanabhan] which exhibits
reasonable accuracy
– Median error of 64 mi
– 90% queries within 250 mi
– for well connected nodes
Where are the routers?
USA
Europe
Routers and People: World
(Grid size: ~150 mi x 150 mi)
Ugh!
People Per Interface (Skitter)
Africa
Pop
Intfs
(M)
(K)
837
8,379
People Online Online
/ Intfc (M)
/ Intfc
100,011
4.1
495
S. America
341
10,131 33,752
21.9
2,161
Mexico
154
4,361 35,534
3.4
784
W. Europe
366
95,993
3,817
143.1
1,489
Japan
136
37,649
3,631
47.1
1,250
18
18,277
975
10.0
552
299 282,048
1,061
166.1
588
5,653 563,521 10,032
513.2
910
Australia
USA
World
Interfaces and People: USA, Skitter
Grid size: ~90 mi x 90 mi
Routers and People
Upper, Mercator; Lower, Skitter
USA
Europe
Japan
Router Location: Summary
• Router location is strongly driven by both
population density and economic
development
• Superlinear relationship between router
and population density:
R  k Pa
k varies with economic development (users online)
a is greater than one
• More routers per person in more densely populated
areas
Models for Network Topology
1988
1996
1999
Spatial Models
Structural Models
Degree-based Models
Spatial model: Waxman, 1988
• Nodes are distributed randomly (uniformly)
in the plane.
• Probability that two nodes separated by
distance d are connected:
P[C|d] =  exp(-d/L)
0  ,   1; L = diameter of region
: degree of distance sensitivity
: edge density
• A spatially imbedded random graph
Structural Models
• Real networks have structure
– Always connected!
– Formed by interconnection of component
networks
– Distinction between transit and stub networks
imposes a hierarchy on resulting graph
• Tiers: Doar, 1996
• GT-ITM: Calvert, Doar, Zegura, 1997
Degree-based Models
• Faloutsos, Faloutsos, & Faloutsos, 1999:
– Empirically measured networks show a power
law in degree distribution:
P[node has degree d] = k d-a
• Barabasi & Albert, 1999:
– This property will be present in a graph where:
• Nodes and links are added incrementally
• Probability of connecting to a node is proportional to
its degree (preferential connectivity)
• BRITE: Medina, Matta, Byers, 2001
Empirical Evidence
• Interested in influence of distance on link
formation:
f(d) = P[C|d]
i.e., Probability two nodes separated by
distance d are connected
• Estimated as:
number links of length d
f(d) = ------------------------------------------number of router pairs separated by d
f(d) for USA (Skitter)
Distance Sensitive
Distance Insensitive
Link Distance Preference for USA
Skitter, d < 250, semi-log plot
L  140 mi.
Link distance preference: all regions
Upper, Mercator; Lower, Skitter; small d
USA
L  140 mi
Europe
L  80 mi
Japan
L  140 mi
Large d: distance insensitivity
USA data, Skitter
d
F(d) =  f(u)
u=1
Distance insensitivity, all regions
Upper, Mercator; Lower, Skitter; large d
USA
Europe
Japan
Limits to distance sensitivity
Mercator
Skitter
Limit
USA
% Links
Limit
% Links
< Limit
< Limit
820 mi.
82.1% 818 mi.
77.2%
Europe 383 mi.
97.3% 366 mi.
95.4%
Japan
91.5%
92.8%
165 mi.
116 mi.
Link Formation: Summary
• Link formation seems to be a mixture of
distance-dependent and –independent
processes
• Waxman (exponential) model remarkably
good for large fraction of all links!
– But, crucial difference is that we are using a
very irregular spatial distribution of nodes
• Small fraction of non-local links are very
important (structural)
Generating Topologies: a new recipe
1. Good models for population density exist
–
–
–
CIESIN’s Gridded Population of the World
2.5 arc minutes (<5 mi2) … very high quality
Time trends also available
2. Router density then follows from
population relationship
3. Link formation driven by hybrid process
–
Distance-dependent and –independent
Related Work
• Matrix.net:
– Uses DNS hostname allocations
– proprietary location methods
• Akamai
– Extensive peering and measurement infrastructure
• Padmanabhan and Subramanian, 2000:
– assessed accuracy of geographic mapping techniques
• Yook, Jeong, and Barabasi, 2001:
– Similar goals
– Find linear (not exponential) distance dependence
Final Thoughts
• The Internet has
fully
interpenetrated
human society
• Scientific
understanding of
the net is essential
• Applications:
–
–
–
–
Simulation
Security
Reliability
Planning
Thanks!
• CAIDA:
– David Moore
– k claffy
– Andre Broido
• Notre Dame:
– Lazslo Barabasi
• USC:
– Ramesh Govindan
– Hongsuda Tangmunarunkit
Routers and People: North America
Subdividing the Data
N. US
S. US
C. A.
Economic Heterogeneity
What Influences the Formation of Links?
• Waxman, 1988: spatial model
• Zegura et al., 1997: structural model
– Explicitly captures hierarchical structure
• Barabasi et al., 1999: degree-based
preferential connectivity
– Matches observed power-law node degree
– Inspired by Faloutsos et al., 1999
• Strogatz & Watts, 1998: small-world
properties
– Captures “six degrees of separation”
Routers and Economics
Matrix.net: hosts
Mercator: routers
Penultimate Geographic Map, Oct 1980
Last Complete Geographic Map, Aug 1982