Landmarks - Sándor Laki
Download
Report
Transcript Landmarks - Sándor Laki
Geolocation by IP address
Locating Internet hosts
Sándor Laki
[email protected]
http://lakis.web.elte.hu
Sándor Laki (C) Geolocation by IP
address
1
Outline
RADAR a wireless solution
IP2Geo on the Internet
Constraint-Based Geolocation
(GeoLim)
OCTANT framework
Topology-Based Geolocation
Sándor Laki (C) Geolocation by IP
address
2
RADAR
Sándor Laki (C) Geolocation by IP
address
3
RADAR, a wireless approach
Focus on the indoor environment
GPS does not work indoors
Dedicated technologies
Goals
Leverage existing infrastucture
Use wireless LAN
Software solution
Better scalability and lower cost than dedicated
technology
Sándor Laki (C) Geolocation by IP
address
4
RADAR
Key idea: Signal strength matching
Offline calibration
Construct radio map (<location, Sstr>)
Real-time location and tracking
Extract SStr from beacons
Find table entry that best matches the
measured SStr
Sándor Laki (C) Geolocation by IP
address
5
RADAR: Determine location
Find nearest neighbor in signal space
(NNSS)
1st solution
Physical position of NNSS gives the user
location
2nd solution
K-NNSS
Average the coordinates of k nearest
neighbor gives the wanted position
Sándor Laki (C) Geolocation by IP
address
6
Correlation between physical
location and signal strength
•Base system:
•INFOCOM 2000 paper
•Enhanced system:
•Microsoft Technical Report MSR-TR-2000-12
Sándor Laki (C) Geolocation by IP
address
7
IP2Geo
Single-point localization
Sándor Laki (C) Geolocation by IP
address
8
IP2Geo - Motivation
Much focus on location-aware services in wireless
and mobile contexts
Such services are relevant in the Internet context
too
targeted advertising
event notification
territorial rights management
network diagnostics
It is a challenging problem
IP address does not inherently contain an indication
of location
Sándor Laki (C) Geolocation by IP
address
9
IP2Geo
Multi-pronged approach that exploits various “properties” of
the Internet
DNS names of router interfaces often indicate location
network delay tends to correlate with geographic distance
GeoTrack
determine location of closest router with a recognizable DNS name
GeoPing
hosts that are aggregated for the purposes of Internet routing also
tend to be clustered geographically
use delay measurements to estimate location
GeoCluster
extrapolate partial (and possibly inaccurate) IP-to-location
mapping information using BGP prefix clusters
Sándor Laki (C) Geolocation by IP
address
10
GeoTrack – main idea
Extract geographical information from DNS
names of routers on the path
Localizes the target to the last router whose
position is known
Example
ngcore1-serial8-0-0-0.Seattle.cw.net => Seattle
184.atm6-0.xr2.ewr1.alter.net => New York
dnvr-scrm.abilene.ucaid.edu => Denver
Sándor Laki (C) Geolocation by IP
address
11
GeoTrack
GeoTrack operation
do a traceroute to the target IP address
determine location of last recognizable router along the path
Key ideas in GeoTrack
partitioned city code database to minimize chance of false match
ISP-specific parsing rules
delay-based correction
Limitations
routers may not respond to traceroute
DNS name may not contain location information or lookup may
fail
target host may be behind a proxy or a firewall
Sándor Laki (C) Geolocation by IP
address
12
GeoPing - Delay based
localization
Delay-based triangulation is conceptually simple
delay to distance
distance from 3 or more non-colinear points =>
target location
But there are practical difficulties
network path may be circuitous
transmission & queuing delays may corrupt delay
estimate
OWD is hard to measure
OWD ≠ RTT/2 because of routing asymmetry
Sándor Laki (C) Geolocation by IP
address
13
GeoPing - details
Measure the network delay to the target host from several
geographically distributed probes
typically more than 3 probes are used
round-trip delay measured using ping utility
small-sized packets => transmission delay is negligible
pick minimum among several delay samples
Nearest Neighbor in Delay Space (NNDS)
akin to Nearest Neighbor in Signal Space (NNSS) in RADAR
construct a delay map containing (delay vector,location) tuples
given a vector of delay measurements, search through the delay
map for the NNDS
location of the NNDS is our estimate for the location of the target
host
More robust that directly trying to map from delay to distance
Sándor Laki (C) Geolocation by IP
address
14
GeoPing – Delay tends to increase
with geographic distance
Sándor Laki (C) Geolocation by IP
address
15
GeoPing – Estimation error
Sándor Laki (C) Geolocation by IP
address
16
GeoCluster
A passive method
Sándor Laki (C) Geolocation by IP
address
17
GeoCluster
A passive technique unlike GeoTrack and GeoPing
Basic idea:
breaks the IP address space into clusters
assign a geographical location to each cluster based on
IP-to-location third party databases
given a target IP address, first find the matching cluster
using longest-prefix match.
location of matching cluster is our estimate of host
location
Sándor Laki (C) Geolocation by IP
address
18
GeoCluster
Example:
consider the cluster 128.95.0.0/16 (containing 65536
IP addresses)
suppose we know that the location corresponding to
a few IP addresses in this cluster is Seattle
then given a new address, say 128.95.4.5, we
deduce that it is likely to be in Seattle too
Sándor Laki (C) Geolocation by IP
address
19
GeoCluster – Clustering IP
addresses
Exploit the hierarchical nature of
Internet routing
inter-domain routing in the Internet uses
the Border Gateway Protocol (BGP)
BGP operates on address aggregates
we treat these aggregates as clusters
in all we had about 100,000 clusters of
different sizes
Sándor Laki (C) Geolocation by IP
address
20
IP-to-location mapping
Data sources
e-mail service, business web-hosting companies,
etc.
requires a large, fine-grain and fresh database!
Information
partial information (i.e., only for a small subset
of addresses)
possibly inaccurate (e.g., manual input from
user)
Sándor Laki (C) Geolocation by IP
address
21
Extrapolating IP-to-location
mapping
Determine location most likely to correspond to a cluster
majority polling
“average” location
dispersion is an indicator of our confidence in the location estimate
What if there is a large geographic spread in locations?
some clusters correspond to large ISPs and the internal
subdivisions are not visible at the BGP level
sub-clustering algorithm: keep sub-dividing clusters until there is
sufficient consensus in the individual sub-clusters
some clients connect via proxies or firewalls (e.g., AOL clients)
sub-clustering may help if there are local or regional proxies
otherwise large dispersion => no location estimate made
many tools fail in this regard
Sándor Laki (C) Geolocation by IP
address
22
Performance of GeoCluster
Median errors:
GeoCluster ~30km
GeoPing ~300km
GeoTrack ~100km
Sándor Laki (C) Geolocation by IP
address
23
Other database-oriented
applications
NetGeo and IP2LL
Quova
based on WHOIS DB
not closely regulated
the address information often indicates the
head office of the owner which may be far from
the actual target
Commercial service with thier own database
Gtrace
using DNS LOC entries
Sándor Laki (C) Geolocation by IP
address
24
Octant framework
A very impressive solution
Sándor Laki (C) Geolocation by IP
address
25
Octant overview
Combine very different techniques
Active and passive
Constraint-based
Weighted positive and negative constraints
Constraint => region
Using Bézier-regions
Efficient implementations of clipping and union
operations are available
Sándor Laki (C) Geolocation by IP
address
26
Octant - Notations
bi : the region in which the target
node is located
gj : a constraint
It is a region where the node might be
reside associated with weight
Set of nodes
Landmarks: physical locations are at
least partially known (Lj)
Every Lj has an „estimated” location bLj
Sándor Laki (C) Geolocation by IP
address
27
Octant – Landmarks and
constraints
Primary landmark
GPS, street address
Low error
Secondary landmark
Position computed by Octant itself
Positive constraints ( set: )
Node A is within d miles of Lk
g = (x,y) in bk c(x,y,d), where c(x,y,d) is a disc.
Negative constraints( set: )
Node A is further than d miles from Lk
g = (x,y) in bk c(x,y,d)
Sándor Laki (C) Geolocation by IP
address
28
Estimated location
bi = XiXi \ XiXi
Sándor Laki (C) Geolocation by IP
address
29
Mapping latencies to distances
Latency between a target and a landmark
bounds thier maximum distance
Calculate with speed of light
delay*2/3*c
Low precision
Octant’s way
Dynamic calibration
For each L landmark compute two bounds RL(d) and
rL(d)
where d is the ping time of node i
rL(d) || loc(L) – loc(i) || RL(d)
When queuing delays are dominant then rL(d) = 0.
Sándor Laki (C) Geolocation by IP
address
30
Mapping latencies to distances
Each landmark periodically pings all other landmarks =>
creating a correlation table
Determines the convex hull around the points => R(d)
and r(d)
It is sufficient when the target has a direct and
congestion-free path to the landmark
Octant introduce a cut off at latency p
a tunable percentage of landmark lie to the left of p
discard the others
(z is a fictitious datapoint,
placed far away)
Sándor Laki (C) Geolocation by IP
address
31
Mapping latencies to distances
Sándor Laki (C) Geolocation by IP
address
32
Last hop delays
Mapping is further complicated by queuing
and transmission delays associated with the
last hop
Cable and DSL connections
Overloaded PlanetLAB nodes
Goal: isolate the delay components which
artificially inflate latencies
Detailed maps of the underlying physical
network, as in network tomography (not in
Octant)
Octant introduce a simple metric called height
Sándor Laki (C) Geolocation by IP
address
33
Last hop delays in Octant
Based on pair-wise latency measurements
between landmarks
Primary landmarks: a, b, c
Measure thier latencies(RTT): [a,b], [a,c],
[b,c]
The positions of primary landmarks are
known -> we can estimate the transmission
delays: (a,b), (a,c), (b,c)
Lasthop delay(a,b) = [a,b] - (a,b)
Landmark coordinates (alon, alat),…
Sándor Laki (C) Geolocation by IP
address
34
Last hop delays in Octant
How much of the delays can be attributed to each
landmark?
Denoted by a’, b’ and c’ // height
Similarly, for a target t, we can compute t’, as an
estimation…
We can solve for
t’, tlon, tlat
Sándor Laki (C) Geolocation by IP
address
35
Last hop delays
tlon and tlat has relatively high error
not used in the later stages
Given the target and landmark heights
Each landmark can shift its
RL up if t’ < heights of the other landmarks
rL down if t’ > heights of the other landmarks
Sándor Laki (C) Geolocation by IP
address
36
Indirect routes
The preceding assumption
Route lengths are proportional to great circle
distances
not the case in practise, due to policy
routing
Example: a subscriber Ithaca, NY ->
Cornell Univ. (Ithaca)
Syracuse, NY -> Brockport, IL -> New York City
-> Cornell Univ.
~1 mile physical distance VS. ~800 miles length
path
Sándor Laki (C) Geolocation by IP
address
37
Indirect routes discovery
Landmark’s heigth can
indicate…
Localizing routers on the
network path
Secondary landmarks
Localization by latencies
Extract location from router
names
Reverse DNS lookup – undns
tool
Using ZIP code to determine
geographical location
Sándor Laki (C) Geolocation by IP
address
38
Handling uncertainty
Filter out errorneous constraint
Latency based constraints
Weight system that decreases
exponentially with increasing latency
Weight threshold
Sándor Laki (C) Geolocation by IP
address
39
Iterative refinement
Two phase:
First, we use accurate and mostly
conservative constraints
Second, less acurate and more
aggressive constraints to obtain a better
estimation (inside the initial estimated
region)
…
Sándor Laki (C) Geolocation by IP
address
40
Results
Sándor Laki (C) Geolocation by IP
address
41
Results
Sándor Laki (C) Geolocation by IP
address
42
Topology-based Geolocation
Sándor Laki (C) Geolocation by IP
address
43
Motivations
Problems with CBG
Use constraints that are less than speed of light
Risk of underestimates
When an underestimate occurs, the final region
does not contain the true location
Topology based geolocation
using the speed of light to generate constraints
inspired by Sensor Network Localization
Sándor Laki (C) Geolocation by IP
address
44
Summary of techniques
Traceroute from landmarks
Estimate hop latency
Increase structuring
Validate location hints
Improve accuracy
Cluster network interfaces
Map topology
Incorporate location hints
Constraint optimization
Geolocate targets
Sándor Laki (C) Geolocation by IP
address
45
Estimate hop latencies
Using traceroute tool to infer link
latency
Estimate hop latency from the
difference in RTT to adjacent routers
Accurate only if the link is traversed both
directions (symmetric routing)
How can we discover this property?
Three different techniques
Sándor Laki (C) Geolocation by IP
address
46
Estimate hop latencies
First, observing the reverse TTL values
Most routers initialize the TTL values for thier packets from a small set.
If TTL values changes significantly from one node to the next => discard the
link estimate
Second, measuring paths in both direction between pairs of landmarks
30,32,64,128,150,255
If both paths traverse a particular link => taking the differences of
measurements to the two endpoints
This estimation has high confidence
Third, increasing vantage points from which we probe a certain link
For every link on a path from a landmark we probes to both endpoints from
all other landmarks…
If these probes pass over the link => estimate for the link
Sándor Laki (C) Geolocation by IP
address
47
Clustering interfaces
Clustering interfaces that belong to the same router (IP
aliases)
Sándor Laki (C) Geolocation by IP
address
48
Clustering interfaces
Two IP-aliases techniques
Mercator
UDP probes are send to high-numbered ports on a set of
interfaces
Routers send back a port-unreachable ICMP message with the
source address
If two diff. interfaces replie with the same source address =>
aliases
Ally
Used on pairs of interfaces
Sends probes to the two if.
Examines the IP-ID
Most routers generate the IP-ID using a single counter that has
incremented after each packet has been created
Sándor Laki (C) Geolocation by IP
address
49
Validating location hints
DNS names -> locations
Some names are incorrect…
Missnamed, reconfig, reassignment of IP
addresses
Topology constraints can be used to verify
location hints
RTT measurements -> upper bounds…
Clustering -> aliases
Hop latencies
Sándor Laki (C) Geolocation by IP
address
50
Constraint optimization
Targets:X = {xi} Landmarks:L = {li}
Distance bw i and j: d(i,j)
Hard delay constraint
D(li xj) <= cij
Set of hdc = Cd
// cij Speed of light
Soft Link Latency Constraints
Hop latency bw i and j: hij
D(xi,xj) = hij + eij
Where eij is some error
Set of SLLC = Cl
Sándor Laki (C) Geolocation by IP
address
51
Constraint optimization
Minimize:
Subject to: Cd , Cl
Not a convex optimization problem
i,j in Cl|eij|
But we can recast it as a semidefinite
program
Using fast solvers
SeDuMi
Vivaldi
Sándor Laki (C) Geolocation by IP
address
52
Results
Sándor Laki (C) Geolocation by IP
address
53
Results
Sándor Laki (C) Geolocation by IP
address
54
References
RADAR and IP2Geo
http://eris.prakinf.tu-ilmenau.de/res/papers/coopStreaming/padmanabhan01Locating.pdf
Bernard Wong, Ivan Stoyanov and Emin Gün Sirer.
Octant: A Comprehensive Framework for the Geolocalization of Internet
Hosts.
In Proceedings of the Symposium on Networked System Design and
Implementation, Cambridge, Massachusetts, April 2007.
Katz-Bassett, E., John, J. P., Krishnamurthy, A., Wetherall, D., Anderson, T., and
Chawathe, Y.
Towards IP geolocation using delay and topology measurements.
In Proceedings of the 6th ACM SIGCOMM on internet Measurement (Rio de
Janeriro, Brazil, October 25 - 27, 2006). IMC '06. ACM Press, New York, NY, 7184.
Sándor Laki (C) Geolocation by IP
address
55