Landmarks - Sándor Laki

Download Report

Transcript Landmarks - Sándor Laki

Geolocation by IP address
Locating Internet hosts
Sándor Laki
[email protected]
http://lakis.web.elte.hu
Sándor Laki (C) Geolocation by IP
address
1
Outline

RADAR a wireless solution

IP2Geo on the Internet

Constraint-Based Geolocation


(GeoLim)

OCTANT framework
Topology-Based Geolocation
Sándor Laki (C) Geolocation by IP
address
2
RADAR
Sándor Laki (C) Geolocation by IP
address
3
RADAR, a wireless approach


Focus on the indoor environment

GPS does not work indoors

Dedicated technologies
Goals

Leverage existing infrastucture

Use wireless LAN

Software solution

Better scalability and lower cost than dedicated
technology
Sándor Laki (C) Geolocation by IP
address
4
RADAR

Key idea: Signal strength matching

Offline calibration


Construct radio map (<location, Sstr>)
Real-time location and tracking


Extract SStr from beacons
Find table entry that best matches the
measured SStr
Sándor Laki (C) Geolocation by IP
address
5
RADAR: Determine location


Find nearest neighbor in signal space
(NNSS)
1st solution


Physical position of NNSS gives the user
location
2nd solution


K-NNSS
Average the coordinates of k nearest
neighbor gives the wanted position
Sándor Laki (C) Geolocation by IP
address
6
Correlation between physical
location and signal strength
•Base system:
•INFOCOM 2000 paper
•Enhanced system:
•Microsoft Technical Report MSR-TR-2000-12
Sándor Laki (C) Geolocation by IP
address
7
IP2Geo
Single-point localization
Sándor Laki (C) Geolocation by IP
address
8
IP2Geo - Motivation



Much focus on location-aware services in wireless
and mobile contexts
Such services are relevant in the Internet context
too

targeted advertising

event notification

territorial rights management

network diagnostics
It is a challenging problem

IP address does not inherently contain an indication
of location
Sándor Laki (C) Geolocation by IP
address
9
IP2Geo

Multi-pronged approach that exploits various “properties” of
the Internet

DNS names of router interfaces often indicate location

network delay tends to correlate with geographic distance


GeoTrack


determine location of closest router with a recognizable DNS name
GeoPing


hosts that are aggregated for the purposes of Internet routing also
tend to be clustered geographically
use delay measurements to estimate location
GeoCluster

extrapolate partial (and possibly inaccurate) IP-to-location
mapping information using BGP prefix clusters
Sándor Laki (C) Geolocation by IP
address
10
GeoTrack – main idea



Extract geographical information from DNS
names of routers on the path
Localizes the target to the last router whose
position is known
Example

ngcore1-serial8-0-0-0.Seattle.cw.net => Seattle

184.atm6-0.xr2.ewr1.alter.net => New York

dnvr-scrm.abilene.ucaid.edu => Denver
Sándor Laki (C) Geolocation by IP
address
11
GeoTrack



GeoTrack operation

do a traceroute to the target IP address

determine location of last recognizable router along the path
Key ideas in GeoTrack

partitioned city code database to minimize chance of false match

ISP-specific parsing rules

delay-based correction
Limitations



routers may not respond to traceroute
DNS name may not contain location information or lookup may
fail
target host may be behind a proxy or a firewall
Sándor Laki (C) Geolocation by IP
address
12
GeoPing - Delay based
localization

Delay-based triangulation is conceptually simple



delay to distance
distance from 3 or more non-colinear points =>
target location
But there are practical difficulties


network path may be circuitous
transmission & queuing delays may corrupt delay
estimate

OWD is hard to measure

OWD ≠ RTT/2 because of routing asymmetry
Sándor Laki (C) Geolocation by IP
address
13
GeoPing - details


Measure the network delay to the target host from several
geographically distributed probes

typically more than 3 probes are used

round-trip delay measured using ping utility

small-sized packets => transmission delay is negligible

pick minimum among several delay samples
Nearest Neighbor in Delay Space (NNDS)

akin to Nearest Neighbor in Signal Space (NNSS) in RADAR

construct a delay map containing (delay vector,location) tuples



given a vector of delay measurements, search through the delay
map for the NNDS
location of the NNDS is our estimate for the location of the target
host
More robust that directly trying to map from delay to distance
Sándor Laki (C) Geolocation by IP
address
14
GeoPing – Delay tends to increase
with geographic distance
Sándor Laki (C) Geolocation by IP
address
15
GeoPing – Estimation error
Sándor Laki (C) Geolocation by IP
address
16
GeoCluster
A passive method
Sándor Laki (C) Geolocation by IP
address
17
GeoCluster

A passive technique unlike GeoTrack and GeoPing

Basic idea:




breaks the IP address space into clusters
assign a geographical location to each cluster based on
IP-to-location third party databases
given a target IP address, first find the matching cluster
using longest-prefix match.
location of matching cluster is our estimate of host
location
Sándor Laki (C) Geolocation by IP
address
18
GeoCluster

Example:



consider the cluster 128.95.0.0/16 (containing 65536
IP addresses)
suppose we know that the location corresponding to
a few IP addresses in this cluster is Seattle
then given a new address, say 128.95.4.5, we
deduce that it is likely to be in Seattle too
Sándor Laki (C) Geolocation by IP
address
19
GeoCluster – Clustering IP
addresses

Exploit the hierarchical nature of
Internet routing

inter-domain routing in the Internet uses
the Border Gateway Protocol (BGP)

BGP operates on address aggregates

we treat these aggregates as clusters

in all we had about 100,000 clusters of
different sizes
Sándor Laki (C) Geolocation by IP
address
20
IP-to-location mapping

Data sources



e-mail service, business web-hosting companies,
etc.
requires a large, fine-grain and fresh database!
Information


partial information (i.e., only for a small subset
of addresses)
possibly inaccurate (e.g., manual input from
user)
Sándor Laki (C) Geolocation by IP
address
21
Extrapolating IP-to-location
mapping


Determine location most likely to correspond to a cluster

majority polling

“average” location

dispersion is an indicator of our confidence in the location estimate
What if there is a large geographic spread in locations?


some clusters correspond to large ISPs and the internal
subdivisions are not visible at the BGP level
sub-clustering algorithm: keep sub-dividing clusters until there is
sufficient consensus in the individual sub-clusters

some clients connect via proxies or firewalls (e.g., AOL clients)

sub-clustering may help if there are local or regional proxies

otherwise large dispersion => no location estimate made

many tools fail in this regard
Sándor Laki (C) Geolocation by IP
address
22
Performance of GeoCluster
Median errors:
GeoCluster ~30km
GeoPing ~300km
GeoTrack ~100km
Sándor Laki (C) Geolocation by IP
address
23
Other database-oriented
applications

NetGeo and IP2LL


Quova


based on WHOIS DB

not closely regulated

the address information often indicates the
head office of the owner which may be far from
the actual target
Commercial service with thier own database
Gtrace

using DNS LOC entries
Sándor Laki (C) Geolocation by IP
address
24
Octant framework
A very impressive solution
Sándor Laki (C) Geolocation by IP
address
25
Octant overview

Combine very different techniques



Active and passive
Constraint-based

Weighted positive and negative constraints

Constraint => region
Using Bézier-regions

Efficient implementations of clipping and union
operations are available
Sándor Laki (C) Geolocation by IP
address
26
Octant - Notations


bi : the region in which the target
node is located
gj : a constraint


It is a region where the node might be
reside associated with weight
Set of nodes


Landmarks: physical locations are at
least partially known (Lj)
Every Lj has an „estimated” location bLj
Sándor Laki (C) Geolocation by IP
address
27
Octant – Landmarks and
constraints


Primary landmark

GPS, street address

Low error
Secondary landmark



Position computed by Octant itself
Positive constraints ( set:  )

Node A is within d miles of Lk

g = (x,y) in bk c(x,y,d), where c(x,y,d) is a disc.
Negative constraints( set:  )

Node A is further than d miles from Lk

g = (x,y) in bk c(x,y,d)
Sándor Laki (C) Geolocation by IP
address
28
Estimated location

bi = XiXi \ XiXi
Sándor Laki (C) Geolocation by IP
address
29
Mapping latencies to distances

Latency between a target and a landmark

bounds thier maximum distance

Calculate with speed of light



delay*2/3*c
Low precision
Octant’s way



Dynamic calibration
For each L landmark compute two bounds RL(d) and
rL(d)
where d is the ping time of node i

rL(d)  || loc(L) – loc(i) ||  RL(d)

When queuing delays are dominant then rL(d) = 0.
Sándor Laki (C) Geolocation by IP
address
30
Mapping latencies to distances



Each landmark periodically pings all other landmarks =>
creating a correlation table
Determines the convex hull around the points => R(d)
and r(d)
It is sufficient when the target has a direct and
congestion-free path to the landmark

Octant introduce a cut off at latency p

a tunable percentage of landmark lie to the left of p

discard the others

(z is a fictitious datapoint,

placed far away)
Sándor Laki (C) Geolocation by IP
address
31
Mapping latencies to distances
Sándor Laki (C) Geolocation by IP
address
32
Last hop delays

Mapping is further complicated by queuing
and transmission delays associated with the
last hop



Cable and DSL connections
Overloaded PlanetLAB nodes
Goal: isolate the delay components which
artificially inflate latencies


Detailed maps of the underlying physical
network, as in network tomography (not in
Octant)
Octant introduce a simple metric called height
Sándor Laki (C) Geolocation by IP
address
33
Last hop delays in Octant






Based on pair-wise latency measurements
between landmarks
Primary landmarks: a, b, c
Measure thier latencies(RTT): [a,b], [a,c],
[b,c]
The positions of primary landmarks are
known -> we can estimate the transmission
delays: (a,b), (a,c), (b,c)
Lasthop delay(a,b) = [a,b] - (a,b)
Landmark coordinates (alon, alat),…
Sándor Laki (C) Geolocation by IP
address
34
Last hop delays in Octant

How much of the delays can be attributed to each
landmark?


Denoted by a’, b’ and c’ // height
Similarly, for a target t, we can compute t’, as an
estimation…

We can solve for

t’, tlon, tlat
Sándor Laki (C) Geolocation by IP
address
35
Last hop delays

tlon and tlat has relatively high error

not used in the later stages

Given the target and landmark heights

Each landmark can shift its

RL up if t’ < heights of the other landmarks

rL down if t’ > heights of the other landmarks
Sándor Laki (C) Geolocation by IP
address
36
Indirect routes

The preceding assumption



Route lengths are proportional to great circle
distances
not the case in practise, due to policy
routing
Example: a subscriber Ithaca, NY ->
Cornell Univ. (Ithaca)


Syracuse, NY -> Brockport, IL -> New York City
-> Cornell Univ.
~1 mile physical distance VS. ~800 miles length
path
Sándor Laki (C) Geolocation by IP
address
37
Indirect routes discovery


Landmark’s heigth can
indicate…
Localizing routers on the
network path

Secondary landmarks

Localization by latencies



Extract location from router
names
Reverse DNS lookup – undns
tool
Using ZIP code to determine
geographical location
Sándor Laki (C) Geolocation by IP
address
38
Handling uncertainty

Filter out errorneous constraint

Latency based constraints


Weight system that decreases
exponentially with increasing latency
Weight threshold
Sándor Laki (C) Geolocation by IP
address
39
Iterative refinement

Two phase:



First, we use accurate and mostly
conservative constraints
Second, less acurate and more
aggressive constraints to obtain a better
estimation (inside the initial estimated
region)
…
Sándor Laki (C) Geolocation by IP
address
40
Results
Sándor Laki (C) Geolocation by IP
address
41
Results
Sándor Laki (C) Geolocation by IP
address
42
Topology-based Geolocation
Sándor Laki (C) Geolocation by IP
address
43
Motivations

Problems with CBG

Use constraints that are less than speed of light



Risk of underestimates
When an underestimate occurs, the final region
does not contain the true location
Topology based geolocation

using the speed of light to generate constraints

inspired by Sensor Network Localization
Sándor Laki (C) Geolocation by IP
address
44
Summary of techniques

Traceroute from landmarks


Estimate hop latency


Increase structuring
Validate location hints


Improve accuracy
Cluster network interfaces


Map topology
Incorporate location hints
Constraint optimization

Geolocate targets
Sándor Laki (C) Geolocation by IP
address
45
Estimate hop latencies


Using traceroute tool to infer link
latency
Estimate hop latency from the
difference in RTT to adjacent routers


Accurate only if the link is traversed both
directions (symmetric routing)
How can we discover this property?

Three different techniques
Sándor Laki (C) Geolocation by IP
address
46
Estimate hop latencies

First, observing the reverse TTL values

Most routers initialize the TTL values for thier packets from a small set.



If TTL values changes significantly from one node to the next => discard the
link estimate
Second, measuring paths in both direction between pairs of landmarks



30,32,64,128,150,255
If both paths traverse a particular link => taking the differences of
measurements to the two endpoints
This estimation has high confidence
Third, increasing vantage points from which we probe a certain link


For every link on a path from a landmark we probes to both endpoints from
all other landmarks…
If these probes pass over the link => estimate for the link
Sándor Laki (C) Geolocation by IP
address
47
Clustering interfaces

Clustering interfaces that belong to the same router (IP
aliases)
Sándor Laki (C) Geolocation by IP
address
48
Clustering interfaces

Two IP-aliases techniques

Mercator




UDP probes are send to high-numbered ports on a set of
interfaces
Routers send back a port-unreachable ICMP message with the
source address
If two diff. interfaces replie with the same source address =>
aliases
Ally

Used on pairs of interfaces

Sends probes to the two if.

Examines the IP-ID

Most routers generate the IP-ID using a single counter that has
incremented after each packet has been created
Sándor Laki (C) Geolocation by IP
address
49
Validating location hints

DNS names -> locations

Some names are incorrect…


Missnamed, reconfig, reassignment of IP
addresses
Topology constraints can be used to verify
location hints

RTT measurements -> upper bounds…

Clustering -> aliases

Hop latencies
Sándor Laki (C) Geolocation by IP
address
50
Constraint optimization



Targets:X = {xi} Landmarks:L = {li}
Distance bw i and j: d(i,j)
Hard delay constraint



D(li xj) <= cij
Set of hdc = Cd
// cij Speed of light
Soft Link Latency Constraints




Hop latency bw i and j: hij
D(xi,xj) = hij + eij
Where eij is some error
Set of SLLC = Cl
Sándor Laki (C) Geolocation by IP
address
51
Constraint optimization



Minimize:
Subject to: Cd , Cl
Not a convex optimization problem


i,j in Cl|eij|
But we can recast it as a semidefinite
program
Using fast solvers


SeDuMi
Vivaldi
Sándor Laki (C) Geolocation by IP
address
52
Results
Sándor Laki (C) Geolocation by IP
address
53
Results
Sándor Laki (C) Geolocation by IP
address
54
References

RADAR and IP2Geo
http://eris.prakinf.tu-ilmenau.de/res/papers/coopStreaming/padmanabhan01Locating.pdf

Bernard Wong, Ivan Stoyanov and Emin Gün Sirer.
Octant: A Comprehensive Framework for the Geolocalization of Internet
Hosts.
In Proceedings of the Symposium on Networked System Design and
Implementation, Cambridge, Massachusetts, April 2007.

Katz-Bassett, E., John, J. P., Krishnamurthy, A., Wetherall, D., Anderson, T., and
Chawathe, Y.
Towards IP geolocation using delay and topology measurements.
In Proceedings of the 6th ACM SIGCOMM on internet Measurement (Rio de
Janeriro, Brazil, October 25 - 27, 2006). IMC '06. ACM Press, New York, NY, 7184.
Sándor Laki (C) Geolocation by IP
address
55