Transcript Geocoding
Geocoding, Simple
Interpolation, Sampling
Peter Fox
GIS for Science
ERTH 4750 (98271)
Week 4, Tuesday, February 14, 2012
1
Contents
• Reading review
• Assignment 1 status?
• Geocoding
• Interpolation
• Sampling
• Lab on Friday
• Next week
2
Reading review for last week
• Lab on Friday? (Max can close his ears)…
• Video Tutorials
• MapInfo User Guide Chapter 3 (Basics, esp.
Working with Layers in the Layer Control, p. 57) layering
• MapInfo User Guide Chapter 7 (Drawing and
Editing Objects, esp. Editing, p. 170) - digitizing
• MapInfo User Guide Chapter 10 (Buffering and
Working with Objects, p. 268) - buffering
• MapInfo User Guide Chapter 12 (Registering Raster
Images, p.324) - registering
3
Geocoding
• “Geocoding is the process of finding associated
geographic coordinates (often expressed as latitude
and longitude) from other geographic data, such as
street addresses, or zip codes (postal codes). With
geographic coordinates the features can be
mapped and entered into a GIS, or the coordinates
can be embedded into media such as digital
photographs via geotagging.”
• “Reverse geocoding is the opposite: finding an
associated textual location such as a street
address, from geographic coordinates.”
• “A geocoder is a piece of software or a (web)
service that helps in this process” (could be YOU)
4
Geocoding – the world has changed
• http://geocoder.us/
• http://code.google.com/apis/maps/documenta
tion/geocoding/
• http://www.ffiec.gov/geocode/
• Currently 40% of U.S. addresses are
geocoded!
– How about the rest of the world?
5
Geocoding 001
• A regular geocode
may show us the
address of the office,
or the street entrance
to the complex.
• The rooftop geocode
shows us the exact
location.
• Delivering a pizza to
Unit 701 would be
much easier with
rooftop geocoding.
6
Geocoding…
• Often you will have data for which the
position is known only by its street or
address.
• Such data can be ‘spatially enabled’ by
geocoding with the addresses or street
names. This requires a street database. (!)
• Often the street database contains
geographic coordinates for the centroids of
the street segments. If you enter just the
street name, MapInfo will geocode to the
centroid of the street segment.
7
Example
If your street database has only street names and their centroids, you will use "First
Street" to describe your first point, at 4029 First Street, when geocoding.
If First Street extends from Second Ave. to Fourth Ave. your position will be given as the
‘street centroid’.
If First St. extends farther east, its centroid could be even farther from your actual
observation point.
If First St extends only between 2nd Ave. and 3rd Ave, your point will be assigned to the8
location of the ‘segment centroid’ between 2nd and 3rd Avenues.
Thus…
• By this method your second observation, at 4040
First St., would be assigned the same geographic
coordinates as the first.
• Clearly this method is intended for widely spaced
points to be geocoded.
• Your street database may contain coordinates and
address numbers for the intersections.
– For example, at the intersection of 1st St. and 2nd Ave.
(I1) the address numbers for 1st St start at 4000 and
increase eastward to 4100 at the intersection of 1st St.
and 3rd Ave (I2).
– When you geocode the address 4029 First St, MapInfo
will interpolate between the two intersections.
9
• Let’s say the geographic coordinates of the
intersection I1 are (x1, y1) and the
geographic coordinates of the intersection I2
are (x2, y2).
• To do this interpolation to get the coordinates
(x, y) of 4029 1st St:
• x = x1 + (4029-4000)/(4100-4000) * (x2 – x1)
• y = y1 + (4029-4000)/(4100-4000) * (y2 – y1)
10
Simply put
• In simple terms, because the street number
4029 is 29% of the way from I1 to I2, the
geographic coordinates will be assigned 29%
of the distance between them.
• Similarly, the second observation point, at
4040 1st St, will be assigned 40% of the way
from I1 to I2.
• Importantly, the second point will not coincide
with the first.
11
Uh-huh, it’s that easy…
• What about 2300 Downing St, Denver? Is
that North Downing or South Downing?
• Street, road, way, circle, court, avenue…
• Centroids for large ‘regions’
• GPS – authenticated?
• Tell me more…
12
Interpolation
• Interpolation is the process of estimating the
unknown value of a function based on the
known values at neighboring points.
• Because in general we cannot sample every
point on the ground, we sometimes
interpolate between our observed points to
predict, or make an estimate of, the value at
some point we did not or could not sample.
• In contrast to extrapolation…
13
Random samples in a region
• For example, let’s imagine that our job is to measure radon
concentrations in people’s basements for an entire county with the aim of
finding ‘hot-spots’ where levels are dangerously high.
• It would be impossible to sample every home but we want to be able to
warn folks whose homes are possibly ‘hot’ even though we did not
sample there.
14
• Hence, we want to be able to make accurate predictions at unsampled
points based only on our sampled points.
How to?
• Numerous ways to go about making these
predictions.
• We will start with the inverse-distance
weighting (IDW) method, used by MapInfo
Professional.
• In IDW, the value at an unknown point is the
weighted average of its neighbors, where the
weighting is the inverse of distance raised to
some power.
15
The IDW..
• In general, by weighted averages the value of
an unknown point zj at (xj, yj) is given by:
– zj = [SUM i=1,n (wi zi )] / [ SUM i=1,n wi ]
– n is the number of sampled points and w is the
weight assigned to each sampled point.
• You can see that if each w = 1, this equation
represents the common definition of the
average; i.e., the sum of the values divided
by the number of values.
16
Choices…
• In IDW, the value of w for a particular
sampled point is determined by how far that
sampled point is from the point to be
estimated.
• The weight decreases as distance d
increases such that w = d - k where k is some
chosen number (in Mapinfo k is between 1
and 10).
17
Sampling stations
18
Weighted Interpolation
19
Weights: distance and k…
• The exponent determines how smooth the map will be.
• The larger the exponent, the more important each sample
becomes for the estimated points around it.
20
Other ways…
• Another method is to use a Gaussian function
for the distance weighting if you suspect
some correlation distance between the data
• The Gaussian function is:
– wi = exp [ -0.5 * (di / do)2 ] / [ do sqrt (2 * pi)]
– where do is the correlation distance, di is the
distance to point i, and pi = 3.1415926...,
21
The correlation distance determines how smooth the map will be.
The larger the distance, the broader and flatter the weighting function.
22
Uncertainties…
• In the case in which the measurements at the
sampled points have uncertainties with a
standard deviation of s, the weight function
also includes it:
– zj = [SUM i=1,n ( zi wi si -2 )] / [ SUM i=1,n ( wi si-2 )
]
• NB. the superscript -2 indicates that the
weighting is according to 1/s2, or one over the
variance.
23
Leads us to sampling
• Interpolation is the process of estimating, or
predicting, values of some spatial property at points
between the points at which the property has been
measured (sampled).
24
Sampling methods
• Random – samples are randomly distributed
in the region of interest
• Regular – sampling on a grid
• Transect – samples are along lines crossing
the region
• Cluster – samples are clustered
• Contour – samples are made along contours
(often used when digitizing from a map)
25
Choices, choices…
• The sampled points can be distributed in a
regular or irregular grid.
• Often interpolation is used to produce a
regular grid out of an unevenly sampled
distribution.
• A regular grid (raster model) is necessary to
examine certain properties of the data, such
as slope and curvature, or to produce a more
understandable presentation in the form of
surface plots.
26
Factors for sampling…
• Sampling is often dictated by economics or
by logistics but also must reliably account for
the spatial variations in the quantity of
interest.
• Widely-spaced samples may miss the short
wavelength variations and can lead to
aliasing the signal.
27
Sampling theorem
• In essence, the theorem shows that a band
limited analog signal that has been sampled
can be perfectly reconstructed from an infinite
sequence of samples if the sampling rate
exceeds 2B samples per second, where B is
the highest frequency of the original signal.
• If a signal contains a component at exactly B
hertz, then samples spaced at exactly 1/(2B)
seconds do not completely determine the
signal… (wikipedia)
28
Aliased Sampling
• In this plot the sampled points (triangles) indicate a
decreasing trend, whereas the actual trend is ~
sinusoidal
29
30
Tips
• Sampling on a regular grid can lead to
aliasing if the grid spacing is too wide.
• If you want to sample on a grid (regular
sampling), first estimate the wavelength of
the spatial variations by making some close
samples.
• Random sampling allows you to estimate the
wavelengths of the variations in your
sampling but can leave large un-sampled
regions.
31
Summary
• Three more topics for GIS (for Science)
– Geocoding
– Interpolation
– Sampling
• For learning purposes remember:
– Demonstrate proficiency in using geospatial applications and tools
(commercial and open-source).
– Present verbally relational analysis and interpretation of a variety of
spatial data on maps.
– Demonstrate skill in applying database concepts to build and manipulate a
spatial database, SQL, spatial queries, and integration of graphic and tabular
data.
– Demonstrate intermediate knowledge of geospatial analysis methods
and their applications.
32
Reading for this week
• Chapter 13: Putting your Data on a Map
(Geocoding, pp. 353-364)
• Chapter 9: … Thematic maps and Grid
surface maps (Interpolation, pp. 264-266)
• Sampling theorem (self directed)
33
Friday Feb. 17th
• Lab session – with a walk through of
examples first
• 12pm-~1:40pm (attendance will be <obvious>
;-) )
• Hands on (as well as layering, etc.)
– Geocoding
– Interpolation
– Sampling
• Assignment due 5pm
34
Next classes
• Introduction to geostatistics.
• Interpolation techniques continued
(trend surfaces, Thiesses polygons,
splines)
• Lab on Friday (24th)
35