Spatial Data Analysis

Download Report

Transcript Spatial Data Analysis

Spatial Data Analysis
Why Geography is important.
What is spatial analysis?
• From Data to Information
– beyond mapping: added value
– transformations, manipulations and application of
analytical methods to spatial (geographic) data
• Lack of locational invariance
– analyses where the outcome changes when the
locations of the objects under study changes
» median center, clusters, spatial autocorrelation
– where matters
• In an absolute sense (coordinates)
• In a relative sense (spatial arrangement, distance)
Components of Spatial Analysis
• Visualization
– Showing interesting patterns
• Exploratory Spatial Data Analysis (ESDA)
– Finding interesting patterns
• Spatial Modeling, Regression
– Explaining interesting patterns
Implementation of Spatial Analysis
• Beyond GIS
– Analytical functionality not part of typical commercial
GIS
» Analytical extensions
– Exploration requires interactive approach
» Training requirements
» Software requirements
– Spatial modeling requires specialized statistical
methods
» Explicit treatment of spatial autocorrelation
» Space-time is not space + time
• ESDA and Spatial Econometrics
What Is Special About Spatial Data?
• Location, Location, Location
– “where” matters
• Dependence is the rule
– spatial interaction, contagion, externalities,
spill-overs, copycatting
– First Law of Geography (Tobler)
• everything depends on everything else, but closer
things more so
• Spatial heterogeneity
– Lack of stationarity in first-order statistics
• Pertains to the spatial or regional
differentiation observed in the value of a
variable
– Spatial drift (e.g., a trend surface)
– Spatial association
Nature of Spatial Data
• Spatially referenced data “georeferenced”
» “attribute” data associated with location
» where matters
• Example: Spatial Objects
– points: x, y coordinates
» cities, stores, crimes, accidents
– lines: arcs, from node, to node
» road network, transmission lines
– polygons: series of connected arcs
» provinces, cities, census tracts
GIS Data Model
• Discretization of geographical reality
necessitated by the nature of computing
devices (Goodchild)
– raster (grid) vs. vector (polygon)
– field view (regions, segments) vs. object view
(objects in a plane)
• Data model implies spatial sampling and
spatial errors
3 Classes of Spatial Data
• Geostatistical Data
– points as sample locations (“field” data as
opposed to “objects”)
• Continuous variation over space
• Lattice/Regional Data
– polygons or points (centroids)
• Discrete variation over space, observations
associated with regular or irregular areal units
• Point Patterns
– points on a map (occurrences of events at
locations in space)
• Observations of a variable are made at location X
• Assumption that the spatial arrangement is directly
related to the interaction between units of
observation
Visualization and ESDA
• Objective
– highlighting and detecting pattern
• Visualization
– mapping spatial distributions
– outlier detection
– smoothing rates
• ESDA
– dynamically linked windows
– linking and brushing
Mapping patterns
http://www.cdc.gov/nchs/data/gis/atmapfh.pdf
ESDA
http://www.public.iastate.edu/~arcview-xgobi/
Spatial Process
• Spatial Random Field
– { Z(s): s ∈ D }
» s ∈ Rd : generic data location (vector of
coordinates)
» D ⊂ Rd : index set
(subset of potential locations)
» Z(s) random variable at s, with realization z(s)
– Examples
• s are x, y coordinates of house sales, Z sales price
at s
• s are counties, Z is crime rate in s
Point Pattern Analysis
• Objective
– assessing spatial randomness
• Interest in location itself
– complete spatial randomness
– clustering, dispersion
• Distance-based statistics
– nearest neighbors
– number of events within given radius
Point Patterns
• Spatial process
– index set D is point process, s is random
• Data
– mapped pattern
» examples: location of disease, gang shootings
• Research question
– interest focuses on detecting absence of
spatial randomness (cluster statistics)
– clustered points vs dispersed points
Geostatistical Data
• Spatial Process
– index set D is fixed subset of Rd (continuous)
• Data
– sample points from underlying continuous surface
» examples: mining, air quality, house sales price
• Research Question
– interest focuses on modeling continuous spatial
variation
– spatial interpolation (kriging)
Variogram Modeling
(Geostatistics)
• Objective
– modeling continuous variation across space
• Variogram
– estimating how spatial dependence varies
with distance
– modeling distance decay
• Kriging
– optimal spatial prediction
Lattice or Regional Data
• Spatial process
– index set D is fixed collection of countably many
points in Rd
– finite, discrete spatial units
• Data
– fixed points or discrete locations (regions)
» examples: county tax rates, state unemployment
• Research question
– interest focuses on statistical inference
– estimation, specification tests
Spatial Autocorrelation
• Objective
– hypothesis test on spatial randomness of
attributes = value and location
• Global and local autocorrelation statistics:
Moran’s I, Geary’s c, G(d), LISA
• Visualization of spatial autocorrelation
– Moran scatterplot
– LISA maps
Spatial process models
• How is the spatial association generated?
– Spatial autoregressive process (SAR)
• Y = ρWY + ε
– Spatial moving average process (SMA)
• Y = (I + ρW) ε
– ε – vector of independent errors
– W = distance weights matrix
– In SAR, correlation is fairly persistent with increasing
distance, whereas with SMA is decays to zero fairly
quickly.
• Spatial process—the rule governing the
trajectory of the system as a chain of changes in
state.
• Spatial pattern—the map of a single realization
of the underlying spatial process (the data
available for analysis).
• Say you conduct a regression analysis. If the
residuals do not display spatial autocorrelation,
then there is no need to add “space” to the
model. Examine s.a. in the residuals using
Moran’s I or Geary’s c or G(d).
Perspectives on spatial process models
• Finding out how the variable Y relates to its
value in surrounding locations (the spatial lag)
while controlling for the influence of other
explanatory variables.
• When the interest is in the relation between the
explanatory variables X and the dependent
variable, after the spatial effect has been
controlled for (this is referred to as spatial
filtering or spatial screening).
• The expected value of the dependent
variable at each location is a function not
only of explanatory variables at that
location, but of the explanatory variables
at all other locations as well.