Transcript CHAPTER III

CHAPTER III
GENERAL CONCEPTS IN SPATIAL
DATA ANALYSIS
METU, GGIT 711
OUTLINE (Last Week)
Review of Basic Statistical Concepts
2.1.Random Variables and Probability Distributions
2.1.1.The Binomial Distribution
2.1.2.The Poisson Distribution
2.1.3.The Normal Distribution
2.2. Expectation
2.3. Maximum Likelihood Estimation
2.4. Stationarity and Isotropy
2.5. Introductory Spatial Statistics
2.5.1. Points
METU, GGIT 711
OUTLINE
GENERAL CONCEPTS IN SPATIAL DATA ANALYSIS
3.1. Introduction
3.2. Visualizing Spatial Data
3.3. Exploring Spatial Data
3.3.1. Distinction between visualizing and exploring
spatial data
3.3.2. Distinction between exploring and modeling
spatial data
3.4. Modeling Spatial Data
3.5. Practical Problems of Spatial Data Analysis
3.6. Computers and Spatial Data Analysis
3.6.1. Methods of coupling GIS and spatial data
analysis
METU, GGIT 711
3.1. Introduction
Spatial data analysis involves:
 Accurate description of data relating to a process in
space.
 Exploration of patterns and relationships in data
 Search for explanations of such patterns and
relationships
These relate to:
  Visualizing spatial data
  Exploring spatial data
  Modeling spatial data
METU, GGIT 711
3.2. Visualizing Spatial Data
An essential requirement in any data analysis is the
ability to be able to “see” the data being analyzed.
Plots of data and other graphical displays of various
descriptions are fundamental tools for:
Seeking patterns
Generating hypotheses
Assessing the fit of proposed models
Determining the validity of predictions derived
from models
METU, GGIT 711
Maps are the tools for visualizing the spatial data.
Hence GIS can provide an environment to create maps
for spatial data and to explore spatial patterns and
relationships quickly and easily.
Cartographic considerations are important in using
maps in spatial data analyses. Because bad choices of
map type or scaling used for data values can lead to
 Misleading conclusions drawn from the display
 Suggest inappropriate models for the process
under study
METU, GGIT 711
3.3. Exploring Spatial Data
Exploratory methods for spatial data may be in the
form of:
 Maps
or
 Conventional plots
 E.g. Some exploratory techniques when applied to
point events result in contour map of the estimated
intensity of occurrences of events over the whole study
area; others, applied to the same set of events result in
a graph to throw light on the degree of spatial
dependence between event locations.
METU, GGIT 711
Exploring spatial data:
 Provides good descriptions of the data
 Help to develop hypothesis
 Help to establish appropriate models
If many exploratory spatial techniques result in
different forms of maps, then how do they really differ
from visualization techniques?
METU, GGIT 711
3.3.1. Distinction between visualizing and exploring
spatial data
Dividing line between visualization of spatial data and
exploratory data analysis is somewhat artificial. The
distinction is made based on the degree of data
manipulation.
 E.g.
Suppose that we have cause-specific death rates which
are age-standardized in a number of administrative zone.
METU, GGIT 711
Visualizing spatial data involves:
 A map of death rates
 Simple transformation of the rates
(No data manipulation)
Exploring spatial data involves:
 Map of spatial moving average of the rates in for
smoothing out local variations in order to see clearly
global trends (the moving averages are computed
in which each rate is replaced by the average of itself
and those neighboring districts)
(Data manipulation)
METU, GGIT 711
3.3.2. Distinction between exploring and modeling spatial
data
Exploratory methods do not involve any explicit model
for the data. However several exploratory techniques
involve informal comparison of some summary data.
Hence models do enter into exploratory techniques. The
distinction is based on the degree to what extent any
comparison made between the model. Moreover models
depend on certain assumptions.
METU, GGIT 711
 E.g.
Stan Openshaw (a quantitative geographer) tried to detect clusters
in point distributions of incidence of childhood leukemia. For this
purpose he used a technique which exhaustively compares the
observed intensity of events in circles of varying radius centered
on a fine grid imposed over the study area. By this way the aim
was to detect if cases were random in the circles. The circles with
significant discrepancies are identified and retained for later
display and investigation. This technique involves a model for
searching a random pattern and performs repeated formal
statistical comparisons with this model.
However, the validity of such comparison does not depend on the
assumption of any specific alternative model. The technique is
detecting clusters not searching for an explanation for the
process by which such clusters occur.
Therefore, this form of analysis makes few a priori assumptions
about the data and is fully in line with explanatory methods
METU, GGIT 711
3.4. Modeling Spatial Data
Models are mathematical abstraction of reality and not reality
itself. A statistical model involve using a combination of both:
 Data
 Reasonable assumptions
About the nature of phenomena being modeled. The assumptions
are arise from:
Background theoretical knowledge about the behavior of the
phenomena
The results of previous analysis on the same or similar
phenomenon
Judgement and intuition of the modeler.
METU, GGIT 711
A statistical model for a stochastic process consists of
specifying a probability distribution for the random
variable/variables that present the phenomena. Once a
probability distribution is fully specified there is
effectively nothing further that can be said about the
behavior of the process. A fitted model is evaluated
and results may lead to modification of assumptions or
using different model or updating the existing one.
 E.g.
Consider modeling levels of ozone in a large rural area.
The ozone level at each location s in R will vary during
the day and from day to day. A model can be fitted to
explain the distribution of ozone level based on a linear
regression.
METU, GGIT 711
Figure 3.1. Ozone levels
METU, GGIT 711
Basic Assumptions:
1. Random variables { Y(s), s ÎR } are independent
2. The probability distribution of random variable Y(s)
only differ in their mean value
3. The mean value is a simple linear function of
location.
4. Y(s) has normal distribution about this mean with
the same constant variance, σ2.
METU, GGIT 711
The model:
Where;
s1 and s2 are spatial coordinates of s
The assumptions provide a framework under which
final model specifications reduce to a problem of
estimation of unknown parameters.
βi can be
estimated based on Maximum Likelihood Estimation
method.
METU, GGIT 711
The next step is to test the reliability of the model or
goodness of the fit. This can be achieved by using
hypothesis-testing methods. Testing hypothesis, which
involves comparison of the fit of a hypothesized model
with that of an alternative, is in fact one facet of
statistical modeling. At this step:
 Does a model in which certain parameters have prespecified values fit the data significantly well?
METU, GGIT 711
Figure 3.2. Analysis of spatial data
METU, GGIT 711
3.5. Practical Problems of Spatial Data Analysis
There are basically four types of problem that an
analyst can face:
1. Problem of geographical scale
2. Lack of spatial indexing
3. Problem of edge of boundary effects
4. Problem of modifiable areal unit
METU, GGIT 711
Problem 1: Geographical scale at which analyses are
performed.
Spatial data analysis is concerned with detecting and
modeling spatial pattern. However, pattern at one
geographical scale may be simply random variations in
another pattern at a different scale.
METU, GGIT 711
 E.g.
Local variations in disease rates may die out
against the national scale.
The scale to which spatial analysis relates depends on:
Phenomena under study
Objective of the analysis
Scale at which data collected
METU, GGIT 711
Problem 2: Lack of spatial indexing or ordering in
space.
An indexing implies that we have a natural notion of
what is next or previous. On a regular grid there is
reasonably a natural ordering of locations. However,
spatial data are not indexed most of the time. While
some data (those from satellites) come in the form of
regular grid or lattice, much spatial data are provided
for a patchwork quilt of areal units or irregularly
distributed set of sites.

E.g. We can only speak of neighborhood of a zone
for areal units that share a common boundary.
METU, GGIT 711
Problem 3: Problem of edge or boundary effect.
In the middle of a study area, a site or zone may likely
to be surrounded by others; i.e. zone may have
neighbors. However, at the edge of the map or study
region, the neighbors extend in one direction only. In
spatial domain there is potentially much greater set of
observations around the edge of the map. Therefore
edge effects play critical role. This problem can be
overcome by leaving a guard area.
METU, GGIT 711
Problem 4: Problem of modifiable areal unit.
When data are measurements on a set of zones, often
they are aggregated measurements such as
households or individuals living in a zone. For the
sake of confidentially, the data are realized for arbitrary
areal units. The important point is to note that any
result from the analysis of these area aggregations is
usually conditional on the set of zones. Depending on
different aggregated areas the result is subject to
change.
METU, GGIT 711
3.6. Computers and Spatial Data Analysis
Q: Given that some spatial analysis capabilities are
available in widely used systems, is there a need for
spatial analysis functions beyond those currently
provided by GIS?
A: At present yes!

E.g. A GIS will currently be able to overlay a set of
points (childhood cancer) onto a set of polygons
(buffer zones constructed along high voltage power
lines). The GIS will then be able to count how many
points lie within particular polygons by performing a
“point-in-polygon” operation.
METU, GGIT 711
However, it is hard to find a system, which evaluates
significantly the nature of the association between the
set of points and the set of polygons.
If we want to know whether there is statistically
significant association between the incidence of
childhood cancer and proximity to high voltage power
lines we can not do this readily by using GIS.
There are several ways for the use of computers in
spatial data analysis. Most of the time spatial analysis
techniques are coupled with GIS.
METU, GGIT 711
3.6.1.Methods of coupling GIS and spatial data analysis
There are 4 different methods to use spatial analysis
techniques with GIS:
 Full integration
 Loose coupling
 Close coupling
 Special combinations
METU, GGIT 711
Full integration: Every method for exploratory spatial
analysis and modeling are available within a GIS.
Loose coupling: Data are exported from GIS for use
within a spatial statistical framework, (i.e. having GIS and
separate spatial analysis software talk to each other)
Close coupling: Spatial analysis routines are called from
within GIS, (which requires use of macro language
capabilities of GIS).
Special combinations: A self-contained spatial analysis
system for a specific purpose is developed (Case I).
OR
Spatial analysis and GIS functions are added to a
standard statistical package (Case II).
METU, GGIT 711
 E.g.
(Case I) GAM (Geographical Analysis Machine
developed for detecting existence of clusters),
SpaceStat (developed for exploratory data analysis and
model fitting in spatial econometrics)
 E.g. (Case II)
REGARD (by John Haslett from Trinity College, Dublin)
Operates on Mac.
S-Plus (used by professional statisticians)
Operates on IBM
SPLANCS (it can be coupled with Arc/Info, i.e. close
coupling)
Operates on Unix
XLISP-STAT
Operates on Unix
METU, GGIT 711