Transcript Ch12McGrew

A Summary of An Introduction to
Statistical Problem Solving in
Geography
Chapter 12: Inferential Spatial
Statistics
Prepared by W. Bullitt Fitzhugh
Geography 3000 - Advanced Geographic Statistics
Dr. Paul Sutton
Winter Quarter, 2010
Inferential Spatial Statistics: Point
Pattern and Area Pattern Analysis
Common in Geography to find points and areas
used to represent spatial phenomenon
Methods exist to determine whether sample
point pattern or sample area pattern follow a
random spatial distribution
Spatial patterns that are distributed randomly
are not usually of interest because underlying
phenomenon has “no spatial logic”
Types of Spatial Patterns





Spatial patterns may be: clustered, random, or
dispersed
Clustered points or areas have high densities in some
locations and low or zero densities in other locations
Dispersed points or areas are (nearly) uniformly
distributed across a study area
Random points or areas do not tend towards a
clustered or dispersed spatial distribution
In real life, the type of patterns encountered in a
research problem may be ambiguous with elements of
multiple pattern types
Spatial Patterns Example: Lakeside
Trees and Condo Development
Spatial Autocorrelation



Positive spatial autocorrelation occurs when
nearby points or areas have similar values
(clustered patterns)
Negative spatial autocorrelation occurs when
nearby points or areas have dissimilar values
(dispersed patterns)
No spatial autocorrelation exists when point or
area patterns are randomly distributed
Point Pattern Analysis
• Point Pattern Analyses are useful analytic tools for
geographic research problems where the variable(s) at
hand are represented by points on a map
• “Nearest Neighbor Analysis” is a statistical procedure for
understanding the spacing of points on a map
• “Quadrant Analysis” focuses on the nature of the spatial
distribution of the point pattern within the overall study area
• Both methods aim to shed light on the underlying
process(es) behind the geographic results
Nearest Neighbor Analysis
• Nearest Neighbor Analysis (NNA) can be used as a descriptive
statistic or as a method to test hypotheses about the population from
which the sample points were taken
• NNA uses the average nearest neighbor distance as an index of point
spacing (descriptive)
– If a random spatial pattern, average nearest neighbor distance NNC(r) =
1/[2*SQRT(Density)]
– If a perfectly distributed or uniform spatial pattern, average nearest
neighbor distance NND(d) 1.07453/SQRT(Density)
– If a perfectly clustered spatial pattern, average nearest neighbor distance
NND(c) = 0 (all the points are at the same coordinates)
– Density = Number of Points/Study Area
• A standardized nearest neighbor index is used to compare results
from different data sets
– R = Observed Mean Nearest Neighbor Distance/ Random Mean Nearest
Neighbor Distance
Nearest Neighbor Analysis, Contd.
• The observed average nearest neighbor of a
data set can be compared to a theoretical
average nearest neighbor (assuming random
spatial distribution) to test the hypothesis that
points are randomly distributed
• Ho: NND = NND(r)
• Ha: NND not= NND(r) OR NND > NND(r) OR
NNR(r) > NND)
• Choice of Ha depends on having a rational for
thinking pattern may be clustered or distributed
Nearest Neighbor Analysis,
Example
X
Y
Distanc
e to A
Dist Distan Distan Distn Dista
anc ce to
ce to
ace to nce to
e to C
D
E
F
B
A
27
46
0
36.
67
28.64
19.85
6
15.3
B
34
10
36.67
0
21.47
31.83
38.28
39.81
C
15
20
28.64
21.
47
0
13.34
26.68
39.62
D
12
33
19.85
31.
83
13.34
0
15.81
34
E
21
46
6
38.
28
26.68
15.81
0
21.21
F
42
49
15.3
39.
81
39.62
34
21.21
0
Nearest Neighbor Analysis,
Example
NND(A)
E
NND(B)
C
NND
11.44
NND(C)
D
NNDe
6.98
NND(D)
E
s(NNDe)
1.49
NND(E)
A
z(NND)
2.99
NND(F)
D
•
Area of Study Area = (Max(X) – Min(X)) *(Max(Y) – Min(Y))
•
Ho: NND = NNDe (point pattern is random)
•
Ha: NND > NNDe (point pattern is more distributed than
random)
•
z(NND) = 2.99 => p-value = .001, Ho rejected
Quadrant Analysis
• Quadrant Analysis focuses on the frequency of
points occurring in a defined part of the study
area
• Quadrants are superimposed over the study
area, and the number of points in each quadrant
is examined
• Based on quadrant (cell) frequency variability
• Point pattern in the whole study area is
described through the analysis of point
frequencies in each quadrant
Quadrant Analysis, Contd.
• Variance Mean Ratio (VMR) = (Variance of Cell
Frequencies)/(Mean Cell Frequency)
• Disbursed distribution of points, cell frequencies should
be similar
– VMR ~= Zero
• Clustered distribution of points, cell frequencies should
be low or zero for most cells with a few cells having
many points
– VMR is large
• Random distribution of points, variance of cell
frequency should be near the mean cell frequency
Quadrant Analysis, Contd.
• VMR can be used as inferential test statistic
– Chi-square
– Function of VMR and number of cells m
– X^2 = VMR*(m-1)
• Ho: VMR = 1 (point R pattern is random)
• Ha: VMR not=1 OR VMR > 1 OR VMR < 1
• Need a large number of points spread accross a
large number of cells for Quadrant Analysis to
be worthwhile approach
Area Pattern Analysis
• Aspects of Area Pattern Analysis are analogous to Point
Pattern Analysis
• Basic statistic for analysis of area patterns is the “joint
count”
– Join is operationally defined as two areas with common boundary
– Measure of spatial autocorrelation for nominal, areal data
– Familiar GIS function
• Binary Categories used in pattern analysis
– Each individual area assigned black or white value
– Clustering occurs if areas with same binary value are contiguous
– Dispersion occurs if number of areas with black/white joins is greater than
number of same-category joins
– Randomness occurs if number of similar and dissimilar joins are roughly
equal
Area Pattern Analysis, Contd.
•
How do you determine the number of expected
black-white joins?
–
“Free sampling” approach relies on theoretical
background to inform probability of a given area having
black or white value
•
–
“Non-free sampling” approach does not rely on any
information outside of the
•
–
Probability of black or white value for a cell
corresponds to binomial distribution, probability p for
taking one value and q = 1-p for the other value
Probability of black or white cell based on number of
black and white cells in study area
When unsure which approach to take, take non-free
sampling
Area Pattern Analysis: Non-free
Sampling Test
•
Ho: Observed number of black-white joins
(OBW) = expected number of random blackwhite joins (EBW)
•
Ha: OBW not= EBW OR OBW > EBW OR
EBW > OWB
•
Choice of Ha depends on having a rationale
–
If there is no rationale in to choose Ha in either
direction, choose two-tailed test
•
Area Pattern Analysis: Non-free
Sampling Test
EBW = 2*J*B*W/N*(N – 1)
–
J = Total Number of Joins
–
B = Number of Black Areas
–
W = Number of White Areas
–
N = Total Number of Areas
•
Test statistic Z = (OBW – EBW)/s(BW)
•
s(BW) is standard error of expected number of
black-white joins
–
•
Given by formula 12.12 in McGrew (messy)
Worked example found on pages 185 – 189 of
text