Pattern Statistics - University of California, Santa Barbara
Download
Report
Transcript Pattern Statistics - University of California, Santa Barbara
Pattern Statistics
Michael F. Goodchild
University of California
Santa Barbara
Outline
Some examples of analysis
Objectives of analysis
Cross-sectional analysis
Point patterns
What are we trying to do?
Infer process
– processes leave distinct fingerprints on the
landscape
– several processes can leave the same
fingerprints
•
•
•
enlist time to resolve ambiguity
invoke Occam's Razor
confirm a previously identified hypothesis
Alternatives
Expose aspects of pattern that are
otherwise invisible
– Openshaw
– Cova
Expose anomalies, patterns
Convince others of the existence of
patterns, problems, anomalies
Cross-sectional analysis
Social data collected in cross-section
– longitudinal data are difficult to construct
– difficult for bureaucracies to sustain
– compare temporal resolution of process to
temporal resolution of bureaucracy
Cross-sectional perspectives are rich in
context
– can never confirm process
– though they can perhaps falsify
– useful source of hypotheses, insights
What kinds of patterns are of
interest?
Unlabeled objects
– how does density vary?
– do locations influence each other?
– are there clusters?
Labeled objects
– is the arrangement of labels random?
– or do similar labels cluster?
– or do dissimilar labels cluster?
First-order effects
Random process (CSR)
– all locations are equally likely
– an event does not make other events more likely
in the immediate vicinity
First-order effect
– events are more likely in some locations than
others
– events may still be independent
– varying density
Second-order effects
Event makes others more or less likely
in the immediate vicinity
– clustering
– but is a cluster the result of first- or secondorder effects?
– is there a prior reason to expect variation in
density?
Testing methods
Counts by quadrat
– Poisson distribution
r m
P(r ) m e
r!
Deaths by horse-kick in the
Prussian army
Mean m = 0.61, n = 200
Deaths per yr
Probability
0
2
3
4
0.543 0.331 0.101 0.021 0.003
Number of
109.0
years expected
Number of
years observed
1
109
66.3
20.2
4.1
0.6
65
22
3
1
Towns in Iowa
1173 towns, 154 quadrats 20mi by 10mi
0
3
2.4
1
10
9.9
Chisquare with 8 df = 12.7
2
11
20.6
Accept H0
3
31
28.7
4
35
30.0
5
28
25.0
6
23
17.4
7
6
10.4
8
6
5.4
9+
1
4.0
Distance to nearest neighbor
Observed mean distance ro
Expected mean distance re = 1/2d
– where d is density per unit area
Test statistic:
ro re
z
0.26136 nd
Towns in Iowa
622 points tested
643 per unit area
Observed mean distance 3.52
Expected mean distance 3.46
Test statistic 0.82
Accept H0
But what about scale?
A pattern can be clustered at one scale
and random or dispersed at another
Poisson test
– scale reflected in quadrat size
Nearest-neighbor test
– scale reflected in choosing nearest
neighbor
– higher-order neighbors could be analyzed
Weaknesses of these simple
methods
Difficulty of dealing with scale
Second-order effects only
– density assumed uniform
Better methods are needed
K-function analysis
K(h) = expected number of events
within h of an arbitrarily chosen event,
divided by d
How to estimate K?
– take an event i
– for every event j lying within h of i:
•
score 1
Allowing for edge effects
score < 1
The K function
In CSR K(h) = h2
So instead plot:
Lh K h h
0.5
What about labeled points?
How are the points located?
– random, clustered, dispersed
How are the values assigned among the
points?
– among possible arrangments
– random
– clustered
– dispersed
Moran and Geary indices
n wij xi a x j a
n
I
n
i 1 j 1
n
n
n
x a w
2
i
i 1
i 1 j 1
ij
n 1 wij xi x j
2
c
i
j
2 wij xi a
2
i
j
i