Pattern Statistics - University of California, Santa Barbara

Download Report

Transcript Pattern Statistics - University of California, Santa Barbara

Pattern Statistics
Michael F. Goodchild
University of California
Santa Barbara
Outline
Some examples of analysis
 Objectives of analysis
 Cross-sectional analysis
 Point patterns

What are we trying to do?

Infer process
– processes leave distinct fingerprints on the
landscape
– several processes can leave the same
fingerprints
•
•
•
enlist time to resolve ambiguity
invoke Occam's Razor
confirm a previously identified hypothesis
Alternatives

Expose aspects of pattern that are
otherwise invisible
– Openshaw
– Cova
Expose anomalies, patterns
 Convince others of the existence of
patterns, problems, anomalies

Cross-sectional analysis

Social data collected in cross-section
– longitudinal data are difficult to construct
– difficult for bureaucracies to sustain
– compare temporal resolution of process to
temporal resolution of bureaucracy

Cross-sectional perspectives are rich in
context
– can never confirm process
– though they can perhaps falsify
– useful source of hypotheses, insights
What kinds of patterns are of
interest?

Unlabeled objects
– how does density vary?
– do locations influence each other?
– are there clusters?

Labeled objects
– is the arrangement of labels random?
– or do similar labels cluster?
– or do dissimilar labels cluster?
First-order effects

Random process (CSR)
– all locations are equally likely
– an event does not make other events more likely
in the immediate vicinity

First-order effect
– events are more likely in some locations than
others
– events may still be independent
– varying density
Second-order effects

Event makes others more or less likely
in the immediate vicinity
– clustering
– but is a cluster the result of first- or secondorder effects?
– is there a prior reason to expect variation in
density?
Testing methods

Counts by quadrat
– Poisson distribution
r m
P(r )  m e
r!
Deaths by horse-kick in the
Prussian army

Mean m = 0.61, n = 200
Deaths per yr
Probability
0
2
3
4
0.543 0.331 0.101 0.021 0.003
Number of
109.0
years expected
Number of
years observed
1
109
66.3
20.2
4.1
0.6
65
22
3
1
Towns in Iowa

1173 towns, 154 quadrats 20mi by 10mi
0
3
2.4
1
10
9.9
Chisquare with 8 df = 12.7
2
11
20.6
Accept H0
3
31
28.7
4
35
30.0
5
28
25.0
6
23
17.4
7
6
10.4
8
6
5.4
9+
1
4.0
Distance to nearest neighbor
Observed mean distance ro
 Expected mean distance re = 1/2d

– where d is density per unit area

Test statistic:
ro  re
z
0.26136 nd
Towns in Iowa
622 points tested
 643 per unit area
 Observed mean distance 3.52
 Expected mean distance 3.46
 Test statistic 0.82
 Accept H0

But what about scale?
A pattern can be clustered at one scale
and random or dispersed at another
 Poisson test

– scale reflected in quadrat size

Nearest-neighbor test
– scale reflected in choosing nearest
neighbor
– higher-order neighbors could be analyzed
Weaknesses of these simple
methods
Difficulty of dealing with scale
 Second-order effects only

– density assumed uniform

Better methods are needed
K-function analysis
K(h) = expected number of events
within h of an arbitrarily chosen event,
divided by d
 How to estimate K?

– take an event i
– for every event j lying within h of i:
•
score 1
Allowing for edge effects
score < 1
The K function
In CSR K(h) = h2
 So instead plot:

Lh  K h    h
0.5
What about labeled points?

How are the points located?
– random, clustered, dispersed

How are the values assigned among the
points?
– among possible arrangments
– random
– clustered
– dispersed
Moran and Geary indices
n wij xi  a x j  a 
n
I
n
i 1 j 1
n
n
n
 x  a   w
2
i
i 1
i 1 j 1
ij
n  1 wij xi  x j 
2
c
i
j
2 wij  xi  a 
2
i
j
i