Spatial ClusterS and Pattern Analysis

Download Report

Transcript Spatial ClusterS and Pattern Analysis

SPATIAL CLUSTERS
AND PATTERN
ANALYSIS
CHRIS JOCHEM
GEOG 5161 – SPRING 2011
WHEN YOU KNOW ‘WHERE’, YOU
CAN START TO ASK ‘WHY’
John Snow’s map of cholera
deaths in London, 1854.
Need to move beyond simply
mapping events and beyond
general point pattern analysis.
Water pump locations
GOALS OF CLUSTER ANALYSIS
• Examine “unusual” groupings of events in space
and/or time (Cromley and McLafferty 2002)
• Both confirmatory and exploratory of hypotheses
• Different ways to operationalize unusual or
unexpected patterns using probability distributions
• Common Questions (Waller and Gotway 2004, 155):
• Do cases tend to occur near other cases? (possible
infectious agent)
• Does a particular area within the study region seem
to contain a significant excess of observed events?
(possible environmental risk factor)
• Where are the most unusual collection of cases?
(possible cluster)
DIFFERENT
METHODOLOGIES
… for different levels of analysis
• Point Pattern Analysis: density and distance
measurements
• Ex: density map of cholera cases
• Clustering requires different statistical tests
• often used sequentially or as part of a larger study to
select areas for more detailed field work.
• Three main categories of tests:
1. Global
2. Local
3. Focal
3 MAIN CATEGORIES
OF TESTS
1.
2.
3.
Global: a single test for general patterns and spatial
autocorrelation over an entire study region
•
Moran’s I
•
Geary’s C
Local: search for specific regions or areas where clustering
is observed above expected levels
• Example: areas of high crime or terrorist attacks
• Local Moran’s I
• Getis-Ord Gi*
• Spatial Scan Statistic
Focal: specialized statistics searching only in regions
around fixed locations
•
•
•
•
Example: cancers around nuclear reactors
Stone’s Test
Bithell
Tango
CONSIDER YOUR DATA
Point Data
Polygon Data
•
Events (diseases
crimes, conflicts, etc.)
•
•
Cases/Controls
•
Census or social
attributes (poverty,
unemployment,
income, etc.)
Measurement locations
•
Aggregate counts of
individual-level events
Considerations
•
Point level accuracy
Considerations
•
Modifiable areal unit
problem (MAUP)
SPATIAL SCAN STATISTIC
As implemented in SatScan® software
• Input point or area data for events and background
population, can vary over time
• Pass a circular or elliptical filter of varying radii across
study area
• Count observed cases and test likelihood ratio against
expected cases given the population or person-time
Pros
Cons
Spatial, temporal, or
space-time clusters
Learning curve for set-up
and interpretation
Controls for risk factors
and covariates
No graphical output
EXAMPLES
1. O’Loughlin, John and Frank D. W. Witmer. 2011. The
Localized Geographies of Violence in the North
Caucasus of Russia, 1999-2007. The Annals of the
Association of American Geographers 101, no. 1
(January): 178 – 201.
Using a spatial scan statistic to find local clusters of conflicts in space and time.
EXAMPLES
2. Kulldorff M, Athas W, Feuer E, Miller B, Key C. Evaluating
cluster alarms: A space-time scan statistic and brain
cancer in Los Alamos. American Journal of Public
Health, 1998; 88:1377-1380.
See the demo!
Many additional examples:
http://www.satscan.org/references.html
CONSIDERATIONS
AND CRITIQUES
• Must consider data limitations and accuracies
• How do you define a ‘cluster’?  expected outcomes
• Possibility of occurring by chance, especially with small numbers
• Based on theory or hypothesized relationships
“Texas
Sharpshooter
Fallacy”
CONSIDERATIONS
AND CRITIQUES
• Must consider underlying population at risk
• People are not evenly distributed
• Complete spatial randomness is usually not valid
• Difficult to link causality to clusters (Elliot et al. 2000, Elliot
and Wakefield 2001)
• Usually requires further studies
• What matters is scientific, not statistical, significance
(Gould 1970)
• See also O’Sullivan and Unwin (2003), and Harvey
(1966,1967)
RESOURCES
Free Software:
• SatScan: http://www.satscan.org/
• CrimeStat: http://www.icpsr.umich.edu/CrimeStat/
• GeoDa: http://geodacenter.asu.edu/projects/opengeoda
• R packages: http://cran.r-project.org/web/views/Spatial.html
Broad Street Cholera Data:
http://www.asdar-book.org/datasets.php?dataset=4
REFERENCES
Anselin, Luc. 2006. How (not) to lie with spatial statistics. American Journal of
Preventive Medicine 30: s3-s6.
Cromley, Ellen K. and Sara L. McLafferty. 2002. GIS and Public Health. New York:
Guilford Press.
Elliott, Paul and Jon Wakefield. 2001. Disease clusters: Should they be investigated,
and, if so, when and how? Journal of the Royal Statistical Society A 164, 1: 3-12.
Elliott, Paul, Jon Wakefield, Nicola Best, and David Briggs. 2000. Spatial
Epidemiology: Methods and Applications. Oxford University Press.
Gould, Peter. 1970. Is statistix inferens the geographic name for a wild goose?
Economic Geography 46 (June): 439-448.
Harvey, David W. 1966. Geographical processes and the analysis of point patterns:
Testing models of diffusion by quadrat sampling. Transactions of the Institute of
British Geographers 40: 81-95.
Harvey, David W. 1967. Some methodological problems in the useof Neyman type A
and negative binomial distribution for the analysis of point patterns. Transactions of
the Institute of British Geographers 44: 81-95.
REFERENCES
Kulldorff, Martin, and Neville Nagarwalla. 1995. Spatial disease clusters: Detection and
inference. Statistics in Medicine 14: 799-810.
Kulldorff, Martin, W. Athas, E. Feuer, B. Miller, and C. Key. 1998. Evaluating cluster
alarms: A space-time scan statistic and brain cancer in Los Alamos. American Journal of
Public Health 88:1377-1380.
Kulldorff, Martin. 1997. A spatial scan statistic. Communications in Statistics – Theory
and Methods 26, no. 6: 1481-1496.
Kulldorff, Martin. and Information Management Services, Inc. SaTScanTM v8.0: Software
for the spatial and space-time scan statistics. http://www.satscan.org/, 2009.
O’Loughlin, John and Frank D. W. Witmer. 2011. The Localized Geographies of Violence
in the North Caucasus of Russia, 1999-2007. The Annals of the Association of American
Geographers 101, no. 1 (January): 178 – 201.
O’Sullivan, David and David J. Unwin. 2003. Geographic Information Analysis. Hoboken,
New Jersey: John Wiley and Sons.
Olsen, Sjurdur F., Marco Martuzzi, and Paul Elliott. 1996. Cluster analysis and disease
mapping – why, when, and how? A step by step guide. British Medical Journal 313
(October): 863-866.
Waller, Lance A. and Carol A. Gotway. 2004. Applied Spatial Statistics for Public Health
Data. Hoboken, New Jersey: John Wiley and Sons.
SPATIAL CLUSTERS
AND PATTERN
ANALYSIS
CHRIS JOCHEM
GEOG 5161 – SPRING 2011