Exploratory Spatial Data Analysis (ESDA)

Download Report

Transcript Exploratory Spatial Data Analysis (ESDA)

Exploratory Spatial Data Analysis
(ESDA)
Analysis through Visualization
Data Normalization
• Values (attributes) by themselves are sometimes
misleading.
• Normalization refers to the division of multiple sets of
data by a common variable in order to negate that
variable's effect on the data.
• Normalization can help to compare samples.
• Example: The number of people in a county does not
tell us about the relative density of the people. What
we may want is the # of people per area.
Density = (# of people in county / county area)
Data Normalizatoin
Approaches
• Density – divide count by area
• Divide an area –based count variable by another area based count variable
X = Area on wheat / Total area in crops
X = higher ratio indicates that wheat is more important
• Compute ratio of two count variables
X = $ of Wheat Sold / $ of all Crops Sold
X = higher ratio indicates that wheat contributed more income to area
• Compute summary numerical measures for each unit (sum, mean, SD, etc.)
Data Normalization
Raw - # of Hispanics per Tract
Normalized - #Hispanic/Total#
Mapping
Common ESDA Methods
• Quantile - Each class contains an equal number of features.
• Percentile - Sort values in numerical order, compute % of
total observations. Note that the Median = 50% quartile
• Standard Deviation – good for normal distribution
• Box Map – Shows outliers as the function of quartiles.
IQR = Q75 – Q25
Lower Outlier = Q25 – Hinge * IQR
Upper Outlier = Q75 + Hinge * IQR
Mapping (%Hispanic)
Exploration of Data
•
•
•
•
Histogram - examine distribution
Scatter Plot - examine correlation between variables
Box Plot - compare distribution between variables
Parallel Coordinate Plot - examine relation between
variables
Box Plots and Quantile
Spatial Autocorrelation
• First law of geography: “everything is related to
everything else, but near things are more related than
distant things” – Waldo Tobler
• Spatial Autocorrelation – correlation of a variable
with itself through space.
– If there is any systematic pattern in the spatial distribution of a variable,
it is said to be spatially autocorrelated.
– If nearby or neighboring areas are more alike, this is positive spatial
autocorrelation.
– Negative autocorrelation describes patterns in which neighboring areas
are unlike.
– Random patterns exhibit no spatial autocorrelation.
Why spatial autocorrelation is important
• Most statistics are based on the assumption that the
values of observations in each sample are
independent of one another
• Positive spatial autocorrelation may violate this, if the
samples were taken from nearby areas
• Goals of spatial autocorrelation
– Measure the strength of spatial autocorrelation in a map
– test the assumption of independence or randomness
Moran’s I
• One of the oldest indicators of spatial
autocorrelation (Moran, 1950). Still a
defacto standard for determining spatial
autocorrelation.
• Applied to zones or points with continuous
variables associated with them.
• Compares the value of the variable at any one
location with the value at all other locations.
Moran’s I
I
N i  j Wi , j ( X i  X )( X j  X )
(i  j Wi , j )i ( X i  X ) 2
Where N is the number of cases
Xi is the variable value at a particular location
Xj is the variable value at another location
X-bar is the mean of the variable
Wij is a weight applied to the comparison between
location i and location j. Weights are based either
on distance or adjacency.