Statistical Data Analysis
Download
Report
Transcript Statistical Data Analysis
Statistical Data Analysis
Chapter 9 - Montello and Sutton
An Introduction to Scientific
Research Methods in Geography
Overview
Statistical data analysis
Statistical description
Statistical inference
Geospatial Analysis
Data Analysis
Set of display and mathematical
techniques
Logical and conceptual considerations
Allows us to:
Extract meaning from systematically
collected measurements
Communicate that meaning to others
Geographers and Data
Geographers view data as statistical
(complex and imperfect) rather than
deterministic
Three reasons:
Imperfect sample of larger population
Measurement involves error
Phenomena are expressions of complex
sets of many interacting variables
Statistical Description
Goal: summarize potentially important
properties of our data using
Parameters - summary indices to describe
the population
Properties:
Central tendency
Variability / dispersion
Form / shape of distribution
Relationships
Central Tendency
Average or representative value
Three most common:
Mode - most frequent
Median - middle value
Mean (“average”)
Variability / Dispersion
Tells how data points differ from the central
tendency
How representative the central tendency is
Greater when variability is low
Three common:
Range - distance between high and low
Variance - average of deviations from the mean
Standard deviation - square root of the variance
Form / Distribution I
Shape of entire data set
Modality - number of local modes
Skewness - distribution uneven
Positive - mostly low and medium scores
Negative - mostly medium and high scores
Symmetry - mirror around central tendency
Bimodal
Unimodal - normal or “bell-shaped” curve
Form / Distribution II
Derived scores
Describe the value of individual scores
relative to the rest of the data set
Three common:
Rank - 1, 2, 3, etc.
Percentile rank - percentage of the data
that is less than the score in question
z-score - standard deviation units above or
below the mean of the data set
Relationships I
Systematic (consistent) patterns of high
or low values across pairs of variables
Linear relationship - two variables form
a straight line when graphed
Positive (or direct) - high value A has high
value B; low value A has low value B
Negative (or indirect) - high value A has
low value B; low value A has high value B
Relationships II
Relationship strength - degree that
patterns hold across all cases
Correlation coefficient - square of
correlation measure of relationship
strength
Regression analysis - expresses
relationship as an equation that predicts
the values of Y (criterion variable) as a
function of X (predictor variable)
Monotonic relationship - goes up or
down; not necessarily in a straight line
Statistical Inference I
Goal: Draw informed guesses about
likely patterns in population, based on
sample data evidence
Assign probabilities to guesses
Sampling distribution - distribution of a
sample statistic based on all possible
samples of a given size, from a given
population
Statistical Inference II
Assumptions:
Distribution is normal and variances are
equal
Data values are independent
Model specification (such as linearity,
inclusive of relevant predictor constructs)
Statistical Inference III
Two approaches:
Estimation
Point estimate - guess about specific
parameter value
Confidence interval - range of values
distributed around the point estimate,
expressed as probability
Hypothesis Testing
Null hypothesis (H0) is about exact point of
parameter
Alternative hypothesis (HA) is that the exact
point of the parameter is not the null
Statistical Inference IV
Four possible outcomes, based on:
Two possible truths (H0 is true, HA is false)
Two possible decisions (reject H0 and
accept HA; reject both H0 and HA)
Two types of errors:
Type I - reject H0 when H0 is true
Type II - fail to reject H0 when H0 is false
Geospatial Analysis
Geography data are different:
They are spatially distributed
Have location, extent or size, shape,
pattern, connectivity, etc.
They represent natural and human earthsurface features and processes
Spatiality is the focus or is central to the
analysis
Spatiality
Influences the accuracy of inferential
statistical analyses of nonspatial variables
Spatial autocorrelation exists when there
are patterns of spatial dependence – places
are “like” other places
Distance decay – near things are “more like”
each other than things further away
Areal Units
Which areal units to use?
Problems:
Using data from continuous source, but treat with
discrete spatial analysis techniques
Politicization of unit determination (like
gerrymandering)
Modifiable Areal Unit Problem (MAUP) – effect
that theoretically arbitrary areal geometries have
on geographic analysis
Questions
Why is data analysis in geography usually
conceptualized in statistical (probabilistic) terms?
What is meant by strength and form of statistical
relationships?
What is the purpose of statistical inference? Why are
statistical inferences necessarily and ultimately
uncertain?
What are two types of correct decisions and two
types of errors possible when hypothesis testing?
What is spatial autocorrelation, what forms can it
take, and why is it so important to geographic data
analysis?