Chapter 5 - FMCC Academic Computing

Download Report

Transcript Chapter 5 - FMCC Academic Computing

Processing Geographic Data:
Common Measures Useful In
Thematic Mapping
SP 240 Cartography
Alex Chaucer
Data Processing
Data Processing
1. processing geographic data before
mapping in order to reduce, enhance,
retain key features, or show primary
spatial patterns
2. preliminary activity of data symbolization
Spatial Information Technology Center
The Need for Data Processing
•
•
•
•
Data reduction
Enhancement
Retention of Key Features
Simplification
Allows you to convey thought in graphic
form as simply as possible
Spatial Information Technology Center
Mathematical and Statistical Methods
Mathematics
Dealing with magnitudes and relationships that can be expressed
as numbers and symbols

Simple

Ratios, Proportions, and Percentages
Statistics
To draw conclusions where uncertainty and variation are present

To summarize observations

To describe relationships between two variables

To make inferences, both estimations and tests of
significance
Spatial Information Technology Center
Ratio, Proportion, and Percent
Ratio
A fraction used to express the relationship between two variables
Example: Population Density
Proportion
A special ratio that expresses the relationship between the
amount in one class and the total in all classes
Percentage
The proportion developed by multiplying the decimal fraction by
100
Spatial Information Technology Center
Variables, Values, and Arrays
Variables
Raw data for statistical analysis.
Examples: height, rainfall, elevation
Notation: X, Y
Values
An individual numerical observation of a variable
Examples of rainfall variable: 16, 18, 25, 34
Notation: x1 x2 x3 xi
Array
In statistical terminology, an ordered arrangement of values
Examples: ascending, descending
Can be nominal, ordinal, interval
Spatial Information Technology Center
Frequency Distributions and
Histograms
Frequency Distribution
An ordered array that shows the frequency of
occurrence of each value
Frequency Table
Constructed by dividing the total range of the data into classes,
adding up the cells in a class, which can then be displayed in a
histogram
Relative Frequency
The proportion of observations of a particular class
Spatial Information Technology Center
Frequency Distributions and
Histograms
Histogram
A graphic way of presenting the frequency or relative frequency of
occurrence of a variable
http://blacklandgrass.tamu.edu/images/latest/histogram.gif
Spatial Information Technology Center
Summarizing Data Distributions
Often, it is better to work with predefined statistical
divisions that there may already be data for.
Example: Metropolitan Statistical Areas or Census Tracts from the
Census
Aggregated
one numerical descriptor has already been calculated for
observed values within the bounded area
Example: The Median Family Income in Little Rock, Arkansas in a
Metropolitan Statistical Area (MSA) in 1969 is $8,285
Spatial Information Technology Center
Nominal Scale: The Mode and the
Variation Ratio
Nominal Scale Data
data that results from counting, simplest data level
Example: 10 hats, 5 coats, 22 pants, and 12 pairs of shoes
Mode
Number of the class in a nominal distribution with the highest
frequency (pants is mode above)
Variation Ratio
The proportion of occurrences not on the modal class
Modes and variation ratios are important indices of central tendency
and dispersion for nominally scaled data.
Spatial Information Technology Center
Ordinal Scale: The Median and
Percentiles
Ordinal data
data that provides a ranking
less than, more than decisions
Median
that place in a ranked ordinal data set that neither exceeds nor is
exceeded in rank by more than 50 percent of the observations;
used to describe ordinal data
Percentile
the place in a ranked data set that divides the number of
observations into specified a portion of all the observations
Spatial Information Technology Center
Interval and Ratio Scales: The Arithmetic
Mean and the Standard Deviation
Interval/Ratio Data
data resulting from measurement
most sophisticated data
Interval
arbitrary starting points (Fahrenheit temperature scale)
Ratio
have zero as a starting point (Kelvin scale)
Arithmetic Mean
add all the values and divide by the number of values
Standard Deviation
the square root of the variance; used to describe dispersion
around the arithmetic mean in an ordinal/ratio data set
Spatial Information Technology Center
Interval and Ratio Scales: The Arithmetic
Mean and the Standard Deviation
Variance
the statistic used most commonly to describe dispersion around
the arithmetic mean in an ordinal/ratio data set
Normal Distribution
frequency distribution represented by a bell-shaped curve;used
as a basis for comparison in many statistical measures
http://www.duncanwil.co.uk/norm_files/image009.jpg
Spatial Information Technology Center
Interval and Ratio Scales: The Arithmetic
Mean and the Standard Deviation
Skewness
numeric value of deviation from the normal or
bell-shaped, frequency distribution
Kurtosis
numeric value indicating “peakiness” in a
frequency distribution
http://www.pqsystems.com/eline/2001/02/kurtosis.gif
Spatial Information Technology Center
Measures of Areal Concentration
and Association
Cartographers and Geographers


Look for patterns of spatial variation
Following will look at some of the methods for describing
spatial patterns
Spatial Information Technology Center
The Coefficient of Areal
Correspondence
Coefficient of Areal Correspondence
method of comparing areal spatial
distributions; defined as a ratio
May suggest causal relationships as values
approach 1.0
Spatial Information Technology Center
Areal Means
Areal Mean
method of calculating the spatial balance of a set of data points
Over time, these centers of areas can give insight into a spatially
dynamic pattern.
Standard Distance
measure for depicting dispersion around an areal mean
1.
2.
3.
Both include all events (points) used in the analysis
Because all observations are used, both are extremely sensitive to
change in any one observation.
Because squares of distance are used in computing the standard
distance, its value is strongly affected by points at extreme distances
from the areal mean.
Spatial Information Technology Center
Location Quotient
Location Quotient
a measure of geographical concentration; illustrates deviation
from assumed proportional share
Location quotient can show geographical localization.
Can be useful by guiding further studies.
compares the local economy to a reference economy, in the
process attempting to identify specializations in the local economy
Spatial Information Technology Center
Measuring Spatial Association by
Regression Methods
Residuals from regression
differences between observed y (or x) values and
those estimated by the regression line.
Idea behind regression
Looking for a causal relationship between two
variables, one dependent and on independent
Spatial Information Technology Center
Measuring Spatial Association by
Regression Methods
Scattergram
diagram containing a plot of data points, each of which has a
value in two dimensions; a graphic way to illustrate
mathematical correlation; also called a scatterplot
Regression line
drawn on a graph to depict the relationship between two
variables; linear regression is a common form
Correlation
method of showing the mathematical association between two
or more variables
Spatial Information Technology Center
Data Classification
Classification
Scientific reduction of a large number of individual
observations, events, or numbers into smaller groups to
facilitate explanation
aka Taxonomy
Spatial Information Technology Center
Classification is More
Classification
By grouping data generalizations can be made
Can compute and average, and then classify each as
being above the average or below the average
Spatial Information Technology Center