Data Classification Methods

download report

Transcript Data Classification Methods

Cartographic Principles
Thematic Maps (Slightly modified from the
original – the original is the PP referenced
on the web)
Copyright © 2001.
2003. The Polis Center
Techniques for Effective Map Design
Graduated Color Maps
What are they?
Data area grouped
into classes.
Each class has a
separate color.
Colors get
progressively darker or
lighter to indicate
increase/decrease of
value.
Copyright © 2001. The Polis Center
Techniques for Effective Map Design
Graduated Color Maps
The most
important
assumption in
choropleth
mapping is that
the value in the
enumeration
unit is spread
uniformly
throughout the
unit.
Copyright © 2001. The Polis Center
Techniques for Effective Map Design
Graduated Color Maps
It is traditional to use
ratios instead of total
values when creating
graduated color maps.
Most mapping areas are
unequal. The varying
sizes and their values
will alter the impression
of the distribution.
Copyright © 2001. The Polis Center
Techniques for Effective Map Design
Graduated Color Maps
Geographical phenomena –
temperature, pressure, elevation should not be mapped using
graduated color mapping
techniques since their distributions
are not controlled by political or
administrative subdivisions.
An alternative for these types of
data are surface maps. These can
be created with ArcGIS Spatial
Analyst.
They are made of very small
choropleths (pixels) so that the Surface maps are also an effective
way to display many types of cultural
eye thinks they are smooth.
S
S
SS
S
S
S
86th St
S
S
S
S
71
S
S
S
S
S
S
S
S
S
S
SS
S
SS
S
S
SS
S
SS
S
SS
S
S
SS
S
SS SS
SSS
SS
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
SS
S
S
S
SS
S
S
S
S
S
ke
Pi
n
S
to
Sn d el
S Pe
SS S
S SSS
S
S S
S
S
S
,.-
Rd
S
SS
S
te
S
S
S
S
S
S
S
S
38th StS
SS
74
S
S S
et
ay
,.-
S
f
La
S
S
!"3 7
S
S
S S
S
S
SS
S SS
SS
S
S
S S S
SS S
SSS
SS
S
S S SS
S S
S
S S
SS
S
S
S
S
SS S SSSSSSS
S
S S
S
S S
S S SSS
S
S
S
S
S
SSS S SS S S
SS
S
S S SSS SS S
S
SSSS SSSS
S
S
S
S
S
S
S S
SS
SSS
S
SS
S
SSS
S
S
S S
S
SS SS SS
S
S 7 0S
SSS
SS
S
S
S SS
S S
SS SS S S
SS
S
S SSS
SSS S
SS S S
S
SS S SSSS
S
S
SS SS S S
S S
SS
S
S
S S SSSS
SSS SSS
S S SS
S S
SS
S
S
S
SS
S
S
S
SS S S
S
S
S
S
S
S
S S
S
S
S
SSS S
SS
SS
S
SS
S
S S S
S
S
SS
S
S SS S
SSS S SS
SS
SS S
S
S
SS SSS S
S
S S
S
10th St
S
S S S
SSS SS
S
S
SSSSS
SS
S
S S
S S
S
SS S SSS
S
S
S
SS
S
S
SS
S
S
S
S
S
S
S
S
S
SS
S
SS
S
S S SSSSSSSSS
St
S S
S
S S
SS
S
W a sh inSgto n
S
S S SS
S
S
S
SS
SS
S SS
S
S
S
SS
S
S
S
SS
S S
S S S
S
S
SSS
S
S
S
S
S
S S
S
S
SS
S
S
S
S
SS
SS SS
S S
S
SS S
S
S SS
S
S
S SS
S
SS
SS
SSS SS SS
S
S
S
SS SS S SSS S
SS
S SS
S
S S
S
S
S S SS
S
S
SSSSS
S
S
S
SS
S
S
S
S
S
SS SS
SS S
S
S
S
S
S St
S
Ray mond
S
S
S
S
S
S
S
S
S S
S
S S
S S S
S
S
S
S
S
S
So
74
SS
S S
S
S
S
S
u
S
SS S S
t
S
S
S
S S SS
S h ea
S
S
S S
S
SS
st
S S
S
S S
S
SS
S ern
S
SS
S
S
S
Av
S
e
S
S
S
S
SS
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
65 S S
S
S
S
S S
S
S
S
S
S
SS
S
SS
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S SS
S
S
SS
S
S
S
S
6 9S
SS
S
S
S
S
S
,.-
S
Key stone Ave
,.-
S
Rd
S
SS S
Me ridia n St
S
an
46 5
hig
,.-
S
S
S
M ic
S
S
S
S
,.-
,.-
70
S
Ke
nt
uc
ky
A
ve
,.-
S
S
S
S
S
SS
S
Southport R d
S
SS
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
SS
S
S
S
S
S
S
S
S
S
information when used properly.
Copyright © 2001. The Polis Center
Techniques for Effective Map Design
Dot Density Maps
What are they?
Dot density
maps use a
dot to indicate
one or more
occurrences of
a phenomena.
Copyright © 2001. The Polis Center
Techniques for Effective Map Design
Dot Density Maps
Choose a dot value
that results in two
or three dots
being placed in
the area with the
least mapped
quantity.
Select a dot
value that is
easily understood
such as 5, 100,
1000, etc.
The dots should coalesce in the
statistical area that has the highest
density of the mapped value.
Copyright © 2001. The Polis Center
Techniques for Effective Map Design
Dot Density Maps
Advantages
Easily understood
by the reader
Illustrates spatial
density
Original data can be
recovered from the
map if the dots
represent the actual
locations of the
phenomena
Copyright © 2001. The Polis Center
1 dot = 5 births
Therefore 6 dots =
30 births
Techniques for Effective Map Design
Dot Density Maps
Population
Disadvantages
A dot map that is
computer generated
typically involves a
random distribution of
dots within an
enumeration area.
Solution - Use census
blocks over tracts,
counties over states,
etc.
Copyright © 2001. The Polis Center
1 dot = 5000 persons
Population
1 dot = 5000 persons
Techniques for Effective Map Design
Proportional and Graduated
Symbol Maps
What are they?
Proportional Symbol
• The size of a point
symbol varies from place
to place in proportion to
the quantity that it
represents.
Graduated Symbol
• Size of a point symbol is
based on which class
the features value falls
within.
Copyright © 2001. The Polis Center
Techniques for Effective Map Design
Proportional and Graduated Symbol
Maps
Guidelines
Circles are the most common symbol used due to
the ease with which they are interpreted.
All symbols should generally be the same color.
The difference between the largest and smallest
symbols should be great enough to show
differences in data values.
Largest symbols should not overlap so much that
they obscure patterns on the map.
Copyright © 2001. The Polis Center
Techniques for Effective Map Design
Data Classification Methods
What’s the Difference?
Standard
Deviation
Equal
Interval
These 4 maps are all quite different and
convey a different message
Natural
Breaks
Copyright © 2001. The Polis Center
Quantiles
Techniques for Effective Map Design
Data Classification
Values are grouped into
classes in order to
simplify mapped
patterns for the reader.
No more than 6 and no
less than 4 classes are
recommended.
> 3 classes < 7
Copyright © 2001. The Polis Center
Techniques for Effective Map Design
Data Classification Methods
Equal Interval
Encloses equal amounts of the range of the
mapped data within each class interval.
For example, if you need four
classes for a dataset with values
that fall within a range of 0 to
100, the classes would be:
0 to 24
25 to 49
50 to 74
75 to 100
Copyright © 2001. The Polis Center
Techniques for Effective Map Design
Equal
Interval
Example
Copyright © 2001. The Polis Center
Techniques for Effective Map Design
Data Classification Methods
Equal Interval Classes – How it Works
GIS subtracts lowest value in data set from
the highest.
It divides the number by the number of
classes you specified.
It adds the number to the lowest value to get
the maximum for the first class.
It then adds the maximum value to set the
breaks for the rest of the classes.
Copyright © 2001. The Polis Center
Techniques for Effective Map Design
Data Classification Methods
Equal Interval Classes – What Are They Good For?
Presenting information to a non-technical
audience since equal intervals are easier to
interpret since the range of each class is
equal.
Mapping continuous data – such as
precipitation or temperature.
Copyright © 2001. The Polis Center
Techniques for Effective Map Design
Data Classification Methods
Equal Interval Classes - Disadvantages
If data values are clustered rather than evenly
distributed, there may be many features in one
or two classes and some classes may have few
or no features.
Copyright © 2001. The Polis Center
Techniques for Effective Map Design
Data Classification Methods
Standard Deviation
This classification method should only be
used when the data set approximates a
normal distribution.
Normal distribution has
values clustered around
the mean
Class intervals created with this method
should only be used when the reader
understands them
Copyright © 2001. The Polis Center
Techniques for Effective Map Design
Standard
Deviation
Example
Copyright © 2001. The Polis Center
Techniques for Effective Map Design
Data Classification Methods
Standard Deviation: How It Works
1. The GIS finds the mean value by adding all data
values and dividing by the number of features.
2. It then calculates the standard deviation (s) by
subtracting the mean from each value and squaring
it (to make it positive), sums the numbers and
divides by the number of features.
3. It then takes the square root to find the final
standard deviation.
4. GIS then creates classes above and below the mean
based on the number of standard deviations you
specify such as ½ or 1 standard deviations.
Copyright © 2001. The Polis Center
Techniques for Effective Map Design
Data Classification Methods:
Standard Deviation: How It Works
S=
Copyright © 2001. The Polis Center
(x - x)
2
n
Techniques for Effective Map Design
Data Classification Methods:
Standard Deviation: How It Works
The data classes are
centered on the mean
-3S -2S -S
Copyright © 2001. The Polis Center
0
S
Classes
2S
3S
Techniques for Effective Map Design
Data Classification Methods
Standard Deviation: What is it good for?
Seeing what features are above or below an
average value.
Displaying data that has many values around
the mean and few further from the mean
(this is statistically called a bell curve or
normal distribution).
Copyright © 2001. The Polis Center
Techniques for Effective Map Design
Data Classification Methods
Standard Deviation: Disadvantages
Map does not show actual values of features,
only how far they are from the mean.
Very high or low values (called outliers) can
skew the mean so that most features fall in
the same class.
Copyright © 2001. The Polis Center
Techniques for Effective Map Design
Data Classification Methods
Quantiles
Quantiles produces irregular intervals.
Developing class boundaries with quantiles
assures an equal number of values in each
class and minimizes the importance of the
class boundaries.
Copyright © 2001. The Polis Center
Techniques for Effective Map Design
Quantiles
Example
Copyright © 2001. The Polis Center
Techniques for Effective Map Design
Data Classification Methods
Quantiles: How It Works
1. The GIS orders the features, based on
attribute value, from top to bottom and
sums the number of features as it goes.
2. It divides the total number of features by
the number of classes you specify to get the
number of features in each class.
3. It assigns the first features in the order to
the lowest class until that class is filled and
then moves on to the next lowest class, and
so forth.
Copyright © 2001. The Polis Center
Techniques for Effective Map Design
Data Classification Methods
Quantiles: Disadvantages
Features with close values may end up in
different classes, especially if values cluster.
This may do one of two things:
exaggerate the differences between features
A few widely ranging features may end up in the
same class, thus minimizing the differences
between features
If areas vary greatly in size, a quantile
distribution may skew the patterns on the
map.
Copyright © 2001. The Polis Center
Techniques for Effective Map Design
Data Classification Methods
Natural Breaks
Natural breaks are determined with a
frequency histogram. Class boundaries are
identified as troughs in the data.
Selection of class boundaries tends to place
large numbers of similar values in the same
class.
Many data sets will not have obvious natural
breaks which means that a map created with
this method would tend to show breaks
where none really exist.
Copyright © 2001. The Polis Center
Techniques for Effective Map Design
Data Classification Methods
Natural Breaks
Copyright © 2001. The Polis Center
Techniques for Effective Map Design
Natural
Breaks
Example
Copyright © 2001. The Polis Center
Techniques for Effective Map Design
Data Classification Methods
Natural Breaks: How it Works
1. The GIS finds groupings and patterns that
are inherent in your data.
2. Data values that cluster are put in a single
class.
3. Class breaks are defined where there is a
gap between clusters of values.
Copyright © 2001. The Polis Center
Techniques for Effective Map Design
Data Classification Methods
Natural Breaks: Disadvantages
Since the class values are specific to
individual classes, it is difficult to compare the
map to other maps.
Choosing the number of classes can be
difficult especially if the data is evenly
distributed.
Copyright © 2001. The Polis Center
Techniques for Effective Map Design
What to Do About Outliers
Put them in their own class
Group them into a single class
Group them with the next closest class
Draw them using a special symbol
Copyright © 2001. The Polis Center
Techniques for Effective Map Design
Population Density
Indianapolis,
Indiana
-1990-
#
#
#
#
#
#
#
#
Population Desity
1 Dot = 1000
#
#
#
#
#
#
Which one communicates
Population density more
Effectivly?
#
#
N
#
1
0
1
Population Density
Indianapolis,
Indiana
-1990-
2 Miles
#
Indiana East Stateplane
Coordinate System (NAD 83)
#
#
Source: Social Assets and Vulnerabiliy
Indicators Project (SAVI), 1990.
Map Created by Kevin Mickey,
The Polis Center
Population Density
(Per Square Mile)
0 - 1,500
1,500 - 2,999
3,000 - 4,999
5,000 - 7,999
Over 8,000
How could each
be improved?
N
Indiana East Stateplane
Coordinate System (NAD 83)
Source: Social Assets and Vulnerabiliy
Indicators Project (SAVI), 1990.
Copyright © 2001. The Polis Center
Map Created by Kevin Mickey,
The Polis Center
Techniques for Effective Map Design
Housing Unit Density
Indianapolis,
Indiana
-1990-
Total Units
0 - 433
433 - 834
834 - 1085
1085 - 1328
1328 - 1542
1542 - 1735
1735 - 1924
1924 - 2174
2174 - 2588
2588 - 3445
Which map
communicates housing
density more effectivly?
Why?
N
0.7
0
0.7 1.4 Miles
Indiana East Stateplane
Coordinate System (NAD 83)
Source: Social Assets and Vulnerabiliy
Indicators Project (SAVI), 1990.
Map Created by Kevin Mickey,
The Polis Center
Housing Unit Density
Indianapolis,
Indiana
-1990-
Housing Units
(Per Square Mile)
0 - 499
500 - 999
1000 - 1499
1500 - 2499
Over 2500
N
0.7
0
0.7
1.4 Miles
Indiana East Stateplane
Coordinate System (NAD 83)
Copyright © 2001. The Polis Center
Source: Social Assets and Vulnerabiliy
Indicators Project (SAVI), 1990.
Map Created by Kevin Mickey,
The Polis Center
Techniques for Effective Map Design
Conclusions and Questions
Copyright © 2001.
2003. The Polis Center
Techniques for Effective Map Design