Image Classification: Supervised Classification

Download Report

Transcript Image Classification: Supervised Classification

Image Classification: Redux
Lecture 7
Prepared by R. Lathrop 11//99
Updated 11/02
Readings:
ERDAS Field Guide 5th Ed. Ch 6:234-260
Supervised vs. Unsupervised
Approaches
• Supervised - image analyst "supervises" the
selection of spectral classes that represent
patterns or land cover features that the analyst
can recognize
Prior Decision
• Unsupervised - statistical "clustering"
algorithms used to select spectral classes
inherent to the data, more computer-automated
Posterior Decision
Supervised
Select Training
fields
Edit/evaluate
signatures
Classify
image
Evaluate
classification
vs.
Unsupervised
Run clustering
algorithm
Identify
classes
Edit/evaluate
signatures
Evaluate
classification
Supervised vs. Unsupervised
Supervised Prior Decision: from Information classes in the
Image to Spectral Classes in Feature Space
N
I
R
Red
Unsupervised Posterior Decision: from Spectral Classes in
Feature Space to Information Classes in the Image
Training
• Training: the process of defining criteria by
which spectral patterns are recognized
• Spectral signature: result of training that defines
a training sample or cluster
parametric - based on statistical
parameters that assume a normal
distribution (e.g., mean, covariance matrix)
nonparametric - not based on statistics but
on discrete objects (polygons) in feature
space
Supervised Training Set Selection
• Objective - selecting a homogenous (unimodal)
area for each apparent spectral class
• Digitize polygons - high degree of user control;
often results in overestimate of spectral class
variability
• Seed pixel - region growing technique to reduce
with-in class variability; works by analyst setting
threshold of acceptable variance, total # of
pixels, adjacency criteria (horiz/vert, diagonal)
Supervised Training Set Selection
Whether using the
digitized polygon or
seed pixel technique,
the analyst should
select multiple training
sites to identify the
many possible spectral
classes in each
information class of
interest
Training Stage
• Training set ---> training vector
• Training vector for each spectral classrepresents a sample in n-dimensional
measurement space where n = # of bands
for a given spectral class j
Xj = [ X1 ]
X1 = mean DN band 1
[ X2]
X2 = mean DN band 2
Classification Training Aids
• Goal: evaluate spectral class separability
• 1) Graphical plots of training data
- histograms
- coincident spectral plots
- scatter plots
• 2) Statistical measures of separability
- divergence
- Mahalanobis distance
• 3) Training Area Classification
• 4) Quick Alarm Classification
- paralellipiped
Training Aids
• Graphical portrayals of training data
– histogram (check for normality)
– ranges (coincident spectral plots)
– scatter plots (2D or 3D)
• Statistical
Measures of Separability: expressions
of statistical distance that are sensitive to
both mean and variance
- divergence
- Mahalanobis distance
Training Aids
• Scatter plots: each training set sample
constitutes an ellipse in feature space
• Provides 3 pieces of information
- location of ellipse: mean vector
- shape of ellipse: covariance
- orientation of ellipse: slope & sign of
covariance
• Need training vector and covariance matrix
N
I
R
R
e
f
l
e
c
t
a
n
c
e
Spectral Feature Space
Grass
Mix: grass/trees
Broadleaf
Trees
Conifer
Impervious
Surface &
Bare Soil
water
Red Reflectance
Examine ellipses for
gaps and overlaps.
Overlapping ellipses
ok within information
classes; want to limit
between info classes
Training Aids
• Training/Test Area classification: look for
misclassification between information
classes; training areas can be biased,
better to use independent test areas
• Quick alarm classification: on-screen
evaluation of all pixels that fall within
the training decision region (e.g.
parallelipiped)
Classification Decision Process
• Decision Rule: mathematical algorithm that,
using data contained in the signature,
performs the actual sorting of pixels into
discrete classes
• Parametric vs. nonparametric rules
Parallelepiped or box classifier
• Decision region defined by the rectangular area
defined by the highest and lowest DN’s in each
band; specify by range (min/max) or std dev.
• Pro: Takes variance into account but lacks
sensitivity to covariance (Con)
• Pro: Computationally efficient, useful as first
pass
• Pro: Nonparametric
• Con: Decision regions may overlap; some pixels
may remain unclassified
N
I
R
R
e
f
l
e
c
t
a
n
c
e
Spectral Feature
Parallelepiped
or Space
Box Classifier
Upper and lower limit of each
box set by either range
(min/max) or # of standard
devs.
Note overlap in Red but not
NIR band
Red Reflectance
Parallelepipeds have “corners”
NIR
reflect
ance
unir
Parallelepiped
boundary
.
Signature ellipse
Candidate
pixel
ured
Red reflectance
Adapted from ERDAS Field Guide
Parallelepiped or Box Classifier:
problems
Veg 1
NIR
reflect
ance
Unclassified
pixels
??
Veg3
Soil 3
Misclassified
Veg 2
pixel
Overlap region
Soil 1
Water 2
Soil 2
Water 1
Red reflectance
Adapted from Lillesand & Kiefer, 1994
Minimum distance to means
• Compute mean of each desired class and
then classify unknown pixels into class with
closest mean using simple euclidean
distance
• Con: insensitive to variance & covariance
• Pro: computationally efficient
• Pro: all pixels classified, can use
thresholding to eliminate pixels far from
means
Minimum Distance to Means Classifier
Veg 1
NIR
Veg3
reflect
ance
Soil 3
Veg 2
Soil 1
Water 2
Soil 2
Water 1
Red reflectance
Adapted from Lillesand & Kiefer, 1994
Minimum Distance to Means Classifier:
Euclidian Spectral Distance
Y
92, 153
Yd = 85-153
Distance = 111.2
180, 85
Xd = 180 -92
X
Statistically-based classifiers
• Defines a probability density (statistical) surface
• Each pixel is evaluated for its statistical
probability of belonging in each category,
assigned to class with maximum probability
• The probability density function for each
spectral class can be completely described by
the mean vector and covariance matrix
Parametric Assumption: each spectral
class exhibits a unimodal normal
distribution
Bimodal histogram:
Mix of Class 1 & 2
# of
pixels
Class 1
0
Class 2
Digital Number
255
N
I
R
R
e
f
l
e
c
t
a
n
c
e
Spectral
Space
SpectralFeature
classes
as
probability surfaces
Ellipses defined by class mean
and covariance; creates
likelihood contours around each
spectral class;
Red Reflectance
N
I
R
R
e
f
l
e
c
t
a
n
c
e
Spectral
SensitiveFeature
to largeSpace
covariance values
Some classes may have large
variance and greatly overlap
other spectral classes
Red Reflectance
Maximum likelihood classifier
• Pro: potentially the most accurate classifier
as it incorporates the most information
(mean vector and COV matrix)
• Con: Parametric procedure that assumes the
spectral classes are normally distributed
• Con: sensitive to large values in the
covariance matrix
• Con: computationally intensive
Hybrid classification
• Can easily mix various classification algorithms
in a multi-step process
• First pass: some non-parametric rule (feature
space or paralellipiped) to handle the most
obvious cases, those pixels remaining
unclassified or in overlap regions fall to second
pass
• Second pass: some parametric rule to handle the
difficult cases; the training data can be derived
from unsupervised or supervised techniques
GIS Rule-based approaches
• Unsupervised or supervised techniques to define
spectral classes
• Use of additional geo-spatial data sets to either
pre-stratify image data set, for inclusion as
additional band data in classification algorithm
or post-processing
• Develop set of boolean rules or conditional
statements
example: if spectral class = conifer and
soil = sand, then pitch pine
Region-based classification approaches
• As an alternative to “per-pixel” classification
approaches, region-based approaches attempt to
include the local spatial context
• Textural channels approach: inclusion of texture
(local variance) as an additional channel in
classification; con: tends to blur edges
• Region growing: Image segmented into spectrally
homogenous, spatially contiguous regions first, then
these regions are classified using a spectral
classification approach; conceptually very promising
but robust operational algorithms scarce
Object-oriented classification:
eCognition example
From the eCognition website: “Image analysis with eCognition is
based upon contiguous, homogeneous image regions which are
generated by an initial image segmentation.Connecting all the
regions, the image content is represented as a network of image
objects. These image objects act as the building blocks for the
subsequent image analysis. In comparison to pixels, image objects
carry much more useful information. Thus, they can be
characterised by far more properties than pure spectral or spectralderivative information, such as their form,texture, neighbourhood or
context.”
To download free trial version, go to:
http://www.definiens-imaging.com/down/index.htm
Post-classification “smoothing”
• Most classifications have a problem with “salt and
pepper”, i.e., single or small groups of mis-classified
pixels, as they are “point” operations that operate on
each pixel independent of its neighbors
• Majority filtering: replaces central pixel with the
majority class in a specified neighborhood (3 x 3
window); con: alters edges
• Eliminate: clumps “like” pixels and replaces clumps
under size threshold with majority class in local
neighborhood; pro: doesn’t alter edges
Accuracy Assessment
• Various techniques to assess the “accuracy’ of
the classified output by comparing the “true”
identity of land cover derived from reference
data (observed) vs. the classified (predicted) for
a random sample of pixels
• Contingency table: m x m matrix where m = #
of land cover classes
– Columns: usually represent the reference data
– Rows: usually represent the remote sensed
classification results
Accuracy Assessment Contingency Matrix
Reference Data
Class.
Data
1.10
1.20
1.40
1.60
2.00
2.10
2.40
2.50
Row
Total
1.10
109
11
4
0
0
0
1
1
116
1.20
2
82
2
3
0
0
0
0
101
1.40
3
4
123
0
0
0
0
0
130
1.60
2
1
0
22
0
0
1
1
25
2.00
0
0
0
0
0
0
0
0
0
2.10
0
0
0
0
0
9
0
0
9
2.40
0
2
1
0
0
0
74
0
76
2.50
0
1
0
0
0
0
0
41
43
Col
Total
116
101
130
25
0
9
76
43
500
Accuracy Assessment
• Sampling Approaches: to reduce analyst bias
– simple random sampling: every pixel has equal
chance
– stratified random sampling: # of points will be
stratified to the distribution of thematic layer classes
(larger classes more points)
– equalized random sampling: each class will have
equal number of random points
• Sample
class
size: at least 30 samples per land cover
Accuracy Assessment Issues
• What constitutes reference data?
- higher spatial resolution imagery (with
visual interpretation)
- “ground truth”: GPSed field plots
- existing GIS maps
• Problem with “mixed” pixels: possibility of
sampling only homogeneous regions (e.g.,
3x3 window) but introduces a subtle bias
Errors of Omission vs.
Commission
• Error of Omission: pixels in class 1
erroneously assigned to class 2; from the
class 1 perspective these pixels should have
been classified as class1 but were omitted
• Error of Commission: pixels in class 2
erroneously assigned to class 1; from the
class 1 perspective these pixels should not
have been classified as class but were
included
Errors of Omission vs. Commission:
from a Class2 perspective
# of
pixels
Omission error:
pixels in Class2
erroneously
assigned to
Class 1
Class 1
Commission error:
pixels in Class1
erroneously
assigned to Class 2
Class 2
0
255
Digital Number
Accuracy Assessment Measures
• Overall accuracy: divide total correct (sum of the
major diagonal) by the total number of sampled
pixels; can be misleading, should judge individual
categories also
• Producer’s accuracy: measure of omission error; total
number of correct in a category divided by the total # in that
category as derived from the reference data
• User’s accuracy: measure of commission error;
total number of correct in a category divided by the total #
that were classified in that category
Accuracy Assessment Measures
• Kappa coefficient: provides a difference
measurement between the observed agreement of
two maps and agreement that is contributed by
chance alone
• A Kappa coefficient of 90% may be interpreted as
90% better classification than would be expected by
random assignment of classes
• Allows for statistical comparisons between matrices
(Z statistic); useful in comparing different
classification approaches to objectively decide
which gives best results
Kappa coefficient
Khat = (n * SUM Xii) - SUM (Xi+ * X+i)
n2 - SUM (Xi+ * X+i)
where SUM = sum across all rows in matrix
Xi+ = marginal row total (row i)
X+I = marginal column total (column i)
n = # of observations
Takes into account the off-diagonal elements of the
contingency matrix (errors of omission and commission)
Accuracy Assessment Measures
Number
Correct
Producer=s
Accuracy
User=s
Accuracy
Kappa
Developed
109
94.0
86.5
0.8243
1.20
Cultivated/Grassland
82
81.2
92.1
0.9014
1.40
Forest/Scrub/Shrub
123
94.6
94.6
0.9272
1.60
Barren
22
88.0
81.5
0.8051
2.00
Unconsolidated Shore
---
---
---
---
2.10
Estuarine Emergent Wetland
9
100.0
100.0
**
2.40
Palustrine Wetland: Emergent/Forested
74
97.4
96.1
0.9541
2.50
Water
41
95.4
97.6
0.9740
Totals
460
Code
Land Cover Description
1.10
** Sample Size for this Land Cover Class Too Small (< 25) for valid Kappa measure
Overall Classification Accuracy = 92.0%
0.9005