Transcript Document
Chapter 7
Maximum likelihood classification of
remotely sensed imagery
遥感影像分类的最大似然法
ZHANG Jingxiong (张景雄)
School of Remote Sensing Information Engineering
Wuhan University
Introduction
Thematic classification: an important
technique in remote sensing for provision of
spatial information
For example, land cover information
key input to biophysical parameterization,
environmental modeling, resources
management, and other applications.
Historically, visual interpretation (目视判
读) was applied to identify homogeneous
areal-classes based on aerial photographs.
With digital images, it is possible to
substantially automate this process by using
two methods.
Unsupervised (非监督)
Supervised (监督)
The maximum likelihood classification
(极大似然分类) - MLC
The most common method for supervised
classification
Classes (类别): {k, k=1,…,c}
Measurement vector(观测向量/数据):
z(x)
Likelihood(似然): p(k|z(x))
Classification rule(分类规则):
xi, if p(i|z(x)) > p(j|z(x)) for all ji
Bayes’ theorem (贝叶斯定理)
For events B1, B2, .., Bn are disjoint, i.e.,
BiBj = for i <> j, with P(Bi) > 0 for all i,
Also their union is
then, for any event A,
P ( B j | A)
P( A B j )
P ( A)
P( A | B j ) P( B j )
n
P( A | B ) P( B )
i 1
i
numerator by multiplication law;
denominator by the law of total probability
i
Bayes’ theorem at work …
Candidate cover types {k, k=1,…,c}, c
being the total number of classes considered
such as {urban, forest, agriculture}
Bayes‘ theorem then allows for evaluation
of the posterior probability (验后概率)
p(k|z(x))
the posteriori probability of class k
conditional to data z(x))
Based on:
1) prior probability (先验概率) –
expected occurrence of candidate classes k
p(k) or pk (before measurement)
2) class-conditional probability density –
the occurrence of measurement vector z(x)
conditional to class k
p(z(x) | k) or pk(z(x)),
(derived from training data)
Bayes‘ theorem then allows for evaluation
of the posterior probability (验后概率)
p(k|z(x))
the posteriori probability of class k
conditional to data z(x))
p(k|z(x)) = p(k, z(x)) / p(z(x))
= pk pk(z(x)) / p(z(x))
where p(z(x)) = sum( p(k,z(x))
= sum(pk pk(z(x)) )
assign observation z(x) to class k that
has the highest posteriori probability
p(k|z(x)), or
that returns the highest product pk pk(z(x)),
as p(z(x)) won’t change the relative
magnitude of p(k|z(x)) across k.
To
Computing p(z(x) | k) with normally
distributed data
The occurrence of measurement vector z(x)
conditional to class k, assuming
multivariate normal distribution:
pk(z(x)) = (2)-b/2 det|covZ|k|-1/2 exp(-dis2/2)
The squared Mahalanobis distance between
an observation z(x) and the means of classspecific Z variable mZ|k:
1
dis ( z ( x) mZ |k ) cov Z |k ( z ( x) mZ |k )
2
T
covZ|k : variance and covariance matrix of
variable Z conditional to class k
mZ|k: the means of class-specific Z variable
b: the number of features (e.g., spectral
bands)
Mathematical detail
The objective/criterion (准则): to minimize
the conditional average loss:
c
loss (i | z ( x)) mis(i, j ) p( j | z ( x))
j 1
mis(i, j): the cost resulting from labeling a
pixel actually belonging to class j to class
i.
“0-1 loss function” :
0 - no cost for correct classification
1 - unit cost for misclassification.
the expected loss incurred if pixel x with
observation z(x) is classified as i:
loss (i | z ( x)) 1 p(i | z ( x))
a decision rule that will lead to minimized
loss is:
decide z(x) i if and only if:
p( z( x) | i ) p(i ) max p( z( x) | j ) p( j )
j
Bayesian classification rule works actually
as a kind of maximum likelihood
classification.
Examples
two classes 1 and 2
means and dispersion matrices:
m1 = [4 2],
m2 = [3 3]
COV1 = |3
|4
COV1-1 = |3
|-2
4|
COV2 = |4 5|
6|
|5 7|
-2| COV2 -1 = |7/3 -5/3|
1.5|
|-5/3 4/3|
To decide to which class the measure vector
z = (4, 3) belongs.
squared Mahalanobis distance between this
measurement vector and the two class
means:
dis12 = 3/2 dis22 = 4/3
class-conditional probability densities:
p1(z) = 1/(2pi*1.414) exp (-3/4) = 0.0532
p2(z) = 1/(2pi*1.732) exp (-2/3) = 0.0472
Assume equal prior class probabilities:
p1 = 1/2 and p2 = 1/2
the posterior probabilities are:
p(1|z) = 0.0532 * (1/2) / (0.0266 + 0.0236)
= 0.53
p(2|z) = 0.0472 * (1/2) / (0.0266 + 0.0236)
= 0.47
to classify the measurement z into class 1
When p1 = 1/3 and p2 = 2/3,
the posterior probabilities are:
p(1|z) = 0.0532 * (1/3) / (0.0177 + 0.0315)
= 0.36
p(2|z) = 0.0472 * (2/3) / (0.0177 + 0.0315)
= 0.64
Then, class 2 is favored for z over class
1.
Looking back …
Pixel-based
Parcel-based methods
adaptations to the MLC?
object-oriented approaches?
Extraction of Impervious Surfaces
Using Object-Oriented Image Segmentation
USGS NAPP 1 × 1 m
DOQQ of an area in
North Carolina
Impervious surfaces
Methods for thematic mapping
Parametric
MLC
Non-parametric
Artificial neural networks
Non-metric
• Expert systems
• Decision-tree classifiers
• Machine learning
References
Tucker, C. J. and Townshend, J. R. G. and
Goff, T. E. (1985) African Land-Cover
Classification Using Satellite Data. Science,
227(4685): 369-375.
Anyamba, A. and Eastman, J. R. (1996)
Interannual Variability of NDVI over Africa
and its relation to El Niño / Southern
Oscillation. International Journal of Remote
Sensing 17(13) : 2533-2548.
Questions
1. Discuss the importance of statistics in
thematic classification of remotely
sensed imagery.