Transcript Document

Chapter 7
Maximum likelihood classification of
remotely sensed imagery
遥感影像分类的最大似然法
ZHANG Jingxiong (张景雄)
School of Remote Sensing Information Engineering
Wuhan University
Introduction
Thematic classification: an important
technique in remote sensing for provision of
spatial information
 For example, land cover information
key input to biophysical parameterization,
environmental modeling, resources
management, and other applications.

Historically, visual interpretation (目视判
读) was applied to identify homogeneous
areal-classes based on aerial photographs.
 With digital images, it is possible to
substantially automate this process by using
two methods.

Unsupervised (非监督)
Supervised (监督)
The maximum likelihood classification
(极大似然分类) - MLC
The most common method for supervised
classification
 Classes (类别): {k, k=1,…,c}
 Measurement vector(观测向量/数据):
z(x)
 Likelihood(似然): p(k|z(x))
 Classification rule(分类规则):
xi, if p(i|z(x)) > p(j|z(x)) for all ji

Bayes’ theorem (贝叶斯定理)
For events B1, B2, .., Bn are disjoint, i.e.,
BiBj =  for i <> j, with P(Bi) > 0 for all i,
Also their union is 
then, for any event A,
P ( B j | A) 
P( A  B j )
P ( A)

P( A | B j ) P( B j )
n
 P( A | B ) P( B )
i 1
i
numerator by multiplication law;
denominator by the law of total probability
i
Bayes’ theorem at work …
Candidate cover types {k, k=1,…,c}, c
being the total number of classes considered
such as {urban, forest, agriculture}
 Bayes‘ theorem then allows for evaluation
of the posterior probability (验后概率)
p(k|z(x))
the posteriori probability of class k
conditional to data z(x))


Based on:
1) prior probability (先验概率) –
expected occurrence of candidate classes k
p(k) or pk (before measurement)
2) class-conditional probability density –
the occurrence of measurement vector z(x)
conditional to class k
p(z(x) | k) or pk(z(x)),
(derived from training data)
Bayes‘ theorem then allows for evaluation
of the posterior probability (验后概率)
p(k|z(x))
the posteriori probability of class k
conditional to data z(x))
 p(k|z(x)) = p(k, z(x)) / p(z(x))
= pk pk(z(x)) / p(z(x))
where p(z(x)) = sum( p(k,z(x))
= sum(pk pk(z(x)) )

assign observation z(x) to class k that
has the highest posteriori probability
p(k|z(x)), or
 that returns the highest product pk pk(z(x)),
as p(z(x)) won’t change the relative
magnitude of p(k|z(x)) across k.
 To
Computing p(z(x) | k) with normally
distributed data

The occurrence of measurement vector z(x)
conditional to class k, assuming
multivariate normal distribution:
pk(z(x)) = (2)-b/2 det|covZ|k|-1/2 exp(-dis2/2)

The squared Mahalanobis distance between
an observation z(x) and the means of classspecific Z variable mZ|k:
1
dis  ( z ( x)  mZ |k ) cov Z |k ( z ( x)  mZ |k )
2
T
covZ|k : variance and covariance matrix of
variable Z conditional to class k
 mZ|k: the means of class-specific Z variable
 b: the number of features (e.g., spectral
bands)

Mathematical detail

The objective/criterion (准则): to minimize
the conditional average loss:
c
loss (i | z ( x))   mis(i, j ) p( j | z ( x))
j 1
mis(i, j): the cost resulting from labeling a
pixel actually belonging to class j to class
i.
“0-1 loss function” :
0 - no cost for correct classification
1 - unit cost for misclassification.
 the expected loss incurred if pixel x with
observation z(x) is classified as i:

loss (i | z ( x))  1  p(i | z ( x))

a decision rule that will lead to minimized
loss is:
decide z(x)  i if and only if:
p( z( x) | i ) p(i )  max p( z( x) |  j ) p( j )
j

Bayesian classification rule works actually
as a kind of maximum likelihood
classification.
Examples
two classes 1 and 2
 means and dispersion matrices:
m1 = [4 2],
m2 = [3 3]

COV1 = |3
|4
COV1-1 = |3
|-2
4|
COV2 = |4 5|
6|
|5 7|
-2| COV2 -1 = |7/3 -5/3|
1.5|
|-5/3 4/3|
To decide to which class the measure vector
z = (4, 3) belongs.
 squared Mahalanobis distance between this
measurement vector and the two class
means:
dis12 = 3/2 dis22 = 4/3
 class-conditional probability densities:
p1(z) = 1/(2pi*1.414) exp (-3/4) = 0.0532
p2(z) = 1/(2pi*1.732) exp (-2/3) = 0.0472

Assume equal prior class probabilities:
 p1 = 1/2 and p2 = 1/2
 the posterior probabilities are:
p(1|z) = 0.0532 * (1/2) / (0.0266 + 0.0236)
= 0.53
p(2|z) = 0.0472 * (1/2) / (0.0266 + 0.0236)
= 0.47
 to classify the measurement z into class 1

When p1 = 1/3 and p2 = 2/3,
 the posterior probabilities are:
p(1|z) = 0.0532 * (1/3) / (0.0177 + 0.0315)
= 0.36
p(2|z) = 0.0472 * (2/3) / (0.0177 + 0.0315)
= 0.64
 Then, class 2 is favored for z over class
1.

Looking back …
Pixel-based
 Parcel-based methods
adaptations to the MLC?
object-oriented approaches?

Extraction of Impervious Surfaces
Using Object-Oriented Image Segmentation
USGS NAPP 1 × 1 m
DOQQ of an area in
North Carolina
Impervious surfaces
Methods for thematic mapping
Parametric
MLC 
 Non-parametric
Artificial neural networks
 Non-metric
• Expert systems
• Decision-tree classifiers
• Machine learning

References
Tucker, C. J. and Townshend, J. R. G. and
Goff, T. E. (1985) African Land-Cover
Classification Using Satellite Data. Science,
227(4685): 369-375.
 Anyamba, A. and Eastman, J. R. (1996)
Interannual Variability of NDVI over Africa
and its relation to El Niño / Southern
Oscillation. International Journal of Remote
Sensing 17(13) : 2533-2548.

Questions
1. Discuss the importance of statistics in
thematic classification of remotely
sensed imagery.