lecture5_old

Transcript lecture5_old

Environmental Remote Sensing
Lecture 5:
Image Classification
"
Purpose:
–
categorising data
–
data abstraction / simplification
–
data interpretation
–
mapping
"
"
for land cover mapping
use land cover class as a surrogate for other information of interest
(ie assign relevant information/characteristics to a land cover class)
Basis for 'classifying'
"
"
method: pattern recognition
use any/all of the following properties in an image to
differentiate between land cover classes:–
spectral
–
spatial
–
temporal
–
directional
–
[time / distance-resolved (LIDAR)]
Spatial Pattern Recognition
"
attempt to make use of spatial context to distinguish between different
classes
–
e.g. measures of image texture, shape descriptors derived from image data, spatial
context of 'objects ' derived from data.
temporal pattern recognition
"
"
the ability to distinguish based on spectral or spatial considerations may vary
over the year
use variations in image DN (or derived data) over time to distinguish between
different cover types
–
e.g. variations in VI over agricultural crops
directional pattern recognition
"
surface with different structures will tend to give different trends in reflectance
as a function of view and illumination angles
–
recent work indicates that we may be able to make use of such
information to distinguish between surfaces with different structural
attributes, even if they are not clearly separable on the basis of spectral
properties alone.
spectral pattern recognition
"
"
most typically used
relies on being able to distinguish between different land cover classes from
differences in the spectral reflectance (or more typically, image DN) in a
number of wavebands
Approaches to Classification
"
Whatever the particular feature(s) of a surface we are
attempting to use to distinguish it from other surfaces, we need
some form of automated classification algorithm to achieve this
on a computer.
–
Supervised Classification
–
Unsupervised Classification
Supervised Classification
"
training stage
–
"
preprocessing of images e.g. using Principal Components analysis may
be advisable sometimes to determine the intrinsic dimensionality of the
data
Identify areas of cover types of interest (map, ground survey,
spectral characteristics) in bands of an image
–
typically, a FCC image is used, with three bands which will enhance
discrimination between cover types of interest
–
utility of FCC is dependent on display capabilities, and individual bands
can also be used (or band transformations, such as NDVI which will
enhance vegetation, or various principal component bands )
"
"
areas of interest are delineated by the user
spectral information on the cover types is gathered for these
areas, which can then be plotted in 'feature space'
–
"
typically, only a two-dimensional scatterplot can be displayed at one time
(two bands)
the scatterplot(s) should be examined to determine the
potential separability of the land cover classes
Supervised classification - classification stage
"
minimum distance to means
–
for each land cover class, calculate the mean vector in feature space (i.e.
the mean value in each waveband)
–
segment the feature space into regions within which a pixel will lie closest
to a particular class mean
–
we can typically define some limit beyond which a pixel is defined as
unclassified
"
a simple and fast technique
"
major limitation:–
insensitive to different degrees of variance in the data (so not typically
used if classes are close together or have different degrees of variance)
"
"
"
parallelepiped ('box')
–
segment feature space by assigning boundaries around the spread of a class in feature space (taking
into account variations in the variance of classes)
–
N-dimensional equivalent of applying (upper and lower) thresholds to a single band of data
–
typically use minimum/maximum of DN in a particular class to define limits, giving a rectangular 'box'
for each class (in 2D)
–
pixels outside of these regions are unclassified (which is good or bad, depending on what you want!!)
problems if class regions overlap
problems if there is a high degree of covariance in the data between different
bands (a rectangular box shape is inappropriate)
–
can modify algorithm by using stepped boundaries with a series of rectangles to partially overcome
such problems
"
simple and fast technique
"
takes some account of variations in the variance of each class
"
"
"
(Gaussian) maximum likelihood
based on the assumption that data in a class is (unimodal)
Gaussian distributed (Normal)
–
distribution of pixels in a class can then be defined through a mean
vector and covariance matrix
–
calculate the probability of a pixel belonging to any class using probability
density functions defined from this information
–
we can represent this as ellipsoidal equiprobability contours
–
assign a pixel to the class for which it has the highest probability of
belonging to
–
can apply a threshold where, if the probability is below this, it is assigned
as unclassified
relatively computationally-intensive
Bayesian
"
an extension of the maximum likelihood classifier
–
assign two weighting factors in determining the probabilities:"
"
–
–
determine an a priori probability of the occurrence of a
class in the scene
assign a 'cost' of misclassification for each class
combine these factors to determine the probability contours
preferable if relevant information is available
Decision tree
"
apply a classification in a series of steps, where the classifier
has only to be able to distinguish between two or more classes
at each step
–
can combine various types of classifiers as appropriate using such
methods
application of classification stage
–
The selected algorithm is run on a computer
accuracy assessment stage
–
class separation (at training stage)
–
useful to have some quantitative estimate of class separability in all
bands
–
e.g. divergence
"
"
distance between class means, weighted by covariance of the classes
the larger the divergence, the larger the 'statistical distance' between classes
and the higher the likelihood of correct classification
"
"
Contingency table (confusion matrix)
–
classification is applied to the training set pixels
–
since the cover type of these pixels is 'known', we can get an indication of
the degree of error (misclassification) in the algorithm used
–
it does not therefore provide an indication of the overall classification
accuracy, rather, it tells us how well the classifier can classify the training
areas
–
this information may be used to indicate that the user should go back and
reinvestigate, retrain, or reclassify.
ideally, one should not in fact use the same pixels in the
classification 'accuracy' check as in the training; an
independent set of samples should be used, which can be
used to give a better 'overall' accuracy estimate
Unsupervised Classification (clustering)
"
"
cluster pixels in feature space based on some measure of their
proximity
interpretation of results / assigned classes
– need to compare clustered classes with reference data to interpret (or, in ideal conditions, use
spectral information to infer)
– clusters occurring in feature space might not be those a human operator would select
– can be useful, e.g. in picking up variations within what would otherwise be distinguished as a
single class (e.g. stressed / unstressed fields of the same crop)
– can sometimes be difficult to interpret, especially if a cluster represents data from a mixture of
cover types which just happen not to be spectrally separable
– the clusters can sometimes be of little intrinsic value in themselves, e.g. sunlit trees, shaded
trees is perhaps not a useful discrimination if one simply wants to classify 'trees', and so
clusters may have to be combined
"
A large number of clustering algorithms exist
"
K-means
–
input the number of clusters desired
–
algorithm typically initiated with arbitrarily-located 'seeds' for cluster
means
–
each pixel then assigned to closest cluster mean
–
revised mean vectors are then computed for each cluster
–
iterate on these two operations until some convergence criterion is met
(cluster means don't move between iterations)
–
method is computationally-expensive, because it is iterative
Hybrid Approaches
"
"
"
"
"
useful when there is a large degree of variability in the DN of individual
classes
use clustering concepts from unsupervised classification to derive subclasses for individual classes, followed by standard supervised methods.
can apply e.g. K-means algorithm to (test) subareas, to derive class statistics
and use the derived clusters to classify the whole scene
requirement that all classes of interest are represented in these test areas
clustering algorithms may not always determine all relevant classes in an
image, and so the results may have to be combined with supervised
information, e.g. linear features (roads etc.) may not be picked-up by the
textural methods described above
Postclassification filtering
"
"
"
The result of a classification from RS data can often appear rather 'noisy'
it is sometimes desirable to aggregate local information in some way, e.g. in
order to gain an overall picture of local cover type, or the most common local
cover type
a variety of aggregation techniques exist, the simplest and perhaps most
common of which is majority filtering
– a kernel is passed over the classification result and the class which
occurs most commonly in the kernel is used
"
Although this is the most common method, it may not always be appropriate;
the particular method for spatial aggregation of categorical data of this sort
depends on the particular application to which the data are to be put
– e.g. successive aggregations will typically lose scattered data of a certain
class, but keep tightly-clustered data

lecture5_old

Transcript lecture5_old

Directory