Image Classification

Transcript Image Classification

Chapter 5
Image Classification
Analysis and applications of remote
sensing imagery
Instructor: Dr. Cheng-Chien Liu
Department of Earth Sciences
National Cheng Kung University
Last updated: 24 May 2005
Introduction
 Overall objective of classification
• Automatically categorize all pixels in an image into land cover
classes or themes
 Three pattern recognitions
• Spectral pattern recognition  emphasize in this chapter
• Spatial pattern recognition
• Temporal pattern recognition
 Selection of classification
• No single “right” approach
• Depend on
 The nature of the data being analyzed
 The computational resources available
 The intended application of the classified data
Supervised classification
 Fig 7.37
• A hypothetical example
Five bands: B, G, R, NIR, TIR,
Six land cover types: water, sand, forest, urban, corn, hay
 Three basic steps (Fig 7.38)
• Training stage
• Classification stage
• Output stage
Supervised classification (cont.)
 Classification stage
• Fig 7.39
Pixel observations from selected training sites plotted on scatter
diagram
Use two bands for demonstration, can be applied to any band number
Clouds of points  multidimensional descriptions of the spectral
response patterns of each category of cover type to be interpreted
• Minimum-Distance-to-Mean classifier
Fig 7.40
 Mean vector for each category
 Pt 1  Corn
 Pt 2  Sand ?!!
Advantage: mathematically simple and computationally efficient
Disadvantage: insensitive to different degrees of variance in the spectral
response data
Not widely used if the spectral classes are close to one another in the
measurement space and have high variance
Supervised classification (cont.)
 Classification stage (cont.)
• Parallelepiped classifier
Fig 7.41
 Range for each category
 Pt 1  Hay ?!!
 Pt 2  Urban
Advantage: mathematically simple and computationally
efficient
Disadvantage: confuse if correlation or high covariance are
poorly described by the rectangular decision regions
 Positive covariance: Corn, Hay, Forest
 Negative covariance: Water
Alleviate by use of stepped decision region boundaries (Fig
7.42)
Supervised classification (cont.)
 Classification stage (cont.)
• Gaussian maximum likelihood classifier
Assumption: the distribution of the cloud of points is Gaussian
distribution
Probability density functions  mean vector and covariance matrix
(Fig. 7.43)
Fig 7.44: Ellipsoidal equiprobability contours
Bayesian classifier
 A priori probability (anticipated likelihood of occurrence)
 Two weighting factors
 If suitable data exist for these factors, the Bayesian implementation of the classifier is
preferable
Disadvantage: computational efficiency
 Look-up table approach
 Reduce the dimensionality (principal or canonical components transform)
 Simplify classification computation by separate certain classes a prior
 Water is easier to separate by use of NIR/Red ratio
Supervised classification (cont.)
 Training stage
• Classification  automatic work
• Assembling the training data  manual work
Both an art and a science
Substantial reference data
Thorough knowledge of the geographic area
You are what you eat!
 Results of classification are what you train!
• Training data
Both representative and complete
 All spectral classes constituting each information class must be adequately represented in
the training set statistics used to classify an image
 e.g. water (turbid or clear)
 e.g. crop (date, type, soil moisture, …)
 It is common to acquire data from 100+ training areas to represent the spectral variability
Supervised classification (cont.)
 Training stage (cont.)
• Training area
 Delineate boundaries (Fig 7.45)
 Carefully located boundaries  no edge pixels
 Seed pixel
 Choose seed pixel  statistically based criteria  contiguous pixels  cluster
• Training pixels
 Number
 At least n+1 pixels for n spectral bands
 In practice, 10n to 100n pixels is used
 Dispersion  representative 
• Training set refinement
 Make sure the sample size is sufficient
 Assess the overall quality
 Check if all data sets are normally distributed and spectrally pure
 Avoid redundancy
 Delete or merge
Supervised classification (cont.)
 Training stage (cont.)
• Training set refinement process
Graphical representation of the spectral response patterns
 Fig 7.46: Histograms for data points included in the training areas of “hay”
 Visual check on the normality of the spectral response distribution
 Two subclasses: normal and bimodal
 Fig 7.47: Coincident spectral plot
 Corn/hay overlap for all bands
 Band 3 and 5 for hay/corn separation (use scatter plot)
 Fig 7.48: SPOT HRV multi-spectral images
 Fig 7.49 scatter plot of band 1 versus band 2
 Fig 7.50 scatter plot of band 2 versus band 3  less correlated  adequate
Quantitative expressions of category separation
 Transform divergence: a covariance-weighted distance between category means
 Table 7.1: Portion of a divergence matrix (<1500  spectrally similar classes)
Supervised classification (cont.)
 Training stage (cont.)
• Training set refinement process (cont.)
Self-classification of training set data
 Error matrix  for training area not for the test area or the overall scene
 Tell us how well the classifier can classify the training areas and nothing more
 Overall accuracy is perform after the classification and output stage
Interactive preliminary classification
 Plate 29: sample interactive preliminary classification procedure
Representative subscene classification
 Complete the classification for the test area  verify and improve
Summary
 Revise with merger, deletion and addition to form the final set of statistics used in
classification
 Accept misclassification accuracy of a class that occurs rarely in the scene to preserve the
accuracy over extensive areas
 Alternative methods for separating two spectrally similar classes  GIS data, visual
interpretation, field check, multi-temporal or spatial pattern recognition procedures, …
Supervised classification (cont.)
 Training stage (cont.)
• Implementation  region of interest (ROI)
• Three sources of ROI
Manually from an image using the mouse
From pixel scatter plots
From vector layers
Exercise 1
 Quick classification using interactive 2-D
scatter plots
• Rationale
Sufficient information to determine appropriate training areas may not
exist
2-D scatter plot  first step in determine training set
• Data: ca_coast.dat (TMS data)
• Create 2D scatter plot
Tool  2-D Scatter Plots…
The adjacent bands are usually highly correlated
Choose band 3 for X-axis and band 8 for Y-axis
Check dancing pixels
 hold the left-button in the image window
 hold the right-button in the image window
Option  Density slice
Exercise 1
 Quick classification using interactive 2-D
scatter plots
• Rationale
Sufficient information to determine appropriate training areas may not
exist
2-D scatter plot  first step in determine training set
• Data: ca_coast.dat (TMS data)
• Create 2D scatter plot
Tool  2-D Scatter Plots…
The adjacent bands are usually highly correlated
Choose band 3 for X-axis and band 8 for Y-axis
Check dancing pixels
 hold the left-button in the image window
 hold the right-button in the image window
Option  Density slice
Self test 1
 File: ca_coast.dat
• Use 2D scatter plot to define 5 ROIs
 Note:
• Selection of bands for 2D scatter plot
• The least number of pixels required for each
class
• Dispersion of ROIs
• Give each ROI an appropriate name
• Output the ROIs into a file
Exercise 2
 Perform classification
• File: ca_coast.dat
• Use the same ROIs that were defined earlier
• Classification method:
Maximum likelihood method
Minimum distance method
• Try various threshold value(s)
• Use Preview function
Change the extent by selecting the Change View button
• Examine the rule image
Exercise 3
 Examine class images
• Load results of classification in previous exercise
• Link the displays and examine the differences
• Answer the following questions
Regions of the same classification
Regions of the different classification
Which is better
Do your ROIs seem to be appropriate?
How to improve the classification by changing the ROIs
• Check the header and data type of the classified result
• Change the class color mapping
Exercise 4
 Examine rule images
•
•
•
•
Display rule images in previous exercise
Link the displays and examine the differences
Plot the z profile for each rule image
Move to an arbitrary pixel, check the value
and determine which class this pixel should be
Exercise 5
 Perform post classification using the rule
classifier
•
•
•
•
•
Classification  Post Classification  Rule Classifier
File: dist_rule.img
Change the thresholds and press Quick Apply
Examine the result
Examine the rule images histogram to determine the
appropriate threshold for each class
Press the Hist button for open ocean class
Set a threshold to encompass the first peak of the bimodel
Repeat for the other classes
Exercise 6
 Overlay classes
•
•
•
•
Display band 7 of ca_coast.dat in gray
Overlay  Classification
File: max_class.img
Interactive Class Tool dialog
Turn on and off class(es)
Options  Class distribution
Change active class
Options  Associated stats data file
Options  Stats for all classes
 Examine the min, max, mean, standard deviation for each class
• Display band 7 of ca_coast.dat in a new window
• Overlay dist_class.img
• Link two displays and examine the differences
Exercise 6 (cont.)
 Overlay classes (cont.)
• Repeat setting the Interactive Class Tool dialog for
the new file: dist_class.img
Turn on and off class(es)
Options  Class distribution
Change active class
Options  Associated stats data file
Options  Stats for all classes
 Examine the min, max, mean, standard deviation for each class
• Compare the class distribution and stats plots
• Editing pixels of classification using the Interactive
Class Tool
Exercise 7
 Convert classes to ROIs
• Using Band Threshold to ROI tool
Overlay  Regions of Interest
Options  Band Threshold to ROI
• Options  report area of ROIs
Unsupervised classification
 Unsupervised  supervised
• Supervised  define useful information
categories  examine their spectral
separability
• Unsupervised  determine spectral classes 
define their informational utility
 Illustration: Fig 7.51
• Advantage: the spectral classes are found
automatically (e.g. stressed class)
Unsupervised classification (cont.)
 Clustering algorithms
• K-means
 Locate centers of seed clusters  assign all pixels to the cluster with the closest
mean vector  revise mean vectors for each clusters  reclassify the image 
iterative until there is no significant change
• Iterative self-organizing data analysis (ISODATA)
 Permit the number of clusters to change from on iteration to the next by
 Merging: distance < some predefined minimum distance
 Splitting: standard deviation > some predefined maximum distance
 Deleting: pixel number in a cluster < some specified minimum number
 Table 7.2
• Outcome 1: ideal result
• Outcome 2: subclasses  classes
• Outcome 3: a more troublesome result
 The information categories is spectrally similar and cannot be differentiated in
the given data set
Exercise 8
 Unsupervised classification
•
•
•
•
File: ca_coast.dat
Method: K-means and ISODATA
Parameter:
Overlay the result of classification onto the
original true-color image
• Examine the result of classification
• Save both results for exercise 10
Exercise 8 (cont.)
Hybrid classification
 Unsupervised training areas
• Image sub-areas chosen intentionally to be quite different from
supervised training areas
 Supervised  regions of homogeneous cover type
 Unsupervised  contain numerous cover types at various locations throughout
the scene
 To identify the spectral classes
 Guided clustering
•
•
•
•
•
•
•
Delineate training areas for class X
Cluster all class X into spectral subclasses X1, X2, …
Merge or delete class X signatures
Repeat for all classes
Examine all class signatures and merge/delete signatures
Perform maximum likelihood classification
Aggregate spectral subclasses
Classification of mixed pixels
 Mixed pixels
• IFOV includes more than one type/feature
 Low resolution sensors  more serious
 Subpixel classification
• Spectral mixture analysis
 A deterministic method (not a statistical method)
 Pure reference spectral signatures
 Measured in the lab, in the field, or from the image itself
 Endmembers
 Basic assumption
 The spectral variation in an image is caused by mixtures of a limited number of surface materials
 Linear mixture  satisfy two basic conditions simultaneously
 The sum of the fractional proportions of all potential endmembers SFi = 1
 The observed DNl for each pixel
DN l  F1 DN l ,1  F2 DN l , 2  ...  FN DN l , N  El
 B band  B equations
 B+1 equations  solve B+1 endmember fractions
 Fig 7.52: example of a linear spectral mixture analysis
 Drawback: multiple scattering  nonlinear mixturemodel
Classification of mixed pixels (cont.)
 Subpixel classification (cont.)
• Fuzzy classification
A given pixel may have partial membership in more than
one category
Fuzzy clustering
 Conceptually similar to the K-means unsupervised classification approach
 Hard boundaries  fuzzy regions
 Membership grade
Fuzzy supervised classification
 A classified pixel is assigned a membership grade with respect to its
membership in each information class
Exercise 9
 Linear spectral unmixing
•
•
•
•
•
•
•
•
File: ca_coast.dat
Display the image in true color
Set 5 ROIs, each has one pure pixel
Spectral  mapping methods  endmember
collection
Import five endmembers from ROIs
Algorithms  Linear spectral unmixing
Set constrained
Apply and examine the results
The output stage
 Image classification  output products
 end users
• Graphic products
Plate 30, Fig 3 of the paper “IKONOS imagery for resource
management”
• Tabular data
• Digital information files
Postclassification smoothing
 Salt-and-pepper appearance
• Low-pass filter can not be used
• Must operate on the basis of logical operations, rather than
simple arithmetic computations
 Majority filter
• Fig 7.53
 (a) original classification  salt-and-pepper appearance
 (b) 3 x 3 pixel-majority filter
 (c) 5 x 5 pixel-majority filter
 Imbedded in the algorithm of classification
• Limited
• Need the technique of spatial pattern recognition
• Future development
Exercise 10
 Postclassification smoothing
• File: results from exercise 8
• Clump and Sieve
For generalizing classification images, Sieve is usually run
first to remove the isolated pixels based on a size (number
of pixels) threshold. Clump is run to add spatial coherency
to existing classes by combining adjacent similar classified
areas
Classification → Post Classification → Sieve Classes
Classification → Post Classification → Clump Classes
• Combine Classes
Classification → Post Classification → Combine Classes
Classification accuracy assessment
 Significance
• A classification is not complete until its accuracy is assessed
 Classification error matrix
• Error matrix (confusion matrix, contingency table)
 Table 7.3
 Omission (exclusion) 漏授(該有的沒有)
 Non-diagonal column elements (e.g. 16 sand pixels were omitted)
 Commission (inclusion) 誤授(不該有的卻有)
 Non-diagonal raw elements (e.g. 38 urban pixels + 79 hay pixels were included in corn)
 Overall accuracy
 Producer’s accuracy 生產者準確度
 Indicate how well training set pixels of the given cover type are classified
 User’s accuracy 使用者準確度
 Indicate the probability that a pixel classified into a given category actually represents that category on
the ground
 Training area accuracies are sometimes used in the literature as an indication of
overall accuracy. They should not be!
Classification accuracy assessment (cont.)
 Sampling considerations
• Test area
 Different and more extensive than training area
 Withhold some training areas for postclassification accuracy assessment
 Being homogeneous, test areas might not provide a valid indication of
classification accuracy at the individual pixel level of land cover variability
• Wall-to-wall comparison
 Expensive
 Defeat the whole purpose of remote sensing
• Random sampling
 Collect large sample of randomly distributed points  too expensive and
difficult
 e.g. 3/4 of Taiwan area is covered by The Central mountain
 Only sample those pixels without influence of potential registration error
 Several pixels away from field boundaries
 Stratified random sampling
 Each land cover category  Stratum
Classification accuracy assessment (cont.)
 Sampling considerations (cont.)
• Accomplishment of random sampling
Overlay the classified output data with a grid
Test cells within the grid are selected randomly and groups of pixels
within the test cells are evaluated
• Sample unit
Individual pixels, clusters of pixels or polygons
• Sample number
General area: 50 samples per category
Large area or more than 12 categories: 75 – 100 samples per category
Depend on the variability of each category
 Wetland need more samples than open water
Classification accuracy assessment (cont.)
 Evaluating classification error matrices
• Table 7.4: error matrix (randomly sampled test)
Producer’s accuracy for Forest 84% > overall accuracy
65%  good for classify forest?!
User’s accuracy for forest is only 60%
Only good for classify water
Self test 2
 Employ all methods and concepts of
classification that you have learned so
far to classify the file ca_coast.dat
carefully. The ground truths in the
validation region will be provided next
week in the form of ROIs to assess your
result.
Tutorial: multispectral classification
 Read image
• File → Open Image File
 Subdirectory: envidata
 File: can_tmr.img
 RGB Color
 Bands 4, 3, and 2
• Review Image Colors
 False color infrared photograph
 Bright red areas → high infrared reflectance → healthy vegetation → under cultivation, or along rivers
 Slightly darker red areas → native vegetation → coniferous trees
 Several distinct geologic and urbanization classes are also readily apparent as is urbanization
• Cursor Location/Value
• Examine Spectral Plots
 Tools → Profiles → Z Profile (Spectrum)
 Note the relations between image color and spectral shape
 Pay attention to the location of the image bands in the spectral profile, marked by the red, green, and
blue bars in the plot
Tutorial: multispectral classification
(cont.)
 Unsupervised Classification
• Classification → Unsupervised → K-Means or IsoData
• K-Means
 Uses a cluster analysis approach which requires the analyst to select the number
of clusters to be located in the data, arbitrarily locates this number of cluster
centers, then iteratively repositions them until optimal spectral separability is
achieved
 Choose K-Means as the method, use all of the default values and click on OK
 Review the results contained in can_km.img.
 Experiment with different numbers of classes, change thresholds, standard
deviations, and maximum distance error values to determine their effect on the
classification.
• Isodata
 Calculates class means evenly distributed in the data space and then iteratively
clusters the remaining pixels using minimum distance techniques. Each
iteration recalculates means and reclassifies pixels with respect to the new
means
 Choose IsoData as the method, use all of the default values and click on OK, or
 Review the results contained in can_iso.img.
Tutorial: multispectral classification
(cont.)
 Regions of Interest (ROI)
• Select Training Sets Using Regions of Interest (ROI)
• Restore Predefined ROIs
Choosing from the #1 Main Image menu bar Overlay → Region of
Interest
Choose File → Restore ROIs
File: CLASSES.ROI
• Create Your Own ROIs
Overlay → Region of Interest
Draw a polygon
Fix the polygon by clicking the right mouse button a second time
New Region
Edit
Tutorial: multispectral classification
(cont.)
 Supervised Classification
• Supervised classification requires that the user
select training areas for use as the basis for
classification
• Classification → Supervised → [method]
[method] is one of the supervised classification methods in
the pull-down menu (Parallelepiped, Minimum Distance,
Mahalanobis Distance, Maximum Likelihood, Spectral
Angle Mapper, Binary Encoding, or Neural Net). Use one of
the two methods below for selecting training areas, also
known as regions of interest (ROIs).
Tutorial: multispectral classification
(cont.)
 Classical Supervised Multispectral Classification
• Parallelepiped
 Uses a simple decision rule to classify multispectral data. The decision boundaries form
an n-dimensional parallelepiped in the image data space. The dimensions of the
parallelepiped are defined based upon a standard deviation threshold from the mean of
each selected class
 Pre-saved results are in the file can_pcls.img
 Perform your own classification using the CLASSES.ROI regions of interest
• Maximum Likelihood
 Assumes that the statistics for each class in each band are normally distributed
 Calculates the probability that a given pixel belongs to a specific class
 Unless a probability threshold is selected, all pixels are classified
 Each pixel is assigned to the class that has the highest probability
• Minimum Distance
 Uses the mean vectors of each ROI and calculates the Euclidean distance from each
unknown pixel to the mean vector for each class
• Mahalanobis Distance
 A direction sensitive distance classifier that uses statistics for each class
 Assumes all class covariances are equal and therefore is a faster method
Tutorial: multispectral classification
(cont.)
 Spectral Classification Methods
• Developed specifically for use on Hyperspectral data,
but provide an alternative/improved method for
classifying multispectral data
• The Endmember Collection Dialog
Spectral → Mapping Methods → Endmember Collection
(Classification → Endmember Collection)
Open File
 File: can_tmr.img
 Endmember Collection: Parallel dialog
Algorithm → [method]
 [method] represents: Parallelepiped, Minimum Distance, Manlanahobis Distance,
Maximum Likelihood, Binary Encoding, and the Spectral Angle Mapper (SAM)
Tutorial: multispectral classification
(cont.)
 Spectral Classification Methods (cont.)
• Binary Encoding Classification
Encodes the data and endmember spectra into 0s and 1s based on
whether a band falls below or above the spectrum mean
An exclusive OR function is used to compare each encoded reference
spectrum with the encoded data spectra and a classification image is
produced
All pixels are classified to the endmember with the greatest number of
bands that match unless the user specifies a minimum match threshold,
in which case some pixels may be unclassified if they do not meet the
criteria
Algorithm → Binary Encoding





Import → from ROI from Input File
Select All Items
Endmember Spectra
Options → Plot Endmembers
Apply
Binary Encoding Parameters
Tutorial: multispectral classification
(cont.)
 Spectral Classification Methods (cont.)
• Spectral Angle Mapper Classification
Uses the n-dimensional angle to match pixels to reference
spectra
Determines the spectral similarity between two spectra by
calculating the angle between the spectra, treating them as
vectors in a space with dimensionality equal to the number
of bands
Endmember Collection
Algorithm → Spectral Angle Mapper
Tutorial: multispectral classification
(cont.)
 Post Classification Processing
• Classification Method → Rule Image Values
Represent






Parallelepiped Number of bands that satisfied the parallelepiped criteria
Minimum Distance Sum of the distances from the class means
Maximum Likelihood Probability of pixel belonging to class
Mahalanobis Distance Distances from the class means
Binary Encoding Binary Match in Percent
Spectral Angle Mapper Spectral Angle in Radians
Tools → Color Mapping → ENVI Color Tables
 Stretch Bottom and Stretch Top sliders
Cursor Location/Value
Classification → Post Classification → Rule Classifier
 File: can_tmr.sam
 Rule Image Classifier Tool
Tutorial: multispectral classification
(cont.)
 Post Classification Processing (cont.)
• Class Statistics
Classification → Post Classification → Class Statistics
Select All Items
• Confusion Matrix
Comparison of two classified images (the classification and the “truth”
image), or a classified image and ROIs
The truth image can be another classified image, or an image created
from actual ground truth measurements
• Classification → Post Classification → Confusion
Matrix → [method]
Using Ground Truth Image, or Using Ground Truth ROIs.
• Match Classes Parameters dialog
Tutorial: multispectral classification
(cont.)
 Post Classification Processing (cont.)
• Clump and Sieve
 For generalizing classification images, Sieve is usually run first to remove the
isolated pixels based on a size (number of pixels) threshold. Clump is run to add
spatial coherency to existing classes by combining adjacent similar classified
areas
• Compare the pre-calculated results in the files can_sv.img
(sieve) and can_clmp.img (clump of the sieve result) to the
classified image can_pcls.img
• Classification → Post Classification → Sieve Classes
• Classification → Post Classification → Clump Classes
• Combine Classes
• Classification → Post Classification → Combine Classes
 File: can_sam.img
 Add Combination
Tutorial: multispectral classification
(cont.)
 Post Classification Processing (cont.)
• Edit Class Colors
 Tools → Color Mapping → Class Color Mapping
 To make the changes permanent, select Options → Save Changes
• Overlay Classes
 Classification → Post Classification → Overlay Classes
 Select can_tmr.img band 3 for each RGB band
 Use can_comb.img as the classification input
• Interactive Classification Overlays
 Interactively toggle classes on and off as overlays on a displayed image, to edit
classes, get class statistics, merge classes, and edit class colors.
 Display band 4 of can_tmr.img
 Overlay → Classification
 Try the various options for assessing the classification under the Options menu
 Choose various options under the Edit menu to interactively change the
contents of specific classes
 File → Save Image As → [Device]
Tutorial: multispectral classification
(cont.)
 Post Classification Processing (cont.)
• Classes to Vector Layers
• Overlay → Vectors
 File: can_clmp.img
• File → Open Vector File → ENVI Vector File
 Files: can_v1.evf and can_v2.evf.
 Select All Layers
 Load Selected
• Classification → Post Classification → Classification to Vector
 Raster to Vector Input Band dialog.
 Choose the generalized image can_clmp.img
 Select Region #1 and Region #2 and enter the root name canrtv
 Load Selected at the bottom of the dialog.
 Load Vector
 Edit→Edit Layer Properties
• Classification Keys Using Annotation
 Overlay → Annotation
 Object → Map Key
 Edit Map Key Items

Image Classification

Transcript Image Classification

Directory