Unsupervised Classification

Download Report

Transcript Unsupervised Classification

Classification
Advantages of Visual Interpretation:
1. Human brain is the best data processor for information extraction, thus can do
much more complex operations that computers can not match.
2. Interpreters expertise can be part of the input
Disadvantages:
1. We can only see three bands at a time. Can not process more at a time.
2. Time consuming and costly. Can not be automated, thus not applicable for
large projects.
3. Requires training and experience to do a good work.
Computer-Aided Classification
Parametric: assumes normal distribution of the data
Supervised classification: provide the computer with some examples
of known features in multi-dimensional feature space. The computer will first
analyze the statistical parameters for the training data and then assign all
other pixels to one of the classes in the examples based on statistical similarity.
Unsupervised classification: Instead of providing the computer with examples
of features in multi-dimensional feature space, the users let the computer to
identify pre-specified number of spectral clusters among which the difference
between clusters are maximized and within clusters are minimized. It is the
users’ responsibility to assign a class label to each of the clusters. Note: one
class may have many clusters.
Hybrid Classification: It takes the advantage of both the supervised classification
and unsupervised classification.
1. Collect training sets. 2. Unsupervised classification to identify spectral clusters
within the training sets. 3. Classify image with the clusters. 4. Regroup the
clusters into original classes
Advantages of Computer-Aided Classifications
1.
Can be automated for large area application
2. Better consistency
3.
Can process as many images with as many bands as necessary
4. Though image processing experience are required for
classification, the analysts ground experience for what on the
ground looks like in the image with certain band combination
are not required.
Supervised Classification Algorithms
1. Minimum-distance-to-means classifier
The training set provides a mean value for each class for each band, the mean
vector. Each pixel is assigned to a class to which it has the shorted Euclidean
distance.
Band y
u
u u
u u
u
u
wwwww
w w w
u
u u ssss
u u ssss
u u
f
f
c c
c c
c
c
f
f
f
f
f f
Band x
Advantage: mathematically simple and computationally efficient.
Disadvantage: Not sensitive to degrees of variation for different classes.
For example the triangle point will be assigned to s class while it really is
u class. Because of this, it is not widely used.
Parallelpiped
Band y
u
u u
u u
u
u
wwwww
w w w
u
u u ssss
u u ssss
u u
f
f
c c
c c
c
c
f
f
f
f f
f
Band x
The rules to assign a pixel into a class are determined by the regions that are
spanned by the DN ranges of each category as shown above. The rectangular
regions are called parallelpipeds. The variance associated with each category
are considered. Pixels falling outside of the parallelpipeds belong to “unknown”
category and Pixels falling in the overlap area are “not sure” or arbitrarily
assigned to one of the two overlapping categories.
Advantages: fast and efficient computationally.
Disadvantages: overlapping of decision regions causes problems because of
high correlation among DN values across bands for different features.
Maximum Likelihood Classification
Assumptions: DN values
for each class are normally
distributed
-3
Class mean
Standard deviation
-2 -  +
+2
+3
68.3%
95.5%
99.7%
If a pixel falls outside the range of (+/-3), we can say that the probability that this pixel belongs
to the class is 0.3%, or we can say we are 99.7% sure that this pixels does not belong to this class.
Maximum Likelihood Classification
In a two dimensional spectral space, each class form a probability density region,
the probability for a pixel belongs to a center category is defined by its distance
measured in standard deviation. The pixel is assigned a class that is most likely
(maximum likelihood).
Disadvantages: Large number of computations, particularly many spectral bands
are involved.
Unsupervised Classification
Unsupervised classification does not use training data as the basis for
classification, but examine unknown pixels in an image and aggregate them
into a number of clusters based on natural groupings present in the image
values. The classification assumes that DN values within a given cover type
should be close together in the spectral space, while data in different classes
should be comparatively well separated.
The categories that unsupervised classification identify are not land cover or
use classes, but spectral classes or clusters. Some cover types may have
encompass multiple clusters, for example, agriculture land may have tobacco,
soy beans, sugar cane, rice, wheat etc. But they look different spectrally.
The analysts must provide labels to each of the clusters after unsupervised
classification using other sources of data.
Unsupervised Classification Algorithms
Band y
The K-means approach:
1. Draw a 45 degree line and divide into the
number of classes by the user. The center
of each segment is the mean for the classes
Class intervals
2. Assign the pixels to each of the classes based
on minimum distance rule.
3. Recalculate the mean center values for each
cluster and reassign the class membership
for each pixel
Mean centers
Band x
4.
Repeat step 3 until a pre-specified percentage
of pixels does not change class membership.
K-means Approach
iterations
Iteration of recalculating the means causes the mean points
migrate to new locations of cluster center
Nonparametric Classification-Neural Networks
Neural network classifies images in analogous to neural nerves.
It does not assume normality for the training sets, but rather it is
learning based. The neural network memorizes what it sees or
learns from the training data and then assign the class membership
to each pixel in the image.
Output layer (classes)
Hidden layer (F2 node)
Input layer (images)
Hybrid Classification
Step 1. Collect training sets as in the supervised classification.
Step 2. Perform unsupervised classification on the training sets
to get subclasses for each class.
Step 3. Perform supervised classification with the subclasses.
Step 4. Aggregate spectral subclasses in the classified images
back into the original classes
The above process is also called guided clustering. The advantage
of the guided clustering is that it allows very different spectral clusters
to represent the same class.
Hybrid Classification
agriculture
Hybrid classification allows these diverse subclass spectral clusters to aggregate
to form into subclasses, enhancing classification accuracy. Since we already know
the class membership for each of the clusters, it is trivial to put them back into
the same class afterwards.
Post Classification Fine Tuning: reclassification
1.
Taking out trouble classes or clusters (usually done by masking).
2.
Perform classification (usually unsupervised classification) to create
new small classes or (clusters). Because the majority of the data is masked
out from confusion. Better or cleaner small classes can be generated.
3.
Merge the re-classification with the majority of data.
4.
Sometimes, it may take a few iterations to achieve a satisfactory result.
Post Classification Fine Tuning-- Map Editing
The last step in classification is map editing which is often not mentioned. The
following steps may be required.
Smoothing: Using spatial filters to remove the “salt and pepper” effect that
is caused by isolated pixels in the middle of a large continuous class. This
can be done by running the following filters:
a) mean filter
b) majority filter
c) medium filter
d) sieve
2. Hand editing to correct the obviously misclassified pixels
This may be very time consuming. But improves map accuracy.
1.
Land use/Land cover Classification
Land cover: the type physcial feature present on the surface of the Earth.
For example, corn fields, lakes, forests, concrete highways.
Land use: refers to human activity or economic function associated with a
specific piece of land. For example, land use of agriculture can include
corn, rice, sugar cane, tobacco, orchards, … and so on, all of which are
different land cover types.
A knowledge of both land use and land cover can be important for land
planning and land management activities. Ideally, land use and land cover
should be presented on separate maps. In practice, it is often most efficient
to mix the two systems when remote sensing data for the principal data
source for such mapping activities.
Criteria for USGS Landuse/landcover Classification System
1.
2.
3.
The minimum interpretation accuracy with remotely sensed data is >=85%
Accuracy of interpretation for several categories should be equal.
Repeatable results from one interpreter to another and from one time of
sensing to another.
4. Applicable to extensive areas
5. Categorization permit land use be inferred from the land cover types
6. Suitable for use with remotely sensed data obtained at different times of the year.
7. Categories can be divided into more detailed subcagories that can be obtained
from large scale imagery or ground surveys.
8. Aggregation of categories be possible
9. Comparison with future land use and land cover data should be possible
10. Multiple uses of land should be recognized when possible.
Note: These criteria were setup prior to the widespread use of satellite imagery and computer-aided
classification. While most of the 10 criteria withstood the test of time, the first two criteria are not
always attainable when mapping land use and land cover over large, complex geographic areas.
USGS Landuse/Landcover
Classification System
for Use with Remotely
Sensed Data
Value
IGBP Land Cover
Description
0
Water Bodies
1
Evergreen Needleleaf Forest
2
Evergreen Broadleaf Forest
3
Deciduous Needleleaf Forest
4
Deciduous Broadleaf Forest
5
Mixed Forest
6
Closed Shrublands
7
Open Shrublands
8
Woody Savannas
9
Savannas
10
Grasslands
11
Permanent Wetlands
12
Croplands
13
Urban and Built-Up
14
Cropland/Natural Vegetation Mosaic
15
Permanent Snow and Ice
16
Barren or Sparsely Vegetated
17
Unclassified