2003 DIP Chap 4
Download
Report
Transcript 2003 DIP Chap 4
Initial Statistical Extraction and
Image Quality Assessment
Dr. John R. Jensen
Department of Geography
University of South Carolina
Columbia, SC 29208
Jensen, 2003
Image Processing System Considerations
The analyst responsible for analyzing the digital remote sensor
data must first assess its quality. This is normally
performed by:
1. Computing fundamental image statistics and evaluating
them to see if there are any unusual anomalies in the image
data that might be of concern, and
2. performing a subjective evaluation of the appearance of the
remote sensor data.
Jensen, 2003
Image Processing Mathematical Notation
The following notation will be used to describe the
mathematical operations applied to the digital remote sensor
data:
i = a row (or line) in the imagery
j = a column (or sample) in the imagery
k = a band of imagery
l = another band of imagery
n = total number of picture elements (pixels) in an array
BVijk = brightness value in a row i, column j, of band k
BVik = ith brightness value in band k
Jensen, 2003
Image Processing Mathematical Notation
BVil = ith brightness value in band l
mink = minimum value of band k
maxk = maximum value of band k
rangek = range of actual brightness values in band k
quantk = quantization level of band k (e.g., 28 = 0 to 255;
212 = 0 to 4095)
µk = mean of band k
vark = variance of band k
sk = standard deviation of band k
Jensen, 2003
Image Processing Mathematical Notation
skewnessk = skewness of a band k distribution
kurtosisk = kurtosis of a band k distribution
covkl = covariance between pixel values in two bands,
k and l
rkl = correlation between pixel values in two bands,
k and l
Xc = measurement vector for class c composed of
brightness values (BVijk) from row i, column j, and
band k
Jensen, 2003
Image Processing Mathematical Notation
Mc = mean vector for class c
Md = mean vector for class d
µck = mean value of the data in class c, band k
sck = standard deviation of the data in class c, band k
vckl = covariance matrix of class c for bands k through l;
shown as Vc
vdkl = covariance matrix of class d for bands k through l;
shown as Vd
Jensen, 2003
Remote Sensing Sampling Theory
A population is an infinite or finite set of elements. An
infinite population could be all possible images that might be
acquired of the Earth in 2001. All Landsat 7 ETM+ images of
Charleston, S.C. in 2001 is a finite population.
A sample is a subset of the elements taken from a population
used to make inferences about certain characteristics of the
population. For example, we might decide to analyze a June
1, 2001, Landsat image of Charleston. If observations with
certain characteristics are systematically excluded from the
sample either deliberately or inadvertently (such as selecting
images obtained only in the spring of the year), it is a biased
sample. Sampling error is the difference between the true
value of a population characteristic and the value of that
characteristic inferred from a sample.
Remote Sensing Sampling Theory
• Large samples drawn randomly from natural populations
usually produce a symmetrical frequency distribution. Most
values are clustered around some central value, and the
frequency of occurrence declines away from this central
point. A graph of the distribution appears bell shaped and is
called a normal distribution.
• Many statistical tests used in the analysis of remotely
sensed data assume that the brightness values recorded in a
scene are normally distributed. Unfortunately, remotely
sensed data may not be normally distributed and the analyst
must be careful to identify such conditions. In such instances,
nonparametric statistical theory may be preferred.
Jensen, 2003
Common
Symmetric and
Skewed
Distributions in
Remotely Sensed
Data
Jensen, 2003
Remote Sensing Sampling Theory
• The histogram is a useful graphic representation of the
information content of a remotely sensed image.
•It is instructive to review how a histogram of a single band
of imagery, k, composed of i rows and j columns with a
brightness value BVijk at each pixel location is constructed.
Jensen, 2003
Histogram of A
Single Band of
Landsat Thematic
Mapper Data of
Charleston, SC
Jensen, 2003
Histogram of
Thermal Infrared
Imagery of a
Thermal Plume
in the Savannah
River
Jensen, 2003
Remote Sensing Metadata
Metadata is “data or information about data”. Most quality
digital image processing systems read, collect, and store
metadata about a particular image or sub-image. It is
important that the image analyst have access to this metadata
information. In the most fundamental instance, metadata
might include:
the file name, date of last modification, level of quantization
(e.g, 8-bit), number of rows and columns, number of bands,
univariate statistics (minimum, maximum, mean, median,
mode, standard deviation), perhaps some multivariate
statistics, geo-referencing performed (if any), and pixel size.
Jensen, 2003
Viewing Individual Pixels
Viewing individual pixel brightness values in a remotely
sensed image is one of the most useful methods for
assessing the quality and information content of the data.
Virtually all digital image processing systems allow the
analyst to:
1. use a mouse-controlled cursor (cross-hair) to
identify a geographic location in the image (at a
particular row and column or geographic x,y
coordinate) and display its brightness value in n
bands,
2. display the individual brightness values of an
individual band in a matrix (raster) format.
Jensen, 2003
Cursor and Raster Display of Brightness Values
Jensen, 2003
Individual Pixel Display of
Brightness Values
Jensen, 2003
Raster Display of Brightness Values
Jensen, 2003
ThreeDimensional
Evaluation of
Pixel Brightness
Values within a
Geographic Area
Jensen, 2003
Remote Sensing Univariate Statistics
The mean of a single band of imagery composed of n
brightness values (BVik) is computed using the formula:
n
mk
BV
i 1
ik
n
The sample mean, mk, is an unbiased estimate of the
population mean. For symmetrical distributions, the sample
mean tends to be closer to the population mean than any other
unbiased estimate (such as the median or mode).
Unfortunately, the sample mean is a poor measure of central
tendency when the set of observations is skewed or contains
an extreme value.
Jensen, 2003
Sample Hypothetical Dataset of Brightness Values
Pixel
Band 1
(green)
Band 2
(red)
Band 3
(nearinfrared)
Band 4
(nearinfrared)
(1,1)
130
57
180
205
(1,2)
165
35
215
255
(1,3)
100
25
135
195
(1,4)
135
50
200
220
(1,5)
145
65
205
235
Jensen, 2003
Univariate Statistics for the Hypothetical Sample Dataset
Band 1
(green)
Band 2
(red)
Band 3
(nearinfrared)
Band 4
(nearinfrared)
Mean (mk)
135
46.40
187
222
Variance
(vark)
562.50
264.80
1007
570
Standard
deviation
(sk)
23.71
16.27
31.4
23.87
Minimum
(mink)
100
25
135
195
Maximum
(maxk)
165
65
215
255
Range (BVr)
65
40
80
60
Jensen, 2003
Remote Sensing Univariate Statistics - Variance
The variance of a sample is the average squared deviation of
all possible observations from the sample mean. The variance
of a band of imagery, vark, is computed using the equation:
n
vark
BV
i 1
mk
2
ik
n
The numerator of the expression is the corrected sum of
squares (SS). If the sample mean (mk) were actually the
population mean, this would be an accurate measurement of
the variance.
Jensen, 2003
Remote Sensing Univariate Statistics
Unfortunately, there is some underestimation because the
sample mean was calculated in a manner that minimized the
squared deviations about it. Therefore, the denominator of the
variance equation is reduced to n – 1, producing a larger,
unbiased estimate of the sample variance;
SS
vark
n 1
Jensen, 2003
Univariate Statistics for the Hypothetical Sample Dataset
Band 1
(green)
Band 2
(red)
Band 3
(nearinfrared)
Band 4
(nearinfrared)
Mean (mk)
135
46.40
187
222
Variance
(vark)
562.50
264.80
1007
570
Standard
deviation
(sk)
23.71
16.27
31.4
23.87
Minimum
(mink)
100
25
135
195
Maximum
(maxk)
165
65
215
255
Range (BVr)
65
40
80
60
Jensen, 2003
Remote Sensing Univariate Statistics
The standard deviation is the positive square root of the
variance. The standard deviation of the pixel brightness values
in a band of imagery, sk, is computed as
sk k vark
Jensen, 2003
Jensen, 2003V
Univariate Statistics for the Hypothetical Sample Dataset
Band 1
(green)
Band 2
(red)
Band 3
(nearinfrared)
Band 4
(nearinfrared)
Mean (mk)
135
46.40
187
222
Variance
(vark)
562.50
264.80
1007
570
Standard
deviation
(sk)
23.71
16.27
31.4
23.87
Minimum
(mink)
100
25
135
195
Maximum
(maxk)
165
65
215
255
Range (BVr)
65
40
80
60
Jensen, 2003
Remote Sensing Univariate Statistics
Skewness is a measure of the asymmetry of a histogram and is
computed using the formula
BVik m k
sk
i 1
skewnessk
n
n
3
Jensen, 2003
Remote Sensing Univariate Statistics
A histogram may be symmetric but have a peak that is very
sharp or one that is subdued when compared with a perfectly
normal distribution. A perfectly normal distribution (histogram)
has zero kurtosis. The greater the positive kurtosis value, the
sharper the peak in the distribution when compared with a
normal histogram. Conversely, a negative kurtosis value
suggests that the peak in the histogram is less sharp than that of
a normal distribution. Kurtosis is computed using the formula
1 n BV m
k
kurtosisk ik
sk
n i 1
4
3
Jensen, 2003
Remote Sensing Multivariate Statistics
The different remote-sensing-derived spectral measurements for
each pixel often change together in some predictable fashion. If
there is no relationship between the brightness value in one
band and that of another for a given pixel, the values are
mutually independent; that is, an increase or decrease in one
band’s brightness value is not accompanied by a predictable
change in another band’s brightness value. Because spectral
measurements of individual pixels may not be independent,
some measure of their mutual interaction is needed. This
measure, called the covariance, is the joint variation of two
variables about their common mean.
Jensen, 2003
Remote Sensing Multivariate Statistics
To calculate covariance, we first compute the corrected sum of
products (SP) defined by the equation
n
SPkl BVik m k BVil m l
i 1
Jensen, 2003
Remote
RemoteSensing
SensingUnivariate
Multivariate
Statistics
Statistics
It is computationally more efficient to use the following
formula to arrive at the same result:
n
n
SPkl BVik BVil
i 1
n
BV BV
i 1
ik
i 1
il
n
This quantity is called the uncorrected sum of products.
Jensen, 2003
Remote Sensing Multivariate Statistics
Just as simple variance was calculated by dividing the corrected
sums of squares (SS) by (n – 1), covariance is calculated by
dividing SP by (n – 1). Therefore, the covariance between
brightness values in bands k and l, covkl, is equal to
SPkl
cov kl
n 1
Jensen, 2003
Format of a Variance-Covariance Matrix
Band 1
(green)
Band 2
(red)
Band 3
(nearinfrared)
Band 4
(nearinfrared)
Band 1
SS1
cov1,2
cov1,3
cov1,4
Band 2
cov2,1
SS2
cov2,3
cov2,4
Band 3
cov3,1
cov3,2
SS3
cov3,4
Band 4
cov4,1
cov4,2
cov4,3
SS4
Jensen, 2003
Computation of Variance-Covariance Between
Bands 1 and 2 of the Sample Data
Band 1
(Band 1 x Band 2)
Band 2
130
7,410
57
165
5,775
35
100
2,500
25
135
6,750
50
145
9,425
65
675
31,860
232
SP12 (31,860)
cov12
675232
540
135
4
5
Jensen, 2003
Variance-Covariance Matrix of the Sample Data
Band 1
(green)
Band 2
(red)
Band 3
(nearinfrared)
Band 4
(nearinfrared)
Band 1
562.25
-
-
-
Band 2
135
264.80
-
-
Band 3
718.75
275.25
1007.50
-
Band 4
537.50
64
663.75
570
Jensen, 2003
Remote Sensing Multivariate Statistics
To estimate the degree of interrelation between variables in a
manner not influenced by measurement units, the correlation
coefficient, r, is commonly used. The correlation between two
bands of remotely sensed data, rkl, is the ratio of their
covariance (covkl) to the product of their standard deviations
(sksl); thus:
cov kl
rkl
s k sl
Jensen, 2003
Correlation Matrix for the Sample Data
Band 1
(green)
Band 2
(red)
Band 3
(nearinfrared)
Band 4
(nearinfrared)
Band 1
-
-
-
-
Band 2
0.35
-
-
-
Band 3
0.95
0.53
-
-
Band 4
0.94
0.16
0.87
Jensen, 2003
Band
1
2
3
4
5
6
7
Min
51
17
14
5
0
0
102
Max
242
115
131
105
193
128
124
Mean
Standard Deviation
65.163137
10.231356
25.797593
5.956048
23.958016
8.469890
26.550666
15.690054
32.014001
24.296417
15.103553
12.738188
110.734372
4.305065
Covariance Matrix
Band Band 1
Band 2
1 104.680654 58.797907
2 58.797907 35.474507
3 82.602381 48.644220
4 69.603136 45.539546
5 142.947000 90.661412
6 94.488082 57.877406
7 24.464596 14.812886
Correlation Matrix
Band Band 1 Band 2
1 1.000000 0.964874
2 0.964874 1.000000
3 0.953195 0.964263
4 0.433582 0.487311
5 0.575042 0.626501
6 0.724997 0.762857
7 0.555425 0.577699
Band 3
82.602381
48.644220
71.739034
76.954037
149.566052
91.234270
23.827418
Band 3
0.953195
0.964263
1.000000
0.579068
0.726797
0.845615
0.653461
Band 4
69.603136
45.539546
76.954037
246.177785
342.523400
157.655947
46.815767
Band 4
0.433582
0.487311
0.579068
1.000000
0.898511
0.788821
0.693087
Univariate and
Multivariate
Statistics of Landsat
TM Data of
Charleston, SC
Band 5
142.947000
90.661412
149.566052
342.523400
590.315858
294.019002
82.994241
Band 5
0.575042
0.626501
0.726797
0.898511
1.000000
0.950004
0.793462
Band 6
0.724997
0.762857
0.845615
0.788821
0.950004
1.000000
0.814648
Band 6
94.488082
57.877406
91.234270
157.655947
294.019002
162.261439
44.674247
Band 7
24.464596
14.812886
23.827418
46.815767
82.994241
44.674247
18.533586
Band 7
0.555425
0.577699
0.653461
0.693087
0.793462
0.814648
1.000000
Jensen, 2003
3-Dimensional
View of the
Thermal Infrared
Matrix of Data
Jensen, 2003
Two-dimensional
Feature Space
Plot of TM
Bands 3 and 4
Jensen, 2003