Lecture Notes

Download Report

Transcript Lecture Notes

Psych 5500/6500
Measures of Variability
Fall, 2008
1
Measures of Variability
We will look at three ways of measuring how much
the scores differ from each other.
1. Mean Absolute Deviation
2. Variance
3. Standard Deviation
These approaches are based upon the concept that
you can tell how much the scores differ from
each other by looking at how much they differ
from the mean.
2
Spatial Analogy
The two figures below represent the location of
students within two rooms. In room A the
variability of location is large, in room B it is small.
3
Spatial Analogy (cont.)
How might we want to measure the variability of
location? One solution would be to find the
average distance each student is from every
other student. These distances are reflected
below.
4
Spatial Analogy (cont.)
A simpler procedure would be to find the geographic
center of the students, and measure the average
distance each student is from that.
5
Variability
In a similar fashion, if we want to know how close
scores are to each other we could measure how
close each score is to every other score. A
simpler approach would be to measure how close
each score is to the mean (which is located in the
center of the scores). If all the scores are similar
in value then they will all be close to the mean, if
the scores differ a great deal some will be far from
the mean.
6
Variability: Example 1
Sample : Y  11, 7, 10, 9, 11, 12.
Y  10.
Step 1: determine how far each score is from the mean
Y
-
11
7
10
9
11
12
-
Y
10
10
10
10
10
10
=
=
=
=
=
=
=
deviation from mean
1
-3
0
-1
1
2
7
Mean (Interesting Property #2)
Given our spatial analogy, it would make sense to add up the
distances (deviations) from the mean, and find the average
deviation as our measure of variability. This does not work
however, as the sum of the deviations from the mean (e.g.
1 + -3 + 0 + -1 + 1 + 2) always equals zero. The mean is
the only value for which this is true (i.e. plug any number
other than 10 into the table on the previous slide and you
will not get a sum of deviations equal to zero). The
problem here--the reason we end up with an average
distance of zero--is that we allow there to be both negative
and positive distances from the mean, it would make more
sense not to allow that.
8
Mean Absolute Deviation
One solution would be to use the absolute value of the
deviations, and find the average (mean) of those
deviations as our measure of variability.
Y
-
11
7
10
9
11
12
-
Y
10
10
10
10
10
10
=
deviations
|deviations|
=
=
=
=
=
=
1
-3
0
-1
1
2
1
3
0
1
1
2
mean deviation  (1  3  0  1  1  2)/6  1.33
9
Variance
Another approach, that used by the ‘variance’, is to
square each deviation, and find the average
(mean) of those.
Y
11
7
10
9
11
12
-
Y
10
10
10
10
10
10
= deviations
=
1
=
-3
=
0
=
-1
=
1
=
2
Mean squared deviation = (1+9+0+1+1+4)/6=2.67
deviations²
1
9
0
1
1
4
10
Variance (defined)
The variance, then, is the average squared distance
each score is from the mean. This is technically
stated as ‘the mean squared deviation from the
mean’. The spatial analogy still applies, but we
square each distance before finding the average.
11
Variance (continued)
The variance will be the mean of the squared deviations
(1+9+0+1+1+4)/6=16/6=2.67. If the scores are similar to
each other the mean squared deviation will be small. If the
scores differ a lot the mean squared deviation will be larger.
Y
11
7
10
9
11
12
-
Y
10
10
10
10
10
10
= deviations
=
1
=
-3
=
0
=
-1
=
1
=
2
deviations²
1
9
0
1
1
4
12
Sum of Squares
To find the variance we need to sum the squared
deviations and divide by N, that sum of squared
deviations has a name.
Y
-
Y
=
deviations
deviations²
11
-
10
=
1
1
7
-
10
=
-3
9
10
-
10
=
0
0
9
-
10
=
-1
1
11
-
10
=
1
1
12
-
10
=
2
4
“Sum of the squared deviations” = 1+9+0+1+1+4=16
This is usually abbreviated to “Sum of Squares”, or simply
“SS”.
13
Variance (computation)
The process we used to compute SS is called the
' definition al formula' for SS, and is : SS   (Y - Y) 2
The symbol for the variance of the sample is S2 .
SS
S 

n
2
2
(Y
Y
)

n
16
  2.67
6
Note that as we are summing squared values there is no way for
SS (or the variance) to be a negative number.
14
N or (N-1) ?
We have defined the variance of our sample as being:
SS
S 
N
2
You may have encountered a similar, but different, formula for
variance that has (N-1) in the denominator. That is actually
something different, and we will be covering it in the next
lecture. Note that when SPSS gives you ‘variance’ it uses the
(N-1) formula.
15
Variance: Example 2
Let’s look at another sample:
Example 2: Y = 5, 10, 18, 8, 7, 12
Compare that to our first sample where:
Example 1: Y = 11, 7, 10, 9, 11, 12
Note that example 2 has greater variability among
its scores, as variance measures variability the
variance of example 2 should be greater than the
variance of example 1.
16
Example 2 (Cont.)
Y = 5, 10, 18, 8, 7, 12. Again the mean is 10.
Y
-
Y
=
deviations
5
-
10
=
5
25
10
-
10
=
0
0
18
-
10
=
8
64
8
-
10
=
-2
4
7
-
10
=
-3
9
12
-
10
=
2
4
Note how much the score of 18
added to the SS when its deviation
of 8 was squared.
deviations²
SS= 106
106
S 
 17.67
6
2
17
Effect of scores that are far from
the mean
Because the variance is the average squared
distance each score is from the mean,
scores that are far from the mean have a
disproportionate effect on the variance. A
score that is 1 away from the mean adds 12
=1 to the SS, a score that is 10 away from
the mean adds 102 = 100 to the SS.
18
Mean (Interesting Property #3)
The mean of the scores will give you the
smallest possible ‘sum of squared
deviations’. In other words, if you used any
number other than the mean (10) to compute
SS in the previous examples then the
resulting value of SS would have been
larger.
19
Variance: Example 3
Y = 10, 10, 10, 10, 10, 10. Again the mean is 10.
Note the scores are identical, variance should be
zero.
Y
= deviations
deviations²
Y
10
-
10
=
0
0
10
-
10
=
0
0
10
-
10
=
0
0
10
-
10
=
0
0
10
-
10
=
0
0
10
-
10
=
0
0
0
2
S  0
6
SS= 0
20
Formulas for SS
The definitional formula for SS has the advantage of making
it clear just exactly what ‘SS’ is, the ‘sum of the squared
deviations’:
Definition al formula : SS   (Y - Y) 2
The definitional formula has the disadvantage of being slow
and cumbersome for large data sets. A much faster way to
compute SS using a calculator is with the computational formula.

Y

: SS   Y 
n
2
Computatio nal formula
2
21
Computational Formula
Y  11, 7, 10, 9, 11, 12

Y

: SS   Y 
n
2
Computatio nal formula
2
2
2
2
2
2
2
2
Y

11

7

10

9

11

12
 616

 Y  11  7  10  9  11  12  60
 Y   60  3600 n  6
2
2
3600
SS  616 
 616  600  16
6
22
Interpreting Variance
So what does knowing, for example, that a sample
has a variance of ’10’ tell us about the sample?
Well, it tells us that the average squared distance
each score is from the mean is 10.
The variance also has meaning when it comes to
comparing two samples. If sample A had a
variance of 6 and sample B had a variance of 8,
then the scores in sample B varied more than the
scores in sample A.
And finally, if the variance of sample equals zero that
tells us that all the scores were identical.
23
Standard Deviation
The last measure of variability we will consider
is the standard deviation. It is simply the
positive square root of the variance. Its
symbol is ‘S’.
S S
2
Example 1 : S  2.67  1.63
Example 2 : S  17.67  4.20
Example 3 : S  0  0
24
Interpreting Standard Deviation
So what does knowing, for example, that a
sample has a standard deviation of ‘4’ tells
us about the sample? Well, it tells us that
the square root of the average squared
deviation from the mean is ‘4’. As we will
see in a future lecture, knowing the
standard deviation is both interesting and
comprehendible. Until then...
25
Interpreting Standard Deviation
(cont.)
...at least you know that, as with the variance,
if sample A has a standard deviation of 3
and sample B has a standard deviation of
5, then the scores in sample B differed
more than the scores in sample A.
And that if a sample had a standard deviation
of zero that means that all of the scores
were identical.
26