Transcript Chapter 4
Chapter 4
Variability
Variability
In statistics, our goal is to measure the
amount of variability for a particular set of
scores, a distribution.
If all the scores are the same no variability
If small difference, variability is small
If large difference, variability is large
Variability
Variability provides a quantitative measure
of the degree to which scores in a
distribution are spread out or clustered
together.
Goal: to describe how spread out the
scores are in a distribution
Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the
Wadsworth Group, a division of Thomson Learning
Figure 4.1
Population distributions of heights and weights
Variability (cont.)
Variability will serve two purposes
Describe the distribution
Close
together
Spread out over a large distance
Measure how well an individual score (or
group of scores) represents the entire
distribution
Variability (cont.)
Variability provides information about how
much error to expect when you are using a
sample to represent a population.
Three measures of variability
Range
Interquartile range
Standard deviation
Range
The range is the difference between the
upper real limit of the largest (maximum) X
value and the lower real limit of the
smallest (minimum) X value.
Range is the most obvious way to
describe how spread out the scores are.
Range (cont.)
Problem: Completely determined by the
two extreme values and ignores the other
scores in the distribution.
It often does not give an accurate
description of the variability for the entire
distribution.
Considered a crude and unreliable
measure of variability
Interquartile Range and
Semi-Interquartile Range
Divide the distribution into four equal parts
Q1, Q2, Q3
The interquartile range is defined as the
distance between the first quartile and the
third quartile
Interquartile Range
Semi-interquartile Range
Semi-interquartile Range
25%
25%
Q1
25%
Q2
25%
Q3
Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the
Wadsworth Group, a division of Thomson Learning
Figure 4.2
The interquartile range
Interquartile Range (cont.)
When the interquartile range is used to
describe variability, it commonly is
transformed into the semi-interquartile
range.
Semi-interquartile range is one-half of the
interquartile range
Interquartile Range (cont.)
Because the semi-interquartile range is
derived from the middle 50% of a
distribution, it is less likely to be influenced
by extreme scores and therefore gives a
better and more stable measure of
variability than the range.
Interquartile Range (cont.)
Does not take into account distances
between individual scores
Does not give a complete picture of how
scattered or clustered the scores are.
Standard Deviation
Most commonly used
Most important measure of variability
Standard deviation uses the mean of the
distribution as a reference point and
measures variability by considering the
distance between each score and the
mean.
Standard Deviation (cont.)
Are the scores clustered or scattered?
Deviation is the average distance and
direction from the mean.
Standard Deviation (cont.)
Goal of standard deviation is to measure
the standard, or typical, distance from the
mean.
Deviation is the distance and direction
from the mean
deviation score = X - m
Standard Deviation (cont.)
Step 1
Determine the deviation or distance from the
mean for each individual score.
If
m = 50
X = 53
deviation score = X – m
= 53-50
= +3
Standard Deviation (cont.)
If m = 50
X = 45
deviation score = X – m
= 45-50
= -5
Standard Deviation (cont.)
Step 2: Calculate the mean of the
deviation scores
Add the derivation scores
Divide by N
Standard Deviation (cont.)
X
X–m
8
+5
1
-2
3
0
0
-3
Deviation scores must add up to zero
S(X – m) = 0
Standard Deviation (cont.)
Step 3: Square each deviation score.
Why? The average of the deviation
scores will not work as a measure of
variability.
Why? They always add up to zero
Standard Deviation (cont.)
Step 3 cont.:
Using the squared values, you can now
compute the mean squared deviation
This is called variance
Variance = mean squared deviation
Standard Deviation (cont.)
By squaring the deviation scores:
You get rid of the + and –
You get a measure of variability based on
squared distances
This is useful for some inferential statistics
Note: This distance is not the best descriptive
measure for variability
Standard Deviation (cont.)
Step 4: Make a correction for squaring the
distances by getting the square root.
Standard deviation =
variance
Sum of Squared Deviations (SS)
Variance = mean squared deviation = SS
N
Definitional Formula
SS = S ( X – m)2
Sum of Squared Deviations (SS)
Definitional Formula
X–m
( X – m)2
=8
1
-1
1
m=2
0
-2
4
6
+4
16
1
-1
1
X
22 = S ( X – m)2
Computational Formula
SS = S X2 – (SX)2
N
Computational Formula for SS
X
X2
1
1
0
0
SS = SX2 – (SX)2
N
= 38 – (8)2
4
6
36
1
1
SX = 8
SX2 = 38
= 38 – 64
4
= 38 – 16
= 22
Definitional vs. Computational?
Definitional is most direct way of
calculating the sum of squares
However if you have numbers with
decimals, it can become cumbersome
Computation is most commonly used
Formulas
Variance = SS
N
Standard deviation = variance =
SS
N
Formulas (cont.)
Variance and standard deviation are
parameters of a population and will be
identified with a Greek letter – s or sigma
Population standard deviation = s = SS
N
Population variance = s2 = SS
N
Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the
Wadsworth Group, a division of Thomson Learning
Figure 4.4
Graphic presentation of the mean and standard deviation
Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the
Wadsworth Group, a division of Thomson Learning
Figure 4.5
Variability of a sample selected from a population
Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the
Wadsworth Group, a division of Thomson Learning
Figure 4.6
Largest and smallest distance from the mean
Example (pg. 94)
X
X–M
1
6
4
3
8
7
6
-4
1
-1
-2
3
2
1
( X – M)2
S X = 35
16
M = 35/7=5
1
n=7
1
4
9
4
1
36 = S ( X – M)2 = SS
Degrees of Freedom
Degrees of freedom, use for sample
variance
where n is the number of scores in the
sample.
With a sample of n scores, the first n-1
scores are free to vary
but the final score is restricted.
As a result, the sample is said to have
n-1 degrees of freedom
Degrees of Freedom
Degrees of freedom, or df, for sample
variance are defined as
df = n – 1
where n is the number of scores in the
sample.
Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the
Wadsworth Group, a division of Thomson Learning
Table 4.2
Reporting the mean and standard deviation in APA format