Transcript Chapter 4

Chapter 4
Variability
Variability

In statistics, our goal is to measure the
amount of variability for a particular set of
scores, a distribution.
If all the scores are the same no variability
 If small difference, variability is small
 If large difference, variability is large

Variability
Variability provides a quantitative measure
of the degree to which scores in a
distribution are spread out or clustered
together.
 Goal: to describe how spread out the
scores are in a distribution

Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the
Wadsworth Group, a division of Thomson Learning
Figure 4.1
Population distributions of heights and weights
Variability (cont.)

Variability will serve two purposes

Describe the distribution
 Close
together
 Spread out over a large distance

Measure how well an individual score (or
group of scores) represents the entire
distribution
Variability (cont.)
Variability provides information about how
much error to expect when you are using a
sample to represent a population.
 Three measures of variability

Range
 Interquartile range
 Standard deviation

Range
The range is the difference between the
upper real limit of the largest (maximum) X
value and the lower real limit of the
smallest (minimum) X value.
 Range is the most obvious way to
describe how spread out the scores are.

Range (cont.)
Problem: Completely determined by the
two extreme values and ignores the other
scores in the distribution.
 It often does not give an accurate
description of the variability for the entire
distribution.
 Considered a crude and unreliable
measure of variability

Interquartile Range and
Semi-Interquartile Range
Divide the distribution into four equal parts
 Q1, Q2, Q3
 The interquartile range is defined as the
distance between the first quartile and the
third quartile

Interquartile Range
Semi-interquartile Range
Semi-interquartile Range
25%
25%
Q1
25%
Q2
25%
Q3
Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the
Wadsworth Group, a division of Thomson Learning
Figure 4.2
The interquartile range
Interquartile Range (cont.)
When the interquartile range is used to
describe variability, it commonly is
transformed into the semi-interquartile
range.
 Semi-interquartile range is one-half of the
interquartile range

Interquartile Range (cont.)

Because the semi-interquartile range is
derived from the middle 50% of a
distribution, it is less likely to be influenced
by extreme scores and therefore gives a
better and more stable measure of
variability than the range.
Interquartile Range (cont.)
Does not take into account distances
between individual scores
 Does not give a complete picture of how
scattered or clustered the scores are.

Standard Deviation
Most commonly used
 Most important measure of variability
 Standard deviation uses the mean of the
distribution as a reference point and
measures variability by considering the
distance between each score and the
mean.

Standard Deviation (cont.)
Are the scores clustered or scattered?
 Deviation is the average distance and
direction from the mean.

Standard Deviation (cont.)
Goal of standard deviation is to measure
the standard, or typical, distance from the
mean.
 Deviation is the distance and direction
from the mean
deviation score = X - m

Standard Deviation (cont.)

Step 1

Determine the deviation or distance from the
mean for each individual score.
If
m = 50
X = 53
deviation score = X – m
= 53-50
= +3
Standard Deviation (cont.)
If m = 50
X = 45
deviation score = X – m
= 45-50
= -5
Standard Deviation (cont.)

Step 2: Calculate the mean of the
deviation scores
Add the derivation scores
 Divide by N

Standard Deviation (cont.)
X
X–m
8
+5
1
-2
3
0
0
-3
Deviation scores must add up to zero
S(X – m) = 0
Standard Deviation (cont.)
Step 3: Square each deviation score.
 Why? The average of the deviation
scores will not work as a measure of
variability.
 Why? They always add up to zero

Standard Deviation (cont.)
Step 3 cont.:
 Using the squared values, you can now
compute the mean squared deviation
 This is called variance


Variance = mean squared deviation
Standard Deviation (cont.)

By squaring the deviation scores:
You get rid of the + and –
 You get a measure of variability based on
squared distances
 This is useful for some inferential statistics
 Note: This distance is not the best descriptive
measure for variability

Standard Deviation (cont.)

Step 4: Make a correction for squaring the
distances by getting the square root.

Standard deviation =
variance
Sum of Squared Deviations (SS)

Variance = mean squared deviation = SS
N
Definitional Formula
SS = S ( X – m)2
Sum of Squared Deviations (SS)
Definitional Formula
X–m
( X – m)2
=8
1
-1
1
m=2
0
-2
4
6
+4
16
1
-1
1
X
22 = S ( X – m)2

Computational Formula
SS = S X2 – (SX)2
N
Computational Formula for SS
X
X2
1
1
0
0
SS = SX2 – (SX)2
N
= 38 – (8)2
4
6
36
1
1
SX = 8
SX2 = 38
= 38 – 64
4
= 38 – 16
= 22
Definitional vs. Computational?
Definitional is most direct way of
calculating the sum of squares
 However if you have numbers with
decimals, it can become cumbersome
 Computation is most commonly used

Formulas
Variance = SS
N
 Standard deviation = variance =

SS
N
Formulas (cont.)

Variance and standard deviation are
parameters of a population and will be
identified with a Greek letter – s or sigma
Population standard deviation = s = SS
N
Population variance = s2 = SS
N
Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the
Wadsworth Group, a division of Thomson Learning
Figure 4.4
Graphic presentation of the mean and standard deviation
Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the
Wadsworth Group, a division of Thomson Learning
Figure 4.5
Variability of a sample selected from a population
Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the
Wadsworth Group, a division of Thomson Learning
Figure 4.6
Largest and smallest distance from the mean
Example (pg. 94)
X
X–M
1
6
4
3
8
7
6
-4
1
-1
-2
3
2
1
( X – M)2
S X = 35
16
M = 35/7=5
1
n=7
1
4
9
4
1
36 = S ( X – M)2 = SS
Degrees of Freedom
Degrees of freedom, use for sample
variance
 where n is the number of scores in the
sample.
 With a sample of n scores, the first n-1
scores are free to vary
 but the final score is restricted.
 As a result, the sample is said to have
n-1 degrees of freedom

Degrees of Freedom
Degrees of freedom, or df, for sample
variance are defined as
df = n – 1
where n is the number of scores in the
sample.

Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the
Wadsworth Group, a division of Thomson Learning
Table 4.2
Reporting the mean and standard deviation in APA format