Transcript Document
Variability
Introduction to Statistics
Chapter 4
Jan 22, 2009
Class #4
Describing Variability
Describes in an exact quantitative measure, how spread
out/clustered together the scores are
Variability is usually defined in terms of distance
How far apart scores are from each other
How far apart scores are from the mean
How representative a score is of the data set as a whole
Describing Variability: the
Range
Simplest and most obvious way of describing
variability
Range = Highest - Lowest
(real limits)
The range only takes into account the two extreme
scores and ignores any values in between. To counter
this there the distribution is divided into quarters
(quartiles). Q1 = 25%, Q2 =50%, Q3 =75%
The Interquartile range: the distance of the middle two
quartiles (Q3 – Q1)
The Semi-Interquartile range: is one half of the Interquartile
range
Interquartile range (IQR)
The most common percentiles are quartiles. Quartiles divide
data sets into fourths or four equal parts.
• The 1st quartile, denoted Q1, divides the bottom 25% the
data from the top 75%. Therefore, the 1st quartile is
equivalent to the 25th percentile.
• The 2nd quartile divides the bottom 50% of the data from the
top 50% of the data, so that the 2nd quartile is equivalent to
the 50th percentile, which is equivalent to the median.
• The 3rd quartile divides the bottom 75% of the data from the
top 25% of the data, so that the 3rd quartile is equivalent to
the 75th percentile.
Interquartile range (IQR)
The interquartile range (IQR) is the distance
between the 75th percentile and the 25th
percentile
The IQR is essentially the range of the middle
50% of the data
Because it uses the middle 50%, the IQR is not
affected by outliers (extreme values)
Interquartile range (IQR)
Example:
Compute
the interquartile range for the
sorted
18, 33, 58, 67, 73, 93, 147
The 25th and 75th percentiles are the
.25*(7+1) and .75*(7+1) = 2nd and 6th
observations, respectively.
IQR = 93-33 = 60.
Describing Variability:
Deviation in a Population
A more sophisticated measure of variability is one
that shows how scores cluster around the mean
Deviation is the distance of a score from the mean
X - , e.g. 11 - 6.35 = 3.65, 3 – 6.35 = -3.35
A measure representative of the variability of all the
scores would be the mean of the deviation scores
(X - )
Add all the deviations and divide by n
N
However the deviation scores add up to zero (as mean
serves as balance point for scores)
Describing Variability:
Variance in a Population
X
3
3
4
4
4
5
5
5
6
6
6
6
7
7
8
8
9
10
10
11
Sum
X-
-3.35
-3.35
-2.35
-2.35
-2.35
-1.35
-1.35
-1.35
-0.35
-0.35
-0.35
-0.35
0.65
0.65
1.65
1.65
2.65
3.65
3.65
4.65
0
(X -)²
11.22
11.22
5.52
5.52
5.52
1.82
1.82
1.82
0.12
0.12
0.12
0.12
0.42
0.42
2.72
2.72
7.02
13.32
13.32
21.62
106.55
To remove the +/- signs we
simply square each deviation
before finding the average. This
is called the Variance:
(X - )²
N
= 106.55
20
= 5.33
The numerator is referred to as the
Sum of Squares (SS): as it refers to
the sum of the squared deviations
around the mean value
SS is a basic component of
variability – the sum of squared
deviation scores
Variability:
Variance in a Population
let X = [3, 4, 5 ,6, 7]
Mean = 5
(X - Mean ) = [-2, -1, 0, 1, 2]
r
2
( X )
subtract Mean from each number in X
(X - Mean )2 = [4, 1, 0, 1, 4]
squared deviations from the mean
(X - Mean )2 = 10
sum of squared deviations from the mean (SS)
(X - Mean )2 /N = 10/5 = 2
average squared deviation from the mean
N
2
Variability:
Variance in a Population
let X = [1, 3, 5, 7, 9]
Mean = 5
(X - Mean) = [-4, -2, 0, 2, 4 ]
r
2
( X )
subtract Mean from each number in X
(X - Mean)2 = [16, 4, 0, 4, 16]
squared deviations from the mean
(X - Mean)2 = 40
sum of squared deviations from the mean (SS)
(X - Mean)2 /n = 40/5 = 8
average squared deviation from the mean
N
2
Variability:
Variance in a Population
Variance can be calculated with the sum of
squares (SS) divided by n
r
2
( X )
N
2
Variability: Variance in a Sample
Variance in a sample
S
2
(X X )
2
n 1
n is the number of scores -1
SS is the Sum of Squared Deviations From the Mean
SS (X X)2
So, variance (S2) is the average squared deviation
from the mean
Describing Variability:
Population and Sample Variance
Population variance is designated by ²
² = (X - )² = SS
N
N
Sample Variance is designated by s²
Samples are less variable than populations: they therefore give
biased estimates of population variability
Degrees of Freedom (df): the number of independent (free to
vary) scores. In a sample, the sample mean must be known
before the variance can be calculated, therefore the final score is
dependent on earlier scores: df = n -1
s² =
(x - M)² =
n-1
SS = 106.55 = 5.61
n -1
20 -1
Describing Variability: the
Standard Deviation
Variance is a measure based on squared distances
In order to get around this, we can take the square
root of the variance, which gives us the standard
deviation
Population () and Sample (s) standard deviation
= (X - )²
N
s = (X - M)²
n-1
Variability:
Standard Deviation of a Sample
The square root of Variance is called the
Standard Deviation
S
2
S
(X X )
n 1
(X X )
n 1
2
Variance
2
Standard Deviation
Variability: Standard Deviation
“The Standard Deviation tells us
approximately how far the scores vary from
the mean on average”
It is approximately the average deviation of
scores from the mean
The Standard Deviation and the
Normal Distribution
scores above or below any given
point on a normal curve
34% of scores between the
mean and 1 SD above or
below the mean
An additional 14% of scores
between 1 and 2 SDs above or
below the mean
Thus, about 96% of all scores
are within 2 SDs of the mean
(34% + 34% + 14% + 14% =
96%)
Note: 34% and 14% figures can
be useful to remember
Probability Density
There are known percentages of
Describing Variability
The standard deviation is the most common
measure of variability, but the others can be used.
A good measure of variability must:
Must be stable and reliable: not be greatly affected by
little details in the data
Extreme scores
Multiple sampling from the same population
Open-ended distributions
Both the variance and SD are related to other
statistical techniques
SS Computational Formula
Note this formula on page 93. In later
chapters, we will be using this alternate SS
formula.
Credits
http://www.le.ac.uk/pc/sk219/introtostats1.ppt#259,4,Plotting Data:
describing spread of data
http://math.usask.ca/~miket/Sullivan_PP/Chapter_3/sec3_4.ppt#24