Chapter 4: Variability

Download Report

Transcript Chapter 4: Variability

Chapter 4: Variability
Variability
• The goal for variability is to obtain a measure of
how spread out the scores are in a distribution.
• A measure of variability usually accompanies a
measure of central tendency as basic descriptive
statistics for a set of scores.
Central Tendency and Variability
• Central tendency describes the central point of
the distribution, and variability describes how the
scores are scattered around that central point.
• Together, central tendency and variability are
the two primary values that are used to describe
a distribution of scores.
Variability
• Variability serves both as a descriptive measure
and as an important component of most
inferential statistics.
• As a descriptive statistic, variability measures
the degree to which the scores are spread out or
clustered together in a distribution.
• In the context of inferential statistics, variability
provides a measure of how accurately any
individual score or sample represents the entire
population.
Variability (cont'd.)
• When the population variability is small, all of the
scores are clustered close together and any
individual score or sample will necessarily
provide a good representation of the entire set.
• On the other hand, when variability is large and
scores are widely spread, it is easy for one or
two extreme scores to give a distorted picture of
the general population.
Measuring Variability
• Variability can be measured with
– The range
– The standard deviation/variance
• In both cases, variability is determined by
measuring distance.
- distance between two scores
- distance between a score and the mean
The Range (p. 106)
• The range is the total distance covered by the
distribution, from the highest score to the lowest
score (using the upper and lower real limits of
the range).
range = URL for Xmax – LRL for Xmin
The Range (cont'd.)
• Alternative definitions of range:
– When scores are whole numbers or discrete
variables with numerical scores, the range tells us
the number of measurement categories.
range = Xmax – Xmin + 1
– Alternatively, the range can be defined as the
difference between the largest score and the
smallest score. (commonly used definition,
especially for discrete variables)
range = Xmax – Xmin
The Standard Deviation
• Standard deviation measures the standard
(average) distance between a score and the
mean.
• The calculation of standard deviation can be
summarized as a four-step process:
Dispersion: Deviation (p. 107-)
deviation = X – μ
 find the mean of the deviation
 not a good measure for variability... Why? because.....
mean serves as a balance point for the distribution
 Σ (X – μ) = 0 (see p. 107 example 4.1)
mean deviation is always zero no matter what...
• So we need to get rid of the +/- sign
 sum of squared deviations (but you also squared the
“unit of measurement”, e.g. dollar square, age square )
 mean square deviation (standard/average measure)
 take a square root ( the unit of measurement is normal
again!!, e.g. dollar, age )
Example 4.1 (p.107)
• N=4
sum =
mean =
X
8
1
3
0
12
3
X-μ
5
-2
0
-3
0
The Standard Deviation (cont'd.)
1. Compute the deviation (distance from the mean) for each
score.
2. Square each deviation.
3. Compute the mean of the squared deviations. For a
population, this involves summing the squared deviations
(sum of squares, SS) and then dividing by N. The resulting
value is called the variance or mean square and measures
the average squared distance from the mean.
For samples, variance is computed by dividing the sum
of the squared deviations (SS) by n - 1, rather than N.
The value, n - 1, is know as degrees of freedom (df)
and is used so that the sample variance will provide an
unbiased estimate of the population variance.
4. Finally, take the square root of the variance to obtain the
standard deviation.
Measures of Dispersion
• Range
• Variance
• Standard Deviation
3-14
Computing the Variance
Steps in computing the variance:
Step 1: Find the mean.
Step 2: Find the difference between each observation and the mean, and
square that difference.
Step 3: Sum all the squared differences found in Step 2.
Step 4: Divide the sum of the squared differences by the number of items in
the population.
Population Variance
Sum of Squares:
(X )
SS   ( X   )   X 
N
SS
2
 
: Variance=mean squared deviation
N
N
1
2
2
 
( xi   )

N i 1
2
2
1

N
2
N
x
i 1
2
i

2
Example 4.2 (p.109)
• N=5
• X = 1, 9, 5, 8, 7
sum =
mean =
X
1
9
5
8
7
30
6
X-μ
-5
3
-1
2
1
0
SS
25
9
1
4
1
40  Sum of Squares
8  Variance
Ex 4 (p.110)
• N = 5, X = 4, 0, 7, 1, 3
• Calculate the variance
sum =
mean =
X
4
0
7
1
3
15
3
X-μ
1
-3
4
-2
0
0
SS
1
9
16
4
0
30
6
 Sum of Squares
 Variance
Example 4.3 (p.111)
• N=4
• X = 1, 0, 6, 1
sum =
mean =
X
1
0
6
1
8
2
X-μ
-1
-2
4
-1
0
SS
X*X
1
1
4
0
16
36
1
1
22
38  Sum of Squares
5.5  Variance
2
2
(

X
)
8
SS   X 2 
 38 
 38  16  22
N
4
Sample Variance
Sample Variance
(X )
SS   ( X  M )   X 
n
2
2
2
 X  X    X  n X  X  M
2
2
2
n
1
2
2
s 
( xi  x )

n  1 i 1
1  n 2
2

xi  nx 


n  1  i 1

Sample Standard Deviation
where :
s 2 is the sample variance
x is the value of each observation in the sample
x is the mean of the sample
n is the number of observations in the sample
degrees of freedom: df
• Remember population mean won’t change no
matter what, but sample mean will change
whenever you collect a new set of sample!
• Example 4.6 (p.117)
• n = 3, M = 5, X = 2, 9, ?
• the 3rd score is determined by M=ΣX/n
only n-1 scores are free to vary or independent
of each other and can have any values
degrees of freedom = n-1
degrees of freedom: df=1, n=2
X1  X 2 X1  X 2
( X1  M )  X1 

c
2
2
X1  X 2 X 2  X1
(X2  M )  X2 

 c
2
2
degrees of freedom: df=1, n=2 (cont.)
2
SS   ( X i  M ) 2  c 2  c 2
i 1
SS
2
2
s 
c c
n 1
2
degrees of freedom: df=2, n=3
X1  X 2  X 3 2 X1  X 2  X 3
( X1  M )  X1 

 c1
3
3
X1  X 2  X 3  X1  2 X 2  X 3
(X2  M )  X2 

 c2
3
3
X1  X 2  X 3  X1  X 2  2 X 3
(X3  M )  X3 

 (c1  c2 )
3
3
3
SS   ( X i  M )  d  d
2
i 1
2
1
2
2
degrees of freedom: df=2, n=3 (cont.)
• Actually, SS can be reformulated into 2 distinct
“distances”:
3
SS   ( X i  M ) 2  d12  d 22
i 1
(c1  c2 )
d1 
2
3(c1  c2 )
d2 
6
degrees of freedom: df=n-1
• with n scores in the sample
• SS can be reformulated into n-1 distinct
“distances”, so “mean” variation has n-1 degrees
of freedom:
n
SS   ( X i  M ) 2  d12  d 22  ...  d n21
i 1
2
2
2
d

d

...

d
SS
2
n 1
s2 
 1
n 1
n 1
Example 4.5 (p.116)
• n= 7
• X = 1, 6, 4, 3, 8, 7, 6
sum =
mean =
X
1
6
4
3
8
7
6
35
5
X-μ
-4
1
-1
-2
3
2
1
0
SS
X*X
16
1
1
36
1
16
4
9
9
64
4
49
1
36
36
211  Sum of Squares
6  Variance
Ex 2 (p.118)
• N = 5, X = 1, 5, 7, 3, 4
• if instead this is a sample of n=5
sum =
X
1
5
7
3
4
20
X-μ
-3
1
3
-1
0
0
SS
9
1
9
1
0
20
X*X
1
25
49
9
16
100
 Sum of Squares
Unbiased, Biased
If the average value of all possible sample
statistics = population parameter  unbiased
e.g. we select 100 samples (n=5)  calculate 100
sample means and variances  calculate the
average mean (ΣM/100) and average variance
(Σs2 /100), if M
s 2
2


100
100
 unbiased
Otherwise, the sample statistic is biased
Example 4.7 (p.120)
• Population: N=6, X = 0, 0, 3, 3, 9, 9
 μ = 4, σ2 = 14
• Select 9 samples with n=2 (sample statistics in Table 4.1)
Sample
1
2
3
4
5
6
7
8
9
sum =
mean =
M
0
1.5
4.5
1.5
3
6
4.5
6
9
36
4
Unbiased
SS/n
0
2.25
20.25
2.25
0
9
20.25
9
0
63
7
Biased
SS/(n-1)
0
4.5
40.5
4.5
0
18
40.5
18
0
126
14
Unbiased
Visualize your data set by μ, σ or M, s
• using histogram or other graphs
• If your score = 85  high enough to get the scholarship?
Properties of the Standard Deviation
• If a constant is added to every score in a
distribution, the standard deviation will not be
changed.
• If you visualize the scores in a frequency
distribution histogram, then adding a constant
will move each score so that the entire
distribution is shifted to a new location.
• The center of the distribution (the mean)
changes, but the standard deviation remains the
same.
E(c  X )  c   , V (c  X )  V ( X )
Properties of the Standard Deviation (cont'd.)
• If each score is multiplied by a constant, the
standard deviation will be multiplied by the same
constant.
• Multiplying by a constant will multiply the
distance between scores, and because the
standard deviation is a measure of distance, it
will also be multiplied.
E (cX )  c , V (cX )  c V ( X )
2
The Mean and Standard Deviation
as Descriptive Statistics
• If you are given numerical values for the mean
and the standard deviation, you should be able
to construct a visual image (or a sketch) of the
distribution of scores.
• As a general rule, about 70% of the scores will
be within one standard deviation of the mean,
and about 95% of the scores will be within a
distance of two standard deviations of the mean.
The Empirical Rule
3-*
Example 4.8 (p.124-125)
• any consistent difference between two treatment?
• Experiment A: 2 sets of data, sd is quite small
•  so ∆M = 5 is distinct and easy to see
• Experiment B: 2 sets of data, sd is quite large
•  ∆M = 5: it is difficult to discern/recognize the
differences
• error variance: the result of unsystematic differences
(unexplained and uncontrolled differences between
scores), e.g. static noise (radio)
Demo 4.1 (p. 129)
• Sample: n=6, X = 10, 7, 6, 10, 6, 15
• compute the variance and standard deviation.
ΣX = 54, ΣX2 = 546,
(X )
SS   ( X  M )   X 
n
SS = 546-(542/6) = 546 – 486 = 60
s2 = SS/(n-1) = 60/5 = 12
s = 3.4641
2
2
2