Center & spread Central tendency & variability
Download
Report
Transcript Center & spread Central tendency & variability
Descriptive Statistics: Overview
Measures of Center
*
Mode
Median
Mean
Measures of Symmetry
Skewness
Measures of Spread
Range
Inter-quartile Range
Variance
* Standard deviation
*
Measures of Position
Percentile
Deviation Score
*
Z-score
*
Central tendency
• Seeks to provide a single value that best
represents a distribution
Central tendency
18
16
No. of People
14
12
10
8
6
4
2
0
3.5
4.5
5.5
6.5
7.5
8.5
9.5
Nightly Hours of Sleep
10.5 11.5
Central tendency
16
14
# of vehicles
12
10
8
6
4
2
0
0
1
2
3
# of wheels
4
5
6
Central tendency
40
30
25
20
15
10
5
10
0
12
0
14
0
16
0
18
0
20
0
22
0
24
0
80
60
40
20
0
0
No. of People
35
Income in 1,000s
Central tendency
• Seeks to provide a single value that best
represents a distribution
• Typical measures are
– mode
– median
– mean
Mode
• the most frequently occurring score value
• corresponds to the highest point on the frequency
distribution
The mode = 39
5
4
Frequency
For a given sample
N=16:
33 35 36 37 38 38 38
39 39 39 39 40 40 41
41 45
3
2
1
0
33 34 35 36 37 38 39 40 41 42 43 44 45
Score
Mode
• The mode is not sensitive to extreme scores.
5
4
Frequency
For a given sample
N=16:
33 35 36 37 38 38 38
39 39 39 39 40 40 41
41 50
3
2
1
0
The mode = 39
33
35
37
39
41
43
Score
45
47
49
Mode
• a distribution may have more than one mode
The modes = 35 and
39
5
4
Frequency
For a given sample
N=16:
34 34 35 35 35 35 36
37 38 38 39 39 39 39
40 40
3
2
1
0
33
34
35
36
37
Score
38
39
40
Mode
• there may be no unique mode, as in the case of a
rectangular distribution
No unique mode
5
4
Frequency
For a given sample
N=16:
33 33 34 34 35 35 36
36 37 37 38 38 39 39
40 40
3
2
1
0
33
34
35
36
37
Score
38
39
40
Median
• the score value that cuts the distribution in half
(the “middle” score)
• 50th percentile
5
4
Frequency
For N = 15
the median is
the eighth
score = 37
3
2
1
0
33
34
35
36
37
Score
38
39
40
Median
5
For N = 16
the median is
the average of
the eighth and
ninth scores =
37.5
Frequency
4
3
2
1
0
33
34
35
36
37
Score
38
39
40
Mean
• this is what people usually have in mind when they
say “average”
• the sum of the scores divided by the number of
scores
For a sample:
X
X
n
For a population:
X
n
Changing the value of a single score may not affect the mode
or median, but it will affect the mean.
Mean
18
__
X=7.07
16
12
10
8
6
4
2
0
3.5
4.5
5.5
6.5
7.5
8.5
9.5
10.5 11.5
In many cases the mean is the
preferred measure of central
tendency, both as a description of the
data and as an estimate of the
parameter.
Nightly Hours of Sleep
__
X=2.4
5
In order for the mean to
be meaningful, the
variable of interest must
be measures on an
interval scale.
Frequency
No. of People
14
4
3
2
1
0
Score
Mean
__
X=36.8
5
4
Frequency
4
3
2
1
3
2
1
0
0
38
39
33
40
Score
35
36
37
38
39
40
Score
40
__
X=93.2
35
No. of People
The mean is sensitive
to extreme scores and
is appropriate for
more symmetrical
distributions.
34
30
25
20
15
10
5
0
10
0
12
0
14
0
16
0
18
0
20
0
22
0
24
0
37
80
36
60
35
40
34
0
33
20
Frequency
__
X=36.5
5
Income in 1,000s
Symmetry
• a symmetrical distribution exhibits no skewness
• in a symmetrical distribution the Mean = Median = Mode
18
16
No. of People
14
12
10
8
6
4
2
0
3.5
4.5
5.5
6.5
7.5
8.5
9.5
Nightly Hours of Sleep
10.5 11.5
Skewed distributions
• Skewness refers to the asymmetry of the distribution
40
35
30
25
20
15
10
5
10
0
12
0
14
0
16
0
18
0
20
0
22
0
24
0
80
60
40
20
0
0
Mode = 70,000$
Median = 88,700$
Mean = 93,600$
median
No. of People
• A positively skewed
distribution is asymmetrical
and points in the positive
direction.
Income in 1,000s
•mode < median < mean
mode
mean
Skewed distributions
• A negatively skewed distribution
median
• mode > median > mean
7
No. of People
6
5
4
3
2
1
0
0
20
40
60
80
100
Test score
mean
mode
Measures of central tendency
+
Mode
• quick & easy to compute
• useful for nominal data
• poor sampling stability
• not affected by extreme scores
• somewhat poor sampling
stability
• sampling stability
• related to variance
• inappropriate for discrete
data
• affected by skewed
distributions
Median
Mean
-
Distributions
• Center: mode, median, mean
• Shape: symmetrical, skewed
• Spread
16
14
# of People
12
10
8
6
4
2
0
0
10
20
30
40
50
60
Scores
70
80
90 100
Measures of Spread
• the dispersion of scores from the center
• a distribution of scores is highly variable if the scores
differ wildly from one another
• Three statistics to measure variability
– range
– interquartile range
– variance
Range
• largest score minus the smallest score
16
14
12
# of People
• these two
have same range (80)
but spreads look different
10
8
6
4
2
0
0
10
20
30
40
50
60
70
80
Scores
• says nothing about how scores vary around the center
• greatly affected by extreme scores (defined by them)
90 100
Interquartile range
• the distance between the 25th percentile and the 75th
16
percentile
14
• Q3-Q1 = 70 - 30 = 40
• Q3-Q1 = 52.5 - 47.5 = 5
# of People
12
10
8
6
4
2
0
0
10
20
30
40
50
60
Scores
70
80
90 100
• effectively ignores the top and bottom quarters, so extreme
scores are not influential
• dismisses 50% of the distribution
Deviation measures
• Might be better to see
how much scores
differ from the center
of the distribution -using distance
• Scores further from
the mean have higher
deviation scores
Score
Deviation
Amy
10
-40
Theo
20
-30
Max
30
-20
Henry
40
-10
Leticia
50
0
Charlotte
60
10
Pedro
70
20
Tricia
80
30
Lulu
90
40
AVERAGE
50
Deviation measures
• To see how ‘deviant’
the distribution is
relative to another,
we could sum these
scores
• But this would leave
us with a big fat zero
Score
Deviation
Amy
10
-40
Theo
20
-30
Max
30
-20
Henry
40
-10
Leticia
50
0
Charlotte
60
10
Pedro
70
20
Tricia
80
30
Lulu
90
40
SUM
0
Deviation measures
So we use squared
deviations from the
mean
This is the sum
of squares (SS)
__
SS= ∑(X-X)2
Score
Sq.
Deviation Deviation
Amy
10
-40
1600
Theo
20
-30
900
Max
30
-20
400
Henry
40
-10
100
Leticia
50
0
0
Charlotte
60
10
100
Pedro
70
20
400
Tricia
80
30
900
Lulu
90
40
1600
0
6000
SUM
Variance
We take the
“average” squared
deviation from the
mean and call it
VARIANCE
For a population:
SS
N
2
For a sample:
SS
s
n 1
2
(to correct for the fact that
sample variance tends to
underestimate pop variance)
Variance
1. Find the mean.
2. Subtract the mean
from every score.
3. Square the deviations.
4. Sum the squared
deviations.
5. Divide the SS by N or
N-1.
Score
Dev’n
Amy
10
-40
1600
Theo
20
-30
900
Max
30
-20
400
Henry
40
-10
100
Leticia
50
0
0
Charlotte
60
10
100
Pedro
70
20
400
Tricia
80
30
900
Lulu
90
40
1600
0
6000
SUM
Sq. Dev.
6000/8
=750
Standard deviation
The standard deviation is the square root of the variance
SS
s s
n 1
2
The standard deviation measures spread in the original
units of measurement, while the variance does so in units
squared.
Variance is good for inferential stats.
Standard deviation is nice for descriptive stats.
Example
N = 28
X = 50
s2 = 555.55
s = 23.57
14
12
# of People
N = 28
X = 50
s2 = 140.74
s = 11.86
10
8
6
4
2
0
0
10
20
30
40
50
60
Scores
70
80
90 100
Descriptive Statistics: Quick Review
Measures of Center
*
Mode
Median
Mean
*
Measures of Symmetry
Skewness
Measures of Spread
Range
Inter-quartile Range
Variance
* Standard deviation
*
*
*
Descriptive Statistics: Quick Review
For a population:
For a sample:
Variance
SS
N
SS
s
n 1
Standard
Deviation
2
s s2
Mean
2
2
Exercise
1
2
3
4
5
• Treat this little distribution as a sample and calculate:
– Mode, median, mean
– Range, variance, standard deviation
Descriptive Statistics: Overview
Measures of Center
*
Mode
Median
Mean
Measures of Symmetry
Skewness
Measures of Spread
Range
Inter-quartile Range
Variance
* Standard deviation
*
Measures of Position
Percentile
Deviation Score
*
Z-score
*
Measures of Position
How to describe a data
point in relation to its
distribution
Measures of Position
Quantile
Deviation Score
Z-score
Quantiles
Quartile
Divides ranked scores into four equal parts
25%
(minimum)
25%
25% 25%
(median)
(maximum)
Quantiles
Decile
Divides ranked scores into ten equal parts
10% 10% 10%
10% 10% 10%
10% 10% 10% 10%
Quantiles
Percentile rank
Divides ranked scores into 100 equal parts
Percentile rank of score x =
number of scores less than x
total number of scores
• 100
Deviation Scores
Score
For a population:
deviation X
For a sample:
deviation X X
Deviation
Amy
10
-40
Theo
20
-30
Max
30
-20
Henry
40
-10
Leticia
50
0
Charlotte
60
10
Pedro
70
20
Tricia
80
30
Lulu
90
40
Average
50
•What if we want to compare scores from
distributions that have different means and
standard deviations?
•Example
–Nine students scores on two different tests
–Tests scored on different scales
Nine Students on Two Tests
Test 1
Test 2
Amy
10
1
Theo
20
2
Max
30
3
Henry
40
4
Leticia
50
5
Charlotte
60
6
Pedro
70
7
Tricia
80
8
Lulu
90
9
50
5
Average
Nine Students on Two Tests
Test 1
Test 2
Deviation
Score 1
Amy
10
1
-40
-4
Theo
20
2
-30
-3
Max
30
3
-20
-2
Henry
40
4
-10
-1
Leticia
50
5
0
0
Charlotte
60
6
10
1
Pedro
70
7
20
2
Tricia
80
8
30
3
Lulu
90
9
40
4
50
5
Average
Deviation
Score 2
Z-Scores
• Z-scores modify a distribution so that it is
centered on 0 with a standard deviation of 1
• Subtract the mean from a score, then divide
by the standard deviation
For a population:
For a sample:
X
X X
z
S
z
Z-Scores
Test 1
Test 2
Z- Score 1
Z-Score 2
Amy
10
1
-1.5
-1.5
Theo
20
2
-1.2
-1.2
Max
30
3
-.77
-.77
Henry
40
4
-.34
-.34
Leticia
50
5
0
0
Charlotte
60
6
.34
.34
Pedro
70
7
.77
.77
Tricia
80
8
1.2
1.2
Lulu
90
9
1.5
1.5
50
5
0
0
25.8
2.58
1
1
Average
St Dev
Z-Scores
A distribution of Z-scores…
• Always has a mean of zero
• Always has a standard deviation of 1
• Converting to standard or z scores does not change the shape
of the distribution: z scores cannot normalize a non-normal
distribution
A Z-score is interpreted as “number of standard
deviations above/below the mean”
Exercise
On their third test, the class average was 45 and the
standard deviation was 6. Fill in the rest.
Test 3
Amy
52
Theo
39
Z-Score
Max
-1.5
Henry
1.3
Descriptive Statistics: Quick Review
For a population:
For a sample:
Variance
SS
N
SS
s
n 1
Standard
Deviation
2
s s2
Mean
Z-score
2
z
X
2
z
X X
S
Messing with Units
If you add or subtract a constant from each value in a distribution, then
• the mean is increased/decreased by that amount
• the standard deviation is unchanged
• the z-scores are unchanged
If you multiply or divide each value in a distribution by a constant, then
• the mean is multiplied/divided by that amount
• the standard deviation is multiplied/divided by that amount
• the z-scores are unchanged
Example
Theo
Max
Henry
Leticia
Charlotte
Pedro
Tricia
Lulu
MEAN
Score
Dev’s
Sq dev
Z-score
5
3
5
7
7
8
4
9
6
-1
-3
-1
1
1
2
-2
3
1
9
1
1
1
4
4
9
1.94
-1.5
-.5
.5
.5
1.0
-1.0
1.5
-.5
STDEV
Adding 1
Theo
Max
Henry
Leticia
Charlotte
Pedro
Tricia
Lulu
MEAN
Score
Dev’s
Sq dev
Z-score
6
4
6
8
8
9
5
10
7
-1
-3
-1
1
1
2
-2
3
1
9
1
1
1
4
4
9
1.94
-1.5
-.5
.5
.5
1.0
-1.0
1.5
-.5
STDEV
Example
Theo
Max
Henry
Leticia
Charlotte
Pedro
Tricia
Lulu
MEAN
Score
Dev’s
Sq dev
Z-score
5
3
5
7
7
8
4
9
6
-1
-3
-1
1
1
2
-2
3
1
9
1
1
1
4
4
9
1.94
-1.5
-.5
.5
.5
1.0
-1.0
1.5
-.5
STDEV
Multiplying by 10
Theo
Max
Henry
Leticia
Charlotte
Pedro
Tricia
Lulu
MEAN
Score
Dev’s
Sq dev
Z-score
50
30
50
70
70
80
40
90
60
-10
-30
-10
10
10
20
-20
30
100
900
100
100
100
400
400
900
19.4
-1.5
-.5
.5
.5
1.0
-1.0
1.5
-.5
STDEV
Other Standardized Distributions
The Z distribution is not the only standardized distribution.
You can easily create others (it’s just messing with units,
really).
Other Standardized Distributions
Score
Example:
Let’s change these test
scores into ETS type scores
(mean 500, stdev 100)
Theo
5
Max
3
Henry
5
Leticia
7
Charlotte
7
Pedro
8
Tricia
4
Lulu
9
Average
St Dev
6
1.94
Other Standardized Distributions
Score
Z-Score
ETS type
score
Theo
3
-1.5
350
Max
5
-.5
450
Henry
7
.5
550
Leticia
7
.5
550
Multiply by 100 to
increase the st dev
Charlotte
8
1.0
600
Pedro
4
-1.0
400
Add 500 to increase
the mean
Tricia
9
1.5
650
Lulu
5
-.5
450
6
0
500
1.94
1
100
Here’s How:
Convert to Z scores
Average
St Dev
Exercise
Score
Theo
20
Max
18
Henry
13
Leticia
17
Charlotte
19
Pedro
16
Tricia
11
Lulu
9
Percentile
Deviation
Score
Z-Score
IQ type score
(Mean 100
Stdev 10)