Transcript File

Descriptive Statistics: Overview
Measures of Center
*
Mode
Median
Mean
Measures of Symmetry
Skewness
Measures of Spread
Range
Inter-quartile Range
Variance
* Standard deviation
*
Measures of Position
Percentile
Deviation Score
*
Z-score
*
Central tendency
• Seeks to provide a single value that best
represents a distribution
Central tendency
18
16
No. of People
14
12
10
8
6
4
2
0
3.5
4.5
5.5
6.5
7.5
8.5
9.5
Nightly Hours of Sleep
10.5 11.5
Central tendency
16
14
# of vehicles
12
10
8
6
4
2
0
0
1
2
3
# of wheels
4
5
6
Central tendency
40
30
25
20
15
10
5
10
0
12
0
14
0
16
0
18
0
20
0
22
0
24
0
80
60
40
20
0
0
No. of People
35
Income in 1,000s
Central tendency
• Seeks to provide a single value that best
represents a distribution
• Typical measures are
– mode
– median
– mean
Mode
• the most frequently occurring score value
• corresponds to the highest point on the frequency
distribution
The mode = 39
5
4
Frequency
For a given sample
N=16:
33 35 36 37 38 38 38
39 39 39 39 40 40 41
41 45
3
2
1
0
33 34 35 36 37 38 39 40 41 42 43 44 45
Score
Mode
• The mode is not sensitive to extreme scores.
5
4
Frequency
For a given sample
N=16:
33 35 36 37 38 38 38
39 39 39 39 40 40 41
41 50
3
2
1
Score
49
47
45
43
41
39
37
35
The mode = 39
33
0
Mode
• a distribution may have more than one mode
The modes = 35 and
39
5
4
Frequency
For a given sample
N=16:
34 34 35 35 35 35 36
37 38 38 39 39 39 39
40 40
3
2
1
0
33
34
35
36
37
Score
38
39
40
Mode
• there may be no unique mode, as in the case of a
rectangular distribution
No unique mode
5
4
Frequency
For a given sample
N=16:
33 33 34 34 35 35 36
36 37 37 38 38 39 39
40 40
3
2
1
0
33
34
35
36
37
Score
38
39
40
Median
• the score value that cuts the distribution in half
(the “middle” score)
• 50th percentile
5
4
Frequency
For N = 15
the median is
the eighth
score = 37
3
2
1
0
33
34
35
36
37
Score
38
39
40
Median
5
For N = 16
the median is
the average of
the eighth and
ninth scores =
37.5
Frequency
4
3
2
1
0
33
34
35
36
37
Score
38
39
40
Mean
• this is what people usually have in mind when they
say “average”
• the sum of the scores divided by the number of
scores
For a sample:
X

X 
n
For a population:
X


n
Changing the value of a single score may not affect the mode
or median, but it will affect the mean.
Mean
18
__
X=7.07
16
12
10
8
6
4
2
0
3.5
4.5
5.5
6.5
7.5
8.5
9.5
10.5 11.5
In many cases the mean is the
preferred measure of central
tendency, both as a description of the
data and as an estimate of the
parameter.
Nightly Hours of Sleep
__
X=2.4
5
In order for the mean to
be meaningful, the
variable of interest must
be measures on an
interval scale.
Frequency
No. of People
14
4
3
2
1
0
Score
Mean
__
X=36.8
5
4
Frequency
4
3
2
1
3
2
1
0
0
38
39
33
40
Score
35
36
37
38
39
40
Score
40
__
X=93.2
35
No. of People
The mean is sensitive
to extreme scores and
is appropriate for
more symmetrical
distributions.
34
30
25
20
15
10
5
0
10
0
12
0
14
0
16
0
18
0
20
0
22
0
24
0
37
80
36
60
35
40
34
0
33
20
Frequency
__
X=36.5
5
Income in 1,000s
Symmetry
• a symmetrical distribution exhibits no skewness
• in a symmetrical distribution the Mean = Median = Mode
18
16
No. of People
14
12
10
8
6
4
2
0
3.5
4.5
5.5
6.5
7.5
8.5
9.5
Nightly Hours of Sleep
10.5 11.5
Skewed distributions
• Skewness refers to the asymmetry of the distribution
40
35
30
25
20
15
10
5
10
0
12
0
14
0
16
0
18
0
20
0
22
0
24
0
80
60
40
20
0
0
Mode = 70,000$
Median = 88,700$
Mean = 93,600$
median
No. of People
• A positively skewed
distribution is asymmetrical
and points in the positive
direction.
Income in 1,000s
•mode < median < mean
mode
mean
Skewed distributions
• A negatively skewed distribution
median
• mode > median > mean
7
No. of People
6
5
4
3
2
1
0
0
20
40
60
80
100
Test score
mean
mode
Measures of central tendency
+
Mode
• quick & easy to compute
• useful for nominal data
• poor sampling stability
• not affected by extreme scores
• somewhat poor sampling
stability
• sampling stability
• related to variance
• inappropriate for discrete
data
• affected by skewed
distributions
Median
Mean
-
Distributions
• Center: mode, median, mean
• Shape: symmetrical, skewed
• Spread
16
14
# of People
12
10
8
6
4
2
0
0
10
20
30
40
50
60
Scores
70
80
90 100
Measures of Spread
• the dispersion of scores from the center
• a distribution of scores is highly variable if the scores
differ wildly from one another
• Three statistics to measure variability
– range
– interquartile range
– variance
Range
• largest score minus the smallest score
16
14
12
# of People
• these two
have same range (80)
but spreads look different
10
8
6
4
2
0
0
10
20
30
40
50
60
70
80
Scores
• says nothing about how scores vary around the center
• greatly affected by extreme scores (defined by them)
90 100
Interquartile range
• the distance between the 25th percentile and the 75th
16
percentile
14
• Q3-Q1 = 70 - 30 = 40
• Q3-Q1 = 52.5 - 47.5 = 5
# of People
12
10
8
6
4
2
0
0
10
20
30
40
50
60
Scores
70
80
90 100
• effectively ignores the top and bottom quarters, so extreme
scores are not influential
• dismisses 50% of the distribution
Deviation measures
• Might be better to see
how much scores
differ from the center
of the distribution -using distance
• Scores further from
the mean have higher
deviation scores
Score
Deviation
Amy
10
-40
Theo
20
-30
Max
30
-20
Henry
40
-10
Leticia
50
0
Charlotte
60
10
Pedro
70
20
Tricia
80
30
Lulu
90
40
AVERAGE
50
Deviation measures
• To see how
‘deviant’ the
distribution is relative
to another, we could
sum these scores
• But this would leave
us with a big fat zero
Score
Deviation
Amy
10
-40
Theo
20
-30
Max
30
-20
Henry
40
-10
Leticia
50
0
Charlotte
60
10
Pedro
70
20
Tricia
80
30
Lulu
90
40
SUM
0
Deviation measures
So we use squared
deviations from the
mean
This is the sum
of squares (SS)
__
SS= ∑(X-X)2
Score
Sq.
Deviation Deviation
Amy
10
-40
1600
Theo
20
-30
900
Max
30
-20
400
Henry
40
-10
100
Leticia
50
0
0
Charlotte
60
10
100
Pedro
70
20
400
Tricia
80
30
900
Lulu
90
40
1600
0
6000
SUM
Variance
We take the
“average” squared
deviation from the
mean and call it
VARIANCE
For a population:
SS
 
N
2
For a sample:
SS
s 
n 1
2
(to correct for the fact that
sample variance tends to
underestimate pop variance)
Variance
1. Find the mean.
2. Subtract the mean
from every score.
3. Square the deviations.
4. Sum the squared
deviations.
5. Divide the SS by N or
N-1.
Score
Dev’n
Amy
10
-40
1600
Theo
20
-30
900
Max
30
-20
400
Henry
40
-10
100
Leticia
50
0
0
Charlotte
60
10
100
Pedro
70
20
400
Tricia
80
30
900
Lulu
90
40
1600
0
6000
SUM
Sq. Dev.
6000/8
=750
Standard deviation
The standard deviation is the square root of the variance
SS
s s 
n 1
2
The standard deviation measures spread in the original
units of measurement, while the variance does so in units
squared.
Variance is good for inferential stats.
Standard deviation is nice for descriptive stats.
Example
N = 28
X = 50
s2 = 555.55
s = 23.57
14
12
# of People
N = 28
X = 50
s2 = 140.74
s = 11.86
10
8
6
4
2
0
0
10
20
30
40
50
60
Scores
70
80
90 100
Descriptive Statistics: Quick Review
Measures of Center
*
Mode
Median
Mean
*
Measures of Symmetry
Skewness
Measures of Spread
Range
Inter-quartile Range
Variance
* Standard deviation
*
*
*
Descriptive Statistics: Quick Review
For a population:
For a sample:
Variance
SS
s =
N
SS
s =
n -1
Standard
Deviation
  2
s  s2
Mean
2
2