3 Measures of Dispersion

Download Report

Transcript 3 Measures of Dispersion

Measures of Dispersion
or
Measures of Variability
Measures of Variability
 A single summary figure that describes
the spread of observations within a
distribution.
MEASURES OF
DESPERSION




RANGE
INTERQUARTILE RANGE
VARIANCE
STANDARD DEVIATION
Measures of Variability
 Range
 Difference between the smallest and largest
observations.
 Interquartile Range
 Range of the middle half of scores.
 Variance
 Mean of all squared deviations from the mean.
 Standard Deviation
 Measure of the average amount by which
observations deviate from the mean. The square
root of the variance.
Variability Example:
Range
 Las Vegas Hotel Rates
52, 76, 100, 136, 186, 196, 205, 150,
257, 264, 264, 280, 282, 283, 303, 313,
317, 317, 325, 373, 384, 384, 400, 402,
417, 422, 472, 480, 643, 693, 732, 749,
750, 791, 891
 Range: 891-52 = 839
Pros and Cons of the
Range
 Pros
 Very easy to
compute.
 Scores exist in the
data set.
 Cons
 Value depends only
on two scores.
 Very sensitive to
outliers.
 Influenced by sample
size (the larger the
sample, the larger the
range).
Inter quartile Range
 The inter quartile range is Q3-Q1
 50% of the observations in the
distribution are in the inter quartile range.
 The following figure shows the interaction
between the quartiles, the median and
the inter quartile range.
Inter quartile Range
Quartiles:
Q
Q
2
Q
1
n+1
=
th
4
2(n+1) n+1
=
= th
4
2
3
3(n+1)
=
th
4
Inter quartile :
IQR = Q3 – Q1
Pros and Cons of the
Interquartile Range
 Pros
 Fairly easy to
compute.
 Scores exist in the
data set.
 Eliminates influence
of extreme scores.
 Cons
 Discards much of the
data.
Percentiles and Quartiles
 Maximum is 100th percentile: 100% of
values lie at or below the maximum
 Median is 50th percentile: 50% of values
lie at or below the median
 Any percentile can be calculated. But the
most common are 25th (1st Quartile) and
75th (3rd Quartile)
Locating Percentiles in a
Frequency Distribution
 A percentile is a score below which a specific
percentage of the distribution falls.
 The 75th percentile is a score below which 75% of
the cases fall.
 The median is the 50th percentile: 50% of the cases
fall below it
 Another type of percentile :The lower quartile is
25th percentile and the upper quartile is the 75th
percentile
Locating Percentiles in a Frequency Distribution
F25%
included
m
V
a
u
l
25th
r
r
u
r
c
c
c
e
percentile
here
.
0
6
6
6
V
0
.
1
4
5
1
1
50th
percentile
50%
included
.
0
6
6
7
2
.
5
8
9
6
3
.
0
2
2
7
4
here
80th
.
1
2
2
9
5
80%
.
1
1
1
1
6
percentile
included
.
1
1
1
2
7
here
.
8
8
8
0
E
7
8
0
T
2
2
M
N
9
0
T
Five Number Summary





Minimum Value
1st Quartile
Median
3rd Quartile
Maximum Value
VARIANCE:
Deviations of each observation from
the mean, then averaging the sum of
squares of these deviations.
STANDARD DEVIATION:
“ ROOT- MEANS-SQUARE-DEVIATIONS”
Variance
 The average amount that a score deviates
from the typical score.
 Score – Mean = Difference Score
 Average of Difference Scores = 0
 In order to make this number not 0, square the
difference scores ( negatives to become positives).
Variance:
Computational Formula
 Population
 
2
N  X 2  ( X ) 2
N
2
 Sample
n X  ( X )
2
S 
2
n2
2
Variance
 Use the computational formula to calculate the variance.
X
S2 
n X 2  ( X ) 2
n2
10(400)  (60) 2
S 
10 2
4000  3600
2
S 
100
S 2  4.0
2
X2
3
9
4
16
4
16
4
16
6
36
7
49
7
49
8
64
8
64
9
81
Sum: 60 Sum: 400
Variability Example:
Variance
 Las Vegas Hotel Rates
S2 
n X 2  ( X ) 2
n2
2
35
(
6686202
)

(
13386
)
S2 
35 2
234017070  179184996
S2 
1225
S 2  44760.88
2
X
X
472
222784
303
91809
280
78400
282
79524
417
173889
400
160000
254
64516
205
42025
384
147456
264
69696
317
100489
76
5776
643
413449
480
230400
136
18496
250
62500
100
10000
732
535824
317
100489
264
69696
384
147456
750
562500
402
161604
422
178084
373
139129
325
105625
313
97969
749
561001
791
625681
196
38416
891
793881
283
80089
52
2704
186
34596
693
480249
Sum: 13386 Sum: 6686202
Pros and Cons of Variance
 Pros
 Takes all data into
account.
 Lends itself to
computation of other
stable measures (and
is a prerequisite for
many of them).
 Cons
 Hard to interpret.
 Can be influenced by
extreme scores.
Standard Deviation
 To “undo” the squaring of difference
scores, take the square root of the
variance.
 Return to original units rather than
squared units.
Quantifying Uncertainty
 Standard deviation: measures the
variation of a variable in the sample.
 Technically,
s
N
1
N 1
(x
i 1
i
 x)
2
Standard Deviation
Measure of the average amount by which
observations deviate on either side of the mean.
The square root of the variance.
 Population
 


 Sample
s s
2
 (X   )
2
S
N
N  X2  (  X)
N2
2
2
(X  X )
n
n X  ( X )
2
S
2
n2
2
Variability Example:
Standard Deviation
S
(X  X )
2
n
S
(3  6) 2  (4  6) 2  (4  6) 2  (4  6) 2  (6  6) 2  (7  6) 2  (7  6) 2  (8  6) 2  (8  6) 2  (9  6) 2
10
S
40
 2.0
10
S
n X 2  ( X )
2
n2
10(400)  (60) 2
S
10 2
4000  3600
100
Mean: 6
S
Standard Deviation: 2
S  4.0
S  2.0
Variability Example:
Standard Deviation
Las Vegas Hotel Rates
9
8
7
Frequency
6
5
hotel rates
4
3
2
1
0
800-899
700-799
600-699
500-599
400-499
300-399
200-299
100-199
0-99
Rates
Mean:
$371.60
Standard Deviation:
35(6686202)  (13386) 2
S
35 2
S
234017070  179184996
1225
S  44760.88  $211.57
Pros and Cons of Standard
Deviation
 Pros
 Lends itself to computation
of other stable measures
(and is a prerequisite for
many of them).
 Average of deviations
around the mean.
 Majority of data within one
standard deviation above or
below the mean.
 Cons
 Influenced by extreme
scores.
Mean and Standard
Deviation
 Using the mean and standard deviation
together:
 Is an efficient way to describe a distribution with just
two numbers.
 Allows a direct comparison between distributions
that are on different scales.
A 100 samples were selected. Each of the sample contained 100
normal individuals. The mean Systolic BP of each sample is
presented
110,
140,
110,
120,
130,
160,
100,
130, etc
Systolic BP level
90
100
110
120
130
140
150
160
-
90,
120,
100,
120,
No. of samples
5
10
20
34
20
10
5
2
Mean = 120
Sd., = 10
Normal Distribution
Mean = 120
SD = 10
90
100
110
120
130
140
150
•The curve describes probability of getting any range of
values ie., P(x>120), P(x<100), P(110 <X<130)
•Area under the curve = probability
•Area under the whole curve =1
•Probability of getting specific number =0, eg P(X=120) =0
D e s c r ip t iv e S t a t is t ic s
V a r ia b le : A g e
A n d e r s o n - D a r lin g N o r m a lit y T e s t
A-S quared:
P - V a lu e :
15
30
45
60
Mean
S tDe v
V a r ia n ce
S ke w ne ss
Ku r t o s is
N
75
M in im u m
1 s t Q u a r t ile
M e d ia n
3 r d Q u a r t ile
M a x im u m
9 5 % C o n f id e n ce I n t e r v a l f o r M u
0 .9 6 2
0 .0 1 4
3 6 .4 5 0 0
1 5 .7 3 5 6
2 4 7 .6 0 8
0 .6 7 9 6 2 6
8 .5 1 E- 0 2
60
1 1 .0 0 0 0
2 5 .0 0 0 0
3 1 .5 0 0 0
4 6 .7 5 0 0
7 9 .0 0 0 0
9 5 % C o n f id e n ce I n t e r v a l f o r M u
3 2 .3 8 5 1
28
33
38
43
9 5 % C o n f id e n ce I n t e r v a l f o r S ig m a
1 3 .3 3 8 0
9 5 % C o n f id e n ce I n t e r v a l f o r M e d ia n
4 0 .5 1 4 9
1 9 .1 9 2 1
9 5 % C o n f id e n ce I n t e r v a l f o r M e d ia n
2 8 .0 0 0 0
4 2 .0 0 0 0
WHICH MEASURE TO USE ?
DISTRIBUTION OF DATA IS SYMMETRIC
---- USE MEAN & S.D.,
DISTRIBUTION OF DATA IS SKEWED
---- USE MEDIAN & QUARTILES
ANY QUESTIONS
THANK YOU