No Slide Title

Download Report

Transcript No Slide Title

DESCRIPTIVE STATISTICS
STATISTICAL MEASUREMENT OF
DATA




Location (central tendency)
Dispersion (spread)
Skewness (symmetry)
Kurtosis (peakedness)
MEASURES OF LOCATION






Arithmetic mean
Geometric mean
Harmonic mean
Median
Percentiles
Mode
ARITHMETIC MEAN
(DISCRETE DATA)
1
x
n

n
xi
i 1
where n is the number of observations.
ARITHMETIC MEAN
(GROUPED DATA)
n
x

i 1
n
f i xi

i 1
fi
where xi is the MCV and fi is the frequency of
of the ith class whereas n is the number of
classes.
GEOMETRIC MEAN
The geometric mean is used where
relative changes (especially percentages)
are being considered.
HARMONIC MEAN
The harmonic mean is used when the data
consist of rates such as prices ($/kg),
speeds (km/h) or production (output/manhour).
GEOMETRIC MEAN
Geometric Mean  n x1 x2 . . . . xn
where n is the number of observations.
MEDIAN
The median is the middle observation
of a set of arranged data (ascending
or descending order), i. e, it divides
the set of data into two equal parts in
terms of the number of observations.
MEDIAN
The rank of the median is given by
( n  1)
2
where n is the total number of
observations.
MEDIAN (DISCRETE DATA)
When n is odd, the median is the
middle observation.
When n is even, the median is the
average or midpoint of the two
middle observations.
MEDIAN (DISCRETE DATA)
Example 1:
27 13 62 5 44 29 16
Rearranged:
5 13 16 27 29 44 62
Rank of median = (7 + 1) / 2 = 4
Median = 27
MEDIAN (DISCRETE DATA)
Example 2:
5 13 16 27 29 44
Rank of median = (6 + 1) / 2 = 3.5
Median = (16 + 27) / 2 = 21.5
MEDIAN (GROUPED DATA)
In this case, the value of the median
can only be estimated since the
identity of each observation is
unknown in the whole frequency
distribution.
MEDIAN (GROUPED DATA)
We proceed as follows:
 Determine the rank of the median
 Locate the cell in which the median is
found
 Use linear interpolation or simple
proportion to evaluate the median
MEDIAN (GROUPED DATA)
The method of linear interpolation
assumes that the observations within
each cell are evenly spread or uniformly
distributed.
MEDIAN (GROUPED DATA)
 Rmedian

Median  LCB  
 cell width 
 frequency

where Rmedian is the rank of the median
in its cell. This is obtained by taking the
overall rank of the median and
subtracting the cumulative frequency of
the previous cell.
MEDIAN (E. g. GROUPED DATA)
Marks
0–4
5–9
10 – 14
15 – 19
20 – 24
No. of
students
2
8
14
17
9
Total
50
Less than
CF
2
10
24
41
50
MEDIAN (E. g. GROUPED DATA)
n = 50
Rank of median = (50 + 1) / 2 = 25.5
Location of median: cell ‘15 – 19’
 (25.5  24)

Median  14.5  
 5   14.94
17


MEDIAN (E. g. GROUPED DATA)
25
25.5
Ranks
41
Values
14.5
Q2
19.5
PERCENTILES
Percentiles are statistics which divide a
distribution into 100 equal parts in terms
of the number of observations.
The most well-known ones are quartiles
and deciles.
QUARTILES
The rank of the first or lower quartile
(Q1) is given by (n  1)
4
The rank of the third or upper quartile
(Q3) is given by 3( n  1)
4
where n is the total number of
observations.
QUARTILES (DISCRETE DATA)
Example 1:
27 13 62
Rearranged:
5 13 16
5 44 29 16
27
29
Rank of Q1 = (7 + 1) / 4 = 2
Q1 = 13
Rank of Q3 = 3(7 + 1) / 4 = 6
Q3 = 44
44
62
QUARTILES (DISCRETE DATA)
Example 2:
5 13 16 27
29 44
Rank of Q1 = (6 + 1) / 4 = 1.75
Q1 = 5 + 0.75(13 – 5) = 11
Rank of Q3 = 3(6 + 1) / 4 = 5.25
Q3 = 29 + 0.25(44 – 29) = 32.75
PERCENTILES


Rk
Pk  LCB  
 cell width 
 frequency

where Rk is the rank of the kth percentile
in its cell. This is obtained by taking the
overall rank of the percentile and
subtracting the cumulative frequency of
the previous cell.
PERCENTILES
Percentiles can be estimated from a
cumulative
frequency
ogive
by
interpolation.
MODE (DISCRETE DATA)
The mode is the observation occurring the
most or which has the highest frequency. It
can be easily located by visual inspection.
NOTE
If there are more than one
observation with the same highest frequency
we say that there are several modes but we
can also say that there is no mode.
MODE (GROUPED DATA)
In this case, we talk about a modal class,
which is the class with the highest frequency.
A rough approximation for a single value of
the mode is the MCV of the modal class.
The mode can be found quite accurately by
using a formula or from a histogram.
MODE (GROUPED DATA)
A useful formula for finding the mode is
Mode = mean – 3(mean – median)
MODE (GROUPED DATA)
 f1

Mode  LCB  
 cell width 
 f1  f 2

where f1 is the difference in frequencies
between the modal class and the class
preceding it and f2 is the difference in
frequencies between the modal class and the
class immediately after it.
MODE (GROUPED DATA)
We can also use a histogram to find the
mode. We simply represent the modal
class and the classes preceding it and
immediately after it.
MODE (GROUPED DATA)
50
45
40
Frequency
35
30
25
20
15
10
5
0
20-40
40-60
Age of people
60-80
MEASURES OF DISPERSION




Range
Quartile deviation
Standard deviation
Coefficient of variation
RANGE (DISCRETE DATA)
The range is the numerical difference
between the maximum and the
minimum observations
RANGE (GROUPED DATA)
The range is the numerical difference
between the upper cell limit of the last
cell and lower cell limit of the first
cell.
QUARTILE DEVIATION
The quartile deviation or semi interquartile range is defined as
Q3  Q1
Quartile deviation
2
This quantity eliminates outliers and
extreme values.
STANDARD DEVIATION
AND VARIANCE
The standard deviation is the positive
square root of the variance. All
formulae are given in terms of the
variance which is equal to
n
n
1
1
2
2
2
2
s   ( xi  x )   xi  x
n i 1
n i 1
STANDARD DEVIATION
(discrete data)
The standard deviation is the best
measure of spread since it can be used
for further statistical processing.
n
s
2
(
x

x
)
 i
i 1
n
STANDARD DEVIATION
(grouped data)
n
s
 f i ( xi  x )
2
i 1
n
 fi
i 1
with the usual definitions of xi and fi .
COEFICIENT OF VARIATION
The purpose of the coefficient of
variation is to compare dispersions in
various distributions.
s
Coefficient of variation 
x
SKEWNESS
Skewness is a measure of symmetry. It
indicates whether there is a concentration
of low or high observations.
A distribution having a lot of low
observations is positively skewed whereas
one which has more high observations
displays negative skewness.
SKEWNESS
A distribution which is symmetrical
has no or zero skewness (e. g the
Normal distribution)
MEASURE OF SKEWNESS
Coefficient of skewness:
Mean Mode
Standard deviation
3( Mean  Median)
Standard deviation
POSITIVE SKEWNESS
Mode Q2 Mean
NEGATIVE SKEWNESS
Mean Q2 Mode
ZERO SKEWNESS (SYMMETRY)
Mean
Mode
Median
KURTOSIS
Kurtosis indicates the degree of
‘peakedness’ of
a
unimodal
frequency distribution.
Kurtosis usually indicates to which
extent a curve (distribution) departs
from the bell-shaped or normal curve.
KURTOSIS
Platykurtic
KURTOSIS
Mesokurtic
KURTOSIS
Leptokurtic
KURTOSIS
The formulae for calculating kurtosis
are given by
(x  x)


ns
4
4
 f (x  x)

4
ns
4