Transcript msv-33

www.making-statistics-vital.co.uk
MSV 33: Measures of Spread
The Bee Academy
‘And our topic today, my
fellow bees, is spread!’
Professor Zzub
‘Mmm...’
‘No, no, no, Millie! I mean,
How can we measure
how spread out a data set is!’
‘I’m lost.
Example please...’
‘The data sets 1, 3, 5, 7, 9 and 3, 4, 5, 6, 7
have the same mean, but the first set
is clearly more spread out than the second.’
‘So you are asking how we could
measure that – how about the
top number take away the
bottom for each set? If the
spread is big, that’ll be big!’
1, 3, 5, 7, 9 and 3, 4, 5, 6, 7
‘Nice idea, Ding – and this measure is
used! It’s called the RANGE.
So the range for our first set
is 9 - 1 = 8, while the
range for our second set is 7 - 3 = 4.’
‘Let me guess – there’s
more to it than that.’
‘Sadly, Brenda, the range is badly
affected by extreme values or
outliers. It can give a rather
misleading picture of the data.’
1, 3, 5, 7, 9, 11, 13 and 3, 4, 5, 6, 7, 8, 20
Range = 12
Range = 17
‘Okay, then, don’t take all the data; chuck
away the lowest quarter, and the highest
quarter, and THEN take the range.
Just taking the middle 50%, you’ve got rid
of all those extreme values.‘
‘1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
‘Great idea, Paul – so for example with this small data set,
we can add the quartiles, Q1,
Q2 (the median) and Q3...’
‘... And the Interquartile Range is Q3 – Q1 = 6,
the range of the middle 50% of the data.
‘I’ve got another idea!’
‘What’s that, Millie?’
‘1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
‘Go back to this data set again. We could find the mean,
then find the difference of each of these numbers from the mean,
and then add the differences together.
If the numbers are spread out, then this will be big!’
‘That is nearly a great idea, Millie, but watch what happens...’
‘1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
‘So the differences add to 0. Always.’
‘But that is easily fixed...’
‘Find the POSITIVE difference of each of
these numbers from the mean,
and then add these differences together.
It won’t be 0 now!’
‘Indeed, Virender,
the sum now is 30.
But is that a fair measure of spread?’
‘Surely you have to divide by the total
number of numbers you have to take an average!’
‘Excellent, Ding!
And this takes us to what is called
‘the mean deviation from the mean’.
If we write it in symbols, we have
‘There is still a problem, however –
The modulus function is not always easy
to handle mathematically. It is true that
|ab|=|a||b|, but it is not generally true
that |a + b| = |a|+|b|.’
‘Well, there are other ways to make the
differences from the mean all positive.
You could square the differences, for example!’
‘Great idea, Millie. So we can find the square of the difference
of each of these numbers from the mean,
and then add these together.
Then divide by the total number of numbers we have.’
‘This is called the MSD, or ‘the population variance’.
If we multiply out, we get an alternative formulation
that is usually easier to calculate,
especially if the mean is not a whole number.’
‘As before.’
‘So have we got it now?
Is this the measure of spread
we generally use?’
‘We are very nearly there, Brenda. There is, sadly,
a problem with the MSD. Most of the time we are
taking a SAMPLE from a population. We would like
the expectation of our variance statistic to be the
variance of the population. But in order for that to
happen...
‘We have to take our MSD statistic...
‘And divide by n-1 rather than n.’
‘This statistic is called ‘the ‘sample variance’ or simply the
‘variance’. The expected value of this is the population variance.
As with the population variance statistics,
there is an alternative form...
‘Which is often
easier to use.’
‘So is that all the
measures of
spread we need to
know?’
‘I should add, Virender, that we do use the
square root of the MSD (called RMSD) and
the square root of the variance
(called the Standard Deviation)
as measures of spread too. The advantages
of the RMSD and the SD are that they are
measured in the same units as the random
variable we are interested in.’
‘So to summarise...’
Range =
Top value –
bottom value.
Interquartile range (IQR)=
Q3  Q1, where the quartiles Q1,
Q2 and Q3 divide the data set into
four groups of equal size.
Mean Square
Deviation
(population
variance).
Root Mean Square
Deviation = RMSD.
Variance
(or sample
variance).
Standard
Deviation.
With thanks to pixabay.com
www.making-statistics-vital.co.uk
is written by Jonny Griffiths
[email protected]