STORIES AND STATISTICS

Download Report

Transcript STORIES AND STATISTICS

STORIES AND
STATISTICS
Prepared by Frank Swain
National Coordinator for Science Training for Journalists
Royal Statistical Society
[email protected]
020 7614 3947
Contents
Communicating numbers
Percentages & percentage points
Surveys
Averages
Uncertainty
Trends
Correlation versus causation
Probabilities: what makes a value unusual?
Absolute and relative risk
Imagery
Communicating numbers
Breaking down big numbers
Your numbers are characters in the story – give
them some personality
Breaking down big numbers
1.4m photos
“1.4 million photos are
uploaded a second”
x 86,400 seconds in
a day
÷ 500 million users
= 240 photos per
person per day
Realistic?
Putting numbers in context
Numbers often need to be scaled to
be meaningful e.g. per person, per
passenger mile etc.
Tourist
info
centres
Hospitals
Putting numbers in context
“The implant has been used by around 1.4 million
women since it was introduced in 1999. In its 11 years of
“…for
every
1,000 584
women
use, medicine regulators
have
recorded
pregnancies
using it, less than one will
among users”
get pregnant over a threeyear period”
Percentages
Percentages less than 1% are difficult
to interpret. Better to use “3 in every
10,000” than 0.03%
Also be careful with percentages
bigger than 100% - can be better to
use double, triple etc.
Percentages
Know the difference between a
percentage and a percentage point.
VAT increased to 20% on January 2011
This is a rise of 2.5 percentage points
not a rise of 2.5%
UK smoking rate
1948
26m smokers
65%
1970
25m 55%
smokers
“The smoking population shrank by 4 per cent”
“The smoking rate has declined 10 percentage points”
= 1 million non-smokers
= 1 million smokers
Surveys
• How many…
What’s been counted?
chairs?
footprints?
hearts beating?
ballot papers?
…people?
Polls and surveys
• Polls are ways of finding out
what a population thinks
without asking everyone
• Sample size – poll of 1000
people has ± 3% confidence
interval just from sampling
• So be careful of small
subgroups of the sample, 100
people gives ± 10%
Survey example
“…couples now expect to
blow an average of £20,273
tying the knot…”
• Which average?
• Whose
wedding?
• Who’s asking?
Do you have the exact
questions the pollster asked?
Are they precise and fair?
Polls and surveys
Do the people surveyed reflect
the wider population?
(selection bias)
Were the questions asked in a fair
way?
(response bias)
Who commissioned the survey?
Statistical significance
• So how do we know if an event really
is interesting or if it was just random
variation?
• That’s what ‘statistical significance’ is
about.
• For example, is a cluster of cancer
cases in an area suspicious or likely to
be just natural variation?
League tables
League tables
are often
meaningless
because the
natural
variation is far
bigger than the
differences in
the table
There are many different ways of
calculating an average.
Which is the appropriate one to use?
Variation and distributions
We often want to
summarise a distribution
of values with one
number – an average.
But there are different
types of average: mean,
median and mode.
Averages
Average does not mean the same
thing as typical.
Different averages tell different stories
– say which you are using.
Averages
Mode, £275
Median, £377
Mean, £463
Averages
Bottom line:
Give an idea of the size and shape
of the spread around the average.
Normal distribution
68.2%
95.4%
A W O R D O N “A V E R A G E ”
Do countries win more Olympic medals at home?
Medals Won On Average (Away Games)
Medals Won At Home Games
120
100
80
60
40
20
0
South Korea
Spain
United States
Australia
Greece
China
Great Britain
How accurate are the figures?
“The number of people out of work rose
by 38,000 to 2.49 million in the three
months to June, official figures show.”
GOLDACRE: “The estimated change over the past quarter is 38,000,
but the 95% confidence interval is ± 87,000, running from -49,000
to 125,000. That wide range clearly includes zero, no change at all.”
One change in the numbers
does not make a trend.
Blips often happen.
Trends
Trends
Beware spurious connections that don’t
amount to ‘a causes b’.
Correlation and causation
Correlation and causation
Correlation and causation
Correlation and causation
• A significant correlation between two variables
does not imply one causes the other.
• Often there is a common cause for both
variables, or it’s just a coincidence.
“Regression to the mean”
The most abused correlation
in the world!
“One in a million”.
Probability and coincidences
• The chance of an
event can be very
small, but if it has lots
of opportunities to
happen, it can be near
certain.
• Most weeks someone
wins the lottery.
Probability
Actually it’s only 133,000 to
one…
…and there are around
“the chances…
167,000 third children
born inan
the UK each year.astonishing 48 million to
one”
Always think about how
many opportunities there
were for a coincidence to
happen
Probability
Extremes
You should know what the absolute and
the relative risk is, and communicate both.
What,
me
worry?
Risk
Google tells me….
diabetes,
weight gain,
cigarette smoke,
HRT,
solariums
…all “double” my risk of cancer
Risk example
“Bacon increases risk of colorectal
cancer by 20%”
But how bad is that?
Risk example
About 5 out of 100
people develop
colorectal cancer.
Risk example
If all 100 ate 3 extra rashers
every day... The number
would rise to six
So…
“Bacon
increases
“About
1 extra
caserisk
perof
100
Colorectalpeople”
cancer by 20%”
Is therefore the same as saying
Risk
• Absolute risk increases from 5% to 6%
• Absolute risk increases by 1 percentage
point
• Relative risk increases by 20%
• 100 people eating 50g of processed meat
every day for the rest of their lives would
lead to 1 extra case of colorectal cancer
Apply the same rules to a graphic that
you would a story: strive for accuracy,
clarity and a strong narrative.
Visualising data
Visualising data
Visualising data
Visualising data
Resources
Royal Statistical Society
StraightStatistics.org
FullFact.org
STATS.org
UnderstandingUncertainty.org