Summary Statistics and Confidence Intervals

Download Report

Transcript Summary Statistics and Confidence Intervals

Summary Statistics &
Confidence Intervals
Annie Herbert
Medical Statistician
Research & Development Support Unit
Salford Royal NHS Foundation Trust
[email protected]
0161 2064567
Timetable
Time
Task
60 mins
Presentation
20 mins
Coffee Break
90 mins
Practical Tasks in
IT Room
Outline
• Sampling
• Summary statistics
• Confidence intervals
• Statistics Packages
‘Population’ and ‘Sample’
• Studying population of interest. Usually would
like to know typical value and spread of outcome
measure in population.
• Data from entire population usually impossible
or inefficient/expensive so take a sample
(even census data can have missing values).
• Sample must be representative of population.
• Randomise!
E.g. Randomised Controlled Trial
(RCT)
POPULATION
GROUP 1
OUTCOME
GROUP 2
OUTCOME
SAMPLE
RANDOMISATION
Types of Data
Categorical
Numerical/Continuous
Example:
• Yes/No
• Blood Group
Example:
• Weight
• Pain Score
Graphs:
• Bar Chart
• Pie Chart
Graphs:
• Histogram
• Box and Whisker Plot
Summary:
• Frequency (n)
• Proportion (%)
Summary:
• Mean & Standard Deviation (SD)
• Median & Inter-quartile range (IQR)
Types of Average
(‘Average’ - a number which typifies a set of numbers)
• Mean = Total divided by n
• Median = Middle value
• Mode = Most common value/group
(rarely used)
Types of Average - Example
Pain score data: 10, 8, 7, 7, 1, 7, 6, 5, 3, 4
Median
2nd 3rd
5th 6th
8th 9th
Ordered: 1, 3, 4, 5, 6, 7, 7, 7, 8, 10
Mean = (1 + 3 + 4 + … + 10) ÷ 10 = 5.8
Median = (6+7) ÷ 2 = 6.5
Mode = 7
Mean or Median?
Roughly Normally distributed:
• Mean or median
• Mean by convention
20
15
10
5
0
-3
-1
1
3
5
7
9
11
13
15
17
19
21
50
40
30
20
10
0
0
1
2
3
4
5
6
7
Skewed:
• Median
• Less affected by extreme
values
Variation and Spread
• Standard Deviation (‘SD’)
- Average distance from mean
- Use alongside mean
• Inter-Quartile Range (‘IQR’)
- Range in which middle 50% of the data lie
(middle 50% when ordered)
- Use alongside median
• Range
- Highest and lowest value
- Possibly quote in addition to SD/IQR
Types of Variation - Example
Pain score data: 10, 8, 7, 7, 1, 7, 6, 5, 3, 4
Median
2nd 3rd
5th 6th
8th 9th
Ordered: 1, 3, 4, 5, 6, 7, 7, 7, 8, 10
IQR
SD = 2.6
IQR = (3.75, 7.25)
Range = (1,10)
Standard Error
• Not the same as standard deviation.
• Calculated using a measure of variability
and sample size.
• Used to construct confidence intervals.
• Not very informative when given alongside
statistics or as error bars on a plot.
Sample statistic is the best guess
of the (true) population value
• E.g. Sample mean is the best estimate of
mean in population.
• Mean likely to be different if take a new
sample from the population.
• Know that estimate not likely to be exactly
right.
Confidence Intervals (CIs)
• Confidence interval = “range of values that we
can be confident will contain the true value of
the population”.
• The “give or take a bit” for best estimate.
• Convention is to use a 95% confidence interval
(‘95% CI’).
• But also leaves 5% confidence that this interval
does not contain the true value.
Example: Legislation for smoke-free workplaces
and health of bar workers in Ireland: before and
after study (Allwright et al; BMJ Oct 2005)
Salivary
cotinine
(nmol/l)
Before
N=138
After
N=138
Difference
(95% CI)
29.0
5.1
-22.7 (-26.7 to -19.0)
Median
Any respiratory
symptoms
n (%)
90 (65%) 67 (49%)
-16.7 (-26.1 to -7.3)
Runny
nose/sneezing
n (%)
61 (44%) 48 (35%)
-9.4 (-19.8 to 0.9)
Example: Supplementary feeding with either ready-touse fortified spread or corn-soy blend in wasted adults
starting antiretroviral therapy in Malawi
(MacDonald et al; BMJ May 2009)
“After 14 weeks, patients receiving
fortified spread had a greater increase
in BMI and fat-free body mass than
those receiving corn-soy blend: 2.2 (SD
1.9) v 1.7 (SD 1.6) (difference 0.5, 95%
confidence interval 0.2 to 0.8), and 2.9
(SD 3.2) v 2.2 (SD 3.0) kg (difference 0.7
kg, 0.2 to 1.2 kg), respectively.”
Example: Sample size matters
What proportion of patients attending clinic are satisfied?
Sample
size
10
Number
satisfied
7
Proportion
satisfied
70%
95% CI for
proportion
35% to 93%
25
18
70%
50% to 88%
50
35
70%
55% to 82%
100
70
70%
60% to 79%
1000
700
70%
67% to 73%
Example: % confidence matters
What proportion of patients attending clinic are satisfied?
Sample size = 50
No. satisfied = 35
Proportion satisfied= 70%
90% CI
58% to 81%
95% CI
55% to 82%
99% CI
51% to 85%
p-values vs. Confidence Intervals
• p-value:
- Weight of evidence to reject null hypothesis
- No clinical interpretation
•
-
Confidence Interval:
Can be used to reject null hypothesis
Clinical interpretation
Effect size
Direction of effect
Precision of population estimate
So… it’s not all about p-values!
• For some hypotheses p-value and CI will both
indicate whether to reject it or not.
• A CI will also provide an estimate, as well as a
range for that estimate.
• General medical journals prefer CI.
Statistical Packages
Package Summary Statistics
SPSS
Stats
Direct
• Not user-friendly
• Gives a large
choice of statistics
to calculate
Confidence Intervals
Doesn’t provide a CI
for some key
comparative statistics:
e.g. simple
percentage
• One right-click
Provides a CI for most
• Will produce a set statistics
20 or so of the most
commonly used
statistics
Thanks for listening!