Lecture 01. Introduction to Medical Statistics
Download
Report
Transcript Lecture 01. Introduction to Medical Statistics
Introduction to
Medical
Statistics
Why Do Statistics?
Extrapolate from data collected to make general
conclusions about larger population from which
data sample was derived
Allows general conclusions to be made from limited
amounts of data
To do this we must assume that all data is randomly
sampled from an infinitely large population, then
analyse this sample and use results to make
inferences about the population
Walter Frank Raphael Weldon
Karl Pearson
Data
Categorical data: values belong to categories
Nominal data: there is no natural order to the categories
e.g. blood groups
Ordinal data: there is natural order e.g. Adverse Events
(Mild/Moderate/Severe/Life Threatening)
Binary data: there are only two possible categories
e.g. alive/dead
Numerical data: the value is a number
(either measured or counted)
Continuous data: measurement is on a continuum
e.g. height, age, haemoglobin
Discrete data: a “count” of events e.g. number of pregnancies
Descriptive Statistics:
concerned with summarising or describing a
sample eg. mean, median
Inferential Statistics:
concerned with generalising from a sample,
to make estimates and inferences about a
wider population eg. T-Test, Chi Square test
Statistical Terms
Mean: the average of the data
sensitive to outlying data
Median: the middle of the data
not sensitive to outlying data
Mode: most commonly occurring value
Range: the spread of the data
IQ range: the spread of the data
commonly used for skewed data
Standard deviation: a single number which measures
how much
the observations vary
around the mean
Symmetrical data: data that follows normal distribution
(mean=median=mode)
report mean & standard deviation & n
Skewed data: not normally distributed
(meanmedian mode)
report median & IQ Range
Standard Normal Distribution
Standard Normal Distribution
Mean +/- 1 SD encompasses 68% of observations
Mean +/- 2 SD encompasses 95% of observations
Mean +/- 3SD encompasses 99.7% of observations
Steps in Statistical Testing
Null hypothesis
Ho: there is no difference between the groups
Alternative hypothesis
H1: there is a difference between the groups
Collect data
Perform test statistic eg T test, Chi square
Interpret P value and confidence intervals
P value 0.05 Reject Ho
P value > 0.05 Accept Ho
Draw conclusions
Meaning of P
P Value: the probability of observing a result
as extreme or more extreme than the one
actually observed from chance alone
Lets us decide whether to reject or accept the
null hypothesis
P > 0.05
P = 0.01 to 0.05
P = 0.001 to 0.01
P < 0.001
Not significant
Significant
Very significant
Extremely significant
T Test
T test checks whether two samples are likely to have come from the same
or different populations
Used on continuous variables
Example: Age of patients in the APC study (APC/placebo)
PLACEBO:
mean age 60.6 years
mean age 60.5 years
SD +/- 17.2
n= 850
95% CI 59.3-61.7
What is the P value?
SD+/- 16.5
n= 840
95% CI 59.5-61.7
APC:
0.01
0.05
0.10
0.90
0.99
P = 0.903 not significant patients from the same population
(groups designed to be matched by randomisation so no surprise!!)
T Test: SAFE “Serum Albumin”
n
mean
SD
95% CI
PLACEBO
ALBUMIN
3500
28
10
27.7-28.3
3500
30
10
29.7-30.3
Q: Are these albumin levels different?
Ho = Levels are the same (any difference is there by chance)
H1 =Levels are too different to have occurred purely by
chance
Statistical test: T test P < 0.0001 (extremely significant)
Reject null hypothesis (Ho) and accept alternate hypothesis
(H1)
ie. 1 in 10 000 chance that these samples are both from the
same overall group therefore we can say they are very likely to
be different
RANDOMIZED CONTROLLED TRIALS
Reducing Sample Size
Same results but using much smaller sample size (one tenth)
ALIVE
DEAD
TOTAL
% DEAD
PLACEBO
58 (69.2%)
26 (30.8%)
84 (100%)
30.8
DEAD
64 (75.3%)
21 (24.7%)
85 (100%)
24.7
TOTAL
122 (72.2%)
47 (27.8%)
169 (100%)
Reduction in death rate = 6.1% (still the same)
Perform Chi Square test P = 0.39
39 in 100 times this difference in mortality could have
happened by chance therefore results not significant
Again, power of a study to find a difference depends a lot
on sample size for binary data as well as continuous data
Summary
Size matters=BIGGER IS BETTER
Spread matters=SMALLER IS BETTER
Bigger difference=EASIER TO FIND
Smaller difference=MORE DIFFICULT TO
FIND
To find a small difference you need a big study