Lecture 01. Introduction to Medical Statistics

Download Report

Transcript Lecture 01. Introduction to Medical Statistics

Introduction to
Medical
Statistics
Why Do Statistics?

Extrapolate from data collected to make general
conclusions about larger population from which
data sample was derived

Allows general conclusions to be made from limited
amounts of data

To do this we must assume that all data is randomly
sampled from an infinitely large population, then
analyse this sample and use results to make
inferences about the population
Walter Frank Raphael Weldon
Karl Pearson
Data


Categorical data:  values belong to categories

Nominal data: there is no natural order to the categories
e.g. blood groups

Ordinal data: there is natural order e.g. Adverse Events
(Mild/Moderate/Severe/Life Threatening)

Binary data: there are only two possible categories
e.g. alive/dead
Numerical data:  the value is a number
(either measured or counted)

Continuous data: measurement is on a continuum
e.g. height, age, haemoglobin

Discrete data: a “count” of events e.g. number of pregnancies

Descriptive Statistics:
concerned with summarising or describing a
sample eg. mean, median

Inferential Statistics:
concerned with generalising from a sample,
to make estimates and inferences about a
wider population eg. T-Test, Chi Square test
Statistical Terms








Mean:  the average of the data
 sensitive to outlying data
Median:  the middle of the data
 not sensitive to outlying data
Mode:  most commonly occurring value
Range:  the spread of the data
IQ range:  the spread of the data
 commonly used for skewed data
Standard deviation:  a single number which measures
how much
the observations vary
around the mean
Symmetrical data:  data that follows normal distribution
 (mean=median=mode)
 report mean & standard deviation & n
Skewed data:  not normally distributed
 (meanmedian mode)
 report median & IQ Range
Standard Normal Distribution
Standard Normal Distribution
Mean +/- 1 SD  encompasses 68% of observations
Mean +/- 2 SD  encompasses 95% of observations
Mean +/- 3SD  encompasses 99.7% of observations
Steps in Statistical Testing

Null hypothesis
Ho: there is no difference between the groups

Alternative hypothesis
H1: there is a difference between the groups

Collect data

Perform test statistic eg T test, Chi square

Interpret P value and confidence intervals
P value  0.05 Reject Ho
P value > 0.05 Accept Ho

Draw conclusions
Meaning of P

P Value: the probability of observing a result
as extreme or more extreme than the one
actually observed from chance alone

Lets us decide whether to reject or accept the
null hypothesis
P > 0.05
 P = 0.01 to 0.05
 P = 0.001 to 0.01
 P < 0.001

Not significant
Significant
Very significant
Extremely significant
T Test



T test checks whether two samples are likely to have come from the same
or different populations
Used on continuous variables
Example: Age of patients in the APC study (APC/placebo)
PLACEBO:
mean age 60.6 years




mean age 60.5 years
SD +/- 17.2
n= 850
95% CI 59.3-61.7
What is the P value?






SD+/- 16.5
n= 840
95% CI 59.5-61.7
APC:
0.01
0.05
0.10
0.90
0.99
P = 0.903  not significant  patients from the same population
(groups designed to be matched by randomisation so no surprise!!)
T Test: SAFE “Serum Albumin”
n
mean
SD
95% CI
PLACEBO
ALBUMIN
3500
28
10
27.7-28.3
3500
30
10
29.7-30.3
Q: Are these albumin levels different?
Ho = Levels are the same (any difference is there by chance)
H1 =Levels are too different to have occurred purely by
chance
Statistical test: T test  P < 0.0001 (extremely significant)
Reject null hypothesis (Ho) and accept alternate hypothesis
(H1)
ie. 1 in 10 000 chance that these samples are both from the
same overall group therefore we can say they are very likely to
be different
RANDOMIZED CONTROLLED TRIALS
Reducing Sample Size

Same results but using much smaller sample size (one tenth)
ALIVE



DEAD
TOTAL
% DEAD
PLACEBO
58 (69.2%)
26 (30.8%)
84 (100%)
30.8
DEAD
64 (75.3%)
21 (24.7%)
85 (100%)
24.7
TOTAL
122 (72.2%)
47 (27.8%)
169 (100%)
Reduction in death rate = 6.1% (still the same)
Perform Chi Square test  P = 0.39
39 in 100 times this difference in mortality could have
happened by chance therefore results not significant
Again, power of a study to find a difference depends a lot
on sample size for binary data as well as continuous data
Summary

Size matters=BIGGER IS BETTER

Spread matters=SMALLER IS BETTER

Bigger difference=EASIER TO FIND

Smaller difference=MORE DIFFICULT TO
FIND

To find a small difference you need a big study