ARTS II - UHL Writing Club

Download Report

Transcript ARTS II - UHL Writing Club

Crash
Course in
Statistics
Dr Kelvin Ng Kuan Huei
MBBS MRCP
Specialist Registrar in CPT/GIM
‘There are three kinds of lies:
lies, damned lies, and statistics.’
-- Benjamin Disraeli
Why understand statistics?
• Statistics help us to see patterns
• Bad statistics = Bad Decisions
• If you don’t understand statistics, you can’t spot
bad statistics
Quantitative vs Qualitative
Qualitative
Quantitative
Complete detailed description
Classify, count and analyse
statistically
Researcher may only roughly
know endpoint
Researcher knows what the
endpoint is
Researcher is data gathering
instrument
Researcher uses tools
Data in form of pictures, words
or objects
Data in the form of numbers
Subjective
Objective
‘Rich’ more time consuming and
not generalizable
Efficient, hypothesis testing but
loss of detail
‘Red apple was the favourite as
it was sweeter, crunchier and
tastier but on the other hand
green apple was more
refreshing! ‘
‘The red apple was the favourite
compared with the green apple with
P<0.05’
Observational studies vs RCT
• Experimental and quasi-experimental
• Observational studies
– Easy, fast and relatively cheap
– Dependent on stratification eg. selection bias,
covariates
• RCT
– Balancing of confounding factors
– Lack of generalization, not always applicable,
slow
Statistics
• Descriptive Statistics
– Describe or summarise data
• Inferential Statistics
– Make statistical inferences and draw
conclusions
• Estimation
– Confidence interval
– Parameter estimation
• Hypothesis testing
– Null hypothesis
Descriptive statistics
• Measures of central tendency
– Mean, mod, median
• Measures of dispersion and variability
– Standard deviation, variance,
• Diagrams eg. stem and leaf, box plots
Descriptive statistics
Sample
– 9, 4, 5, 4, 7, 4, 2, 5
– 2, 4, 4, 4, 5, 5, 7, 9
– Mean = 5
– Median = 4.5
– Mod = 4
– Standard deviation = 2
Inferential Statistics
• Reach conclusion beyond the immediate data
alone ie. make inferences on population based
on sample
• True state of affairs + chance = sample
– Sample error
– Central limit theorem ie. normally distributed
Inferential Statistics
• Comparisons analysis
– Either compares means or medians between
groups
• Correlation analysis
– Correlation does not imply causation
• Regression analysis
– Incorporates multiple covariates into equation
Comparisons Analysis
• T-test
– Comparisons of means
• Mann Whitney U and Wilcoxon matched
pair test
– Comparisons of medians
• ANOVA and Kruskal Wallis test
– Comparison of means between unrelated
groups (ANOVA)
– Comparisons of medians between unrelated
groups (Kruskal Wallis test)
Correlations analysis
• Linear datasets?
• Spearman rank correlation
– Ordinal data but no need for normal
distibution
• Pearsons product moment
– Interval data
Correlation does not imply cause and effect!
Regression analysis
• Does not assume normal sampling. Allows
modeling the dependence of a variable
against another (or more)
• Binomial dataset
– Chi2 test
• Linear regression
• Multiple regression
Linear regression
Multiple regression
Correlation vs regression
• Correlation
– Makes no assumption about association
– Test for interdependence
• Regression
– Assumes variable is dependent covariates
– One way causal relationship (in linear
regression)
Correlation or regression analysis?
The P value
• It is not a measure of the hypothesis ie.
• It is the probability of obtaining the result by
chance….
• But null hypothesis is not a random event!
• P value of <0.05 is a less that 5% chance of
obtaining the result by chance
• Pre-test probability
– Bayesian probability
The P value
• High P value
– Underpowered
– Limited clinical difference
• Low P value
– Large enough sample size will find even trivial
differences are associated with statistical
significance
– Statistical significance does not equate to
clinical significance
P value is no replacement for
common sense!
Type I and II Errors
• Type 1 error (α error)
– False positive ie. reject null hypothesis when
Type 1
it is true
error
• Type 2 error (β error)
– False negative ie. fail to reject null hypothesis
when it is false
Type 2
error
Subgroup analysis
•
•
•
•
•
Not statistically powered
Multiple testing
Usually not adjusted for covariates
Predetermined endpoints
ISIS-2 and star signs
Hazard ratio
• Hazard ratio
– The risk of an event eg. death, composite
endpoint
– A value of 1 suggests no difference between
comparator groups ie. risk relative to another
group
– Often expressed within 95% confidence
intervals
Relative vs absolute risk
reduction
• Beware of headline grabbing statements!
– If I buy two lottery tickets, I double my
chances of winning by 100%
– If I buy two lottery tickets, I increase my
chance of winning to 0.0001%
• Significance of effect is dependent on
incidence
• Important in health economics
assessments.
Summary
Qualtitative
vs
quantitative
Observation
vs RCT
Studies
Central limit
theorem
Central
measures
Dispersion
Descriptive
Comparison
Statistics
Correlation
Inferential
Regression
Hazard
ratios
Errors
Concepts
P-values
Subgroup
analysis
Questions?
‘There are three kinds of lies:
lies, damned lies, and statistics.’
-- Benjamin Disraeli