Transcript Slides
ASPIRE Class 5
Biostatistics and Data Collection Tools
Daniel M. Witt, PharmD, BCPS, FCCP
Learning Objectives
ASPIRE
Class 4: Biostatistics
• Differentiate between descriptive and inferential statistics
• Choose an appropriate statistical test based on the type of data being
analyzed
• Describe the concepts of normal distribution, population, and sample
• Formulate the analytic plan for a research study
• Evaluate various data collection tools and databases to collect study
data
2
Elements of a Research Protocol
Objectives
Background
Population
Design
Procedures
Analytical
Plan
Class 5 Assignment
Please come prepared with the above items
October 20th, 2:30-5:00 p.m. @
Kaiser Permanente Central Support Services
4
Biostatistics
Daniel M. Witt, PharmD, FCCP, BCPS, CACP
Kaiser Permanente Colorado
Why Biostatistics?
Which medical practices actually help?
Determining what therapies are helpful based on simple
experience doesn’t work
– Biologic variability
– Placebo effect
An Example
Cardiac output
Drug appears to be
effective at increasing CO
Drug dose
An Example
Cardiac output
Clearly no relationship
between drug dose and
CO
Drug dose
Biostatistics
A useful tool
Turns clinical and laboratory experience into quantitative
statements
Determines whether and by how much a treatment or procedure
affected a group of patients
Turns boring data into an interesting story
Learning Point
Experiments rarely include entire population
Selecting unrepresentative samples (bad luck) is unlikely but
possible
Biostatistical procedures permit estimation of the chance of
such bad luck
Tell a story (who, what, why, where, how)
General Research Goals
Obtain descriptive information about a population based on a
sample of that population
Test hypotheses about the population
Minimize bias
Random Variables
Definition:
– Outcomes of an experiment or observation whose values
cannot be anticipated with certainty
Two types
– Discrete
– Continuous
“This is important because….”
– choosing (and evaluating) statistical methods depends, in
part, on the type of data (variables) used
Discrete (counting) Variables
2 types– Nominal: classified into groups
in no particular order, and with
no indication of relative severity
(e.g., sex, mortality, disease
state, bleeding, stroke, MI)
0
1
2
Discrete Variables
– Ordinal: ranked in a specific
order, but with no consistent
level of magnitude difference
between ranks (e.g., NYHA
Caution: Mean and standard deviation is
class, trauma score)
NOT reported with this type of data
3
Continuous (measuring) Variables
– Data are ranked in a specific
order with a consistent change in
magnitude between units; (e.g.,
heart rate, LDL cholesterol, blood
glucose, INR, blood pressure,
time, distance)
1
2
Continuous Data
Summarizing Data
• Bell-shaped frequency
distribution
• Landmarks
x: mean
SD: standard deviation (SD)
Normal distribution:
(most common model for
population distributions)
30
35
N=200
Mean=40
SD=5.0
40
45
x
SD
SD
50
N=150
Mean=15
SD=2.5
10
15
20
Mean (average)
• Only used for continuous, normally distributed data
• Sensitive to outliers
• Most commonly used measure of central tendency
Non-Normal Distributions
Mean ± SD
N=100
Mean=37.6
SD=4.5
• Although mean and SD can be calculated for any
population,
• Does not summarize the distribution as well as for
normal distributions
• A better approach is to use percentiles
Median (50th percentile)
Median
Half of observations fall below and half lie above
• Can be used for ordinal or continuous data
• Insensitive to outliers
25th percentile
75th percentile
Percentiles
•
•
The in a distribution where a value is larger than 25% or 75% of the other values in the
sample
Does not assume that the population has a normal distribution
68%
95%
- 2SD - 1SD
•
•
•
•
mean + 1SD + 2SD
Standard Deviation (SD)
Appropriately applied only to data that are
normally or near normally distributed
Applicable only to continuous data
Within +/- 1 SD are found 68% of the sample’s values,
Within +/- 2 SD are found 95% of the sample’s values
Hypothesis Testing
The null hypothesis (Ho)
– posits no difference between groups being compared (Group A = Group
B)
– a statistical convention (but a good one)
– is used to assist in determining if any observed differences between
groups is due to chance alone (bad luck)
in other words, is any observed difference likely due to sampling variation?
Hypothesis Testing
Example: A new anti-obesity medication is compared to an
existing one to determine if one agent is better at achieving
goal BMI at the recommended starting dose.
– Results:
At
Goal
New Drug
60
Old Drug 50
Not At
Goal
40
50
Totals
Totals
90
200
110
100
100
Ho: success rate for new drug = success rate for old drug
Hypothesis Testing
Tests for statistical significance determine if the data
are consistent with Ho
If Ho is “rejected” = statistically significant difference between groups
(unlikely due to chance or ‘bad luck’)
If Ho is “accepted” = no statistically significant difference between
groups (results may be due to ‘bad luck’)
Hypothesis Testing
– The distribution (range of values) for statistical tests when Ho
is true is known
– Depending on this statistic’s value, Ho is accepted or
rejected
Choosing the appropriate statistical test depends on:
– Type of data (nominal, ordinal, continuous)
– Study design (parallel, cross-over, etc.)
– presence of Confounding variables
Hypothesis Testing
0.05
0.01
0
C2
3.84
6.64
For our example,
– data is nominal data, parallel design with no confounders
– appropriate test is C2
The frequency distribution of C2 when Ho is true is shown above
Hypothesis Testing
Large values are possible when Ho is true, but they occur
infrequently (5% of the time when C2 is >3.84 and only 1% of the
time when C2 is > 6.64)
These extreme values are used to demarcate the point(s) at
which Ho is accepted or rejected
Hypothesis Testing
For our example:
– using the data in the formula for calculating C2 yields a value of 1.64
– because 1.64 < 3.84, accept Ho and say that the new drug is not statistically
significantly better than the old drug in getting patients to their goal BMI with
the recommended starting dose
1.64
0
C2
3.84
Decision Errors
Underlying “Truth”
Your Decision
Ho is true
Ho is false
Accept Ho
No error
Type II error
Reject Ho
Type I error
No error
Decision Errors
The probability of making a Type I error is defined as the
significance level a
– By setting a at 0.05, this effectively means that 1 out of 20
times a Type I error will occur when Ho is rejected
– The calculated probability that a Type I error has occurred is
called the “p-value”
– When the a level is set a priori, Ho is rejected when p < a
Decision Errors
The probability of making a Type II error (accepting Ho when it
should be rejected) is termed b
By convention, b should be < 0.20
Decision Errors
Power (1-b)
The ability to detect actual differences between groups
Power is increased by:
– Increasing a
– Increasing n
– Large differences between populations
Power is decreased by:
– Poor study design
– Incorrect statistical tests
Statistical Significance
Areas for Vigilance
Size of p-value is not related to the importance of the
result
Statistically significant does not necessarily mean
clinically significant
Lack of statistical significance does not mean results
are unimportant
Choosing a Statistical Test
Parametric versus non-parametric
Parametric tests assume an underlying normal distribution
Non-parametric tests:
– Non-normally distributed data
– Nominal or ordinal data
Choosing a Statistical Test
Continuous Data
Student’s t-test
– 1 sample: compares mean of study population to the mean of a
population whose mean is known
– 2 sample (independent samples): compares the means of 2 normal
distributions
– Paired: compares the means of paired or matched samples
Choosing a Statistical Test
Continuous Data
Analysis of variance (ANOVA)
– Compares the means of 3 or more groups in a study
– Multiple comparison procedures are used to determine
which groups actually differ from each other
e.g., Bonferroni, Tukey, Scheffe, others
Analysis of covariance (ANACOVA)
– Controls for the effects of confounding variables
Choosing a Statistical Test
Ordinal Data
Wilcoxon rank sum
Mann-Whitney U
Wilcoxon signed rank
Kruskal-Wallis
Friedman
These tests may also be used for non-normally distributed continuous
data
Choosing a Statistical Test
Nominal Data
X2
– Compares percentages between 2 or more groups
Fisher’s exact test
– Infrequent outcomes
McNemar’s
– Paired samples
Mantel-Haenszel
– Controls for influence of confounders
95% Confidence Intervals
When the ABSOLUTE difference between groups is considered:
– A 95% confidence interval that excludes zero is considered statistically
significant
– The 95% confidence interval also provides information regarding the
MAGNITUDE of the difference between groups
Regression
Regression useful in constructing predictive models
Multiple regression involves modeling many possible predictor
variables to ascertain which predict a particular target variable
Regression modeling often used to control or adjust for the
effects of confounding variables
•Example of
predictive modeling
•Expected
performance derived
from regression
model
Circ Cardiovasc Qual Outcomes 2011;4:22-29
Expected
performance
(99% CI)
Observed
performance
Observed differs
from expected by
>5%
Survival Analysis
Studies the time between entry into a study and some
event (e.g., death)
Takes into account that some subjects leave the study
due to reasons other than the ‘event’ (e.g. lost to follow
up, study period ends)
May be utilized to arrive at different types of models
– Kaplan-Meier
– Cox Regression Model
Proportional hazards regression analysis
Kaplan Meier
Uses survival times (or censored survival times) to
estimate the proportion of people who would survive a
given length of time under the same circumstances
Allows for the production of a survival curve
Uses log-rank test to test for statistically significant
differences between groups
Survival Analysis-Kaplan Meier Survival
Curve
1.0
0.8
Treatment
0.6
0.4
Control
0.2
0.0
Time
Cox Regression Modeling
Reported graphically like Kaplan-Meier
Investigates several variables at a time
Allows calculation of relative risk estimate while adjusting for
differences between groups