H - University of Kansas Medical Center

Download Report

Transcript H - University of Kansas Medical Center

Introduction to Biostatistics
for Clinical and Translational
Researchers
KUMC Departments of Biostatistics & Internal Medicine
University of Kansas Cancer Center
FRONTIERS: The Heartland Institute of Clinical and Translational Research
Course Information
 Jo A. Wick, PhD
 Office Location: 5028 Robinson
 Email: [email protected]
 Lectures are recorded and posted at
http://biostatistics.kumc.edu under ‘Educational
Opportunities’
Course Objectives
 Understand the role of statistics in the scientific
process
 Understand features, strengths and limitations of
descriptive, observational and experimental
studies
 Distinguish between association and causation
 Understand roles of chance, bias and confounding
in the evaluation of research
Course Calendar
 June 29: Descriptive Statistics and Core Concepts
 July 6: Hypothesis Testing
 July 13: Linear Regression & Survival Analysis
 July 20: Clinical Trial & Experimental Design
Probability Review
Experiment
 An experiment is a process whose results are not
known until after it has been performed.
 The range of possible outcomes are known in advance
 We do not know the exact outcome, but would like to
know the chances of its occurrence
 The probability of an outcome E, denoted P(E), is
a numerical measure of the chances of E
occurring.
 0 ≤ P(E) ≤ 1
Probability
 The most common definition of probability is the
relative frequency view:
# of times x = a
P  x = a =
total # of observations of x
 Probabilities for the outcomes of a random variable
0 0.0
0.00
5
0.05
P(x)
10
0.2
0.10
15
0.15
0.4
x are represented through a probability distribution:
0 -4
1
2 2
3
-2 44
5
66
07
x
8
8
9
10
10 2
1112
12
14
4
Population Parameters
 Most often our research questions involve
unknown population parameters:
What is the average BMI among 5th graders?
What proportion of hospital patients acquire a hospitalbased infection?
 To determine these values exactly would require a
census.
 However, due to a prohibitively large population (or
other considerations) a sample is taken instead.
Sample Statistics
 Statistics describe or summarize sample
observations.
 They vary from sample to sample, making them
random variables.
 We use statistics generated from samples to make
inferences about the parameters that describe
populations.
Sampling Variability
Samples
μ σ
x  0 s 1
x  0.15 s  1.1
Population
Sampling Distribution
x
of
x  0.1 s  0.98
Types of Samples

 Random sample: each
population
sample
person has equal chance
of being selected.
 Convenience sample:
persons are selected
because they are convenient or readily available.
The principal way to guarantee that the sample
 Systematic sample: persons selected based on
a pattern.
 Stratified sample: persons selected from within
subgroup.
Random Sampling
 For studies, it is optimal (but not always possible)
for the sample providing the data to be
representative of the population under study.
 Simple random sampling provides a
representative sample (theoretically).
 A sampling scheme in which every possible sub-sample
of size n from a population is equally likely to be selected
 Assuming the sample is representative, the summary
statistics (e.g., mean) should be ‘good’ estimates of the
true quantities in the population.
• The larger n is, the better estimates will be.
Types of Samples
 We will explore the impact of sampling when we
discuss Experimental Design on July 20.
Hypothesis Testing
Recall: Types of Data
 All data contains information.
 It is important to recognize that the hierarchy
implied in the level of measurement of a variable
has an impact on
(1) how we describe the variable data and
(2) what statistical methods we use to analyze it.
Levels of Measurement
 Nominal: difference
discrete qualitative
 Ordinal: difference, order
 Interval: difference, order, equivalence of intervals
continuous quantitative
 Ratio: difference, order, equivalence of intervals,
absolute zero
Types of Data
NOMINAL
ORDINAL
INTERVAL
RATIO
Information increases
Levels of Measurement
 The levels are in increasing order of mathematical
structure—meaning that more mathematical
operations and relations are defined—and the
higher levels are required in order to define some
statistics.
 At the lower levels, assumptions tend to be less
restrictive and the appropriate data analysis
techniques tend to be less sensitive.
 In general, it is desirable to have a higher level of
measurement.
Levels of Measurement
Level
Statistical Summary
Mathematical
Relation/Operation
Nominal
Mode
one-to-one transformations
Ordinal
Median
monotonic transformations
Interval
Mean, Standard Deviation
positive linear transformations
Ratio
Geometric Mean, Coefficient of Variation
multiplication by c  0
Recall: Hypotheses
 Null hypothesis “H0”: statement of no
differences or association between variables
 This is the hypothesis we test—the first step in the
‘recipe’ for hypothesis testing is to assume H0 is true
 Alternative hypothesis “H1”: statement of
differences or association between variables
 This is what we are trying to prove
Hypothesis Testing
 One-tailed hypothesis: outcome is expected in a
single direction (e.g., administration of
experimental drug will result in a decrease in
systolic BP)
 H1 includes ‘<‘ or ‘>’
 Two-tailed hypothesis: the direction of the effect is
unknown (e.g., experimental therapy will result in a
different response rate than that of current
standard of care)
 H1 includes ‘≠‘
Hypothesis Testing
 The statistical hypotheses are statements
concerning characteristics of the population(s) of
interest:
 Population mean: μ
 Population variability: σ
 Population rate (or proportion): π
 Population correlation: ρ
 Example: It is hypothesized that the response rate
for the experimental therapy is greater than that of
the current standard of care.
 πExp > πSOC ← This is H1.
Recall: Decisions
 Type I Error (α): a true H0 is incorrectly rejected
 “An innocent man is proven GUILTY in a court of law”
 Commonly accepted rate is α = 0.05
 Type II Error (β): failing to reject a false H0
 “A guilty man is proven NOT GUILTY in a court of law”
 Commonly accepted rate is β = 0.2
 Power (1 – β): correctly rejecting a false H0
 “Justice has been served”
 Commonly accepted rate is 1 – β = 0.8
Decisions
Truth
Conclusion
H1
H0
H1
Correct: Power
Type I Error
H0
Type II Error
Correct
Basic Recipe for Hypothesis
Testing
1. State H0 and H1
2. Assume H0 is true
3. Collect the evidence—from the sample data,
compute the appropriate sample statistic and the
test statistic

Test statistics quantify the level of evidence within the
sample—they also provide us with the information for
computing a p-value (e.g., t, chi-square, F)
4. Determine if the test statistic is large enough to
meet the a priori determined level of evidence
necessary to reject H0 (. . . or, is p < α?)
Example: Carbon Monoxide
 An experiment is undertaken to determine the
concentration of carbon monoxide in air.
 It is hypothesized that the actual concentration is
significantly greater than 10 mg/m3.
 Eighteen air samples are obtained and the
concentration for each sample is measured.
 The random variable (outcome) x is carbon monoxide
concentration.
 The characteristic (parameter) of interest is μ—the true
average concentration of carbon monoxide in air.
Step 1: State H0 & H1
 H1: μ > 10 mg/m3 ← We think!
0.4
 H0: μ ≤ 10 mg/m3 ← We assume in order to test!
0.2
0.0
P(x)
Step 2: Assume μ = 10
-4
-2
μ = 10
0
x
2
4
Step 3: Evidence
10.25
10.37
10.66
10.47
10.56
10.22
10.44
10.38
10.63
10.40
10.39
10.26
10.32
10.35
10.54
10.33
10.48
10.68
Sample statistic: x = 10.43
Test statistic: t 
What does 1.79 mean? How do we use it?
x  μ0 10.43  10

 1.79
s
1.02
n
18
Student’s t Distribution
0.4
 Remember when we assumed H0 was true?
0.2
0.0
P(x)
Step 2: Assume μ = 10
-4
-2
μ = 10
0
x
2
4
Student’s t Distribution
 What we were actually doing was setting up this
theoretical Student’s t distribution from which the pvalue can be calculated:
xμ
10  10
0
s
0.2
n
0.0
P(x)
0.4
t
-4
-2
0
t=0
x
2
4

0
1.02
18
Student’s t Distribution
 Assuming the true air concentration of carbon
0.4
monoxide is actually 10 mg/mm3, how likely is it
that we should get evidence in the form of a
sample mean equal to 10.43?
0.2
P  x  10.43 ?
0.0
P(x)
Step 2: Assume μ = 10
-4
-2
μ = 10
0
2
x
x =10.43
4
Student’s t Distribution
 We can say how likely by framing the statement in
terms of the probability of an outcome:
0.2
x  μ0
10  10

0
s
1.02
n
18
p = P(t ≥ 1.79) = 0.0456
0.0
P(x)
0.4
t
-4
-2
0
t=0
2
x
t = 1.79
4
Step 4: Make a Decision
 Decision rule: if p ≤ α, the chances of getting the
actual collected evidence from our sample given
the null hypothesis is true are very small.
 The observed data conflicts with the null ‘theory.’
 The observed data supports the alternative ‘theory.’
 Since the evidence (data) was actually observed and our
theory (H0) is unobservable, we choose to believe that
our evidence is the more accurate portrayal of reality and
reject H0 in favor of H1.
Step 4: Make a Decision
 What if our evidence had not been in as great of
degree of conflict with our theory?
 p > α: the chances of getting the actual collected
evidence from our sample given the null hypothesis is
true are pretty high
 We fail to reject H0.
Decision
 How do we know if the decision we made was the
correct one?
 We don’t!
 If α = 0.05, the chances of our decision being an incorrect
reject of a true H0 are no greater than 5%.
 We have no way of knowing whether we made this kind
of error—we only know that our chances of making it in
this setting are relatively small.
Which test do I use?
 What kind of outcome do you have?
 Nominal? Ordinal? Interval? Ratio?
 How many samples do you have?
 Are they related or independent?
Types of Tests
One Sample
Measurement
Level
Population
Parameter
Hypotheses
Sample
Statistic
Nominal
Proportion
π
H0: π = π0
H1: π ≠ π0
Ordinal
Median M
H0: M = M0
H1: M ≠ M0
Interval
Mean μ
H0: μ = μ0
H1: μ ≠ μ0
x
Student’s t or Wilcoxon (if non-normal or
small n)
Ratio
Mean μ
H0: μ = μ0
H1: μ ≠ μ0
x
Student’s t or Wilcoxon (if non-normal or
small n)
p=
x
n
m = p50
Inferential Method(s)
Binomial test or
z test (if np > 10 & nq > 10)
Wilcoxon signed-rank test
Types of Tests
 Parametric methods: make assumptions about
the distribution of the data (e.g., normally
distributed) and are suited for sample sizes large
enough to assess whether the distributional
assumption is met
 Nonparametric methods: make no assumptions
about the distribution of the data and are suitable
for small sample sizes or large samples where
parametric assumptions are violated
 Use ranks of the data values rather than actual data
values themselves
 Loss of power when parametric test is appropriate
Types of Tests
Two Independent Samples
Measurement
Level
Population
Parameters
Hypotheses
Sample
Statistics
Nominal
π1, π2
H0: π1 = π2
H1: π1 ≠ π2
Ordinal
M1, M2
H0: M1 = M2
H1: M1 ≠ M2
m1, m2
Median test
Interval
μ1, μ2
H0: μ1 = μ2
H1: μ1 ≠ μ2
x1
x2
Student’s t or Mann-Whitney (if non-normal,
unequal variances or small n)
Ratio
μ1, μ2
H0: μ1 = μ2
H1: μ1 ≠ μ2
x1
x2
Student’s t or Mann-Whitney (if non-normal,
unequal variances or small n)
p1 =
x1
n1
p2 =
Inferential Method(s)
x2
n2
Fisher’s exact or Chi-square (if cell counts >
5)
Comparing Central Tendency
# Groups
2
>2
Normal or large
n
Independent
Samples
Dependent
Samples
2-sample t
Non-normal or
small n
Independent
Samples
Paired t
Normal or large
n
Dependent
Samples
Wilcoxon
Signed-Rank
Independent
Samples
Wilcoxon RankSum
Non-normal or
small n
Dependent
Samples
ANOVA
Independent
Samples
2-way ANOVA
Dependent
Samples
Kruskal-Wallis
Friedman’s
One-Sample Test of a Mean
 Dissolving times (seconds) of a drug in gastric
juice:
x  45.21
42.7
43.4
44.6
45.1
45.6
45.9
46.8
47.6
s 2  2.69
 It is hypothesized that the drug will take more than
45 seconds to fully dissolve.
0.4
 H1: μ > 45
t
45.21 45
 0.36
0.58
0.2
p = P(t > 0.36) = 0.36
0.0
P(x)
 H0: μ ≤ 45
-4
-2
0
t=0
x
2
4
Two-Sample Test of Means
 Clotting times (minutes) of blood for subjects given
one of two different drugs:
Drug B
Drug G
8.8
8.4
9.9
9.0
7.9
8.7
11.1
9.6
9.1
9.6
8.7
10.4
x1  8.75
x2  9.74
9.5
 It is hypothesized that the two drugs will result in
different blood-clotting times.
 H1: μB ≠ μG
 H0: μB = μG
Two-Sample Test of Means
 What we’re actually hypothesizing: H0: μB  μG = 0
0.2
0.0
P(x)
0.4
x1  x2  0.99
-4
-2
μB  μG = 0
0
2
4
x
P  x1  x2  0.99   ?
Two-Sample Test of Means
 What we’re actually hypothesizing: H0: μB  μG = 0
p = P(|t| > 2.475) = 0.03
x1  x2
0.2
s12 s22

n1 n2
0.0
P(x)
0.4
t
-4
t = 2.48
-2
0
t=0
x
t = 2.48
2
4

8.75  9.74
 2.475
0.40
Assumptions of t
 In order to use the parametric Student’s t test, we
have a few assumptions that need to be met:
 Approximate normality of the observations
 In the case of two samples, approximate equality of the
sample variances
Assumption Checking
 To assess the assumption of normality, a simple
histogram would show any issues with skewness
or outliers:
Assumption Checking
 Skewness
Assumption Checking
 Other graphical assessments include the QQ plot:
Assumption Checking
 Violation of normality:
Assumption Checking
 To assess the assumption of equal variances
(when groups = 2), simple boxplots would show
any issues with heteroscedasticity:
Assumption Checking
 Rule of thumb: if
the larger variance is
more than 2 times
the smaller, the
assumption has been
violated
Now what?
 If you have enough observations (20? 30?) to be
able to determine that the assumptions are
feasible, check them.
 If violated:
• Try a transformation to correct the violated assumptions (natural
log) and reassess; proceed with the t-test if fixed
• If a transformation doesn’t work, proceed with a non-parametric test
• Skip the transformation altogether and proceed to the nonparametric test
 If okay, proceed with t-test.
Now what?
 If you have too small a sample to adequately
assess the assumptions, perform the nonparametric test instead.
 For the one-sample t, we typically substitute the Wilcoxon
signed-rank test
 For the two-sample t, we typically substitute the MannWhitney test
Consequences of Nonparametric
Testing
 Robust!
 Less powerful because they are based on ranks
which do not contain the full level of information
contained in the raw data
 When in doubt, use the nonparametric test—it will
be less likely to give you a ‘false positive’ result.
Summary
 Probability review
 Population parameters
 Sample statistics
 Types of samples
 Hypothesis testing
 Matching the level of measurement to the type of test
 Recipe for hypothesis testing
 Types of tests
• Parametric versus nonparametric
 Assumption checking