Hypothesis Testing
Download
Report
Transcript Hypothesis Testing
Hypothesis Testing
Field Epidemiology
Hypothesis
• Hypothesis testing is conducted in etiologic
study designs such as the case-control or
cohort as well as the experimental study
designs.
• An hypothesis is a a statement of association
between exposure (predictor) and an
outcome (disease or health event).
• Hypotheses are one-tailed or two tailed.
The null hypothesis states that there is no
association.
Examples
• Smoking is not associated with lung cancer
(Null hypothesis)
• Smoking is associated with a higher
incidence of lung cancer (One-tailed
hypothesis)
• Smoking is associated with a lower incidence
of lung cancer (or it is protective) (One-tailed
hypothesis)
• Smoking has some association with lung
cancer (uncertain of how it influences lung
cancer) (Two-tailed hypothesis)
Rules of Thumb
• Usually there is one main hypothesis
and a couple of secondary hypotheses
• The more specific you are in your
statement of hypothesis, the easier it
will be to answer your question
• Usually stated in the paper as “The
purpose of the study is to….”
Epidemiologic Decision
Making
Disease
Exposure
a
No Exposure c
a+c
No Disease
b
a+b
d
c+d
b+d
N
Relative Risk
R.R.= a/(a+b)
-----------c/(c+d)
RR = the likelihood of developing the disease
in the exposed group compared to the unexposed group
Relative Risk
for a disease exposure
CVD
Obesity
75
No Obesity 25
100
No CVD
25
75
100
RR = 75/100 = 3.00
25/100
C.I. (2.10 - 4.29)
100
100
200
Relative Risk for preventive
intervention
Disease
Counseling
25
No Counseling 50
75
No Disease
75
100
50
100
125
200
RR = 25/100 = .50
50/100
C.I. (.39-.79)
Relative Risk Calculation
Used Condoms
Did not use
Condoms
Ct
30
60
No CT Total
70
100
40
100
90
110
RR =
200
=
Attributable Risk
• AR = Ie - Io
the difference between incidence rates
in the exposed and nonexposed groups
Odds Ratio
• a/c
b/d
• or the odds of
exposure in disease
compared to odds of
exposure in
non diseased
• a*d
b*c
• mathematically
equivalent to the
simpler formula
Odds Ratio
Ct
Douching
60
No douching 40
100
No CT
30
70
100
O.R. = 60 * 70 = 3.50
40 * 30
Total
90
110
200
T-test - Continuous data
Men
Women
Number Mean
CD4
4350
326
925
431
Standard p-value
Deviation
288
< .001
330
Formula t-test = mean A - mean B - diff Null
variance for the entire study pop
Which group is more immuno-suppressed?
C.I. For Mean CD4
Men
Women
Number
4350
925
Mean
326.2
430.7
S.D.
288.4
330.1
S.E.
4.37
10.86
95% C.I.
317.6 - 334.8
409.4 - 451.9
T-test
• If the sample sized are different - first must pool
the variances
• pooled var = (4015-1)71.0 + (955-1)84.9 =74
(4015+955-2)
t-test = 34.8-29.9 0 =4.9 =16
________________
41(1/4015+1/955)
.23
Normally Distributed Data
Age at Entry
1600
1400
1200
Frequency
1000
800
600
400
Std. Dev = 8.83
Mean = 34.0
N = 5877.00
200
0
.0
85.0
80.0
75.0
70.0
65.0
60.0
55.0
50.0
45.0
40.0
35.0
30.0
25.0
20.0
15
AGE_YRS
Non-normally distributed data
CD4/mm3
1200
1000
Frequency
800
600
400
Std. Dev = 298.80
Mean = 344.5
N = 5275.00
200
0
.0
00
26 0.0
0
2400.0
2200.0
2000.0
18 0.0
0
1600.0
1400.0
1200.0
100.0
80 .0
0
600.0
400.0
20
0
0.
TH_L_CNT
2 Test of statistical
association
• Used to determine statistical association
for categorical data
2 = (O - E) 2
E
2 Test - Categorical data
Men
< 200
37.6
200
62.4
Women
20.4
79.6
p-value
< .001
2 Test of statistical
association
• Used to determine statistical association
for categorical data
2 = (O - E) 2
E
2 Calculation
Given Hotline Number
No Hotline Number
Given Hotline Number
No Hotline Number
ER use No ER use Total
100
400
500
300
200
500
400
600 1000
ER use No ER use Total
200
300
500
200
300
500
400
600 1000
(100-200)2 + (400-300)2 + (300-200)2 + (200-300)2
200
300
200
300
2 = 166.7, 1 D.F. (look up in table)
Multivariable techniques
Continuous Outcome
Categorical Outcome
Linear regression
Logistic Regression
Generalized estimating Cox Regression
equations (GEE)
ANOVA
GEE
Polychotomous