Study designs: Cross-sectional studies
Download
Report
Transcript Study designs: Cross-sectional studies
Principles of Epidemiology for Public Health (EPID600)
Study designs: Cross-sectional studies,
ecologic studies (and confidence intervals)
Victor J. Schoenbach, PhD home page
Department of Epidemiology
Gillings School of Global Public Health
University of North Carolina at Chapel Hill
www.unc.edu/epid600/
2/22/2011
Cross-sectional studies
1
Signs from around the world
In a Copenhagen airline ticket office:
“We take your bags and send them in all
directions.”
2
Signs from around the world
In a Norwegian cocktail lounge:
“Ladies are requested not to have
children in the bar.”
3
Signs from around the world
Rome laundry:
“Ladies, leave your clothes here and
spend the afternoon having a good time.”
4
Faster keyboarding - 1
I cdnuolt blveiee taht I cluod aulaclty uesdnatnrd waht I
was rdanieg. The phaonmneal pweor of the hmuan mnid,
aoccdrnig to a rscheearch at Cmabrigde Uinervtisy. It
dn'seot mttaer in waht oredr the ltteers in a wrod are, the
olny iprmoatnt tihng is taht the frist and lsat ltteer be in
the rghit pclae. The rset can be a taotl mses and you can
sitll raed it wouthit a porbelm.
• Gary C. Ramseyer's First Internet Gallery of Statistics
Jokes http://davidmlane.com/hyperstat/humorf.html (#162)
5
Faster keyboarding - 2
Most of my friends could read this with understanding
and rather quickly I might add. Then I had them read a
statistical bit of literature:
• Miittluvraae asilyans sattes an idtenossiy ctuoonr
epilsle is the itternoiecsno of a panle pleralal to the xlyapne and the sruacfe of a btiiarave nmarol
dbttiisruein.
Gary C. Ramseyer's First Internet Gallery of Statistics Jokes
http://davidmlane.com/hyperstat/humorf.html (#162)
6
Principles of Epidemiology for Public Health (EPID600)
Study designs: Cross-sectional studies,
ecologic studies (and confidence intervals)
Victor J. Schoenbach, PhD home page
Department of Epidemiology
Gillings School of Global Public Health
University of North Carolina at Chapel Hill
www.unc.edu/epid600/
2/22/2011
Cross-sectional studies
7
Today – outline
• Cross-sectional studies (and sampling)
• Ecologic studies
• Confidence intervals
10/15/2001
Cross-sectional studies
8
Cross-sectional studies
• Cross-sectional studies include surveys
• People are studied at a “point” in time, without
follow-up.
• Can combine a cross-sectional study with follow-up
to create a cohort study.
• Can conduct repeated cross-sectional studies to
measure change in a population.
2/10/2009
Cross-sectional studies
9
Cross-sectional studies
• Number of uninsured Americans rises to 50.7
million. (USA Today, 9/17/2010; data from Census Bureau)
• In 2007-2008, almost one in five children older than
5 years was obese. (Health, United States, 2010; data from
the National Health and Nutrition Examination Survey)
• 35% (~7.4 million) of births to U.S. women during
the preceding 5 years were mistimed or unwanted
(2002 National Survey of Family Growth, Series 23, No. 25, Table 21)
[Source: www.cdc.gov/nchs/]
2/22/2011
Cross-sectional studies
10
Cross-sectional studies
• Incidence information is not available from a typical
cross-sectional study
• Sometimes can reconstruct incidence from historical
information
• Example: the incidence proportion of quitting
smoking, called the “quit ratio”:
ex-smokers / ever-smokers
is calculated from survey data.
2/10/2009
Cross-sectional studies
11
Measure prevalence at “point” in time
• “Snapshot” of a population, a “still life”
• Can measure attitudes, beliefs, behaviors, personal or
family history, genetic factors, existing or past health
conditions, or anything else that does not require followup to assess.
• The source of most of what we know about the
population
10/15/2001
Cross-sectional studies
12
Population census
• A cross-sectional study of an entire
population
• Provides the denominator data for
many purposes (e.g., estimation of
rates, assessing generalizability,
projecting from smaller studies)
• A huge effort – people can be difficult to
find and to count; may not want to
provide data
• Some countries maintain accurate and
current registries of the entire country
2/22/2011
Cross-sectional studies
13
National surveys conducted by NCHS
National Health Interview Survey (NHIS) –
household interviews
National Health and Nutrition Examination
Survey (NHANES) – interviews and physical
examinations
National Survey of Family Growth (NSFG) –
household interviews
National Health Care Survey (NHCS) –
medical records
2/22/2011
Cross-sectional studies
14
National surveys
• Designed to be representative of the entire country
• Modes: household interview, telephone, mail
• Employ complex sampling designs to optimize efficiency
(tradeoff between information and cost)
• Logistically challenging (answering machines, cellphones, . . .)
See presentation by Dr. Anjani Chandra at
www.minority.unc.edu/institute/2003/materials/slides/Chandra-20030522.ppt
2/22/2011
Cross-sectional studies
15
Example: National Health Interview Survey
• Conducted every year in U.S. by National
Center for Health Statistics (CDC)
• “Stratified, multistaged, household survey
that covers the civilian noninstitutionalized
population of the United States”
• Redesigned every decade to use new
census
10/15/2001
Cross-sectional studies
16
“multistaged”
• Improves logistical feasibility and reduces costs
(though reduces precision)
1. Divide population into primary sampling units
(PSU’s)
PSU = primary sampling unit: metropolitan statistical
area, county, group of adjacent counties
2/10/2009
Cross-sectional studies
17
“multistaged”
2. Select sample of census block groups (SSU’s)
within each selected PSU
3. Map each selected census block group or
examine building permits
4. Select one cluster of 4-8 housing units
dispersed evenly throughout the block
NCHS draws a new representative sample for
each week’s interviews
2/10/2009
Cross-sectional studies
18
“stratified”
• US divided into 1,900 PSU’s
• Largest 52 PSU’s are “self-representing”
• Rest of PSU’s divided into 73 categories (“strata”),
based on socioeconomic and demographic variables
• Sampling takes place separately within each category
(“stratum”)
10/15/2001
Cross-sectional studies
19
Sample size and Precision
Sample Lower Point Upper
size
95% estimate 95% Width
100 0.17
0.25 0.33 0.16
400 0.21
0.25 0.29 0.08
900 0.22
0.25 0.28 0.06
1600 0.23
0.25 0.27 0.04
0.25 0.188 0.43301
7/30/2010
Cross-sectional studies
20
Weighted sampling
Hypothetical Unweighted Weighted
Age group Pop (1,000's) Sample
Sample
20-39 yrs
40-59 yrs
60-69 yrs
Total
3/6/2006
18,000
18,000
8,000
44,000
Cross-sectional studies
900
900
400
2,200
400
400
400
1,200
21
“stratified”
• Also place census blocks into categories and
sample within each
• Oversample some strata
10/15/2001
Cross-sectional studies
22
“Defined population”
• Studies, especially cross-sectional studies, are easiest to
interpret when they are based in a population that has some
existence apart from the study itself (“defined population”)
1. Political subdivision (city, county, state)
2. Institutional (HMO, employer, profession)
• Probability sampling enables statistical generalizability to
the defined population
2/10/2009
Cross-sectional studies
23
Surveys of sentinel populations
• HIV seroprevalence survey in three county STD
clinics in central NC in 1988
• 3,000 anonymous, unlinked, leftover sera
• Anonymous questionnaire for demographics
and risk factors
[Schoenbach VJ, Landis SE, Weber DJ, Mittal M, Koch GG, Levine PH. HIV
seroprevalence in sexually transmitted disease clients in a low-prevalence southern
state. Ann Epidemiol 1993;3:281-288]
2/22/2011
Cross-sectional studies
24
HIV seroprevalence
Group
Homosexual men
Bisexual men
Heterosexual men
Women
Total
% HIV+
46
25
1.6
0.6
2.5
[Schoenbach VJ, Landis SE, Weber DJ, Mittal M, Koch GG, Levine PH. HIV
seroprevalence in sexually transmitted disease clients in a low-prevalence southern
state. Ann Epidemiol 1993;3:281-288]
10/15/2001
Cross-sectional studies
25
Seroprevalence (% HIV+) by risk factors
Characteristic
Gay Hetero Women
Syphilis
(history/current)
Gonorrhea (history)
53
9.0
3
37
2.6
1
Anal intercourse
41
1.7
2
Paid for sex
5.2
[Schoenbach VJ, Landis SE, Weber DJ, Mittal M, Koch GG, Levine PH. HIV
seroprevalence in sexually transmitted disease clients in a low-prevalence southern state.
Ann Epidemiol 1993;3:281-288]
10/14/2003
Cross-sectional studies
26
Interpretation
• Measures prevalence – if incidence is our
real interest, prevalence is often not a good
surrogate measure
• Studies only “survivors” and “stayers”
• May be difficult to determine whether a
“cause” came before an “effect” (exception:
genetic factors)
10/15/2001
Cross-sectional studies
27
Other points
• Can choose by exposure or overall
• Can choose by disease – may not be
distinguishable from a case-control study with
prevalent cases
10/15/2001
Cross-sectional studies
28
Outline
• Cross-sectional studies (and sampling)
• Ecologic studies
• Confidence intervals
10/15/2001
Cross-sectional studies
29
“Ecologic” studies
• Most study designs – cross-sectional, casecontrol, cohort, intervention trials – can be carried
out with individuals or with groups
• Group-level studies which use routinely collected
data are easier and less costly
• Group-level studies that involve interventions
may not be easier or less costly
10/15/2001
Cross-sectional studies
30
Types of group-level variables
• Summary of individual-level variable (e.g.,
median household income, % with high
school diploma)
• Property of the aggregate (e.g.,
neighborhood grocery stores, seat belt
legislation, “community competence”)
3/6/2006
Cross-sectional studies
31
Interpretation
• Link between summary exposure variable and
individual-level outcome must be inferred
• Inference from group to individual is not
always sound
2/22/2011
Cross-sectional studies
32
Example: Male Circumcision and HIV
(Slope indicates strength of relationship;
r indicates linearity)
Source: Bongaarts J, et al. The relationship between male circumcision and HIV infection in African populations. AIDS 1989; 3(6): 373-7.
2/22/2011
Cross-sectional studies
33
Outline
• Cross-sectional studies (and sampling)
• Ecologic studies
• Confidence intervals
10/15/2001
Cross-sectional studies
34
Confidence intervals
• Provide a plausible range for the quantity
being estimated
• Width indicates the precision of an estimate
for a given level of “confidence”
• Confidence intervals quantify only random
error from sampling variation, not systematic
error from nonresponse, study design, etc.
3/8/2006
Cross-sectional studies
35
Confidence level vs. precision
• The more vague my estimate, the more
confident I can be that it includes the
population parameter: “I am 100%
confident that the prevalence of HIV is
between 0 and 100%”.
• The more specific my estimate, the lower
my confidence: “I am 0% confident that
the prevalence of HIV is 5.23%”
10/15/2001
Cross-sectional studies
36
Confidence intervals – interpretation
• Simple interpretations are typically not
precise
• Precise interpretations are typically not
simple
10/12/2004
Cross-sectional studies
37
Simple but imprecise
• “There is 95% confidence that the interval
contains the true value”
– True, but begs the question – how to
define “confidence”
10/15/2001
Cross-sectional studies
38
Simple but imprecise
• “There is a 95% probability that the interval
contains the true value”
– Not quite correct: probability (as
conventionally defined) applies to a process,
not to a single instance
10/15/2001
Cross-sectional studies
39
Probability applies to a process: example
A 95% confidence interval can be viewed as a
measurement or estimation process that will
be correct (the interval includes the true
value of the parameter) 95% of the time and
incorrect 5% of the time.
Let us make up another estimation process
that will be correct (about) 95% of the time.
3/7/2006
Cross-sectional studies
40
Why probability applies to a process
• Estimate your gender by flipping a coin 5 times if the result is 5 heads estimate your gender to
be its opposite; otherwise estimate your gender
to be what you think it is now.
• Probability that estimate will be correct is
(1 – Probability of 5 heads) = 0.97 = 97%
• Probability that estimate will be incorrect is 3%
6/29/2002
Cross-sectional studies
41
Why probability applies to a process
So we now have a measurement process that
will be correct 97% of the time. We will use it
to measure your gender.
Flip the coin 5 times, and suppose you get 5
heads
– Is there a 97% probability that you are of the
opposite sex?
6/29/2002
Cross-sectional studies
42
Precise but not simple
A 95% confidence interval is:
1. obtained by using a procedure that will include
the population parameter being estimated 95%
of the time
2. the set of all population values which are “likely”
to yield a sample like the one we obtained
2/22/2011
Cross-sectional studies
43
Suppose that this line represents the value
of the parameter we are trying to estimate
True value
10/15/2001
Cross-sectional studies
44
Possible estimates of that parameter in N
identical studies (shows sampling variation)
o
Study estimates
oo
oooo
True value
oooooo
oooooooo
oooooooooo
o o ooooooooooo o
oo o ooooooooooooooooo o o
10/15/2001
Cross-sectional studies
45
One possible “true” value and how it would
manifest, on average, in N identical studies
o
oo
oooo
True value
oooooo
oooooooo
oooooooooo
o o ooooooooooo o
oo o ooooooooooooooooo o o
95% of the distribution
10/15/2001
Cross-sectional studies
46
Estimate from one study of a given size
?
Estimate
10/15/2001
Cross-sectional studies
47
A possible “true” value with < 2.5% chance of
being observed at or beyond the estimate
?
o
oo
oooo
oooooo
oooooooo
oooooooooo
o ooooooooooo o
ooooooooooooooo o o
Estimate
95% of the distribution
10/14/2003
Cross-sectional studies
48
A possible true value with > 2.5% probability
of being observed at or beyond the estimate
?
o
oo
oooo
oooooo
oooooooo
oooooooooo
o o ooooooooooo o
oooooooooooooooo o o
Estimate
95% of the distribution
10/15/2001
Cross-sectional studies
49
A possible true value with > 2.5% probability
of being observed at or beyond the estimate
?
Estimate
o
oo
oooo
oooooo
oooooooo
oooooooooo
o o ooooooooooo o
oo o ooooooooooooooo
95% of the distribution
10/15/2001
Cross-sectional studies
50
A possible true value with < 2.5% probability of
being observed at or beyond the estimate
?
Estimate
o
oo
oooo
oooooo
oooooooo
oooooooooo
o o ooooooooooo
oo o oooooooooooooo
95% of the distribution
10/15/2001
Cross-sectional studies
51
What the confidence interval represents
o
o
?
oo
o
oo
ooo o o
oo
oooo
oo
oo o o
oo
oooo
oooooo
oo
oo
oo
oo o o
oo
oooooo
oooooooo
oo
oo
oo
oo
oo
oo o o
oo
oooooooo
oooooooooo
oo
oo
oo
oo
oo
oo
oo
oo o o
o oo
oooooooooo
o ooooooooooo o
oo
oo
oo
oo
oo
oo
oo
oo
oo
oo o
o ooooooooooo o oo o ooo
ooooooooooooooo o o o o ooooooooooo o
oooooooooooooooo o o
oo o ooooooooooooooo
95% confidence interval
10/14/2003
Cross-sectional studies
52
What the confidence interval represents
o
o
o
o
ooo
o
o
oo oo oo
oo
oooo
o
o
oooooooooooo
ooo
oo
ooo
o o o o o o o o o o o o o o o o oo
oo
oo
o
o
o
o
ooo
o
o
o
o
o
o
o
oo
oo
oo
oo
o oooooooooo o
oo
o
o
o
o
o
ooo
o o o o o o o o o o o oo o
o
o
o
o
o
o
oo
oo
oo
oo
oo
oo
o
o oooooooooo
ooo
o
o
o
o
o
o
o
o o o o o oo oo o o
o
o oo o
oo
o o oo o o o oo o
oo
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o ooo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
oo
ooo
o oo o o o o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o oo o o o o o o o o
oo
oo
oo
oo
oo
oo
ooooooooooooooo
oo o o
o
95% confidence interval
10/15/2001
Cross-sectional studies
53
One possible “true” value and how it would
manifest, on average, in N identical studies
o
oo
oooo
True value
oooooo
oooooooo
oooooooooo
o o ooooooooooo o
oo o ooooooooooooooooo o o
1.96 x s.e. | 1.96 x s.e.
3/8/2006
Cross-sectional studies
54
Confidence intervals – another take
10/15/2001
Cross-sectional studies
55
One possible population
O
10/15/2001
Cross-sectional studies
56
Another possible population
O
10/15/2001
Cross-sectional studies
57
A 3rd possible population
O
10/15/2001
Cross-sectional studies
58
A 4th possible population
O
10/15/2001
Cross-sectional studies
59
A 5th possible population
O
10/15/2001
Cross-sectional studies
60
A 6th possible population
O
O
O
O
10/15/2001
Cross-sectional studies
61
etc.
O
O
O
O
10/15/2001
Cross-sectional studies
62
There are 1.6 x 1060 possible populations
(no cases all cases)
O
O
O
O
10/15/2001
Cross-sectional studies
63
Suppose this is the population
(prevalence = 15%)
O
O O
O
OO
O
O
O
O
O
O
O
O
O
O O O
O
O
O O
O
O
O
O O
O
O
O
10/15/2001
Cross-sectional studies
64
Take a sample (n=10)
O
O O
O
OO
O
O
O
O
O
O
O
O
O
O O O
O
O
O O
O
O
O
O O
O
O
O
10/15/2001
Cross-sectional studies
65
The sample
O
O
10/15/2001
Cross-sectional studies
66
Make point estimate of prevalence
O
O
10/15/2001
Cross-sectional studies
67
Interval estimate
• What are all the possible populations that
would be expected to yield this prevalence
in a sample of size 10?
6/29/2005
Cross-sectional studies
68
This one is not possible
O
10/15/2001
Cross-sectional studies
69
Possible, but VERY UNLIKELY
O
O
3/8/2006
Cross-sectional studies
70
Not quite 2.5% probability (2.1%, in fact)
O
O
O
O
O
3/8/2006
Cross-sectional studies
71
Yields just about 2.5% (3%, actually) probability of
selecting 2 (or more) cases in 10
O
O
O
O
O
O
3/8/2006
Cross-sectional studies
72
One possible “true” value and how it would
manifest, on average, in N identical studies
o
oo
oooo
True value
oooooo
oooooooo
oooooooooo
o o ooooooooooo o
oo o ooooooooooooooooo o o
95% of the distribution
3/8/2006
Cross-sectional studies
73
Just above 2.5% (actually 2.6%) probability of
selecting 2 (or fewer) cases in 10
O OO OO O O OOO
O O
O OO OO OOOOO O OO
OO
O
O O O OO OO O O OO OO O
OO O OO O O O O
OO
O
O O O O O OO O OO O
O
OO OO O OO O OO O O O
O O OO OOO O OO O OO
O O O O OOO OO OOO O
3/8/2006
Cross-sectional studies
74
Just below 2.5% (actually 2.4%) probability of
selecting 2 (or fewer) cases in 10
O OO OO OO O OOO
O O
O OO OO O OOO O OO O OO
O
O O O OO OO O O OO OO O
OO O OO O O OO O
O
O
O O O O O OO O OO O
O
OO OO OO OO O OO O O O
O O OO O O O OO O OO
OO O OO O OOO OO OOO O
3/8/2006
Cross-sectional studies
75
Interval estimate for 2/10
• Lower bound: 2.5% (5 cases)
• Upper bound: 55% (110 cases)
Meaning: Our sample of 10 with 2 cases provides
evidence to exclude, at conventional error
tolerance, populations with fewer than 5 cases or
more than 110 cases. Populations with 5-110
cannot be excluded as likely sources for this
sample.
3/8/2006
Cross-sectional studies
76
Interval estimate for 2/10
• Actual population prevalence was 15%,
which in fact is between 2.5% and 55%.
• 2.5% to 55% is a very wide interval, i.e.,
a very imprecise estimate
• To make it more precise, we need a
larger sample
3/8/2006
Cross-sectional studies
77
Signs from around the world – Germany
“A sign posted in Germany's Black Forest:
It is strictly forbidden on our black forest
camping site that people of different sex, for
instance, men and women, live together in
one tent unless they are married with each
other for that purpose.”
78
Signs from around the world – Finland
On the faucet in a Finnish washroom:
“To stop the drip, turn cock to right.”
79