Week 2 - Statistical Conclusion and Internal Valdity
Download
Report
Transcript Week 2 - Statistical Conclusion and Internal Valdity
Outline
• Validity of Inference
• Theory of Validity
• Statistical Conclusion Validity
• Internal Validity
• Construct Validity – Jill
• External Validity – Tim
• Trade-offs – Tim et al
• Discussion
Validity of Inference
VALIDITY
• The approximate truth of an inference
– Judgment about the extent to which relevant
evidence supports the inference as being true
• Always entails fallible human judgments
– Evidence comes from both empirical findings
and their consistency with past findings and
theories
• Validity judgments are never absolute
– No certainty that inferences are true or that all
possible alternatives have been falsified
Validity of Inferences
• Validity is a property of inferences
– not of designs or methods
• Even using a randomized experiment does
not guarantee a valid causal inference
– Could be “broken” by
•
•
•
•
Differential attrition
Low statistical power
Improper statistical analysis
Sampling error
Why is it important to remember
that validity is a property of a
knowledge claim, not a property
of the design?
Three Theories of Truth
• Correspondence theory
– A knowledge claim is true if it corresponds to the
world – e.g., see it raining
• Coherence theory
– A claim is true if it belongs to a coherent set of
claims
• Pragmatism
– A claim is true if it is useful to believe it
• Philosophers do not agree on which theory of
truth is correct – and for us it doesn’t matter!
– Science uses them all to approximate the truth
The Theory of Validity is
pragmatic and uses them all
• Correspondence between empirical evidence
and abstract inferences
• Sensitive to degree of coherence between
findings and theory
• Pragmatic ruling out of alternative
explanations
• Truth is a social construction!
Campbell & Stanley, 1963
• Followed Campbell (1957) closely in defining
internal and external validity.
• Internal validity: inferences about whether
“the experimental treatments make a
difference in this specific experimental
instance.” (p. 5)
• Construct validity: asked “to what
populations, settings, treatment variables
and measurement variables can this effect
be generalized?” (p. 5)
Cook & Campbell (1979)
Expanded Typology of Validity
To draw generalized causal inferences it is
useful to treat the causal and generalizability
aspects of the inferences separately:
– Statistical conclusion Validity
– Internal Validity
– Construct Validity
– External Validity
Corresponds to 4 Questions
• How large and reliable is covariation
between the presumed cause and the effect?
• Is the covariation causal, or would it have
been obtained without the treatment?
• Which general constructs are involved in the
persons (units), treatments, observations,
and settings (UTOS)?
• How generalizable is the locally-embedded
causal relationship over varied UTOS?
• These questions and inferences are often
considered separately, so it is practical to
have the typology reflect that
• However, they are often related - and
different combinations are possible (e.g.,
internal validity with or without construct
validity)
• Interesting to consider the limits of
combinations (e.g., to what extent is both
high internal and external validity possible?)
Threats To Validity
• Are specific reasons why we can be partly
or completely wrong in our inferences
– About covariation, causation, constructs or
variations across UTOS
• It is useful to anticipate criticisms of
inferences by considering the types of
limitations encountered by past research.
• Heuristics, such as a list of potential
threats, allow us to account for threats in
the design or by including measures of
anticipated threats.
3 Critical Questions about Threats
• For any particular experiment and finding:
– How would the threat apply in this case?
– Is there other evidence that the threat is
plausible rather than just possible?
– Does the threat operate in the same direction as
the observed effect (so that it could partially or
totally explain it)?
• But “ruling out” threats is a falsification
enterprise, so is always limited
Statistical Conclusion Validity
• The validity of inferences about the
covariation or correlation between the
treatment and the outcome
• How large and reliable is the covariation?
– Whether the variables covary or not
– How strongly they covary (SCC, p. 42)
Testing covariation
• Null hypothesis significance testing (NHST)
• Common misunderstandings of p value
• NHST tells little about effect size
• Effect size bound by confidence intervals
• An alternative approach
• SCC recommend these along with exact p of type I
error
Classical
Interpretation of p value
•In the classic interpretation, exact Type I
probability levels tell us the probability that
the results that were observed in the
experiment could have been obtained by
chance from a population in which the null
hypothesis is true (Cohen, 1994 as cited in SCC, p. 44).
•“Perhaps not the most interesting
hypothesis” (SCC)
Alternative
Interpretation of p value
• p value (probability level) signifies the
confidence we can have in deciding
among the following claims:
– 1) Treatment A did better than treatment B
(sign of effect is +)
– 2) Treatment B did better than treatment A
(sign of effect is -)
– 3) The sign is uncertain (P > .05 signifies 3,
“too close to call”)
Incorrect statistical conclusions (SCC,
p.42)
– 1) Whether the variables covary
– Type I error (claim of a difference when
there is none)
– Type II error (conclude that there is no
effect when in fact there is one)
– 2) How strongly they covary
– Overestimate magnitude of covariation
(and confidence in estimate of magnitude)
– Underestimate magnitude of covariation
(and confidence in estimate of magnitude)
Threats to Statistical Conclusion Validity
• Low statistical power
– See Table 2.3 (pp. 46-7) for methods to increase power
• Violated assumptions of the test statistics
• Fishing and the error rate problem
• Unreliability of measures
– Always attenuates bivariate relationships
•
•
•
•
•
Restriction of range – floor and ceiling effects
Unreliability of treatment implementation
Extraneous variance in experimental setting
Heterogeneity of respondents (units)
Inaccurate effect size estimation
Can we prove that
covariation between a
treatment and an
outcome is zero?
To support the causal
inferences, three things
must be established (p. 53):
• 1) A precedes B in time (use design)
• 2) A covaries with B (use statistics)
• 3) No other explanation for the relationship
is plausible (use design if possible)
Internal Validity
• The ability to infer with confidence that an
independent variable has produced the
observed differences in the dependent
variable (Singleton & Straits, 2005, p. 188)
• Isolating the independent variable
• Controlling confounds
• Validity: the approximate truth of an
inference (SCC, p. 34)
Internal validity
• The validity of inferences about whether
observed covariation between A (treatment)
and B (outcome) reflects a causal
relationship from A to B as those variables
were manipulated or measured.
• Is the covariation causal or would the same
effect be obtained without treatment?
Internal validity
•“Local Molar Causal Validity”
– Local: generalizability is zero, limited to UTOS
– Molar: treatments are a complex package
– Causal: restricted to claims that “A caused B”
•“One of the things that's most difficult to
grasp about internal validity is that it is only
relevant to the specific study in question”
(Trochim, 2006).
Threats to Internal Validity
• Each threat signifies a distinct class of
extraneous other possible causes (p. 55)
–
–
–
–
–
–
–
–
–
Ambiguous temporal precedence
Selection bias
History
Maturation
Statistical regression
Attrition (a special case of selection bias)
Testing effects
Instrumentation
Additive/Interactive effects of these threats
*GARDASIL DAILY DOUBLE*
Threats to internal validity are not
necessarily independent of each other.
Define two threats to internal validity
and explain how they could be related
/ co-occur in a study.
Randomization Controls Most
Threats to Internal Validity
• Indeed, all except
–Differential attrition
–Differential testing
Relating Statistical
and Internal Validity
• Both concern operations (not the constructs
they represent)
• Statistical conclusion validity is concerned
with errors in assessing covariation
• Internal validity is concerned with errors in
causal-reasoning
• Internal validity depends substantially on
statistical conclusion validity
Jill and Tim
• Jill
–Construct Validity
• Tim
–External Validity
–Trade-offs
Shadish 2011
• Evaluators discuss external validity much
less than internal validity
• Some idea of disagreements in the field(s)
• Threats to validity overlap
– E.g, Attrition is listed as a threat to internal
validity. But because sample size drops, it can
threaten power (statistical conclusion validity),
may require changing how we describe who is
and is not in the study (construct validity), and
may raise questions about whether the
intervention would have the same effect in those
who dropped out (external validity).
THREATS TO VALIDITY
DISCUSSION QUESTIONS
What happens to the precision, and
confidence intervals, of effect size
estimates when a study has low
power?
What kind of validity is threatened?
A specific instance of
selection bias is also defined
in SCC’s list as a separate
threat to internal validity.
What is it?
Confounding of treatment
effects with population
differences threatens _______
validity
You are a part of a research team that has
been funded to tackle the adult obesity
epidemic. The hypothesis is that adults
receiving the intervention will have a healthier
weight than adults who do not receive the
intervention. You ask your boss, “How will we
measure healthy weight?” To which, your
boss replies, “Simple, we will ask each
participant their height and weight.” You ask,
“That’s it?”, and your boss replies, “Yes.”
You’re new to the team, but you really want
to speak up because this is a threat to
_________ validity, known as
____________________.
*HERPES DAILY DOUBLE*
Random sampling, though
rarely performed in
experimental designs,
improves what kind of
validity?
You*HEPATITIS
work at High Times
Community
College
DAILY
DOUBLE*
and your coworker comes to work sharing the
results of a new study. He says, “Listen to this!
In a new study, students were randomly
assigned to take 10, 15, or 20 units of course
credit. Results show that college students who
took 20 or more credits were less likely to
engage in marijuana use. So to reduce the
prevalence of marijuana use here at High
Times CC, we have to implement a policy
putting a minimum credit hour of 20 for all
students!”. You, having taken H699, take a
closer look at the report and see that the study
was conducted at one university…Harvard.
Your response to your colleague is, “Sorry, my
friend, but this study most likely lacks
_________ because ____________________”
You want to test out a novel approach to
improving psychological distress among
college students. Your technique is
provided to students that come into the
campus counseling center. You conduct
two week follow-ups with these students
and see that their self-reported levels of
psychological distress has improved. You
are ready to tell your boss about the
success of your program when your
colleague points out that your study has
a threat to _______ validity known as
____________.
Secular trends pose a
threat to
_________validity.
You have completed an RTC in
which you examined the impact
of an SAT preparation course on
SAT performance. You want to
see if results differ for boys
versus girls.
What will happen to your power if
your sample is divided by
gender?