CFA - Texas Tech University

Download Report

Transcript CFA - Texas Tech University

CFA
1
2
3
4
5
6
7
• Rationale: Adjust, repair measurement model
as needed without violating the integrity of
the structural model test. Often we have no
particular theoretic interest in measurement,
except as a means of testing theory at the
construct level.
• Not without controversy, however.
8
• Chi-square structural model minus chi-square
measurement model with df(s)-df(m) degrees
of freedom.
9
Reliability (Really validity?)
(∑λ)2 / (∑λ)2 + ∑θ
AVE
(∑λ2) / (∑λ2) + ∑θ
10
• Discriminant Validity
• AVE > φ2
• φ not equal to 1.0
11
12
13
14
15
16
Causal Inference Issues
• Causal inference is often illusive in social and
behavioral sciences
• Prototypes of Causal Effects seem to implicate
primary (single) causes.
– billiard balls
– bacteria or viruses
• In reality, effects usually have multiple causes
– For distress
•
•
•
•
•
Stressors
Personal dispositions
Familial factors
Social environment
Biological environment
17
Causal Inference, continued
• Effects of causes are not always constant
•
•
•
•
•
•
social buffers
developmental stages
immune system interventions
synergistic causal effects
stochastic variation in causal factor strength
stochastic measurement factors
18
David Hume's framework for Causality
• If E is said to be the effect of C, then
– 1) C and E must have temporal and spatial
contiguity: ASSOCIATION
– 2) C must precede E temporally: DIRECTION
– 3) There must be CONSTANT
CONJUNCTION:
If C, then E for all situations
19
Although still influential, Hume's
analysis is known to have limitations.
• Analysis of any cause C must be isolated
from competing causes (ISOLATION)
• Constant conjunction is too restrictive:
stochastic processes affect causal
relations, and mechanisms may vary
across situations.
– Causal relations may be expressed in terms
of expectations over stochastic variation
20
Formal causal analyses have led to
important advances
• Robert Koch, the Nobel Prize winning
bacteriologist, investigated bacteria as
causes of disease using three principles:
– The organism must be found in all cases of
the disease in question. (association)
– The organism must be isolated and grown in
pure culture (isolation)
– When inoculated with the isolated organism,
susceptible subjects must reproduce the
disease (direction and hedged constant
conjunction)
21
Causal Process in Time
• In the behavioral, social, and biological
sciences, the units of observation cannot
be trusted to stay the same over time.
• For example, in Koch's inoculation test,
how do we know that the subject had not
been infected by chance?
• For studies of distress, we expect both
stress and distress to change over time.
22
Statisticians developed the
randomized experiment to address
causal issues:
• Randomly assign subjects to one of two
conditions, Treatment (T) or Control (C),
• Administer treatment and control
procedures
• Measure outcome variable Y (assumed to
reflect the process of interest) blind to
treatment group
• Infer effect of treatment from difference in
group means
23
Holland’s formal analysis of
randomized experiments:
• Suppose Y(u) is a measurement on subject u that reflects the
process that is supposed to be affected by treatment, T.
• If subject u is given treatment T, then YT(u) is observed.
• If subject u is given a control treatment, C, then YC(u) is
observed.
• We would like to compare YT(u) with YC(u), but only one of
these can be available as u is either in T or C.
– Let the desired comparison be called D = YT(u) - YC(u).
– Holland calls this the Effect of cause T
• Although D can not be observed, its average can be estimated
by computing D  Y  Y
T
C
24
Between-subject is substituted for
within-subject information.
• Within subject analyses are intuitively appealing, but
require strong assumptions about constancy over time.
• When D≠0, then ASSOCIATION is established.
• Randomization prior to treatment deals with the causal
issue of DIRECTION.
• It also partially supports ISOLATION (double blind trials,
manipulation checks help address other aspects of
isolation).
• Randomization does not establish CONSTANT
CONJUNCTION. The effect is only established for the
specific experimental conditions used in the study.
25
Key Feature: Treatment is applied to
subjects sampled into group T
• Holland argues that this manipulation is
critical to guarantee DIRECTION, and
ISOLATION.
• Holland and Rubin go on to assert that
clear causal inference is only possible if
manipulation is at least conceivable. They
propose the motto,
NO CAUSATION
WITHOUT MANIPULATION
26
NO CAUSATION
WITHOUT MANIPULATION
• This motto is not popular with sociologists
and economists. It explicitly denies causal
status to personal attributes, such as race,
sex, age, nationality, and family history.
• Instead, it encourages the investigation of
processes such as discrimination, physical
changes corresponding to age,
government policy, and biochemical
consequences of genetic makeup.
27
NO CAUSATION
WITHOUT MANIPULATION
• To illustrate, Holland would not say that my
height causes me to hit my head going
into my suburban cellar, as my height
cannot be manipulated.
• My failure to duck, and the dangerous
obstruction could be shown to be causally
related to my bumped head.
28
Structural Equation Models
• Researchers of topics such as stress,
discrimination, poverty, coping and so on
cannot easily design randomized
experiments
• Structural Equation Models (SEM) are
often presented as a major tool for
establishing causes.
29
SEM and ISOLATION,
ASSOCIATION, and DIRECTION
• Consider a simple SEM model:
– Y = b1 X + e
e
X
Y
• For every unit change in X, Y is expected to change by
b1 units. This equation implies clear association of Y
and X, and it makes the assumed direction underlying
the association unambiguous. For the equation to be
meaningful in terms of causation, we must also assume
that alternative causes of Y are accounted by the
independent stochastic term, e.
• Bollen calls the requirement that e be uncorrelated with
X, the pseudo-isolation condition.
30
Analysis of Randomized Experiment
through SEM
Y = b0 + b1 X + e
• Let X take one of two values representing whether a
subject received the treatment (X=1) or the control
placebo(X=0). b1 estimates D. Because the assignment
is randomized, X is expected to be uncorrelated with
residual causes of Y.
– Randomization justifies the pseudo-isolation condition.
• The randomized experiment also reminds us that
between subject comparisons can be informative about
average within subject effects. We can contemplate
what would have happened if a given subject had been
assigned to a different group.
31
In non-experimental studies, Isolation
is difficult to establish
• We need to specify EVERY causal factor that is
correlated with X, the causal variable of
interest.
Y =b0 + b1 X + b2 W2 + b3 W3 + b4 W4 + e
X
W2
W3
e
Y
W4
32
The effects of model misspecification
• Suppose some W2 is missing in the data set, even though we know
it is correlated with both Y and X. If we know that W is a causal
factor for both X and Y, then we would portray the model as on the

right:
X
e
W2
Y
X
e
W2
Y
• If we consider the misspecified model, in which W2 is missing, we
can see that the estimated effect of X will include the indirect effect
of W2 on Y. The causal impact of X will be overestimated in the
misspecified model.
33
Missing Data Mechanisms
• Terms suggested by Rubin
– Rubin (1976), Little & Rubin (1987)
• MISSING COMPLETELY AT RANDOM (MCAR)
– Which data point is missing cannot be predicted
by any variable, measured or unmeasured.
• Prob(M|Y)=Prob(M)
– The missing data pattern is ignorable. Analyzing
available complete data is just fine.
34
Missing Data Mechanisms
• MISSING AT RANDOM (MAR)
– Which data point is missing is systematically
related to subject characteristics, but these are all
measured
• Conditional on observed variables, missingness is
random
• Prob(M|Y)=Prob(M|Yobserved)
– E.g. Lower educated respondents might not
answer a certain question.
– Missingness can be treated as ignorable
35
Missing Data Mechanisms
• NOT MISSING AT RANDOM (NMAR)
– Data are missing because of process related to
value that is unavailable
• Someone was too depressed to come report about
depression
• Abused woman is not allowed to meet interviewer
– Missing data pattern is not ignorable.
– Whether missing data are MAR or NMAR can not
usually be established empirically.
36
Approaches to Missing Data
• Listwise deletion
– If a person is missing on any analysis variable, he is
dropped from the analysis.
• Pairwise deletion
– Correlations/Covariances are computed using all available
pairs of data.
• Imputation of missing data values.
• Model-based use of complete data
– E-M (estimation-maximization approach)
• SEM-based FIML
37
EM and FIML
• Use available data to infer sample moment
matrix.
• Uses information from assumed multivariate
distribution
• Patterns of associations can be structured or
unstructured.
• Now implemented in AMOS, EQS, Mplus
38
Example of CFA with Means Model
V1 =
V2 =
V3 =
V4 =
V5 =
V6 =
V7 =
V8 =
V9 =
V10 =
V11 =
V12 =
F1 =
F2 =
D1-F1
D2-F2
D2-F2
parameter Complete n=400
factor 1
1
1.000
factor 1
.9
0.894 0.0060
factor 1
.9
0.901 0.0060
factor 1
.8
0.800 0.0060
factor 1
.8
0.798 0.0050
factor 1
.7
0.690 0.0050
factor 2
1
1.000
factor 2
.9
0.910 0.0110
factor 2
.9
0.907 0.0120
factor 2
.8
0.815 0.0110
factor 2
.7
0.707 0.0110
factor 2
.5
0.514 0.0090
mean
100
99.174 0.6910
mean
50
48.765 0.7010
variance
100 105.250 8.7600
variance
100 118.870 9.9000
covariance
60
70.570 7.4200
Listwise
1.000
0.965
0.996
0.890
0.889
0.751
1.000
0.941
0.957
0.838
0.702
0.523
83.484
42.629
96.575
117.810
55.810
0.0270
0.0290
0.0240
0.0230
0.0240
0.0440
0.0500
0.0430
0.0330
0.0370
2.4660
2.4520
29.9810
36.1000
25.4400
FIML (EQS)
1.000
0.900 0.0060
0.915 0.0060
0.808 0.0060
0.807 0.0050
0.693 0.0060
1.000
0.903 0.0110
0.899 0.0130
0.811 0.0110
0.702 0.0120
0.508 0.0100
98.438 0.6380
48.999 0.6410
115.293 9.4120
119.463 9.8380
71.273 7.6540
39
Multiple Imputation
• Substitute expected values plus noise for
missing values.
• Repeat >5 times.
• Combine estimates and standard errors using
formulas described by Rubin (1987). See also
Schafer & Grahm (2002) Missing data: Our view
of the state of the art. Psychological Methods, 7:
147-177.
40
41
42
43
Communicating SEM Results
• Keeping up with the expert recommendations
– Psychological Methods
– Specialty journals
•
•
•
•
Structural Equation Models
Multivariate Behavioral Research
Applied Psychological Measurement
Psychometrika
• Two kinds of audiences
– Researchers interested in the substance of the empirical
contribution
– Experts in SEM
44
Talking Points of Hoyle&Panter,
McDonald&Ho
• Model specification
– Theoretical justification
– Identifiability
• Measurement Model
• Structural Model
• Model estimation
– Characteristics of data
• Distribution form
• Sample size
• Missing data
45
Talking Points of Hoyle&Panter,
McDonald&Ho
• Model estimation
– Estimation method: ML, GLS, ULS, ADF
– Goodness of estimates and standard errors
• Model Selection and Fit Statistics
• Alternative and Equivalent Models
• Reporting Results
– Path diagrams
– Tabular information
– Use software conventions?
46