Some factor analysis terminology

Download Report

Transcript Some factor analysis terminology

tom.h.wilson
[email protected]
Department of Geology and Geography
West Virginia University
Morgantown, WV
We’ll spend some time in class today
illustrating factor analysis using an
example data set taken from Sun (2002)
Sun (2002)
Example data
used to illustrate
factor analysis
The complete data set
collected by Teti (1974) is
available in the MultiVar
folder – see Deck.mtw
Sun (2002),
from Teti (1974)
Water Quality I
Acidity
SO4
Total Ca
Concentration
500
400
300
200
100
0
0
5
10
15
20
25
30
Sample Site
pH1
Water Quality II
Concentration
Fe1
30
NO3
25
ClMn2+
20
Total Na
15
Total K
10
5
0
0
5
10
15
Sam ple Site
20
25
30
From Sun (1998)
Low acidity
<10 mg/L
High acidity
> 30 mg/L
Teti (1974)
Sun (2002)
Some factor analysis terminology
Factor Loadings represent the degree to which each of the variables
contribute to each of the factors.
Scores represent the value of the sample (observation, individual, area,
etc.) on each of the derived factors. Factors can be thought of as additional
variables. A sample’s score on a factor represents a weighted combination
of the of measured input variables. Its usage is similar to that in
discriminant analysis.
Communality the proportion of the total variance associated with each
variable that is accounted for by the factors.
Equimax rotation rotates the loadings so that a variable loads high on one
factor and low on the others.
Varimax rotation rotates the loadings so that the variance is a maximum
along one factor.
Some data
X
Y
6
5
7.49
2.32
3.82
4.94
9.10
6.73
6.22
0.30
2.59
3.23
3.86
1.51
4.14
5.75
3.30
1.70
4
Y 3
2
1
0
0
1
2
3
4
5
X
6
7
8
9
10
6
5
4
Y 3
2
1
0
0
1
2
3
4
5
6
7
8
9
10
X
Score plot
The factor is a weighted combination of the measured variables that explains a
certain proportion of the variance in your data set. You could think of Factor 1
as being similar to the result obtained by a multiple linear regression analysis.
The factors represent combinations of variables
that explain the variance observed in the data.
Factor 1 explains the largest amount of the
variance in the data, Factor 2, accounts for less
of the variance than factor 1 but more than
Factor 3, etc.
Which of these variables or
combination of variables
would explain the majority
of the variance occurring in
this 5-dimensional data
set?
Sun (2002),
from Teti (1974)
We reduce this multidimensional data set down to two
dimensions defined by the first and second factors.
In this two-dimensional space we see clusters forming and these
clusters in hindsight appear to reflect, primarily, differences in acidity. In
fact, we can reclassify some points based on our findings.
Score Plot
Low acidity
<10 mg/L
High acidity
> 30 mg/L
Teti (1974)
Sun (2002)
Sun (2002)
A comparison of two rotation schemes.
Score plots reveal no difference between approaches to rotation.
Beach Sands vs. Offshore Sands
1.35
1.3
Sortin
1.25
1.2
1.15
1.1
1.05
1
0.31
0.32
0.33
0.34
Porosity
0.35
0.36
Beach Sands vs. Offshore Sands
1.35
1.3
Sortin
1.25
1.2
1.15
1.1
1.05
1
0.31
0.32
0.33
0.34
0.35
0.36
Porosity
Is factor analysis useful when we have only 2 variables?
Factor analysis (score plot) for
the Swedish Mining data
Factor Analysis
Unrotated
Factor Analysis
Varimax Rotation
3
2
2
1
0
Factor 2
Factor2
1
0
-1
-1
-2
-2
-3
-3
-4
-4
-1.50 -1.00 -0.50
0.00
0.50
Factor1
1.00
1.50
2.00
-1.5 -1.0 -0.5 0.0 0.5
1.0
1.5 2.0
Factor 1
In today’s lab exercise we’ll return to the Swedish mining
data and evaluate the potential of factor analysis to help us
decide which of the prospective areas may be productive.
2.5 3.0