Transcript Class 7 & 8

Logistic Regression
and Discriminant
Function Analysis
Logistic Regression vs.
Discriminant Function Analysis
• Similarities
– Both predict group membership for each
observation (classification)
– Dichotomous DV
– Requires an estimation and validation sample
to assess predictive accuracy
– If the split between groups is not more
extreme than 80/20, yield similar results in
practice
Logistic Reg vs. Discrim: Differences
• Discriminant Analysis
• Logistic Regression
– Assumes MV normality
– Assumes equality of VCV
matrices
– Large number of predictors
violates MV normality can’t
be accommodated
– Predictors must be continuous,
interval level
– More powerful when
assumptions are met
– Many assumptions, rarely met
in practice
– Categorical IVs create
problems
– No assumption of MV normality
– No assumption of equality of
VCV matrices
– Can accommodate large
numbers of predictors more
easily
– Categorical predictors OK (e.g.,
dummy codes)
– Less powerful when
assumptions are met
– Few assumptions, typically met
in practice
– Categorical IVs can be dummy
coded
Logistic Regression
• Outline:
– Categorical Outcomes: Why not OLS
Regression?
– General Logistic Regression Model
– Maximum Likelihood Estimation
– Model Fit
– Simple Logistic Regression
Categorical Outcomes: Why not
OLS Regression?
• Dichotomous outcomes:
– Passed / Failed
– CHD / No CHD
– Selected / Not Selected
– Quit/ Did Not Quit
– Graduated / Did Not Graduate
Categorical Outcomes: Why not
OLS Regression?
• Example: Relationship b/w performance
and turnover
1.5
• Line of best fit?!
• Errors (Y-Y’) across
values of performance (X)?
1.0
.5
Turnover
0.0
-.5
1.5
2.0
Performance
2.5
3.0
3.5
4.0
4.5
5.0
Problems with Dichotomous
Outcomes/DVs
•
•
•
•
•
•
The regression surface is intrinsically non-linear
Errors assume one of two possible values,
violate assumption of normally distributed errors
Violates assumption of homoscedasticity
Predicted values of Y greater than 1 and smaller
than 0 can be obtained
The true magnitude of the effects of IVs may be
greatly underestimated
Solution: Model data using Logistic Regression,
NOT OLS Regression
Logistic Regression vs. Regression
• Logistic regression predicts a probability
that an event will occur
– Range of possible responses between 0 and 1
– Must use an s-shaped curve to fit data
• Regression assumes linear relationships,
can’t fit an s-shaped curve
– Violates normal distribution
– Creates heteroscedascity
Example: Relationship b/w Age
and CHD (1 = Has CHD)
General Logistic Regression Model
• Y’ (outcome variable) is the probability that
having one outcome or another based on a
nonlinear function of the best linear
combination of predictors
e a  b1 X 1
Y'
1  e a  b1 X 1
Where:
• Y’ = probability of an event
• Linear portion of equation (a + b1x1) used to
predict probability of event (0,1), not an end in
itself
The logistic (logit) transformation
•
DV is dichotomous purpose is to
estimate probability of occurrences (0, 1)
– Thus, DV is transformed into a likelihood
•
Logit/logistic transformation accomplishes
(linear regression eq. takes log of odds)
odds 
P
P(Y  1)
P(Y  1)


1  P 1  P(Y  1) P(Y  0)
 P 
 Y' 
log( odds)  log it ( P)  ln 
  ln 
  A   BjXij
1

P
1

Y
'




Probability Calculation
e a bX
P  Y'
1  e a bX
Where:
The relation b/w logit (P) and X is intrinsically linear
b = expected change of logit(P) given one unit
change in X
a = intercept
e = Exponential
Ordinary Least Squares (OLS)
Estimation
• Purpose is obtain the estimates that would
best minimize the sum of squared errors,
sum(y-y’)2
• The estimates chosen best describe the
relationships among the observed variables
(IVs and DV)
• Estimates chosen maximize the probability of
obtaining the observed data (i.e., these are
the population values most likely to produce
the data at hand)
Maximum Likelihood (ML) estimation
• OLS can’t be used in logistic regression
because of non-linear nature of relationships
• In ML, the purpose is to obtain the parameter
estimates most likely to produce the data
– ML estimators are those with the greatest joint
likelihood of reproducing the data
• In logistic regression, each model yields a ML
joint probability (likelihood) value
• Because this value tends to be very small (e.g.,
.00000015), it is multiplied by -2log
• The -2log transformation also yields a statistic
with a known distribution (chi-square
distribution)
Model Fit
• In Logistic Regression, R & R2 don’t make sense
• Evaluate model fit using the -2log likelihood (-2LL)
value obtained for each model (through ML
estimation)
– The -2LL value reflects fit of model; used to
compare fit of nested models
– The -2LL measures lack of fit – extent to
which model fits data poorly
– When the model fits the data perfectly, -2LL = 0
• Ideally, the -2LL value for the null model (i.e.,
model with no predictors, or “intercept-only”
model) would be larger than then the model with
predictors
Comparing Model Fit
 The fit of the null model can be tested against the fit
of the model with predictors using chi-square test:
 2   2 LLNull    2 LLModel 
Where:
• 2 = chi-square for improvement in model fit (where df =
kNull – kModel)
• -2LLMO = -2 Log likelihood value for null model
(intercept-only model)
• -2LLM1 = -2 Log likelihood value for hypothesized model
• Same test can be used to compare nested model with k
predictor(s) to model with k+1 predictors, etc.
• Same logic as OLS regression, but the models are
compared using a different fit index (-2LL)
Pseudo R2
• Assessment of overall model fit
• Calculation
R 2   2 LLNull    2 LLModel 
 2 LLNull
• Two primary Pseudo R2 stats:
– Nagelkerke less conservative
• preferred by some because max = 1
– Cox & Snell more conservative
• Interpret like R2 in OLS regression
Unique Prediction
• In OLS regression, the significance
tests for the beta weights indicate if the
IV is a unique predictors
• In Logistic regression, the Wald test is
used for the same purpose
Similarities to Regression
• You can use all of the following
procedures you learned about OLS
regression in logistic regression
– Dummy coding for categorical IVs
– Hierarchical entry of variables (compare
changes in % classification; significance of
Wald test)
– Stepwise (but don’t use, its atheoretical)
– Moderation tests
Simple Logistic Regression
Example
•
•
•
•
Data collected from 50 employees
Y = success in training program (1 =
pass; 0 = fail)
X1 = Job aptitude score (5 = very high;
1= very low)
X2 = Work-related experience
(months)
Syntax in SPSS
DV
LOGISTIC REGRESSION PASS
/METHOD = ENTER APT EXPER
IVs
/SAVE = PRED PGROUP
/CLASSPLOT
/PRINT = GOODFIT
/CRITERIA = PIN(.05) POUT(.10) ITERATE(20)
CUT(.5) .
Results
• Block O: The Null Model results
– Can’t do any worse than this
• Block 1: Method = Enter
– Tests of the model of interest
– Interpret data from here
Tests if model is
Omnibus Tests of Model Coefficients
Step 1
Step
Block
Model
Chi-square
10.169
10.169
10.169
df
2
2
2
Sig.
.006
.006
.006
Step, Block & Model yield same
results because all IVs entered in
same block
significantly better
than the null
model. Significant
chi-square means
yes!
Results Continued
Model Summary
Step
1
-2 Log
Cox & Snell
likelihood
R Square
59.066a
.184
Nagelkerke
R Square
.245
a. Estimation terminated at iteration number 4 because
parameter estimates changed by less than .001.
-2 Log Likelihood an index of fit - smaller number means
better fit (Perfect fit = 0)
Pseudo R2 – Interpret like R2 in regression
Nagelkerke preferred by some because max = 1,
Cox & Snell more conservative estimate uniformly
Classification: Null Model vs.
Model Tested
Classification Tablea, b
Predicted
PASS
Step 0
Obs erved
PASS
fail
fail
pas s
0
0
pas s
24
26
Overall Percentage
Percentage
Correct
.0
100.0
52.0
Null Model
52% correct
classification
a. Cons tant is included in the model.
b. The cut value is .500
Classification Tablea
Predicted
PASS
Step 1
Obs erved
PASS
Overall Percentage
a. The cut value is .500
fail
fail
pas s
pas s
16
6
8
20
Percentage
Correct
66.7
76.9
72.0
Model Tested
72% correct
classification
Variables in Equation
Variables in the Equation
Step
a
1
APT
EXPER
Constant
B
.549
.111
-3.050
S.E.
.235
.052
1.146
Wald
5.473
4.577
7.086
df
1
1
1
Sig.
.019
.032
.008
Exp(B)
1.731
1.118
.047
a. Variable(s) entered on step 1: APT, EXPER.
B  effect of one unit change in IV on the log odds
(hard to interpret)
*Odds Ratio (OR)  Exp(B) in SPSS = more
interpretable; one unit change in aptitude increases the
probability of passing by 1.7x
Wald  Like t test, uses chi-square distribution
Significance  to determine if wald test is significant
Histogram of Predicted Probabilities
To Flag Misclassified Cases
SPSS syntax
COMPUTE PRED_ERR=0.
IF LOW NE PGR_1 PRED_ERR=1.
You can use this for additional analyses to
explore causes of misclassification
Results Continued
Hosmer and Lemeshow Test
Step
1
Chi-square
6.608
df
8
Sig.
.579
An index of model fit. Chi-square compares the fit
of the data (the observed events) with the model
(the predicted events). The n.s. results means that
the observed and expected values are similar 
this is good!
Hierarchical Logistic Regression
• Question: Which of the following
variables predict whether a woman is
hired to be a Hooters girl?
– Age
– IQ
– Weight
Simultaneous v. Hierarchical
Block 1. IQ, Age, Weight
Block 1. IQ
Omnibus Tests of Model Coefficients
Omnibus Tests of Model Coefficients
Step 1
Step
Block
Model
Chi-square
48.462
48.462
48.462
df
3
3
3
Step 1
Sig.
.000
.000
.000
-2 Log
Cox & Snell
likelihood
R Square
142.383 a
.296
Chi-square
.289
.289
.289
df
Sig.
.591
.591
.591
1
1
1
Cox & Snell .002; Nagelkerke .003
Block 2. Age
Model Summary
Step
1
Step
Block
Model
Nagelkerke
R Square
.395
a. Estimation terminated at iteration number 6 because
parameter estimates changed by les s than .001.
Omnibus Tests of Model Coefficients
Step 1
Step
Block
Model
Chi-square
42.044
42.044
42.333
df
1
1
2
Sig.
.000
.000
.000
Cox & Snell .264; Nagelkerke .353
Block 3. Weight
Omnibus Tests of Model Coefficients
Step 1
Step
Block
Model
Chi-square
6.129
6.129
48.462
df
1
1
3
Sig.
.013
.013
.000
Cox & Snell .296; Nagelkerke .395
Simultaneous v. Hierarchical
Block 1. IQ
Block 1. IQ, Age, Weight
Classification Tablea
Predicted
Classification Tablea
Predicted
Step 1
Step 1
Obs erved
Hired
Overall Percentage
a. The cut value is .500
not hired
hired
Hired
not hired
hired
53
12
26
47
Percentage
Correct
81.5
64.4
72.5
Obs erved
Hired
not hired
hired
Hired
not hired
hired
8
57
6
67
Overall Percentage
Percentage
Correct
12.3
91.8
54.3
a. The cut value is .500
Block 2. Age
Classification Tablea
Predicted
Step 1
Obs erved
Hired
not hired
hired
Hired
not hired
hired
55
10
28
45
Overall Percentage
Percentage
Correct
84.6
61.6
72.5
a. The cut value is .500
Block 3. Weight
Classification Tablea
Predicted
Step 1
Obs erved
Hired
Overall Percentage
a. The cut value is .500
not hired
hired
Hired
not hired
hired
53
12
26
47
Percentage
Correct
81.5
64.4
72.5
Simultaneous v. Hierarchical
Block 1. IQ, Age, Weight
Block 1. IQ
Variables in the Equation
Variables in the Equation
Step
a
1
IQ
age
weight
Constant
B
-.009
-.591
-.277
8.264
S.E.
.015
.125
.117
1.821
Wald
.372
22.224
5.630
20.602
a. Variable(s) entered on step 1: IQ, age, weight.
df
1
1
1
1
Sig.
.542
.000
.018
.000
Exp(B)
.991
.554
.758
3881.775
Step
a
1
IQ
Constant
B
.006
-.185
S.E.
.012
.585
Wald
.289
.100
df
Sig.
.591
.752
1
1
Exp(B)
1.006
.831
a. Variable(s) entered on step 1: IQ.
Block 2. Age
Variables in the Equation
Step
a
1
IQ
age
Constant
B
-.003
-.591
6.484
S.E.
.014
.120
1.533
Wald
.032
24.220
17.899
df
Sig.
.858
.000
.000
1
1
1
Exp(B)
.997
.554
654.298
a. Variable(s) entered on step 1: age.
Block 3. Weight
Variables in the Equation
Step
a
1
IQ
age
weight
Constant
B
-.009
-.591
-.277
8.264
S.E.
.015
.125
.117
1.821
Wald
.372
22.224
5.630
20.602
a. Variable(s) entered on step 1: IQ, age, weight.
df
1
1
1
1
Sig.
.542
.000
.018
.000
Exp(B)
.991
.554
.758
3881.775
Multinomial Logistic Regression
• A form of logistic regression that allows
prediction of probability into more than 2 groups
– Based on a multinomial distribution
• Sometimes called polytomous logistic regression
• Conducts an omnibus test first for each predictor
across 3+ groups (like ANOVA)
– Then conduct pairwise comparisons (like post hoc
tests in ANOVA)
Objectives of Discriminant Analysis
• Determining whether significant differences exist
between average scores on a set of variables for
2+ a priori defined groups
• Determining which IVs account for most of the
differences in average score profiles for 2+
groups
• Establishing procedures for classifying objects
into groups based on scores on a set of IVs
• Establishing the number and composition of the
dimensions of discrimination between groups
formed from the set of IVs
Discriminant Analysis
• Discriminant analysis develops a linear
combination that can best separate groups.
• Opposite of MANOVA
• In MANOVA, groups are usually constructed
by researcher and have clear structure (e.g., a
2 x 2 factorial design). Groups = IVs
• In discriminant analysis, the groups usually
have no particular structure and their formation
is not under experimental control. Groups =
DVs
How Discrim Works
• Linear combinations (discriminant functions) are
formed that maximize the ratio of between-groups
variance to within-groups variance for a linear
combination of predictors.
• Total # discriminant functions = # groups – 1 OR
# of predictors (whichever is smaller)
• If more than one discriminant function is formed,
subsequent discriminant functions are
independent of prior combinations and account for
as much remaining group variation as possible.
Assumptions in Discrim
• Multivariate normality of IVs
– Violation more problematic if overlap between groups
• Homogeneity of VCV matrices
• Linear relationships
• IVs continuous (interval scale)
– Can accommodate nominal but violates MV normality
• Single categorical DV
Results influenced by:
• Outliers (classification may be wrong)
• Multicollinearity (interpretation of coefficients
difficult)
Sample Size Considerations
• Observations: # Predictors
– Suggested 20 observations per predictor
– Minimum required 5 observations per
predictor
• Observations: Groups (in DV)
– Minimum: smallest group size exceeds # of
IVs
– Practical Guide: Each group should have 20+
observations
– Wide variation in group size impacts results
(i.e., classification is incorrect)
Example
In this hypothetical example, data from 500
graduate students seeking jobs were examined.
Available for each student were three predictors:
GRE(V+Q), Years to Finish the Degree, and
Number of Publications. The outcome measure
was categorical: “Got a job” versus “Did not get a
job.” Half of the sample was used to determine
the best linear combination for discriminating the
job categories. The second half of the sample
was used for cross-validation.
DISCRIMINANT
/GROUPS=job(1 2)
/VARIABLES=gre pubs years
/SELECT=sample(1)
/ANALYSIS ALL
/SAVE=CLASS SCORES PROBS
/PRIORS SIZE
/STATISTICS=MEAN STDDEV UNIVF BOXM COEFF
RAW CORR COV GCOV TCOV TABLE CROSSVALID
/PLOT=COMBINED SEPARATE MAP
/PLOT=CASES
/CLASSIFY=NONMISSING POOLED .
Interpreting Output
•
•
•
•
•
Box’s M
Eigenvalues
Wilks Lambda
Discriminant Weights
Discriminant Loadings
Group Statistics
JOB
Oops!
Got One!
Total
GRE (V+Q)
Number of Publications
Years to Finish Degree
GRE (V+Q)
Number of Publications
Years to Finish Degree
GRE (V+Q)
Number of Publications
Years to Finish Degree
Mean
1296.20
3.50
6.47
1305.87
6.55
4.85
1298.94
4.36
6.01
Std. Deviation
96.913
2.029
2.094
101.824
1.593
1.179
98.224
2.357
2.016
Valid N (lis twis e)
Unweighted Weighted
179
179.000
179
179.000
179
179.000
71
71.000
71
71.000
71
71.000
250
250.000
250
250.000
250
250.000
Tests of Equality of Group Means
GRE (V+Q)
Number of Publications
Years to Finish Degree
Wilks '
Lambda
.998
.658
.867
F
.492
129.009
37.885
df1
1
1
1
df2
248
248
248
Sig.
.483
.000
.000
Test Results
Box's M
F
Approx.
df1
df2
Sig.
49.679
8.137
6
114277.8
.000
Tes ts null hypothes is of equal population covariance matrices .
Violates Assumption of Homogeneity of VCV matrices. But
this test is sensitive in general and sensitive to violations
of multivariate normality too. Tests of significance in
discriminant analysis are robust to moderate violations of
the homogeneity assumption.
Eigenvalues
Function
1
Eigenvalue % of Variance
.693 a
100.0
Canonical
Correlation
.640
Cumulative %
100.0
a. Firs t 1 canonical discriminant functions were us ed in the
analys is .
Wilks' Lambda
Tes t of Function(s )
1
Wilks'
Lambda
.590
Chi-square
129.854
df
3
Sig.
.000
Discriminant Weights
Standardized Canonical Discriminant Function Coefficients
GRE (V+Q)
Number of Publications
Years to Finish Degree
Function
1
-.308
.944
-.423
Discriminant Loadings
Structure Matrix
Number of Publications
Years to Finish Degree
GRE (V+Q)
Function
1
.866
-.469
.054
Pooled within-groups correlations between discriminating
variables and s tandardized canonical dis criminant functions
Variables ordered by absolute size of correlation within function.
Data from both
these outputs
indicate that
one of the
predictors best
discriminates
who did/did not
get a job.
Which one is
it?
Canonical Discriminant Function Coefficients
GRE (V+Q)
Number of Publications
Years to Finish Degree
(Cons tant)
Function
1
-.003
.493
-.225
3.268
This is the raw
canonical discriminant
function.
Uns tandardized coefficients
Functions at Group Centroids
JOB
Oops!
Got One!
Function
1
-.522
1.317
Uns tandardized canonical discriminant
functions evaluated at group means
The means for the
groups on the raw
canonical discriminant
function can be used to
establish cut-off points
for classification.
Prior Probabilities for Groups
JOB
Oops!
Got One!
Total
Prior
.716
.284
1.000
Cas es Us ed in Analys is
Unweighted Weighted
179
179.000
71
71.000
250
250.000
Classification can be based on distance from the group
centroids and take into account information about prior
probability of group membership.
Classification Resultsb,c,d
Cas es Selected
Original
Count
%
Cross -validateda
Count
%
Cas es Not Selected
Original
Count
%
JOB
Oops !
Got One!
Oops !
Got One!
Oops !
Got One!
Oops !
Got One!
Oops !
Got One!
Oops !
Got One!
Predicted Group
Membership
Oops !
Got One!
170
9
23
48
95.0
5.0
32.4
67.6
169
10
24
47
94.4
5.6
33.8
66.2
175
10
17
48
94.6
5.4
26.2
73.8
Total
179
71
100.0
100.0
179
71
100.0
100.0
185
65
100.0
100.0
a. Cross validation is done only for thos e cases in the analys is. In cros s validation, each cas e
is clas s ified by the functions derived from all cas es other than that cas e.
b. 87.2% of s elected original grouped cas es correctly class ified.
c. 89.2% of uns elected original grouped cas es correctly class ified.
d. 86.4% of s elected cross -validated grouped cases correctly clas sified.
Canonical Discriminant Function 1
JOB = Oops!
50
Two modes?
40
30
20
Std. Dev = 1.10
10
Mean = -.55
N = 364.00
0
1.
1.
75
5
25
.7
5
.7
5
.2
5
.7
5
.2
5
.2
5
- .2
5
- .7
5
.2
-1
-1
-2
-2
-3
Canonical Discriminant Function 1
JOB = Got One!
16
14
12
10
8
6
4
Std. Dev = .62
2
Mean = 1.30
0
N = 136.00
2.
2.
2.
2.
1.
1.
1.
1.
75
50
25
00
75
50
25
5
0
00
.7
.5
5
.2
00
0.
Violation of the homogeneity assumption can affect the
classification. To check, the analysis can be conducted
using separate group covariance matrices.
Classification Resultsa, b
Cas es Selected
Original
Count
%
Cas es Not Selected
Original
Count
%
JOB
Oops!
Got One!
Oops!
Got One!
Oops!
Got One!
Oops!
Got One!
Predicted Group
Membership
Oops!
Got One!
165
14
21
50
92.2
7.8
29.6
70.4
168
17
11
54
90.8
9.2
16.9
83.1
Total
179
71
100.0
100.0
185
65
100.0
100.0
a. 86.0% of s elected original grouped cas es correctly clas s ified.
b. 88.8% of uns elected original grouped cas es correctly class ified.
No noticeable change in the accuracy of classification.
Discriminant Analysis:
Three Groups
The group that did not get a job was actually composed of
two subgroups—those that got interviews but did not land
a job and those that were never interviewed. This accounts
for the bimodality in the discriminant function scores. The
discriminant analysis of the three groups allows for the
derivation of one more discriminant function, perhaps
indicating the characteristics that separate those who get
interviews from those who don’t, or, those who have
successful interviews from those whose interviews do not
produce a job offer.
Remember this?
Canonical Discriminant Function 1
JOB = Oops!
50
Two modes?
40
30
20
Std. Dev = 1.10
10
Mean = -.55
N = 364.00
0
1.
1.
75
5
25
.7
5
.7
5
.2
5
.7
5
.2
5
.2
25
-.
75
-.
5
.2
-1
-1
-2
-2
-3
DISCRIMINANT
/GROUPS=group(1 3)
/VARIABLES=gre pubs years
/SELECT=sample(1)
/ANALYSIS ALL
/SAVE=CLASS SCORES PROBS
/PRIORS SIZE
/STATISTICS=MEAN STDDEV UNIVF BOXM
COEFF RAW CORR COV GCOV TCOV TABLE
CROSSVALID
/PLOT=COMBINED SEPARATE MAP
/PLOT=CASES
/CLASSIFY=NONMISSING POOLED .
Group Statistics
GROUP
Unemployed
Got a Job
Interview Only
Total
GRE (V+Q)
Number of Publications
Years to Finish Degree
GRE (V+Q)
Number of Publications
Years to Finish Degree
GRE (V+Q)
Number of Publications
Years to Finish Degree
GRE (V+Q)
Number of Publications
Years to Finish Degree
Mean
1307.54
1.59
8.57
1305.87
6.55
4.85
1291.30
4.32
5.56
1298.94
4.36
6.01
Std. Deviation
85.491
1.434
1.797
101.824
1.593
1.179
101.382
1.664
1.467
98.224
2.357
2.016
Valid N (lis twis e)
Unweighted Weighted
54
54.000
54
54.000
54
54.000
71
71.000
71
71.000
71
71.000
125
125.000
125
125.000
125
125.000
250
250.000
250
250.000
250
250.000
Tests of Equality of Group Means
GRE (V+Q)
Number of Publications
Years to Finish Degree
Wilks '
Lambda
.994
.455
.529
F
.761
147.864
109.977
df1
2
2
2
df2
247
247
247
Sig.
.468
.000
.000
Test Results
Box's M
F
Approx.
df1
df2
Sig.
21.796
1.780
12
137372.4
.045
Tes ts null hypothes is of equal population covariance matrices .
Separating the three groups produces better
homogeneity of VCV matrices.
Still significant, but just barely. Not enough to worry
about.
Eigenvalues
Function
1
2
Eigenvalue % of Variance
5.353 a
99.1
.047 a
.9
Canonical
Correlation
.918
.211
Cumulative %
99.1
100.0
a. Firs t 2 canonical discriminant functions were us ed in the
analys is .
Wilks' Lambda
Tes t of Function(s )
1 through 2
2
Wilks'
Lambda
.150
.955
Chi-square
466.074
11.246
df
6
2
Sig.
.000
.004
Two significant linear combinations can be derived, but
they are not of equal importance.
Weights
Standardized Canonical Discriminant Function Coefficients
Function
GRE (V+Q)
Number of Publications
Years to Finish Degree
1
.734
-1.246
1.032
2
.194
.521
.602
Loadings
Structure Matrix
Function
Number of Publications
Years to Finish Degree
GRE (V+Q)
1
-.466
.401
.008
2
.867*
.796*
.354*
Pooled within-groups correlations between discriminating
variables and s tandardized canonical dis criminant functions
Variables ordered by absolute size of correlation within function.
*. Largest absolute correlation between each variable and
any discriminant function
What do the
linear
combinations
mean now?
Canonical Discriminant Function Coefficients
Function
1
.007
-.781
.701
-10.496
GRE (V+Q)
Number of Publications
Years to Finish Degree
(Cons tant)
2
.002
.326
.409
-6.445
Uns tandardized coefficients
Functions at Group Centroids
Function
GROUP
Unemployed
Got a Job
Interview Only
1
4.026
-2.469
-.337
2
.162
.251
-.213
Uns tandardized canonical discriminant
functions evaluated at group means
+4
DF2
got a job
+2
interview only
unemployed
0
-2
-4
DF1
-4
-2
0
+2
Functions at Group Centroids
Function
GROUP
Unemployed
Got a Job
Interview Only
1
4.026
-2.469
-.337
2
.162
.251
-.213
Uns tandardized canonical discriminant
functions evaluated at group means
+4
+4
DF2
got a job
+2
interview only
unemployed
0
-2
-4
DF1
-4
-2
0
Loadings
+2
+4
Weights
DF1
DF2
No. Pubs
-1.246
.521
.796
Yrs to finish
1.032
.602
.354
GRE
.734
.194
DF1
DF2
No. Pubs
-.466
.867
Yrs to finish
.401
GRE
.008
+4
+2
DF2
got a job
interview only
unemployed
0
-2
-4
DF1
-4
-2
0
+2
+4
This figure shows that discriminant function #1, which is made
up of number of publications and years to finish, reliably
differentiates between those who got jobs, had interviews only,
and had no job or interview. Specially, a high value on DF1
was associated with not getting a job, suggesting that having
few publications (loading = -.466) and taking a long time to
finish (loading = .401) was associated with not getting a job.
Prior Probabilities for Groups
GROUP
Unemployed
Got a Job
Interview Only
Total
Prior
.216
.284
.500
1.000
Cas es Us ed in Analys is
Unweighted Weighted
54
54.000
71
71.000
125
125.000
250
250.000
Classification Function Coefficients
GRE (V+Q)
Number of Publications
Years to Finish Degree
(Cons tant)
Unemployed
.238
-10.539
11.018
-196.112
Fisher's linear discriminant functions
GROUP
Got a Job
.190
-5.440
6.503
-123.212
Interview Only
.205
-7.256
7.808
-139.036
Territorial Map
Canonical Discriminant
Function 2
-6.0
-4.0
-2.0
.0
2.0
4.0
6.0
ôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòô
6.0 ô
23
31
ô
ó
23
31
ó
ó
23
31
ó
ó
23
31
ó
ó
23
31
ó
ó
23
31
ó
4.0 ô
ô
ô
23 ô
31ô
ô
ô
ó
23
31
ó
ó
23
31
ó
ó
23
31
ó
ó
23
31
ó
ó
23
31
ó
2.0 ô
ô
ô
23
ô
31
ô
ô
ó
23
31
ó
ó
23
31
ó
ó
23
31
ó
ó
23
31
ó
ó
*
23
31
ó
.0 ô
ô
ô 23
ô
31
*
ô
ó
23
*
31
ó
ó
23
31
ó
ó
23
31
ó
ó
23
31
ó
ó
23
31
ó
-2.0 ô
ô
23
ô
ô31
ô
ô
ó
23
31
ó
ó
23
31
ó
ó
23
31
ó
ó
23
31
ó
ó
23
31
ó
-4.0 ô
ô
23 ô
ô
ô 31
ô
ô
ó
23
31
ó
ó
23
31
ó
ó
23
31
ó
ó
23
31
ó
ó
23
31
ó
-6.0 ô
23
31
ô
ôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòô
-6.0
-4.0
-2.0
.0
2.0
4.0
6.0
Canonical Discriminant Function 1
Symbols used in territorial map
Symbol
-----1
2
3
*
Group
----1
2
3
Label
-------------------Unemployed
Got a Job
Interview Only
Indicates a group centroid
Canonical Discriminant Functions
4
3
2
1
GROUP
Got a Job
Unemployed
Interview Only
0
Function 2
Group Centroids
-1
Interview Only
-2
Got a Job
-3
Unemployed
-6
-4
-2
Function 1
0
2
4
6
8
Classification
A classification function is derived for each group. The
original data are used to estimate a classification score for
each person, for each group. The person is then assigned
to the group that produces the largest classification score.
Classification Function Coefficients
GRE (V+Q)
Number of Publications
Years to Finish Degree
(Cons tant)
Unemployed
.238
-10.539
11.018
-196.112
Fisher's linear discriminant functions
GROUP
Got a Job
.190
-5.440
6.503
-123.212
Interview Only
.205
-7.256
7.808
-139.036
Classification Resultsb, c,d
Cas es Selected
Original
Count
%
Cross -validateda
Count
%
Cas es Not Selected
Original
Count
%
GROUP
Unemployed
Got a Job
Interview Only
Unemployed
Got a Job
Interview Only
Unemployed
Got a Job
Interview Only
Unemployed
Got a Job
Interview Only
Unemployed
Got a Job
Interview Only
Unemployed
Got a Job
Interview Only
Predicted Group Members hip
Unemployed Got a Job Interview Only
51
0
3
0
51
20
0
13
112
94.4
.0
5.6
.0
71.8
28.2
.0
10.4
89.6
51
0
3
0
51
20
0
13
112
94.4
.0
5.6
.0
71.8
28.2
.0
10.4
89.6
62
0
4
0
47
18
4
11
104
93.9
.0
6.1
.0
72.3
27.7
3.4
9.2
87.4
Total
54
71
125
100.0
100.0
100.0
54
71
125
100.0
100.0
100.0
66
65
119
100.0
100.0
100.0
a. Cross validation is done only for thos e cases in the analysis. In cros s validation, each cas e is class ified by the
functions derived from all cases other than that cas e.
b. 85.6% of s elected original grouped cas es correctly clas s ified.
c. 85.2% of uns elected original grouped cas es correctly class ified.
d. 85.6% of s elected cross -validated grouped cases correctly clas sified.
Is the classification better than would be expected by
chance? Observed values:
Expected
Actual
Unemployed
Got a Job
Interview
Only
All
Unemployed
51
0
3
54
Got a Job
0
51
20
71
Interview
Only
0
13
112
125
All
51
64
135
250
Expected classification by chance
E = (Row x Column)/Total N
Expected
Actual
Unemployed
Got a Job
Interview
Only
All
Unemployed
(51x54)
250
(64x54)
250
(135x54)
250
54
Got a Job
(51x71)
250
(64x71)
250
(135x71)
250
71
Interview
Only
(51x125)
250
(64x125)
250
(135x125)
250
125
All
51
64
135
250
Correct classification that would occur by chance:
Expected
Actual
Unemployed
Got a Job
Interview
Only
All
Unemployed
11.016
13.824
29.16
54
Got a Job
14.484
18.176
38.34
71
Interview
Only
25.5
32
67.5
125
All
54
71
125
250
The difference between chance expected and actual
classification can be tested with a chi-square as well.

2


f

observed 
f
f exp ected

2
exp ected
= 145.13 + 13.82 + 23.47 + 14.48 + 59.25 + 8.77 +
25.5 + 11.28 + 29.34
Chi squared = 331.04
Where degree of freedom = (# groups -1)2
df = 4