Project Poster - Penn State Department of Statistics

Download Report

Transcript Project Poster - Penn State Department of Statistics

Analysing the Retention in University Park by Logistic Regression
Benaglia, T.A., Hummel, R.M., Pietras, J., Altman, N.
Department of Statistics – Penn State University
E
st i m
at ed P
r obabi l i t y
1. 00
0. 98
0. 96
Acknowledgment:
This project was developed in Stat 511 course – Fall 2004.
Thanks to the College of Science for providing the data.
0. 94
0. 92
0. 90
0. 95
0. 88
0. 95
0. 86
0. 84
0. 82
0. 90
0. 90
0. 80
0. 78
Objective:
For students who enroll as freshmen in the College of Science, what factors most
significantly influence whether they stay at University Park or transfer to another
Penn State campus? The dependent variable is categorical, that is, students’ choice
of campus is categorized as (0) staying at University Park, or (1) leaving University
Park. The initial potential predictors are Ethnicity, High School GPA, SAT Math
Score, SAT Verbal Score, FTCAP Math 140, FTCAP Math 110, FTCAP Math 40,
and FTCAP BMath (where FTCAP scores were from University-wide freshmen
testing).
E
s
t
i
m
a
t
e
d
P
r
o
b
a
b
i
l
i
t
y
E
s
t
i
m
a
t
e
d
0. 85
0. 80
P
r
o
b
a
b
i
l
i
t
y
0. 75
0. 76
0. 74
0. 85
0. 72
0. 70
0. 68
0. 80
0. 66
0. 64
0. 62
0. 60
0. 75
0. 58
0. 56
0. 54
0. 70
0. 70
0
• CAMPUS is recorded either as (0) if the student chooses to stay at University Park
(UP) or as (1) if the student chooses to transfer to another Penn State campus.
• Ethnicity is categorized as:
2 - African/Black American
3 - Asian American
4 - Latino American
5 - White American
The categories of Native American (1) and Other (6) were not included in the study
because there were not enough Native American students to make an appropriate
analysis and the category of Other would have no meaning when reporting our
analysis based on ethnicity.
• High School GPA (HSGPA) is the GPA reported by the incoming student’s
graduating high school and is measured on a 4.0 scale.
• SAT Math Score (SATMATH) is the SAT Math Score reported by ETS to the
University, which is measured on an 800 point scale.
• SAT Verbal Score (SATVERB) is the SAT Verbal Score reported by ETS to the
University, which is measured on an 800 point scale.
• FTCAP Math 140 (FTCAPMATH140) is the freshmen testing score for Math 140.
• FTCAP Math 110 (FTCAPMATH110) is the freshmen testing score for Math 110.
• FTCAP Math 40 (FTCAPMATH40) is the freshmen testing score for Math 40.
• FTCAP BMath (FTCAPBMATH) is the freshmen testing score for B Math.
0. 65
E
t hni ci t y
0. 65
0
1
0
1
Af r i c a n
E
s
t
i
m
a
t
e
d
P
r
o
b
a
b
i
l
i
t
y
2
3
4
5
From figure above, notice that as FTCAPMATH110 scores increase:
The probability for African American students to stay at UP rises quickly and then
asymptotes to 1.
•The probability for Asian students to stay at UP decreases.
•The probability for Hispanic students to stay at UP decreases.
•The probability for Caucasian students to stay at UP increases fairly steadily. This
probability does not change with the interaction except by the intercept.
0. 95
0. 85
0. 80
0. 75
0. 70
0. 65
0
1
Hi s p a n i c
Figure 1: Estimated Probabilities by African, Asian and Hispanic
In the boxplots above, note that African American students have a higher probability,
on average, to stay at UP, compared to non-African American students. Asian
students, on the other hand, have a lower probability, on average, of staying at UP,
compared to non-Asian students. Hispanic students, like African American students,
have a higher probability, on average, of staying at UP, compared to non-Hispanic
students.
This unusual plot and the significance of only the ASIAN*FTCAPMATH110
interaction suggests that there may be a different relationship for Asian students
versus non-Asian students. Then, considering the re-categorized observations as
Asian or non-Asian (rather than African-American, Hispanic, Asian, and Caucasian)
and using logistic regression on this reformed variable yields the following plot.
(Only FTCAPMATH110 is used because it was detected before that it is the only
significant non-ethnicity predictor.)
E
st i m
at ed P
r obabi l i t y
1. 00
0. 98
0. 96
0. 94
0. 92
0. 90
Es t i m
at ed
0. 88
Pr obabi l i t y
0. 92
0. 86
0. 91
0. 84
0. 90
0. 89
0. 82
0. 88
0. 80
0. 87
0. 78
0. 86
0. 76
0. 85
0. 84
0. 74
0. 83
0. 72
0. 82
0. 70
0. 81
0. 68
0. 80
0. 79
0. 66
0. 78
0. 64
0. 77
0. 62
0. 76
0
0. 75
10
20
0. 74
30
FT
C
A
P
M
A
T
H
110
0. 73
A
si an
0. 72
0
1
0. 71
0. 70
Figure 4: Estimated Probabilities versus FTCAPMATH110 by Asian
0. 69
0. 68
0
10
20
30
FTCAPM
ATH110
2
3
4
5
Figure 2: Estimated Probabilities versus FTCAPMATH110 by Ethnicity
Data Analysis
Results.
After a backward stepwise regression, the final model will include as explanatory
variables: AFRICAN, ASIAN, HISPANIC, and FTCAPMATH110.
30
Figure 3: Estimated Probabilities versus FTCAPMATH110 with interaction
As i a n
Et hni ci t y
Methods.
Since the dependent variable (CAMPUS) is categorical (dicotomic), it is adequate to
use Logistic Regression, that is, modeling the probability that a student will stay at
UP considering the possible explanatory variables. According to our research
question, the first step was the application of variable selection methods to determine
which predictors were significant in explaining whether or not students who transfer
their majors from the College of Science will stay at the UP campus.
20
FT
C
A
P
M
A
T
H
110
0. 90
Data Description:
The data was compiled by the College of Science Dean’s Office, from all incoming
freshmen enrolling in the College of Science during the Fall 2003 semester. Students
with missing information were not considered in this study. Any results obtained in
this study will be applicable to any incoming freshmen who first report a major in the
College of Science and subsequently transfer their major to another Penn State
college. The data were specified in the following way:
10
Figure above shows a very distinct separation, given the student’s ethnicity, of
probability trends for whether or not a student stays at UP based on their
FTCAPMATH110 score.
It is possible that there may be a relationship between combinations of variables,
which is described using interaction terms. For example, Asian students with high
FTCAPMATH110 scores may be under significant pressure to stay at the UP
campus.
But, in this model, only FTCAPMATH110 is significant, so the next step is to fit the
regression of CAMPUS on AFRICAN, ASIAN, HISPANIC, FTCAPMATH110, and
the interaction terms AFRICAN*FTCAPMATH110, ASIAN* FTCAPMATH110,
HISPANIC* FTCAPMATH110.
The only interaction term that is significant is ASIAN* FTCAPMATH110. (The
interaction terms with AFRICAN and HISPANIC are not significant.)
In this plot, the probability of staying at University Park based on FTCAPMATH110
scores for Asian versus non-Asian is significantly different. For Asian students, as
FTCAPMATH110 scores increase, the probability of staying at UP increases; for
non-Asian students, as FTCAPMATH110 scores increase, the probability of staying
at UP decreases.
Conclusions:
The only significant predictors of whether or not a student will stay at UP are the
student’s FTCAPMATH110 score and whether the student is Asian or non-Asian.
If the student is Asian, then, as FTCAPMATH110 scores increase, the student’s
probability of staying at UP decreases dramatically. If the student is non-Asian, then,
as FTCAPMATH110 scores increase, the student’s probability of staying at UP
increases almost as dramatically. The probability of an Asian student staying at UP
given a very poor grade (0) on the FTCAPMATH110 is nearly 1. This probability
sinks to about .64 as the FTCAPMATH110 scores rise to 26 (a perfect score). For a
non-Asian student, a low FTCAPMATH110 score (0) yields a probability of
approximately .66 that the student will stay at UP. This increases to about .91 for a
perfect score. The two groups have the same predicted probability at an
FTCAPMATH110 score of approximately 18.5.