Transcript Slide 1

A way to integrate IR and Academic activities to
enhance institutional effectiveness.
Introduction
The University of Alabama (State of Alabama, USA) was one of the first
universities to use data mining for retention and all the way through the cycle to
include intervention as well as recruitment. High school students most likely to
attend the university and freshman at risk of dropping out are for example
identified. Of particular interest is the fact that this was a joint effort between data
mining students in the Department of Statistics and the Enrollment Office.
This lead to the idea to establish cooperation between the Planning Unit and the
Department of Mathematical Statistics and Actuarial Science at the UFS. In the
Statistics department a post graduate course in Data Mining is presented using
SAS Enterprise Miner. The fact that this course has a practical component
which constitutes at least 40% of the final mark, creates the opportunity to
involve the students in Institutional Research activities. A total of twenty projects
were identified and one assigned to each of the thirty students enrolled for this
course. The data used for these projects are from the student database of the
UFS.
The Post Graduate Course
Currently only an introductory data mining course is presented. For optimal
results it will be necessary to introduce a further more advanced data mining
course.
Course 1:
Introduction to Data Mining.
In this course SAS Enterprise Miner and the use of predictive models are
introduced. A broad overview to the modeling techniques of Logistic Regression,
Decision Trees, and Neural Networks are provided. The concepts of data
partitioning, model assessment using lifts charts and ROC curves, and model
implementation are presented. The project is part of this course.
Course 2:
Advanced Data Mining.
This should provide a more in-depth coverage of the technical aspects of each of
the modeling tools discussed in the first course. Topics in Statistical Decision
Theory and unsupervised learning can also be included.
The dataset and the variables
X1
ID Identification number of student
X2
RACE Race of student
X3
GENDER Gender of student
X4
CAMPUSY1 Campus of registration for year one
X5
FACULTYY1 Faculty for year one
X6
MINYEARSTOGRAD Minimum years to obtain qualification registered for
EXTENDEDPY1 Is the qualification registered for an extended program?
(Y,N,NOTAV)
X7
X8
X9
AGEY1 Age of the student when registering for year one
H_LANGUAGE The home language of the student
X 10
M_COUNT The M-count obtained by the student
X 11
X 13
NUMCREDITS1Y1 The total number of credits registered for in the first semester of
year one
PROPCREDITSPASSED1Y1 The proportion of credits passed in the first semester
of year one
NUMCREDITSY1 The total number of credits registered for in year one
X 14
PROPCREDITSPASSEDY1 The proportion of credits passed in year one
X 12
Y TARGET The binary dependent variable which can be 1, to indicate success, and 0
to indicate failure
The projects
The following projects were identified:
1. Build predictive models using decision trees, regression, and neural networks
to identify successful students in Faculty A at the end of the first year of
study.
Definitions:
Success is defined as the event that a student in Faculty A completes the
qualification registered for in year one in the minimum time.
Failure is defined as the event that a student in Faculty A fails to complete
the qualification registered for in year one in the minimum time.
Variables included in dataset:
ID, RACE, GENDER, CAMPUSY1, MINYEARSTOGRAD, EXTENDEDPRY1,
AGEY1, H_LANGUAGE, M_COUNT, NUMCREDITSY1,
PROPCREDITSPASSEDY1, TARGET
Preliminary results for Project 1:
Faculty of Natural and Agricultural Sciences
Faculty of Economic and Management Sciences
Faculty of the Humanities
Faculties can now be compared.
Consider the rule that leads to the highest probability for success in each of
the three faculties:
Rule for faculty of Natural and Agricultural Sciences:
If PROPCREDITSPASSEDY1> 0.89 and NUMCREDITSY1 >142,
then P(Success) = 0.41
Rule for faculty of Economic and Management Sciences:
If PROPCREDITSPASSEDY1>0.81, NUMCREDITY1>116.5, and
M_COUNT>40.5, then P(Success)=0.59
Rule for faculty of the Humanities:
If M_COUNT>34.5 and PROPCREDITSPASSEDY1>0.74,
then P(Success)=0.58
2. Build predictive models using decision trees, regression, and neural networks to
identify students likely to dropout from Faculty A at the end of year one.
Definitions:
Dropout is defined as the event that a student, who did not graduate at the end
of year one, is not registered at the beginning of year two for any qualification in
Faculty A. Only data available at the end of the first semester should be used.
No-Dropout is defined as the event that a student is still registered for a
qualification in Faculty A (not necessarily the same qualification as in year one).
Variables included in dataset:
ID, RACE, GENDER, CAMPUSY1, MINYEARSTOGRAD, AGEY1,
H_LANGUAGE, M_COUNT, NUMCREDITS1Y1,
PROPCREDITSPASSED1Y1, TARGET
3. Build predictive models using decision trees, regression, and neural networks
to identify students in Faculty A likely to pass more than 90% of the
courses registered for in year one. Only data available at the time of
registration should be used.
4. Build predictive models using decision trees, regression, and neural networks
to identify students in Faculty A likely to pass less than 20% of the
courses registered for in year one. Only data available at the time of
registration should be used.
Variables included in dataset:
ID, RACE, GENDER, CAMPUSY1, MINYEARSTOGRAD, AGEY1,
H_LANGUAGE, M_COUNT, TARGET
“Faculty A” can be :
A. Humanities
B. Education
C. Natural and Agricultural Sciences
D. Business and Management Sciences
E. All faculties
The advantages
1. Students are exposed to real life data sets.
2. Students are introduced to an aspect of Institutional Research and will gain
insight in the challenges universities are faced with.
3. Recent computing advances have created an increased demand for Business
Intelligence (BI) professionals. The courses are designed to educate students
to meet the marketplace demand.
4. Cooperation and understanding between support services and academics are
promoted.
5. The university is provided with BI to facilitate the making of strategic decisions
on a large scale since several projects will run simultaneously.
6. Projects can be updated every year to accommodate new enrollments.
7. Possible changes over time in predictive models can be investigated.
8. The projects will enable comparison between faculties. It will for example be
possible to compare the indicators for a dropout in Faculty A with that of
Faculty B.
The challenges
1. Well equipped computer laboratories should be available for student use.
2. Students with an insufficient level of computer literacy should be prevented from
entering the course.
3. A data warehouse should be in place and properly maintained for reliable results.
4. The student projects should be closely monitored and supervised.
Thank you