Predictive Analytics Journey SPSS Modeler “Stream”
Download
Report
Transcript Predictive Analytics Journey SPSS Modeler “Stream”
Predictive Analytics Proof of Concept (POC)
September 2014
Additional Information on the Sac State Predictive Analytics POC
ECUCAUSE 2014 Poster Session
Proof of Concept (POC) Objectives
1. Provide predictive insights for a university-wide strategic
issue/program (e.g. student success and student retention)
2. Demonstrate the capability of predictive analytics for broader
application
3. Develop expertise with IBM SPSS Modeler in partnership with the
vendor and key campus leaders
4. Identify gaps in the data and next steps for architecting and
deploying a predictive analytics solution
Predictive Analytics Journey
Indicators:
Milestones:
1.
2.
3.
1. First semester grade point
average
2. Second semester grade point
average
Enroll full time
Earn summer credits
Complete a college success
course or first-year
experience program
Subset of Indicators and Milestones identified by the Institute for Higher Education Leadership
and Policy (IHELP): “Student Flow Analysis: CSU Student Progress Toward Graduation”
*
SPSS Modeler “Stream”
Using Factors from Published Study
Inside the “Super Node”
Additional Data Prep
“Auto Prep” Option in SPSS Modeler
Choose Speed, Accuracy, or Manual
How Good was the POC Model?
SPSS Modeler
Predictions for Each Student in Cohort
Predictive Analytics POC
Lessons Learned
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Learned how to use IBM SPSS Modeler
Data Prep is key – and time consuming!
Consider moving some of the Data Prep to the ETL layer (i.e. model the data so it can easily be
used at input for analytics)
You must “know your data”
You must be familiar with statistical methods to prep the data properly and to understand the
results
Optimal Predictive Analytics Project Team: Data Modeler, BI Analyst, Subject Matter Expert from
functional area, and Data Scientist
Correlation vs. Cause
The output may be one step in developing advising programs, identifying advising cohorts or for
advising individuals; however, caution should be taken in directly advising a student based on one
predictive model looking 5 years out
Predictive analytics is an on-going, iterative process
There is an opportunity to write the predicted outcomes to the data warehouse and use them to
track the usefulness of the model and to create dashboards to track the success of resulting
programs
Additional POC Work
1.
In addition to focusing on the IHELP indicators and milestones,
several models using a broader set of data from the data
warehouse were developed
2.
Experimented with different cohort years and different targets
3.
Used IBM SPSS Modeler to develop a basic POC Faculty
Retention Model
4.
Used IBM SPSS Modeler for descriptive analytics for AD ASTRA
event and scheduling data
Predictive Analytics
Next Steps
1.
2.
3.
4.
5.
6.
7.
Link to campus strategic plan, identify an opportunity for predictive analytics to
contribute to its success, and build target models with the “optimal team” as described
previously (tight collaboration with campus functional areas)
Continue to develop models focused on student success, but explore other areas such
as university advancement, scheduling, etc.
Move from using flat file extracts to connecting IBM SPSS Modeler directly to the data
warehouse
Develop data models and ETLs to better prep the data for predictive analytics and data
mining
Identify missing data or data gaps and close the gaps if possible with the data that we
have
Partition data to develop the model on a subset of data and then test its predictive
power on the remaining set
Continue to learn and build expertise with SPSS Modeler and its capabilities as well as
continue to build expertise in statistical methods in general