Searching for Patterns: Sean Early PSLC Summer School 2007

Transcript Searching for Patterns: Sean Early PSLC Summer School 2007

Searching for Patterns:
Sean Early
PSLC Summer School 2007
•Question: Which is a better predictor of
performance in a cognitive tutor, error rate or
assistance score?
•Method: Logistic Regression using LFA
preliminary analysis to determine assistance
score
•Outcome: Deeply aggregated data shows that
both are good predictors (p<.001); Complete
case analysis incomplete due to hurdles [data
scrubbing, software, insufficient memory]
Searching for Patterns
Sean Early
University of Southern California
PSLC Summer School 2007
Data Mining Track
June 22, 2007
Background
• Assistments data set from 2004-2005
school year (Heffernan)
• N=179
• MCAS 39 step table
• Data covers one year of 8th grade math
online tutoring system results
• Question:
Which is a better predictor of
performance in a cognitive tutor,
error rate or assistance score?
Method
• Logistic Regression using LFA preliminary
analysis to determine assistance score
LFA represents an extension of Rasch
modeling such that the probability of a
correct response is equal to the underlying
ability of the student, the difficulty of the
knowledge component, and the number of
opportunities that the student has had to
respond correctly to that knowledge
component
Model
The Learning Factors Algorithm can be
represented as:
ln(p/1-p)= αi + βj + Γj
t-1
Results
• Preliminary results using standard logistic
regression showed no significant
difference in the predictive power of errors
and requests for assistance
• These variables were significantly
correlated with each other, and with posttest such that the fewer errors or hints, the
better the final performance
Unanswered Questions
• The simple logistic regression model
leaves much to be desired. It fails to
account for the alpha term (where the
student’s initial skill level was) or for the
rate of growth across time. Both of these
questions can be addressed through the
LFA model, as the alpha term represents
an intercept term while the gamma value
represents opportunities to learn before
mastery.
Data Mining Hurdles I Didn’t See Coming:
or, What I Wish I Knew Then
• Data scrubbing (deleting repeated cases due to
multiple knowledge components in one problem)
needs to happen before creating aggregate data
sets
• Excel is more powerful than I thought (Pivot
Tables may indeed be the best thing MS has
given us)
• 65,000 cases of information applied to a logistic
regression algorithm makes my computer seize
up with a case of the “not todays”
• Codebooks are our friends
Final Thoughts
• For those of us who believe in the mastery v.
performance orientation framework, please take
my lack of a more concrete product to share as
evidence of my mastery orientation. I feel as
though I learned a great deal this week. I also
feel that with about 10 more hours of work and a
more powerful machine at my side, I just may be
able to answer my initial question in a way that
captures some of the more interesting subtleties
in the data set.

Searching for Patterns: Sean Early PSLC Summer School 2007

Transcript Searching for Patterns: Sean Early PSLC Summer School 2007

Directory