Educational Data Mining Overview
Download
Report
Transcript Educational Data Mining Overview
Educational Data Mining
and DataShop
John Stamper
Carnegie Mellon University
9/12/2012
PSLC Corporate Partner Meeting 2012
1
The Classroom of the Future
Which picture represents the
“Classroom of the Future”?
9/12/2012
2
PSLC Corporate Partner Meeting 2012
The Classroom of the Future
The answer is both!
Depends of how much money you have...
… but maybe not what you think…
3
9/12/2012
PSLC Corporate Partner Meeting 2012
The Classroom of the Future
Rich vs. Poor
– Poor kids will be forced to rely on “cheap” technology
– Rich kids will have access to “expensive” teachers
We are seeing this today!
– Waldorf school in Silicon Valley – no technology
– NGLC Wave III Grants
– MOOCs (AI Course at Stanford)
– Growth of adaptive technology companies
– Online instruction
– … and more…
4
9/12/2012
PSLC Corporate Partner Meeting 2012
What does this mean?
My view is that we cannot stop this, I believe we
must accept that economics will force this route.
We should focus on improving learning technology
• New ways to improve teacher-student access
• Add more adaptive features to learning software
Intelligent Tutors, at scale, using data!
5
9/12/2012
PSLC Corporate Partner Meeting 2012
Educational Data Mining
• “Educational Data Mining is an emerging
discipline, concerned with developing
methods for exploring the unique types of
data that come from educational settings, and
using those methods to better understand
students, and the settings which they learn
in.”
– www.educationaldatamining.org
9/12/2012
PSLC Corporate Partner Meeting 2012
6
Classes of EDM Methods
(Baker & Yacef, 2009)
•
•
•
•
•
Prediction
Clustering
Relationship Mining
Discovery with Models
Distillation of Data For Human Judgment
9/12/2012
PSLC Corporate Partner Meeting 2012
7
Prediction
• Develop a model which can infer a single
aspect of the data (predicted variable) from
some combination of other aspects of the
data (predictor variables)
• Does a student know a skill?
• Which students are off-task?
• Which students will fail the class?
9/12/2012
PSLC Corporate Partner Meeting 2012
8
Clustering
• Find points that naturally group together,
splitting full data set into set of clusters
• Usually used when nothing is known about
the structure of the data
– What behaviors are prominent in domain?
– What are the main groups of students?
9/12/2012
PSLC Corporate Partner Meeting 2012
9
Relationship Mining
• Discover relationships between variables in a
data set with many variables
– Association rule mining
– Correlation mining
– Sequential pattern mining
– Causal data mining
9/12/2012
PSLC Corporate Partner Meeting 2012
10
Discovery with Models
• Pre-existing model (developed with EDM
prediction methods… or clustering… or
knowledge engineering)
• Applied to data and used as a component in
another analysis
9/12/2012
PSLC Corporate Partner Meeting 2012
11
Distillation of Data for Human
Judgment
• Making complex data understandable by
humans to leverage their judgment
• Text replays are a simple example of this
9/12/2012
PSLC Corporate Partner Meeting 2012
12
Knowledge Engineering
• Creating a model by hand rather than
automatically fitting model
• In one comparison, leads to worse fit to goldstandard labels of construct of interest than
data mining (Roll et al, 2005), but similar
qualitative performance
9/12/2012
PSLC Corporate Partner Meeting 2012
13
LearnLab
• The LearnLab has played a pivotal role in the
creation of the EDM community
• The CMDM thrust of the center focuses on
Educational Data Mining
• DataShop is also a key tool for the EDM
community
9/12/2012
PSLC Corporate Partner Meeting 2012
14
DataShop
• Open repository for educational data
• Many large-scale datasets both public and
private
• Tools for
– exploratory data analysis
– learning curves
– domain model testing
9/12/2012
PSLC Corporate Partner Meeting 2012
15
DataShop
• Import/Export of data
• Custom fields
• Easy Knowledge Model creation and validation
• Web services for tools integration
9/12/2012
PSLC Corporate Partner Meeting 2012
16
Demo
9/12/2012
PSLC Corporate Partner Meeting 2012
17
Engaging the KDD/ICDM Community
• Some hesitation from these groups
– Educational data not interesting
– Too applied
– Not “big” enough for eScience
• This was one motivation for the 2010 KDD Cup
9/12/2012
PSLC Corporate Partner Meeting 2012
18
KDD Cup Competition
Knowledge Discovery and Data Mining (KDD)
is the most prestigious conference in the data
mining and machine learning fields
KDD Cup is the premier data mining challenge
2010 KDD Cup called “Educational Data
Mining Challenge”
Ran from April 2010 through June 2010
9/12/2012
PSLC Corporate Partner Meeting 2012
19
KDD Cup Competition
Competition goal is to predict student responses given tutor
data provided by Carnegie Learning
Dataset
Students
Steps
File size
Algebra I 2008-2009
3,310
9,426,966
3 GB
Bridge to Algebra 2008-2009
6,043
20,768,884
5.43 GB
9/12/2012
PSLC Corporate Partner Meeting 2012
20
KDD Cup Competition
655 registered participants
130 participants who submitted predictions
3,400 submissions
9/12/2012
PSLC Corporate Partner Meeting 2012
21
KDD Cup Competition
Advances in prediction and cognitive
modeling
Excitement in the KDD Community
The datasets are now in the “wild” and
showing up in non KDD conferences
New competitions have been done and are
in the works
9/12/2012
PSLC Corporate Partner Meeting 2012
22
Opportunities
• Huge potential for EDM and DataShop to
improve educational systems
• DataShop is open and staff is available to help
get users started
• Great option for creating capstone projects
9/12/2012
PSLC Corporate Partner Meeting 2012
23
EDM Community is Online!
www.educationaldatamining.org
EDM 2013 in Memphis TN in July
Questions: [email protected]
9/12/2012
PSLC Corporate Partner Meeting 2012
24