Automating Cognitive Model Improvement by A*Search and

Download Report

Transcript Automating Cognitive Model Improvement by A*Search and

Data mining with DataShop
Ken Koedinger
CMU Director of PSLC
Professor of Human-Computer Interaction &
Psychology
Carnegie Mellon University
“Knowledge components
are the germ of transfer”
Goal of the week:
What does Ken mean by this?
Overview

Motivation for data mining


Exploratory Data Analysis



Better understanding of students =>
better instructional design
Data Shop demo, Excel
Learning curves & Learning Factors Analysis
Example project from last summer
Data Mining Questions &
Methods

What is going on with student learning &
performance?

Exploratory data analysis



Summary & visualization tools in DataShop
Tools in Excel: Auto filter, Pivot Tables, Solver
How to reliably model student achievement?

Item Response Theory (IRT)


Basis for standardized tests, SAT, GRE, TIMSS…
Version of “logistic regression”
Data Mining Questions &
Methods 2

What’s the nature of knowledge students are learning?
How can we discover cognitive models of student learning
that fit their learning curves?


Learning Factors Analysis (LFA)
 Extends IRT to account for learning
 Search algorithm: Discover cognitive model(s) that capture
how student learning transfers over tasks over time
What features of a tutor lead to the most learning?

Learning Decomposition


Extends LFA to explore different rates of learning due to different
forms of instruction
How to extract reliable inferences about causal mechanisms
from correlations in data?

Causal modeling using Tetrad
Overview

Motivation for data mining


Exploratory Data Analysis



Better understanding of students =>
better instructional design
Next
Demo: DataShop, Excel
Learning curves & Learning Factors Analysis
Example project from last summer
Data Shop Demo …
Before going to DataShop,
let’s look at a tutor (1997
version!) that generated the
example data set we’ll look at
TWO_CIRCLES_IN_SQUARE problem:
Initial screen
TWO_CIRCLES_IN_SQUARE problem:
An error a few steps later
TWO_CIRCLES_IN_SQUARE problem:
Student follows hint & completes prob
How to get to the DataShop: Go to
http://learnlab.org & click …
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
2
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
1
3
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
PSLC’s
DataShop

Researchers get data
access, visualizations,
statistical tools

Learning curves track
student learning over
time

Discover what
concepts & skills
students need help
with
PSLC’s
DataShop


Learning curves
reveal over- and
under-practiced
knowledge
components
Rectangle-area has
an initial low error
rate, but is practiced
often
Other DataShop Features



Error Reports
 Identify misconceptions by looking for common student errors
 When do students ask for hints?
 Are there alternative correct strategies?
Performance Profiler
Export Data
 Get all or part of the data in tab-delimited file
 Use your favorite analysis tools …
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Exported File Loaded into Excel
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Overview

Motivation for data mining


Exploratory Data Analysis



Better understanding of students =>
better instructional design
Data Shop demo, Excel
Next
Learning curves & Learning Factors Analysis
Example project from last summer
Cognitive Model drives behavior of
intelligent tutor systems …

Cognitive Model: expert component of intelligent tutors
that models how students solve problems
3(2x - 5) = 9
If goal is solve a(bx+c) = d
Then rewrite as abx + ac = d
If goal is solve a(bx+c) = d
Then rewrite as abx + c = d
If goal is solve a(bx+c) = d
Then rewrite as bx+c = d/a
6x - 15 = 9

2x - 5 = 3
6x - 5 = 9
Model Tracing: Follows student through their individual
approach to a problem -> context-sensitive instruction
Cognitive Model drives behavior of
intelligent tutor systems …

Cognitive Model: expert component of intelligent tutors
that models how students solve problems
3(2x - 5) = 9
If goal is solve a(bx+c) = d
Then rewrite as abx + ac = d
If goal is solve a(bx+c) = d
Then rewrite as abx + c = d
Hint message: “Distribute a
across the parentheses.”
Known? = 85% chance
6x - 15 = 9


Bug message: “You need to
multiply c by a also.”
Known? = 45%
2x - 5 = 3
6x - 5 = 9
Model Tracing: Follows student through their individual
approach to a problem -> context-sensitive instruction
Knowledge Tracing: Assesses student's knowledge
growth -> individualized activity selection and pacing
Cognitive Modeling Challenge

Problem: Intelligent Tutoring Systems depend
on Cognitive Model, which is hard to get right



Hard to program, but more importantly …
A high quality cognitive model requires a deep
understanding of student thinking
Cognitive models created by intuition are often
wrong (e.g., Koedinger & Nathan, 2004)
Significance of improving a cognitive
model

A better cognitive model means:



better feedback & hints (model tracing)
better problem selection & pacing (knowledge
tracing)
Making cognitive models better advances
basic cognitive science
How can we use student data to
build better cognitive models?

Cognitive Task Analysis methods

Think alouds, Difficulty Factors Assessment


Peer collaboration dialog analysis


General lecture Tuesday
TagHelper track
Newer:

Data mining of student interactions with on-line tutors
Back to DataShop to illustrate
Use log data to test alternative
knowledge representations


Which “knowledge component” analysis is correct
is an empirical question!
Log data from tutors provides data to compare
different KC analyses

Find which “germ” accounts for student learning
behaviors
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Not a smooth learning curve -> this
knowledge component model is
wrong. Does not capture genuine
student difficulties.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
This more specific knowledge
component (KC) model (2 KCs) is
also wrong -- still no smooth drop in
error rate.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Ah! Now we are getting a smooth
learning curve. This even more
specific decomposition (12 KCs)
better tracks the nature of student
difficulties & transfer for one problem
situation to another.
Overview

Motivation for data mining


Exploratory Data Analysis



Better understanding of students =>
better instructional design
Demo: DataShop, Excel
Learning curves & Learning Factors Analysis
Example project from last summer
Next
Example project from 2006




Rafferty (Stanford) & Yudelson (U Pitt)
Analyzed a data set from Geometry
Applied Learning Factors Analysis (LFA)
Driving questions:


Are students learning at the same rate as
assumed in prior LFA models?
Do we need different cognitive models (KC
models) to account for low-achieving vs. highachieving students?
Rafferty & Yudelson Results 1


Different
student
learning
rates?
Yes
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Rafferty & Yudelson Results 2

Is it “faster” learning or “different” learning?

Fit with a more compact model is better for low pre for high learn
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.


Students with an apparent faster learning rate are learning a
more “compact”, general and transferable domain model
(Became basis of Anna Rafferty’s masters thesis)
Data Mining-Data Shop Offerings
Tomorrow
Lectures in 3501 Newell-Simon Hall, activities here (Wean 5202)
1. Educational data mining overview & introduction to using the
DataShop

Follow-up activities:


Exercise in using DataShop for exploratory data analysis
Use tutor/course that generated target data set. Begin data export,
data scrubbing, exploratory data analysis
2. Learning from learning curves: Item Response Theory,
Learning Factors Analysis
3. Other data mining techniques: Learning decomposition,
causal models with Tetrad

Define metrics to address driving question, begin analysis
Questions?
What’s next?

Tomorrow:


Do you know which offerings you will go to
tomorrow?
Any conflicts -- two you want to go to that are at
the same time?
END