Automating Cognitive Model Improvement by A*Search and

Download Report

Transcript Automating Cognitive Model Improvement by A*Search and

Learning from Learning
Curves using Learning Factors
Analysis
Hao Cen, Kenneth Koedinger, Brian Junker
Human-Computer Interaction
Cen, H., Koedinger, K., Junker, B. Learning Factors
Institute
Analysis - A General Method for Cognitive Model
Evaluation and Improvement. the 8th International
Carnegie Mellon University
Conference on Intelligent Tutoring Systems. 2006. Science,
26(2)
Cen, H., Koedinger, K., Junker, B. Is Over Practice
Necessary? Improving Learning Efficiency with the
Cognitive Tutor. The 13th International Conference on
Artificial Intelligence in Education (AIED 2007). 2007.
Student Performance As They
Practice with the LISP Tutor
Error Rate
Production Rule Analysis
0.5
Evidence for Production Rule as an
appropriate unit of knowledge acquisition
0.4
0.3
0.2
0.1
0.0
0
2
4
6
8
10
Opportunity to Apply Rule (Required Exercises)
12
14
Using learning curves to
evaluate a cognitive model


Lisp Tutor Model
 Learning curves used to validate cognitive model
 Fit better when organized by knowledge components
(productions) rather than surface forms (programming language
terms)
But, curves not smooth for some production rules
 “Blips” in leaning curves indicate the knowledge
representation may not be right
 Corbett, Anderson, O’Brien (1995)
 Let me illustrate …
Curve for “Declare
Parameter” production rule
What’s happening
on the 6th & 10th
opportunities?


How are steps with blips different from others?
What’s the unique feature or factor explaining these
blips?
Can modify cognitive model using unique
factor present at “blips”


Blips occur when to-be-written program has 2 parameters
Split Declare-Parameter by parameter-number factor:


Declare-first-parameter
Declare-second-parameter
Learning curve analysis by hand
& eye …

Steps in programming problems where the function
(“method”) has two parameters (Corbett, Anderson,
O’Brien, 1995)
Can learning curve analysis be
automated?

Learning curve analysis




Identify blips by hand & eye
Manually create a new model
Qualitative judgment
Need to automatically:



Identify blips by system
Propose alternative cognitive models
Evaluate each model quantitatively
Overview



Learning Factors Analysis algorithm
A Geometry Cognitive Model and Log Data
Experiments and Results
Learning Factors Analysis (LFA):
A Tool for KC Analysis

LFA is a method for discovering & evaluating alternative
cognitive models


Finds knowledge component decomposition that best predicts
student performance & learning transfer
Inputs


Data: Student success on tasks in domain over time
Codes: Hypothesized factors that drive task difficulty


A mapping between these factors & domain tasks
Outputs


A rank ordering of most predictive cognitive models
For each model, a measure of its generalizability & parameter
estimates for knowledge component difficulty, learning rates, &
student proficiency
Learning Factors Analysis (LFA) draws
from multiple disciplines

Machine Learning & AI


Combinatorial search (Russell & Norvig, 2003)
Exponential-family principal component analysis (Gordon,
2002)

Psychometrics & Statistics



Q Matrix & Rule Space (Tatsuoka 1983, Barnes 2005)
Item response learning model (Draney, et al., 1995)
Item response assessment models (DiBello, et al., 1995;
Embretson, 1997; von Davier, 2005)

Cognitive Psychology

Learning curve analysis (Corbett, et al 1995)
Steps in Learning Factors Analysis
We’ve talked
about some of
these steps 1-4
before …
LFA – 1. The Q Matrix

How to represent relationship between knowledge components
and student tasks?


Tasks also called items, questions, problems, or steps (in problems)
Q-Matrix (Tatsuoka, 1983)
Item | KC


Add
Sub
Mul
Div
2*8
0
0
1
0
2*8 - 3
0
1
1
0
2* 8 is a single-KC item
2*8 – 3 is a conjunctive-KC item, involves two KCs
What good is a Q matrix? Used to predict
student accuracy on items not previously
seen, based on KCs involved
13
LFA – 2. The Statistical Model


Problem: How to predict student responses from model?
Solutions: Additive Factor Model (Draney, et al. 1995, Cen, Koedinger,
Junker, 2006)
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
LFA – 2. An alternative “conjunctive”
model

Conjunctive Factor Model (Cen, Koedinger, Junker, 2008)
QuickTime™
QuickTime™
and aand a
TIFF
(LZW)
decompressor
TIFF (LZW) decompressor
are needed
seepicture.
this picture.
are needed
to seetothis
LFA - 4. Model Evaluation
• How to compare cognitive models?
• A good model minimizes prediction risk by balancing fit
with data & complexity (Wasserman 2005)
• Compare BIC for the cognitive models
•
•
•
BIC is “Bayesian Information Criteria”
BIC = -2*log-likelihood + numPar * log(numOb)
Better (lower) BIC == better predict data that haven’t seen
• Mimics cross validation, but is faster to compute
16
LFA – 5. Expert Labeling & P-Matrix


Problem: How to find the potentials to improve the
existing cognitive model?
Solution: Have experts look for difficulty factors that are
candidates for new KCs. Put these in P matrix.
Q Matrix
Item | Skill
Add
P Matrix
Sub
Mul
Item | Skill
Deal with
negative
0
Order
of Ops
0
2*8
0
0
1
2*8
2*8 – 3
0
1
1
2*8 – 3
0
0
2*8 - 30
0
1
1
2*8 - 30
1
0
3+2*8
1
0
1
3+2*8
0
1
…
LFA – 5. Expert Labeling and PMatrix

Operators on Q and P


Q + P[,1]
Q[, 2] * P[,1]
Q- Matrix after add P[, 1]
Item | Skill
Add
Sub
Mul
Div
2*8
0
0
1
0
2*8 – 3
0
1
1
2*8 - 30
0
1
1
Q- Matrix after splitting P[, 1], Q[,2]
neg
Item | Skill
Add
Sub
Mul
Div
0
2*8
0
0
1
0
Subneg
0
0
0
2*8 – 3
0
1
1
0
0
0
1
2*8 - 30
0
0
1
0
1
LFA – 6. Model Search

Problem: How to find best model given P-matrix?
Solution: Combinatorial search

A best-first search algorithm (Russell & Norvig 2002)



Guided by a heuristic, such as BIC
Start from an existing model
Original
Model
BIC = 4328
Combinatorial Search
Goal: Do model selection within the logistic regression model
space
Steps:
1.
Start from an initial “node” in search graph
2.
Iteratively create new child nodes by splitting a model using
covariates or “factors”
3.
Employ a heuristic (e.g. fit to learning curve) to rank each
node
4.
Expand from a new node in the heuristic order by going back
to step 2

LFA – 6. Model Search
Original
Model
BIC = 4328
Split by Embed
4301
4320
4322
Split by Backward
4322
4313
Split by Initial
4312
4322
15 expansions later
4248
4325
50+
4320
4324
Automates the process of
hypothesizing alternative KC
models & testing them against
data
Overview



Learning Factors Analysis algorithm
A Geometry Cognitive Model and Log Data
Experiments and Results
Domain of current study
Domain
of study: the area unit of the geometry tutor
Cognitive model:
15 skills
1.
Circle-area
2.
Circle-circumference
3.
Circle-diameter
4.
Circle-radius
5.
Compose-by-addition
6.
Compose-by-multiplication
7.
Parallelogram-area
8.
Parallelogram-side
9.
Pentagon-area
10.
Pentagon-side
11.
Trapezoid-area
12.
Trapezoid-base
13.
Trapezoid-height
14.
Triangle-area
15.
Triangle-side
Log Data -- Skills in the Base
Model
Student
Step
Skill
Opportunity
A
p1s1
Circle-area
1
A
p2s1
Circle-area
2
A
p2s2
Rectangle-area
1
A
p2s3
Compose-by-addition
1
A
p3s1
Circle-area
3
The Split

Binary Split -- splits a skill a skill with a factor
value, & a skill without the factor value.
After Splitting Circle-area by Embed
Student
Step
Skill
Opportunity
Factor- Embed
Student
Step
Skill
Opportunity
A
p1s1
Circle-area
1
alone
A
p1s1
Circle-area-alone
1
A
p2s1
Circle-area
2
embed
A
p2s1
Circlearea-embed
1
A
p2s2
Rectangle-area
1
A
p2s2
Rectangle-area
1
A
p2s3
Compose-byaddition
1
A
p2s3
Compose-byaddition
1
A
p3s1
Circle-area
3
A
p3s1
Circle-area-alone
2
alone
The Heuristics

Good model captures sufficient variation in
data but is not overly complicated


balance between model fit & complexity minimizing
prediction risk (Wasserman 2005)
AIC and BIC used as heuristics in the search





two estimators for prediction risk
balance between fit & parisimony
select models that fit well without being too complex
AIC = -2*log-likelihood + 2*number of parameters
BIC = -2*log-likelihood + number of parameters *
number of observations
System: Best-first Search

Original
Model
AIC = 5328


an informed graph search algorithm
guided by a heuristic
Heurisitcs – AIC, BIC
Start from an existing model
System: Best-first Search

Original
Model
AIC = 5328
Split by Embed
5301
Split by Backward
5322


an informed graph search algorithm
guided by a heuristic
Heurisitcs – AIC, BIC
Start from an existing model
Add Formula
5312
50+
5320
System: Best-first Search

Original
Model
AIC = 5328
Split by Embed
5301
5320
5322
Split by Backward
5322
5313


an informed graph search algorithm
guided by a heuristic
Heurisitcs – AIC, BIC
Start from an existing model
Add Formula
5312
50+
5320
System: Best-first Search

Original
Model
AIC = 5328
Split by Embed
5301
5320
5322
Split by Backward
5322
5313

Add Formula
5312
5322

an informed graph search algorithm
guided by a heuristic
Heurisitcs – AIC, BIC
Start from an existing model
5325
50+
5324
5320
System: Best-first Search

Original
Model
AIC = 5328
Split by Embed
5301
5320
5322
Split by Backward
5322
5313

Add Formula
5312
5322

an informed graph search algorithm
guided by a heuristic
Heurisitcs – AIC, BIC
Start from an existing model
5325
50+
5324
5320
System: Best-first Search

Original
Model
AIC = 5328
Split by Embed
5301
5320
5322
Split by Backward
5322
5313

Add Formula
5312
5322
15 expansions later
5248

an informed graph search algorithm
guided by a heuristic
Heurisitcs – AIC, BIC
Start from an existing model
5325
50+
5324
5320
Overview



Learning Factors Analysis algorithm
A Geometry Cognitive Model and Log Data
Experiments and Results
Experiment 1


Q: How can we describe learning behavior in
terms of an existing cognitive model?
A: Fit logistic regression model in equation
above (slide 27) & get coefficients
Experiment 1

Higher intercept of skill -> easier skill
Results:
Higher slope of skill -> faster students learn it
Intercep
t
Slope
Parallelogramarea
2.14
Pentagon-area
-2.16
Skill
Student
Intercep
t
student0
1.18
student1
0.82
student2
0.21
Avg Opportunties
Initial Probability
Avg Probability
-0.01
14.9
0.95
0.94
0.93
0.45
4.3
0.2
0.63
0.84
Higher intercept
of student ->
student initially
knew more
Model
Statistics
AIC
3,950
BIC
4,285
MAD
0.083
Final
Probability
The AIC, BIC & MAD
statistics provide
alternative ways to
evaluate models
MAD = Mean Absolute
Deviation
Experiment 2


Q: How can we improve a cognitive model?
A: Run LFA on data including factors &
search through model space
Experiment 2 – Results with BIC
Model 1
Model 2
Model 3
Number of Splits:3
Number of Splits:3
Number of Splits:2
1.
1.
1.
2.
3.
Binary split composeby-multiplication by
figurepart segment
Binary split circleradius by repeat repeat
Binary split composeby-addition by
backward backward
2.
3.
Binary split compose-bymultiplication by figurepart
segment
Binary split circle-radius by
repeat repeat
Binary split compose-byaddition by figurepart areadifference
2.
Binary split compose-bymultiplication by
figurepart segment
Binary split circle-radius
by repeat repeat
Number of Skills: 18
Number of Skills: 18
Number of Skills: 17
AIC: 3,888.67
BIC: 4,248.86
MAD: 0.071
AIC: 3,888.67
BIC: 4,248.86
MAD: 0.071
AIC: 3,897.20
BIC: 4,251.07
MAD: 0.075

Splitting Compose-by-multiplication into two skills
– CMarea and CMsegment, making a distinction
of the geometric quantity being multiplied
Experiment 3


Q: Will some skills be better merged than if
they are separate skills? Can LFA recover
some elements of original model if we search
from a merged model, given difficulty factors?
A: Run LFA on the data of a merged model,
and search through the model space
Experiment 3 – Merged Model

Merge some skills in the original model to remove some
distinctions, add as a difficulty factors to consider

The merged model has 8 skills:









Circle-area, Circle-radius => Circle
Circle-circumference, Circle-diameter => Circle-CD
Parallelogram-area and Parallelogram-side => Parallelogram
Pentagon-area, Pentagon-side => Pentagon
Trapezoid-area, Trapezoid-base, Trapezoid-height => Trapezoid
Triangle -area, Triangle -side => Triangle
Compose-by-addition
Compose-by-multiplication
Add difficulty factor “direction”: forward vs. backward
Experiment 3 – Results
Model 1
Model 2
Model 3
Number of Splits: 4
Number of Splits: 3
Number of Splits: 4
Number of skills: 12
Number of skills: 11
Number of skills: 12
Circle *area
Circle *radius*initial
Circle *radius*repeat
Compose-by-addition
Compose-by-addition*areadifference
Compose-bymultiplication*area-combination
Compose-bymultiplication*segment
All skills are the same as those in
model 1 except that
1. Circle is split into Circle
*backward*initial, Circle
*backward*repeat, Circle*forward,
2. Compose-by-addition is not split
All skills are the same as those in
model 1 except that
1. Circle is split into Circle
*backward*initial, Circle
*backward*repeat, Circle
*forward,
2. Compose-by-addition is split
into Compose-by-addition and
Compose-by-addition*segment
AIC: 3,884.95
AIC: 3,893.477
AIC: 3,887.42
BIC: 4,169.315
BIC: 4,171.523
BIC: 4,171.786
MAD: 0.075
MAD: 0.079
MAD: 0.077
Experiment 3 – Results




Recovered three skills (Circle, Parallelogram, Triangle)
=> distinctions made in the original model are necessary
Partially recovered two skills (Triangle, Trapezoid)
=> some original distinctions necessary, some are not
Did not recover one skill (Circle-CD)
=> original distinction may not be necessary
Recovered one skill (Pentagon) in a different way
=> Original distinction may not be as significant as
distinction caused by another factor
Beyond Experiments 1-3

Q: Can we use LFA to improve tutor
curriculum by identifying over-taught or
under-taught rules?


Thus adjust their contribution to curriculum length
without compromising student performance
A: Combine results from experiments 1-3
Beyond Experiments 1-3 -Results

Parallelogram-side is over taught.
 high intercept (2.06), low slope (-.01).


Trapezoid-height is under taught.



initial success probability .94, average number of practices per student is
15
low intercept (-1.55), positive slope (.27).
final success probability is .69, far away from the level of mastery, the
average number of practices per student is 4.
Suggestions for curriculum improvement


Reducing the amount of practice for Parallelogram-side should save
student time without compromising their performance.
More practice on Trapezoid-height is needed for students to reach
mastery.
Beyond Experiments 1-3 -Results

How about Compose-by-multiplication?
Intercept
CM
-.15
slope
Avg Practice Opportunties
.1
10.2
Initial Probability
.65
Avg Probability
Final Probability
.84
.92
With final probability .92 students seem to have mastered
Compose-by-multiplication.
Beyond Experiments 1-3 -- Results

However, after split
Intercept
CM
slope
Avg
Practice
Opportunties
Initial
Probability
Avg
Probability
Final
Probability
-.15
.1
10.2
.65
.84
.92
CMarea
-.009
.17
9
.64
.86
.96
CMsegment
-1.42
.48
1.9
.32
.54
.60
CMarea does well with final probability .96
But CMsegment has final probability only .60 and an average amount of
practice less than 2
Suggestions for curriculum improvement: increase the amount of practice for
CMsegment
Conclusions and Future Work


Learning Factors Analysis combines statistics,
human expertise, & combinatorial search to evaluate
& improve a cognitive model
System able to evaluate a model in seconds &
search 100s of models in 4-5 hours



Model statistics are meaningful
Improved models are interpretable & suggest tutor
improvement
Planning to use LFA for datasets from other tutors to
test potential for model & tutor improvement
END