Some issues and applications in cognitive

Download Report

Transcript Some issues and applications in cognitive

Some issues and applications in
cognitive diagnosis and
educational data mining
Brian W. Junker
Department of Statistics
Carnegie Mellon University
[email protected]
Presentation to the International Meeting of the Psychometric Society
Tokyo Japan, July 2007
1
Rough Outline
What to do when someone comes into my
office?
• Cognitive Diagnosis Models (CDM’s) in
Psychometrics Models: a partial review
• The Assistments Project: Using CDM’s in
a learning-embedded assessment system
• Educational Data Mining
2
What are CDM’s? How are they
related?
• Rupp (2007), Fu & Li (2007), Junker
(1999), Roussos (1994), and others
• Many definitions try to characterize what
the unique challenges are, but…
• A simple definition of CDM:
“A latent trait measurement model
useful for inferences about cognitive
states or processes”
3
“…Measurement Model useful for
inferences about cognitive…”
• Unidimensional Item Response Models
• Multidimensional Item Response Models
– Compensatory structure (e.g. Reckase, 1985, 1997)
– Multiplicative structure (e.g. Embretson, 1984, 1997)
– Task difficulty (LLTM; e.g. Fischer, 1995) vs. person
attribute modeling (MIRT, e.g. Reckase, 1997)
• (Constrained) Latent Class Models
– Macready and Dayton (1977); Haertel (1989); Maris
(1999);
• Bayes Net Models
– Mislevy et al (e.g. Mislevy, Steinberg, Yan & Almond
(1999)
– AI and data mining communities (more later…)
4
Constrained Latent Class Models
• Basic ingredients
• Xij is data (task/item response)
• Qjk is design (Q-matrix, skills, KC’s, transfer model…)
• ik is latent (knowledge state component of examinee)
[ i = (i1, …,iK) is a latent class label ]
5
Constrained Latent Class Models
• The Q matrix is the incidence matrix of a
bipartite graph
• All such models look like (one-layer)
discrete-node Bayesian networks.
6
Constrained Latent Class Models
• Relate ik to Xij probabilistically:
• Now it looks exactly like an IRT model
• What is the form of Pj(i)?
– Conjunctive (many forms!)
– Disjunctive (less common!)
– Other??
7
Two simple conjunctive forms…
• DINA
• Examined by Junker & Sijtsma (2001)
• Antecedents incl. Macready & Dayton (1977);
Haertel (1989); Tatsuoka (1983, 1995)
• Natural choice in educational data mining
• Difficult to assign credit/blame for failure
8
A second simple conjunctive form
• NIDA
• Also examined by Junker & Sijtsma (2001)
• Antecedents incl. Maris’ MCLCM (1999)
• Maybe more readily assign credit/blame
9
A generalization of NIDA
• RedRUM
• *j is maximal probability of success
• r*jk is penalty for each attribute not possessed
• Introduced by Hartz (2002); cf. DiBello, Stout &
Roussos (1995).
10
Compensatory & disjunctive
forms are also possible
• Weaver & Junker (2004, unpubl.)
• Looks like multidimensional Rasch model
• Plausible for some multi-strategy settings
– limited proportional reasoning domain
• DINO, NIDO, … (Rupp, 2007)
– Pathological gambling as in DSM-IV (Templin
11
& Henson, 2006)
A Common Framework
Mixed nonlinear logistic regression models
where
•  is a coefficient vector;
• h(i,Qj) is a vector of Qjk-weighted main effects
and interactions among latent attributes:
ikQjk, ik1Qjk1ik2Qjk2, ik1Qjk1ik2Qjk2ik3Qjk3 , …
Henson et al. (LCDM, 2007); von Davier (GDM,
2005)
12
LCDM’s / GDM’s
Obtain RedRUM, NIDA, DINA, DINO, etc.,
by constraining ’s!
• Weaker constraints on ’s: conjunctive disjunctive blends, etc.
• Potentially powerful
– unifying framework for many CDM’s
– exploratory modeling tool
13
Many general frameworks, model
choices and design choices
• Conceptual: Fu & Li (2007); Rupp (2007)
• Extensions: HO-DINA, MS-DINA and others (de
la Torre & Douglas, 2004, 2005); Fusion model
system (Roussos et al., in press); Bayes Nets
(Mislevy et al., 1999)
• Model Families: Henson et al. (2007); von
Davier (2005), etc.
What to do when someone comes into my office?
14
Example: ASSISTments Project
• Web-based 8th grade mathematics tutoring
system
• ASSIST with, and ASSESS, progress toward
Massachusetts Comprehensive Assessment
System Exam (MCAS)
• Main statistical/measurement goals
– Predict students’ MCAS scores at end of year
– Provide feedback to teachers
• Ken Koedinger (Carnegie Mellon),
Neil Heffernan (Worcester Polytechnic),
& over 50 others at CMU, WPI and Worcester
Public Schools
15
The ASSISTment Tutor
•
•
Main Items: Released MCAS or
“morphs”
Incorrect Main “Scaffold” Items
– “One-step” breakdowns of main
task
– Buggy feedback, hints on request,
etc.
•
Multiple Knowledge Component
(Q-matrix) models:
–
–
–
–
•
1 IRT 
5 MCAS math strands
39 MCAS standards
77-106 “expert coded” basic skills
Goals:
– Predict MCAS Scores
– KC Feedback: learned/notlearned, etc.
16
Goal: Predicting MCAS
• The exact content of the MCAS exam is not known
until months after it is given
• The ASSISTments themselves are ongoing throughout
the school year as students learn (from teachers, from
ASSISTment interactions, etc.).
% Correct on System per
student
40
35
30
25
20
15
10
5
0
Sep
0
t
Oct
1
Nov
2
Jan
Dec
3
Time
Jan
4
Feb
5
Mar
6
17
Methods: Predicting MCAS
• Regression approaches [Feng et al, 2006; Anozie &
Junker, 2006; Ayers & Junker, 2006/2007]:
–
–
–
–
Percent Correct on Main Questions
Percent Correct on Scaffold Questions
Rasch proficiency on Main Questions
Online metrics (efficiency and help-seeking; e.g. Campione et
al., 1985; Grigorenko & Sternberg, 1998)
– Both end-of-year and “month-by-month” models
• Bayes Net (DINA Model) approaches:
– Predicting KC-coded MCAS questions from Bayes Nets (DINA
model) applied to ASSISTments [Pardos, et al., 2006];
– Regression on number of KC’s mastered in DINA model [Anozie
2006]
18
Results: Predicting MCAS
Predictors
df
CV-MAD
CV-RMSE
1
7.18
8.65
7 months, main questions only
#KC’s of 77
1
learned (DINA)
6.63
8.62
3 months, mains and scaffolds
Rasch
Proficiency
1
5.90
7.18
7 months, main questions only
PctCorrMain +
4 metrics
35 5.46
6.56
7 months; 5 summaries each
month
Rasch Profic +
5 metrics
6
6.46
7 months, main questions only
PctCorrMain
5.24
Remarks
10-fold cross-validation using:
19
Results: Predicting MCAS
• Limits of what we can accomplish for prediction
– Feng et al. (in press) estimate best-possible
MAD ¼ 6 from split-half experiments with MCAS
– Ayers & Junker (2007) reliability calculation suggests
approximate bounds 1.05· MAD · 6.46.
– Best observed MAD ¼ 5.24
• Tradeoff:
– Greater model complexity (DINA) can help [Pardos et
al, 2006; Anozie, 2006];
– Accounting for question difficulty (Rasch), plus online
metrics, does as well [Ayers & Junker, 2007]
20
Goal: KC Feedback
•
Providing feedback on
– individual students
– groups of students
•
Multiple KC (Q-matrix) models:
–
–
–
–
•
1 IRT 
5 MCAS math strands
39 MCAS standards
106 “expert coded” basic skills
Scaffolding:
Optimal measures of single KC’s?
Optimal tutoring aids?
– When more than one transfer
model is involved, scaffolds fail to
line up with at least one of them!
•
Use DINA Model, 106 KC’s
21
Results: KC Feedback
• Average percent of
KC’s mastered:
30-40%
• February dip reflects
a recording error for
main questions
• Monthly split-half
cross-val accuracy
68-73% on average
22
Results: KC Feedback
23
Digression: Learning within DINA
• Current model “wakes up reborn” each month;
No data ! posterior falls back to prior ignoring
previous response behavior.
• Using last month’s posterior as this month’s prior
treats previous response behavior too strongly
(exchangeable with present).
• Wenyi Jiang (ongoing, CMU) is looking at
incorporating a Markov learning model for each
KC in DINA.
24
Digression: Question & KC Model
Characteristics
Main Item:
Which graph
contains the
points in the
table?
Scaffolds:
1.
2.
3.
4.
X
Y
-2
-3
-1
-1
1
3
Guess gj (posterior boxplots)
Slip sj (posterior boxplots)
Quadrant of (-2,-3)?
Quadrant of (-1,-1)?
Quadrant of (1,3)?
[Repeat main]
25
Some questions driven by
ASSISTments
• Different KC models for different purposes seem
necessary.
– How deeply meaningful are the KC’s?
• Q-matrix is QC! task design; what about task ! examinee
design?
– Henson & Douglas (2005) provide recent developments in KLbased item selection for CDM’s
– Most settings have designed, undesigned missingness
– Interactions between assignment design and learning
• How close to right does the CDM have to be?
– Douglas & Chui (2007) have started mis-specification studies
– Perhaps the Henson/von Davier frameworks can help?
– For ASSISTments and other settings, this is a sparse data model
fit question!
• How to design and improve the KC model?
26
Some options for
designing/improving KC model
• Expert Opinion, Iterations
• Rule space method (Tatsuoka 1983, 1995)
• Directly minimizing ij||ij – Xij|| as a function of Q
(Barnes 2005, 2006): Boolean regression & variable
generation/selection [related: Leenen et al., 2000]
• Learning Factors Analysis (Cen, Koedinger & Junker
2005, 2006): learning curve misfit is a better clue to
improving the Q-matrix than static performance misfit
27
From
www.educationaldatamining.org
• Educational Data Mining Workshop, at the 13th International
Conference on Artificial Intelligence in Education (AI-ED). Los
Angeles, California, USA. July 9, 2007.
• Workshop on Educational Data Mining, at the 7th IEEE International
Conference on Advanced Learning Technologies. Niigata, Japan.
During the period July 18-20, 2007.
• Workshop on Educational Data Mining at the 21st National
Conference on Artificial Intelligence (AAAI 2006). Boston, USA. July
16-17, 2006.
• Workshop on Educational Data Mining at the 8th International
Conference on Intelligent Tutoring Systems (ITS 2006). Jhongli,
Taiwan, 2006.
• Workshop on Educational Data Mining at the 20th National Conf. on
28
Artificial Intelligence (AAAI 2005). Pittsburgh, USA, 2005.
From AAAI 2005
•
Evaluating the Feasibility of Learning Student Models from Data Anders Jonnson, Jeff Johns, Hasmik Mehranian,
Ivon Arroyo, Beverly Woolf, Andrew Barto, Donald Fisher, and Sridhar Mahadevan
•
Topic Extraction from Item-Level Grades Titus Winters, Christian Shelton, Tom Payne, and Guobiao Mei
•
An Educational Data Mining Tool to Browse Tutor-Student Interactions: Time Will Tell! Jack Mostow, Joseph Beck,
Hao Cen, Andrew Cuneo, Evandro Gouvea, and Cecily Heiner
•
A Data Collection Framework for Capturing ITS Data Based on an Agent Communication Standard Olga
Medvedeva, Girish Chavan, and Rebecca S. Crowley
•
Data Mining Patterns of Thought Earl Hunt and Tara Madhyastha
•
The Q-matrix Method: Mining Student Response Data for Knowledge Tiffany Barnes
•
Automating Cognitive Model Improvement by A*Search and Logistic Regression Hao Cen, Kenneth Koedinger,
and Brian Junker
•
Looking for Sources of Error in Predicting Student’s Knowledge Mingyu Feng, Neil T. Heffernan, and Kenneth R.
Koedinger
•
Time and Attention: Students, Sessions, and Tasks Andrew Arnold, Richard Scheines, Joseph E. Beck, and Bill
Jerome
•
Logging Students’ Model-Based Learning and Inquiry Skills in Science Janice Gobert, Paul Horwitz, Barbara
Buckley, Amie Mansfield, Edmund Burke, and Dimitry Markman
29
Educational Data Mining
• Often very clever algorithms & data
management, not constrained by quant or
measurement traditions
• A strength (open to new approaches)
• A weakness (re-inventing the wheel, failing
to see where a well-understood difficulty
lies, etc)
30
Conclusions? Questions…
• Lots of options for CDM’s, not yet much practical
experience beyond “my model worked here”
• Significant design questions remain, and seem
to admit quantitative solutions
• Need to be connected to real projects
– real world constraints
– real world competitors in EDM
• It would be mutually advantageous to join with
EDM and draw EDM (partially?) into our
community…
Can we do it? Do we want to?
31
END
(references follow)
32
REFERENCES
•
Anozie, N.O. (2006). Investigating the utility of a conjunctive model in Q matrix assessment using
monthly student records in an online tutoring system. Proposal to the 2007 Annual Meeting of the
National Council on Research in Education.
•
Anozie, N.O. & Junker, B.W. (2006). Predicting end-of-year accountability assessment scores
from monthly student records in an online tutoring system. American Association for Artificial
Intelligence Workshop on Educational Data Mining (AAAI-06), July 17, 2006, Boston, MA.
•
Anozie, N.O. & Junker, B. W. (2007). Investigating the utility of a conjunctive model in Q matrix
assessment using monthly student records in an online tutoring system. Paper presented to the
Annual Meeting of the National Council on Research in Education. Chicago, IL.
•
Ayers, E. & Junker, B.W. (2006). Do skills combine additively to predict task difficulty in eighthgrade mathematics? American Association for Artificial Intelligence Workshop on Educational Data
Mining (AAAI-06), July 17, 2006, Boston, MA.
•
Ayers, E. & Junker, B. W. (2006). IRT modeling of tutor performance to predict end of year exam
scores. Submitted for publication.
•
Barnes, T. (2005). Q-matrix Method: Mining Student Response Data for Knowledge. In the
Proceedings of the AAAI-05 Workshop on Educational Data Mining, Pittsburgh, 2005 (AAAI
Technical Report #WS-05-02).
33
•
Barnes, T., J. Stamper, T. Madhyastha. (2006). Comparative analysis of concept derivation using
the q-matrix method and facets. Proceedings of the AAAI 21st National Conference on Artificial
Intelligence Educational Data Mining Workshop (AAAI2006), Boston, MA, July 17, 2006.
•
Campione, J.C., Brown, A.L., & Bryant, N.R. (1985). Individual di
erences in learning and memory. In R.J. Sternberg (Ed.). Human abilities: An informationprocessing approach, 103–126. New York: W.H. Freeman.
•
Cen, H., Koedinger K., & Junker B. (2005). Automating Cognitive Model Improvement by A
Search and Logistic Regression. In Technical Report (WS-05-02) of the AAAI-05 Workshop on
Educational Data Mining, Pittsburgh, 2005.
•
Cen, H., K. Koedinger, & B. Junker (2006). Learning factors analysis: a general method for
cognitive model evaluation and improvement. Presented at the Eighth International Conference on
Intelligent Tutoring Systems (ITS 2006), Jhongli, Taiwan.
•
Cen, H., K. Koedinger, & B. Junker (2007). Is more practice necessary? Improving learning
efficiency with the Cognitive Tutor through educational data mining. Presented at the 13th Annual
Conference on Artificial Intelligence in Education (AIED 2007), Los Angeles CA.
•
De la Torre, J. & Douglas, J. A. (2004). Higher-order latent trait models for cognitive diagnosis.
Psychometrika, 69, 333-353.
•
de la Torre, J., & Douglas, J. A. (2005). Modeling multiple strategies in cognitive diagnosis. Paper
presented at the annual meeting of the National Council on Measurement in Education, Montréal,
QC, Canada.
•
DiBello, L. V., Stout,W. F., & Roussos, L. A. (1995). Unified cognitive/psychometric diagnostic
assessment liklihood-based classification techniques. In P. D. Nichols, D. F. Chipman, & R. L.
Brennan (Eds.), Cognitively diagnostic assessment (pp. 361-389). Hillsdale, NJ: Lawrence
Erlbaum.
34
•
•
Douglas, J. A. & Chiu, C.-Y. (2007). Relationships Between Competing Latent Variable Models:
Implications of Model Misspecification. Paper presented at the Annual Meeting of the National
Council on Research in Education, Chicago IL.
•
Embretson, S. E. (1984). A General Latent Trait Model for Response Processes. Psychometrika,
49, 175–186.
•
Embretson, S. E. (1997). Multicomponent response models. Chapter 18, pp. 305-322 in van der
Linden W. and Hambleton, R. A. (1997). Handbook of Modern Item Response Theory. New York:
Springer.
•
Feng, M., Heffernan, N. T., & Koedinger, K. R. (2006). Predicting state test scores better with
intelligent tutoring systems: developing metrics to measure assistance required. In Ikeda, Ashley
& Chan (Eds.) Proceedings of the Eighth International Conference on Intelligent Tutoring
Systems. Springer-Verlag: Berlin. pp 31–40.
•
Fischer, G. H. (1995). The linear logistic test model. Chapter 8, pp. 131-156 in Fischer, G. H. &
Molenaar, I. (1995). Rasch Models: Foundations, Recent Developments, and Applications. New
York: Springer.
•
Fu, J., & Li, Y. (2007). Cognitively Diagnostic Psychometric Models: An Integrative Review. Paper
presented at the Annual Meeting of the National Council on Measurement in Education. Chicago
IL.
•
Grigorenko, E. L. and Sternberg, R. J. (1998). Dynamic testing. Psychological Bulletin, 124, 75–
111.
•
Haertel, E. H. (1989). Using restricted latent class models to map the skill structure of
achievement items. Journal of Educational Measurement, 26, 301–321.
35
•
Hartz, S. M. (2002). A Bayesian framework for the unified model for assessing cognitive abilities:
Blending theory with practicality. Unpublished doctoral dissertation, University of Illinois at UrbanaChampaign, Urbana-Champaign, IL.
•
Henson, R. A., Templin, J. L., & Willse, J. T. (2007). Defining a family of cognitive diagnosis
models using log-linear models with latent variables. Invited paper presented at the Annual
Meeting of the National Council on Measurement in Education. Chicago IL.
•
Henson, R. A., & Douglas, J. (2005). Test Construction for Cognitive Diagnosis. Applied
Psychological Measurement, 29, 262–277.
•
Junker, B. W., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and
connections with nonparametric item response theory. Applied Psychological Measurement, 12,
55-73.
•
Leenen, I., Van Mechelen, I., & Gelman, A. (2000). Bayesian probabilistic extensions of a
deterministic classification model. Computational Statistics, 15, 355-371.
•
Maris, E. (1999). Estimating multiple classification latent class models. Psychometrika, 64, 187212.
•
Mislevy, R.J., Almond, R.G., Yan, D., & Steinberg, L.S. (1999). Bayes nets in educational
assessment: Where do the numbers come from? In K.B. Laskey & H.Prade (Eds.), Proceedings of
the Fifteenth Conference on Uncertainty in Artificial Intelligence (437-446). San Francisco: Morgan
Kaufmann.
•
Macready, G. B., & Dayton, C. M. (1977). The use of probabilistic models in the assessment of
mastery. Journal of Education Statistics, 33, 379-416.
36
•
Pardos, Z. A., Heffernan, N. T., Anderson, B., & Heffernan, C. L. (2006). Using Fine Grained Skill
Models to Fit Student Performance with Bayesian Networks. Workshop in Educational Data
Mining held at the Eighth International Conference on Intelligent Tutoring Systems. Taiwan. 2006.
•
Reckase, M. D. (1985). The difficulty of test items that measure more than one ability. Applied
Psychological Measurement, 9, 401-412.
•
Reckase, M. D. (1997). A linear logistic multidimensional model for dichotomous item response
data. In W. J. van der Lindern & R. K. Hambleton (Eds.), Handbook of modern item response
theory (pp. 271-286). New York, NY: Springer-Verlag.
•
Roussos, L. (1994). Summary and review of cognitive diagnosis models. Unpublished manuscript.
•
Roussos, L., diBello, L. V., Stout, W., Hartz, S., Henson, R. A., & Templin, J. H. (in press). The
fusion model skills diagnosis system. In J. P. Leighton, & Gierl, M. J. (Ed.), Cognitively diagnostic
assessment for education: Theory and practice. Thousand Oaks, CA: SAGE.
•
Rupp, A. A. (2007). Unique Characteristics of Cognitive Diagnosis Models. Invited paper
presented at the Annual Meeting of the National Council on Measurement in Education. Chicago
IL.
•
Tatsuoka, K. K. (1983). Rule space: an apporach for dealing with misconceptions based on item
response theory. Journal of Educational Measurement, 20, 345-354.
•
Tatsuoka, K. K. (1995). Architecture of knowledge structures and cognitive diagnosis: a statistical
pattern recognition and classification approach. Chapter 14 in Nichols, P. D., Chipman, S. F. and
Brennan, R. L. (eds.) (1995). Cognitively diagnostic assessment. Hillsdale, NJ: Lawrence Erlbaum
Associates.
37
•
Templin, J., & Henson, R.(2006). Measurement of psychological disorders using
cognitive diagnosis models. Psychological Methods, 11, 287-305.
•
Weaver, R. & Junker, B. W. (2004). Investigating the foundations of a cognitively
diagnostic assessment through both traditional psychometric and skills-based
measurement models: Advanced Data Analysis Final Report. Unpublished technical
report. Pittsburgh, PA: Department of Statistics, Carnegie Mellon University.
•
von Davier, M. (2005). A General Diagnostic Model Applied to Language Testing
Data. Technical Report RR-05-16. Princeton NJ: Educational Testing Service.
38