Transcript Document

Addressing the Testing Challenge
with a Web-Based E-Assessment
System that Tutors as it Assesses
Mingyu Feng, Worcester Polytechnic Institute (WPI)
Neil T. Heffernan, Worcester Polytechnic Institute (WPI)
Kenneth R. Koedinger, Carnegie Mellon University (CMU)
The “ASSISTment” System


An e-assessment and e-learning system
that does both ASSISTing of students and
assessMENT (movie)

www.assistment.org

Massachusetts Comprehensive Assessment System
“MCAS”
Web-based system built on
Common Tutoring Object
Platform (CTOP) [1]
[1] Nuzzo-Jones., G. Macasek M.A., Walonoski, J., Rasmussen K. P., Heffernan, N.T., Common
Tutor Object Platform, an e-Learning Software Development Strategy, WPI technical report.
WPI-CS-TR-06-08.
May 25th, 2006
WWW’06
2
ASSISTment




We break multi-step problems
into “scaffolding questions”
“Hint Messages”: given on
demand that give hints about
what step to do next
“Buggy Message”: a context
sensitive feedback message
“Knowledge Components”:
Skills, Strategies, concepts



The state reports to teachers on
5 areas
We seek to report on 100
knowledge components
How does a student work with
the ASSISTment? (movie)
May 25th, 2006
WWW’06
The original question
(Demo/movie)
a. Congruence
b. Perimeter
c. Equation-Solving
The 1st scaffolding question
Congruence
The 2nd scaffolding question
Perimeter
A buggy message
A hint message
3
Goal


Help student Learning (this paper’s goal [2][3])
Assess students’ performance and present
results to teachers. (this work focused on)

Online “Grade book” report
[2] Razzaq, L., Feng, M., Nuzzo-Jones, G.,
Heffernan, N.T., Koedinger, K. R., Junker, B.,
Ritter, S., Knight, A., Aniszczyk, C., Choksey, S.,
Livak, T., Mercado, E., Turner, T.E., Upalekar. R,
Walonoski, J.A., Macasek. M.A., Rasmussen, K.P.
(2005). The Assistment Project: Blending
Assessment and Assisting. In C.K. Looi, G.
McCalla, B. Bredeweg, & J. Breuker (Eds.)
Proceedings of the 12th International Conference on
Artificial Intelligence In Education, 555-562.
Amsterdam: ISO Press.
[3] Razzaq, L., Heffernan, N.T. (in press).
Scaffolding vs. hints in the Assistment System. In
Ikeda, Ashley & Chan (Eds.). Proceedings of the
Eight International Conference on Intelligent
Tutoring Systems. Springer-Verlag: Berlin. pp. 635644. 2006.
May 25th, 2006
WWW’06
4
Outline for the talk


Part I: Using
Part II: Longitudinal Models tracking student
learning over time


Able to tell which schools provide the most
learning to students
Can we tell teachers which skills are being
learned
May 25th, 2006
WWW’06
5
Data Source



600+ students of two middle schools
Used the ASSISTment system every
other week from Sep. 2004 to June 2005
Real MCAS score


test taken in May 2005
2 paper and pencil based tests,
administered in Sep. 2004 and March
2005.
May 25th, 2006
WWW’06
6
Part I: Using Dynamic Measures

Research Questions


May 25th, 2006
Can we do a more accurate job of
predicting student's MCAS score using the
online assistance information (concerning
time, performance on scaffoldings,
#attempt, #hint)?
Can we do a better job predicting MCAS
in this online assessment system than the
tradition paper and pencil test does?
WWW’06
7
Part I: Using Dynamic Measures

Approach


Run forward stepwise linear regression to train up
regression models using different independent variables
Result
Model
Independent Variable’s
# Variables
Entered
R2
BIC+
MAD*
Model I
Paper practice results only
2
.588
-358
6.20881
Model II
The single online static metric of
percent correct on original
questions
1
.567
-343
6.21108
Model III
Model II plus all other online
measures
5
.663
-423
5.44183
+ BIC: Bayesian Information Criterion
May 25th, 2006
* MAD: Mean Absolute Deviance
WWW’06
8
Part I: Using Dynamic Measures
Model III
Order

Coeff.
Std. Coeff.
1
PERCENT_CORRECT
32.976
.425
2
AVG_ATTEMPT
-11.209
-.199
3
AVG_ITEM_TIME
-.037
-.143
4
AVG_HINT_REQUEST
-2.420
-.121
5
ORIGINAL_PERCENT_CORRECT
12.618
1.66
What do we see from Model III?

May 25th, 2006
Variables
the more hint, attempt, time a student need to solve
a problem, the worse his predicted score would be
WWW’06
9
Part II: Track Learning Longitudinally

Recall the problems of prediction in Grade book




What if we take time into consideration?
Research Questions




Only based on static measure (discussed in part I)
Time ignored  part II
Can our system detect performance improving over time?
Can we tell the difference on learning rate of students
from different schools? Teacher? (Who cares?)
Do students show difference on learning different skills?
Approach -- longitudinal data analysis
Note: Different from Razzaq, Feng et. al which looks at student performance gain over learning
opportunity pairs within the ASSISTment system, here “learning” includes students learning in
class too.
May 25th, 2006
WWW’06
10
Longitudinal Data Analysis

What do we get from a longitudinal model?

Average population trajectory for the specified group

Trajectory indicated by two parameters



 00
slope:
 10
The average estimated score for a group at time j is
j   00   10 *TIMEj
One trajectory for every single student

Each student got two parameters to vary from the group
average



intercept:
Intercept:
 00   0i
slope:
 10   1i
The estimated score for student i at time j is
ij  ( 00   0i )  ( 10   1i ) *TIMEj
Students’ initial knowledge is indicated by
intercept, while slope shows the learning rate
[4] Singer, J. D. & Willett, J. B. (2003). Applied Longitudinal Data Analysis: Modeling Change and Occurrence. Oxford University
May 25th,Press,
2006New York.
WWW’06
11
May 25th, 2006
WWW’06
12
17 Student from one class % Correct (YAxis) over a given month (X Axis)
Table 2. Regression Models
May 25th, 2006
WWW’06
13
May 25th, 2006
WWW’06
14
May 25th, 2006
WWW’06
15
Part II: Track Learning Longitudinally

Result


Unconditional model (model A) : no predictors
Growth model (model B)




estimated initial average PredictedScore = 18
estimated average monthly learning rate = 1.29
Observation : students were learning over time
Add in school/teacher/class (model
D/E/F)


BIC = 31712
Model D shows statistical significant
#param = 3
Diff = 84
advantage as measured by BIC
Unconditional growth model
Observation: students from different
BIC = 31628
(Model B, TIME)
#param = 6
schools differ on both incoming
Diff = 12
knowledge and learning rate
Model D
TIME + SCHOOL
May
25th,
Unconditional means model
(Model A, no predictor)
2006
WWW’06
BIC = 31616
#param = 8
Model E
TIME + TEACHER
BIC = 31672
#param = 20
Model F
TIME + CLASS
BIC = 31668
16 = 70
#param
Part II: Track Learning Longitudinally

The last question

Can we detect difference
on learning rate of
different skills?
May 25th, 2006
WWW’06
17
Percent Correct
Growth of 5 Skills over Time for One Student
80
70
60
50
40
30
20
10
0
Geometry
Algebra
Measurement
Data Analysis
Number Sence
Sept
Oct
Nov
Dec
Jan
Feb March
Time
May 25th, 2006
WWW’06
18
Grow th of 5 Skills over Time for One Student
80
Geometry
Percent Correct
70
60
Algebra
Measurement
50
Data Analysis
Number Sence
40
30
Linear (Geometry)
Linear (Data Analysis)
20
Linear (Algebra)
Linear (Measurement)
10
Linear (Number Sence)
0
Sept
Oct
Nov
Dec
Jan
Feb
March
Time
May 25th, 2006
WWW’06
19
Part II: Track Learning Longitudinally

The last question

Can we detect difference
on learning rate of
different skills?
Yes we can! In this paper we showed that we can the model with 5
skills to do a more accurate prediction of their own data.
Even more recent studies we have down have shown even
finer grain model (98 skills) are better at non-only predicting our
online data, but predicting the students test scores.
[7] Pardos, Z. A., Heffernan, N. T., Anderson, B. & Heffernan, C. (in press). Using Fine-Grained Skill
Models to Fit Student Performance with Bayesian Networks. Workshop in Educational Data Mining held
at the Eight International Conference on Intelligent Tutoring Systems. Taiwan. 2006.
[8] Feng, M., Heffernan, N., Mani, M., & Heffernan C. (in press). Using Mixed-Effects Modeling to
Compare Different Grain-Sized Skill Models. AAAI'06 Workshop on Educational Data Mining, Boston,
2006.
May 25th, 2006
WWW’06
20
Large Scale : ASSISTment project

ASSISTments are tagged with skills
May 25th, 2006
WWW’06
21
Large Scale : ASSISTment project

Are the skill/knowledge components mapping
any good?

Teachers get reports that they think are credible and
useful. [6]
[6] Feng, M., Heffernan, N.T. (in press). Informing Teachers Live about
Student Learning: Reporting in the Assistment System. To be published
in Technology, Instruction, Cognition, and Learning Journal Vol. 3. Old
City Publishing, Philadelphia, PA. 2006.
[7] Pardos, Z. A., Heffernan, N. T., Anderson, B. & Heffernan, C. (in
press). Using Fine-Grained Skill Models to Fit Student Performance with
Bayesian Networks. Workshop in Educational Data Mining held at the
Eight International Conference on Intelligent Tutoring Systems. Taiwan.
2006.
[8] Feng, M., Heffernan, N., Mani, M., & Heffernan C. (in press). Using
Mixed-Effects Modeling to Compare Different Grain-Sized Skill Models.
AAAI'06 Workshop on Educational Data Mining, Boston, 2006.
May 25th, 2006
WWW’06
22
May 25th, 2006
WWW’06
23
May 25th, 2006
WWW’06
24
Large Scale : ASSISTment project

We built 300 ASSISTments provided ~8
hours of content using the Builder [5]

Are the content we created good at producing
learning?


Do students learn from these? [2]
Good enough that its used by 1,500 8th graders in
Worcester, every two weeks as part of their math
class. (2nd year)
[5] Heffernan N.T., Turner T.E., Lourenco A.L.N., Macasek M.A., Nuzzo-Jones G., Koedinger K.R., The ASSISTment builder:
Towards an Analysis of Cost Effectiveness of ITS creation, Accepted by FLAIRS2006, Florida, USA (2006).
May 25th, 2006
WWW’06
25
Large Scale : ASSISTment project

Other work Using Hints and Attempts and Time


Can detect how is “gaming” and prevent it
Machine learning of user models
[9] Walonoski, J., Heffernan, N.T. (accepted). Detection and Analysis of Off-Task Gaming Behavior in
Intelligent Tutoring Systems. In Ikeda, Ashley & Chan (Eds.). Proceedings of the Eight International
Conference on Intelligent Tutoring Systems. Springer-Verlag: Berlin. pp. 382-391. 2006
[10] Walonoski, J., Heffernan, N. T. (accepted) Prevention of Off-Task Gaming Behavior in Intelligent
Tutoring Systems, Proceedings of the Eight International Conference on Intelligent Tutoring Systems.
May 25th, 2006
WWW’06
26
Conclusion



Our online assessment system did a better
job of predicting student knowledge by being
able to take into consideration how much
tutoring assistance was needed.
Promising evidence was found that the online
system was able to track students’ learning
during a year well.
We found that the system could reliably track
students’ learning of individual skills.
May 25th, 2006
WWW’06
27
Some of the ASSISTMENT TEAM
* This research
Leena RAZZAQ*, Mingyu FENG, Goss NUZZO-JONES, Neil T. HEFFERNAN,
Kenneth KOEDINGER+, Brian JUNKER+, Steven RITTER, Andrea KNIGHT+,
Carnegie Learning
Edwin MERCADO*, Terrence E. TURNER, Ruta UPALEKAR, Jason A. WALONOSKI
was made
possible by the
US Dept of
Education,
Institute of
Education
Science,
"Effective
Mathematics
Education
Research"
program grant
#R305K03140,
the Office of
Naval Research
grant # N0001403-1-0221, NSF
CAREER award
to Neil Heffernan,
and the Spencer
Foundation.
Authors Razzaq
and Mercado were
funded by the
National Science
Foundation under
Grant No.
0231773. All the
opinions in this
article are those of
the authors, and
not those of any
of the funders.
Michael A. MACASEK, Christopher ANISZCZYK, Sanket CHOKSEY, Tom LIVAK, Kai RASMUSSEN
Future work

Predict Student State Test Scores




Regression + longitudinal analysis [9]
Incorporate finer grained cognitive models
Item level prediction [8]
Apply the models in current reporting system
[9] Feng, M., Heffernan, N.T., & Koedinger, K.R. (in press). Predicting state test scores better with intelligent
tutoring systems: developing metrics to measure assistance required. In Ikeda, Ashley & Chan (Eds.).
Proceedings of the Eight International Conference on Intelligent Tutoring Systems. Springer-Verlag: Berlin. pp.
31-40. 2006.
[8] Feng, M., Heffernan, N., Mani, M., & Heffernan C. (2006, accepted). Using Mixed-Effects Modeling to
Compare Different Grain-Sized Skill Models. AAAI'06 Workshop on Educational Data Mining, Boston, 2006.
May 25th, 2006
WWW’06
29