154 - Modelling the way - International Educational Data Mining

Download Report

Transcript 154 - Modelling the way - International Educational Data Mining

Diagnosis through
problem solving approaches
Kelvin H R Ng | Kevin Hartman | Kai Liu | Andy W H Khong
Nanyang Technological University, Singapore
1
Overview
Singapore
mathematics
pedagogy
RIGHT method
Objective
Platform
design
Data
collection
Preprocessing
Clustering
Discussion
2
Singapore Math – Second Grade Subtraction
Question:
Mr Chew has 39 Mathematics workbooks on his table. He
has 3 fewer English workbooks than Mathematics
workbooks on his table. How many English workbooks are
there?
39
Math
3
English
?
3
Singapore Math – Second Grade Multiplication
Question:
There are 5 plates of food. Each plate has 3 pies. How
many pies are there altogether?
?
3
3
3
3
3
4
1
RIGHT
R
I
G
H
T
1
• Read the word problem
• Identify the nouns, numeric values, and
unknown variable
• Graph these values in a box diagram
• Have it done, the appropriate calculation
by reasoning through the diagram
• Triple check and review their work
Polya, George. How to solve it: A new aspect of mathematical method. Princeton university press, 2014.
5
Objectives
• Identifying problem solving approaches
• Differentiating schematic and script-like process
Script1
A collection of discrete actions,
followed to achieve a goal or
specific outcome
Ordering food at a restaurant
A consolidation of known
methods to achieve a general
goal
Schema2
Methods to obtain meals
(ordering food, cooking etc)
1
2
Abelson, R. P. (1981). Psychological status of the script concept. American psychologist, 36(7), 715.
6
Rumelhart, D. and Ortony, A.(1979). The representation of knowledge in memory. Representation and understanding: Studies in cognitive science (Bobrow, DG, and Collins, A.(Eds.)), 211-236.
Platform Design
7
Platform Design
8
Platform Design
9
Platform Design
10
Platform Design
11
Platform Design
12
Platform Design
13
Platform Design
14
Platform Design
Structured
MultipleChoice
Lecture
Unstructured
15
Overview
Singapore
mathematics
pedagogy
RIGHT method
Objective
Platform
design
Data
Collection
Preprocessing
Clustering
Discussion
16
Data Collection
• E-learning arithmetic module
• Addition, subtraction, multiplication, division
• 36 second-grade students
• Non-compulsory holiday assignment
1st Grade
Current
Progress
2nd Grade
Model drawing
for 1-step
addition/
subtraction
3rd Grade
Model drawing
for 1-step
multiplication/
division
17
Data Preprocessing
• Non-first MCQ choices → alter choice
‒ Indicators for highlighting and undo events
‒ Model template selection activity
‒ Remove correctness from true/false action
Each action sequence starts from
Initiating a problem
to
Moving onto another problem/activity
18
Clustering
• Affinity propagation
• Pairwise similarity
•
•
•
•
Sequence length difference
Jaccard distance
Common word order
Inverse document frequency
19
Common word order
• Capturing the order of common atomic units between two sequences
CWO S1 , S2
𝑙
𝑖=1
𝑥𝑖 − 𝑦𝑢
,
𝑙2
=
2 𝑙𝑖=1 𝑥𝑖 − 𝑦𝑢
1−
,
𝑙2 − 1
1,
1−
2
A brown dog is eating near the wall
5 wall
4
6
2
A1 brown
dog is3 eating
near the
𝑖𝑓 𝑙 𝑖𝑠 𝑒𝑣𝑒𝑛
𝑖𝑓 𝑙 𝑖𝑠 𝑜𝑑𝑑
𝑖𝑓 𝑙 𝑖𝑠 𝑜𝑑𝑑 𝑎𝑛𝑑 𝑙 = 1
A cat is eating beside the brown wall
3 eating
4
5 brown
2
6
1 cat is
A
beside the
wall
20
Common word order
• Originally designed for text
• Sparse bag-of-words
SL-AT-AT-SM-HL-AT-SM
SL-AT-AT–SM–HL–AT-SM
1– 1–AT–SM–HL–AT– 1
SL-AT-SM–AT–AT–HL-SM
SL-AT- -SM- -AT-AT-HL-SM
1– 1–SM–AT–AT–HL– 1
1– 1– 2 – 3 –– 4 –– 5 – 1
1– 1– 3 – 2 – 5–
– 4–1
SL – select MCQ choice
HL – highlight keyword
SM – submit answer
AT – alter MCQ choice
21
Jaccard Distance
• Dissimilarity of unique terms
𝐽𝑎𝑐𝑐𝑎𝑟𝑑𝐷𝑖𝑠𝑡 𝑆1 , 𝑆2 = 1 − 𝐽𝑎𝑐𝑐𝑎𝑟𝑑𝑆𝑖𝑚(𝑆1 , 𝑆2
JaccardSim S1 , S2 =
S1 ∩ S2
S1 ∪ S2
Term rarity
• Capture rarity of non-common terms
𝑖𝑑𝑓 𝑡𝑖 , 𝐷 = log
𝑇𝑅 =
max
𝐷
𝑑 ∈ 𝐷: 𝑡𝑖 ∈ 𝑑
𝑎∈ 𝑆1 ∪𝑆2 \(𝑆1 ∩𝑆2
𝑖𝑑𝑓 𝑎𝑖 , 𝐷
Sequence length difference
𝑙𝑑𝑖𝑗 = 𝑎𝑏𝑠 𝑙𝑒𝑛𝑔𝑡ℎ 𝑆1 − 𝑙𝑒𝑛𝑔𝑡ℎ 𝑆2
22
Clustering
• Affinity propagation
• Pairwise similarity
•
•
•
•
Sequence length difference
Jaccard distance
Common word order
Inverse document frequency
𝑑𝑖𝑠𝑡 𝑆1 , 𝑆2 =
𝑤1 ∗ 𝐽𝑎𝑐𝑐𝑎𝑟𝑑𝐷𝑖𝑠𝑡 𝑆1 , 𝑆2 + 𝑤2 ∗ 𝐶𝑊𝑂 𝑆1 , 𝑆2
+𝑤3 ∗ max 𝑖𝑑𝑓𝑡𝑗 ∉𝑆1∩𝑆2 𝑡𝑗 , 𝐷
+ 𝑤4 ∗ 𝑎𝑏𝑠 𝑙𝑒𝑛𝑔𝑡ℎ 𝑆1 − 𝑙𝑒𝑛𝑔𝑡ℎ 𝑆2
23
Post-Clustering
• Sequential pattern mining1 to summarize cluster for descriptive label
• Merging cluster with similar semantics2
• Bypass order of actions between delimiters
Action sequence archetype (ASA) ↔ Cluster__________
1
Hu, Y. H., Wu, F., & Liao, Y. J. (2013). An efficient tree-based algorithm for mining sequential patterns with multiple minimum
supports. Journal of Systems and Software, 86(5), 1224-1238.
2 Southavilay, V., Markauskaite, L., & Jacobson, M. (2013, July). From" Events" to" Activities": Creating Abstraction Techniques
for Mining Students' Model-Based Inquiry Processes. In Educational Data Mining 2013.
24
Discussion
Sequential Pattern Mining
Video
Structured
Unstructured
MCQ
Phase I
35
138
270
144
Phase II
30
93
127
52
AP Clustering
Video
Structured
Unstructured
MCQ
Phase I
18
89
92
20
Phase II
11
25
21
9
• Further filter by semantics
Phase I & II
Video
Structured
Unstructured
10
11
15
25
Early Prediction of Stop-out
Logistic Regression
Variable Set
Accuracy
Activity
Stop-out
Persist
Structured
53.08%
55.23%
Unstructured
54.56%
69.81%
0%
91.84%
MCQ
Score-based
Sequencebased
Variables
Accuracy
Kappa Statistics
Structured
48.00%
-0.06
Structured +
Unstructured
66.67%
0.43
MCQ
Videos
Structured
Unstructured
100.00%
75.00%
81.48%
81.82%
1.00
0.48
0.61
0.63
26
ASA Diagnosis
• Students who initiated videos tend to go further
• In structured activity (scaffolded), students who display schematic variants
progress further
• In unstructured activity, students who fall back onto scripts are more
perseverant
27
Now that we know the different
approaches student use,
what do we do next?
Kelvin H R Ng | Kevin Hartman | Kai Liu | Andy W H Khong
Nanyang Technological University
28