ITS2004 data mining - School of Computer Science
Download
Report
Transcript ITS2004 data mining - School of Computer Science
Carnegie
Mellon
Some Useful Design Tactics for Mining
ITS Data
Jack Mostow
Project LISTEN (www.cs.cmu.edu/~listen)
Carnegie Mellon University
Funding: National Science Foundation
ITS 04 Workshop on Analyzing Student-Tutor Interaction
Logs to Improve Educational Outcomes, Maceio, Brazil
Project LISTEN
1
7/22/2004
Carnegie
Mellon
Outline
1.
2.
3.
4.
Project LISTEN’s Reading Tutor
Modify tutor to get mineable data
Map data stream to analyzable data set
Mine data set to discover insights
Project LISTEN
2
7/22/2004
Carnegie
Mellon
Project LISTEN
Project LISTEN’s Reading Tutor (video)
3
7/22/2004
Carnegie
Mellon
Project LISTEN’s Reading Tutor (video)
John Rubin (2002). The Sounds of Speech (Show 3).
On Reading Rockets (Public Television series
commissioned by U.S. Department of Education).
Washington, DC: WETA.
Available at www.cs.cmu.edu/~listen.
Project LISTEN
4
7/22/2004
Carnegie
Mellon
Thanks to fellow LISTENers
Tutoring:
Field staff:
Dr. Joseph Beck, mining tutorial data
Prof. Albert Corbett, cognitive tutors
Prof. Rollanda O’Connor, reading
Prof. Kathy Ayres, stories for children
Joe Valeri, activities and interventions
Becky Kennedy, linguist
Listening:
Dr. Mosur Ravishankar, recognizer
Dr. Evandro Gouvea, acoustic training
John Helman, transcriber
Programmers:
Andrew Cuneo, application
Karen Wong, Teacher Tool
Project LISTEN
Dr. Roy Taylor
Kristin Bagwell
Julie Sleasman
Grad students:
Hao Cen, HCI
Cecily Heiner, MCALL
Peter Kant, Education
Shanna Tellerman, ETC
Plus:
Advisory board
Research partners
DePaul
UBC
U. Toronto
5
Schools
7/22/2004
Carnegie
Mellon
Project LISTEN’s Reading Tutor:
A rich source of experimental data
2003-2004 database:
9 schools
> 200 computers
> 50,000 sessions
> 1.5M tutor responses
> 10M words recognized
Embedded experiments
Randomized trials
Project LISTEN
6
7/22/2004
Carnegie
Mellon
Modify tutor to get mineable data
Log operations at grain size and level of interest
Click <x, y> at time t: motor control
Click “Goldilocks”: item selection
Reify operations to log them analyzably
Handwriting or speech typed input
Freehand drawing graphical palette (Geometry Tutor)
Free-form responses menu selection (Self 88)
Natural language sentence starters (Goodman 03)
Time student and tutor actions
Time allocation reflects motivation (ITS 02)
Hasty responses indicate guessing (TICL 04)
Latency reflects automaticity (TICL 04)
Project LISTEN
7
7/22/2004
Carnegie
Mellon
Modify tutor: add relevant data
Randomize tutorial decisions
What skill to test, what help to give
Probe skills
Assess cognitive development (Arroyo 00)
Test vocabulary words (IJAIE 01)
Insert automated comprehension questions (TICL 04)
Import student data
Gender, age, IQ (Shute 96)
Prior knowledge (Corbett 00)
Pretest scores (TICL 04)
Hand-label when appropriate
Transcribe (some) spoken input (FLET 04)
Project LISTEN
8
7/22/2004
Carnegie
Mellon
Modify tutor: an example
Randomize: explain some new words but not others.
Probe: test each new word the next day.
Did kids do better on explained vs. unexplained words?
Overall: NO; 38% 36%, N = 3,171 trials (IJAIE 01).
Rare, 1-sense words tested 1-2 days later: YES! 44% >> 26%, N = 189.
Project LISTEN
9
7/22/2004
Carnegie
Mellon
Map data stream to data set:
structure data into a single type
Data stream: heterogeneous events over time
Data set: elements with the same features
Segment into shorter episodes
Tutorial action(s) + student response (Beck 00)
Slice into narrower strands
Successive encounters of a specific word (AMLDP 98)
Successive instances of a specific skill (learning curves)
Measure aggregated events
Allocation of time among activities (ITS 02)
Formulate data as experimental trials
Context where the trial occurred
Decision made in this trial
Outcome based on subsequent events
Project LISTEN
10
7/22/2004
Map data stream to data set:
Carnegie
Mellon
Formulate data as experimental trials
Data stream:
Context:
Student is reading a story
‘People sit down and …’
Student needs help on a word
Student clicks ‘read.’
Tutor chooses what help to give
Decision (randomized)
Student continues reading
‘… read a book.’
Time passes…
Student sees word in a later sentence
Project LISTEN
11
‘I love to read stories.’
Outcome: read fluently?
7/22/2004
Carnegie
Mellon
Map data stream to data set: trials
Context:
Decision:
Outcome:
Stude nt_ID
Ta rge t_W ord
He lp_Type
Flue nt
mwb6-5-1996-05-02
sink
RhymesW ith
no
fJH8-4-1994-11-01
gnaw
StartsLike
yes
mDA5-5-1996-04-24
dirt
Autophonics
yes
mST6-6-1994-01-25
people
W ordInContext
yes
mGH6-6-1990-10-01
breakfast
SayW ord
no
mJK4-5-1995-12-16
YOU
Autophonics
no
fGA4-3-1995-10-25
home
RhymesW ith
yes
mBD7-9-1994-12-29
finally
Recue
yes
mCD4-8-1996-03-06
Three
OnsetRime
yes
fso5-8-1994-06-29
Stars
OnsetRime
yes
…
(191,487 more tria ls)
Project LISTEN
12
7/22/2004
Carnegie
Mellon
Mine data set to make discoveries
Count outcome frequency
Success rate of each help type (ICALL 04)
Fit a parametric model
Knowledge tracing (Corbett 95)
Train a model
Statistics, e.g. regression (TICL 04)
Machine learning, e.g. decision trees (AIED 01)
Project LISTEN
13
7/22/2004
Carnegie
Mellon
Count outcome frequency:
which help types worked best?
Best: Rhymes With 69.2% ± 0.4%
Worst: Recue 55.6% ± 0.4%
Compare within level to control for word difficulty.
Same day:
Later day:
Grade 1 words:
Say In Context,
Onset Rime
Onset Rime
Grade 2 words:
Say In Context,
Rhymes With
Rhymes With
Grade 3 words:
Say In Context
Rhymes With,
One Grapheme
Supplying the word helped best in the short term…
But rhyming hints had longer
lasting benefits.
Project LISTEN
14
7/22/2004
Carnegie
Mellon
Summary: modify, map, mine.
Modify tutor to make data mineable.
1.
Log, reify, time, hand-label, import, probe, randomize.
Map data streams to data sets.
2.
Segment, slice, measure.
Mine data set to make discoveries.
3.
Count, fit, train.
See videos, papers, etc. at www.cs.cmu.edu/~listen.
Thank you! Questions?
Project LISTEN
15
7/22/2004
Carnegie
Mellon
Modify tutor to get mineable data
word features
Project LISTEN
16
7/22/2004
Carnegie
Mellon
Structure of Reading Tutor database
Reading Tutor
Student
List readers
Session
List stories
Story Encounter
Show one
sentence at a time
Listens and
helps
Project LISTEN
Sentence Encounter
Word Encounter
17
Login
Pick stories
Read sentence
Read each word
7/22/2004
Carnegie
Mellon
Map data stream to data set:
formulate data as experimental trials
Context where the trial occurred
Decision made in this trial
Outcome based on subsequent events
Context
Student is
stuck
Before a new
word
Click on word
Project LISTEN
Decision
Prompt or
cough?
Explain it or
not?
What help to
give?
Outcome
Next event in
dialog
Test word next
day
Word read OK
next time?
18
FF 2000
IJAIE 01
SSSR 04
7/22/2004
Carnegie
Mellon
Learning curves for students’ help requests
.4
Try to predict subset
Selected data
Grade 1-2 level
1-6 prior encounters
.3
53 students
175,961 words
29,278 help requests
Reading level
.2
Grade 1
.1
Grade 2
Grade 3
0.0
Grade 4
Train predictive model
Count help requests 5x
Predict other kids’ data
71% accuracy
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Number of previous encounters
Project LISTEN
19
7/22/2004
Carnegie
Mellon
Count outcome frequency
(average success rate 66.1%)
Example: ‘People sit down and read a book.’
Whole word:
Analogy:
24,841 Say In Context
56,791 Say Word
Decomposition:
6,280 Syllabify
14,223 Onset Rime
19,677 Sound Out
22,933 One Grapheme
13,165 Rhymes With
13,671 Starts Like
Semantic:
14,685 Recue
2,285 Show Picture
488 Sound Effect
Which types stood out?
Best: Rhymes With 69.2% ± 0.4%
Worst: Recue 55.6% ± 0.4%
Project LISTEN
20
7/22/2004