ITS2004 data mining - School of Computer Science

Download Report

Transcript ITS2004 data mining - School of Computer Science

Carnegie
Mellon
Some Useful Design Tactics for Mining
ITS Data
Jack Mostow
Project LISTEN (www.cs.cmu.edu/~listen)
Carnegie Mellon University
Funding: National Science Foundation
ITS 04 Workshop on Analyzing Student-Tutor Interaction
Logs to Improve Educational Outcomes, Maceio, Brazil
Project LISTEN
1
7/22/2004
Carnegie
Mellon
Outline
1.
2.
3.
4.
Project LISTEN’s Reading Tutor
Modify tutor to get mineable data
Map data stream to analyzable data set
Mine data set to discover insights
Project LISTEN
2
7/22/2004
Carnegie
Mellon
Project LISTEN
Project LISTEN’s Reading Tutor (video)
3
7/22/2004
Carnegie
Mellon
Project LISTEN’s Reading Tutor (video)
John Rubin (2002). The Sounds of Speech (Show 3).
On Reading Rockets (Public Television series
commissioned by U.S. Department of Education).
Washington, DC: WETA.
Available at www.cs.cmu.edu/~listen.
Project LISTEN
4
7/22/2004
Carnegie
Mellon
Thanks to fellow LISTENers
Tutoring:






Field staff:
Dr. Joseph Beck, mining tutorial data
Prof. Albert Corbett, cognitive tutors
Prof. Rollanda O’Connor, reading
Prof. Kathy Ayres, stories for children
Joe Valeri, activities and interventions
Becky Kennedy, linguist
Listening:
 Dr. Mosur Ravishankar, recognizer
 Dr. Evandro Gouvea, acoustic training
 John Helman, transcriber
Programmers:
 Andrew Cuneo, application
 Karen Wong, Teacher Tool
Project LISTEN
 Dr. Roy Taylor
 Kristin Bagwell
 Julie Sleasman
Grad students:




Hao Cen, HCI
Cecily Heiner, MCALL
Peter Kant, Education
Shanna Tellerman, ETC
Plus:
 Advisory board
 Research partners
 DePaul
 UBC
 U. Toronto
5
 Schools
7/22/2004
Carnegie
Mellon
Project LISTEN’s Reading Tutor:
A rich source of experimental data
2003-2004 database:






9 schools
> 200 computers
> 50,000 sessions
> 1.5M tutor responses
> 10M words recognized
Embedded experiments
 Randomized trials
Project LISTEN
6
7/22/2004
Carnegie
Mellon
Modify tutor to get mineable data
Log operations at grain size and level of interest
 Click <x, y> at time t: motor control
 Click “Goldilocks”: item selection
Reify operations to log them analyzably




Handwriting or speech  typed input
Freehand drawing  graphical palette (Geometry Tutor)
Free-form responses  menu selection (Self 88)
Natural language  sentence starters (Goodman 03)
Time student and tutor actions
 Time allocation reflects motivation (ITS 02)
 Hasty responses indicate guessing (TICL 04)
 Latency reflects automaticity (TICL 04)
Project LISTEN
7
7/22/2004
Carnegie
Mellon
Modify tutor: add relevant data
Randomize tutorial decisions
 What skill to test, what help to give
Probe skills
 Assess cognitive development (Arroyo 00)
 Test vocabulary words (IJAIE 01)
 Insert automated comprehension questions (TICL 04)
Import student data
 Gender, age, IQ (Shute 96)
 Prior knowledge (Corbett 00)
 Pretest scores (TICL 04)
Hand-label when appropriate
 Transcribe (some) spoken input (FLET 04)
Project LISTEN
8
7/22/2004
Carnegie
Mellon
Modify tutor: an example
Randomize: explain some new words but not others.
Probe: test each new word the next day.
Did kids do better on explained vs. unexplained words?
 Overall: NO; 38%  36%, N = 3,171 trials (IJAIE 01).
 Rare, 1-sense words tested 1-2 days later: YES! 44% >> 26%, N = 189.
Project LISTEN
9
7/22/2004
Carnegie
Mellon
Map data stream to data set:
structure data into a single type
 Data stream: heterogeneous events over time
 Data set: elements with the same features
Segment into shorter episodes
 Tutorial action(s) + student response (Beck 00)
Slice into narrower strands
 Successive encounters of a specific word (AMLDP 98)
 Successive instances of a specific skill (learning curves)
Measure aggregated events
 Allocation of time among activities (ITS 02)
Formulate data as experimental trials
 Context where the trial occurred
 Decision made in this trial
 Outcome based on subsequent events
Project LISTEN
10
7/22/2004
Map data stream to data set:
Carnegie
Mellon
Formulate data as experimental trials
Data stream:
Context:
Student is reading a story
‘People sit down and …’
Student needs help on a word
Student clicks ‘read.’
Tutor chooses what help to give
Decision (randomized)
Student continues reading
‘… read a book.’
Time passes…
Student sees word in a later sentence
Project LISTEN
11
‘I love to read stories.’
Outcome: read fluently?
7/22/2004
Carnegie
Mellon
Map data stream to data set: trials
Context:
Decision:
Outcome:
Stude nt_ID
Ta rge t_W ord
He lp_Type
Flue nt
mwb6-5-1996-05-02
sink
RhymesW ith
no
fJH8-4-1994-11-01
gnaw
StartsLike
yes
mDA5-5-1996-04-24
dirt
Autophonics
yes
mST6-6-1994-01-25
people
W ordInContext
yes
mGH6-6-1990-10-01
breakfast
SayW ord
no
mJK4-5-1995-12-16
YOU
Autophonics
no
fGA4-3-1995-10-25
home
RhymesW ith
yes
mBD7-9-1994-12-29
finally
Recue
yes
mCD4-8-1996-03-06
Three
OnsetRime
yes
fso5-8-1994-06-29
Stars
OnsetRime
yes
…
(191,487 more tria ls)
Project LISTEN
12
7/22/2004
Carnegie
Mellon
Mine data set to make discoveries
Count outcome frequency
 Success rate of each help type (ICALL 04)
Fit a parametric model
 Knowledge tracing (Corbett 95)
Train a model
 Statistics, e.g. regression (TICL 04)
 Machine learning, e.g. decision trees (AIED 01)
Project LISTEN
13
7/22/2004
Carnegie
Mellon
Count outcome frequency:
which help types worked best?
 Best: Rhymes With 69.2% ± 0.4%
 Worst: Recue 55.6% ± 0.4%
Compare within level to control for word difficulty.
Same day:
Later day:
Grade 1 words:
Say In Context,
Onset Rime
Onset Rime
Grade 2 words:
Say In Context,
Rhymes With
Rhymes With
Grade 3 words:
Say In Context
Rhymes With,
One Grapheme
Supplying the word helped best in the short term…
But rhyming hints had longer
lasting benefits.
Project LISTEN
14
7/22/2004
Carnegie
Mellon
Summary: modify, map, mine.
Modify tutor to make data mineable.
1.

Log, reify, time, hand-label, import, probe, randomize.
Map data streams to data sets.
2.

Segment, slice, measure.
Mine data set to make discoveries.
3.

Count, fit, train.
See videos, papers, etc. at www.cs.cmu.edu/~listen.
Thank you! Questions?
Project LISTEN
15
7/22/2004
Carnegie
Mellon
Modify tutor to get mineable data
word features
Project LISTEN
16
7/22/2004
Carnegie
Mellon
Structure of Reading Tutor database
Reading Tutor
Student
List readers
Session
List stories
Story Encounter
Show one
sentence at a time
Listens and
helps
Project LISTEN
Sentence Encounter
Word Encounter
17
Login
Pick stories
Read sentence
Read each word
7/22/2004
Carnegie
Mellon
Map data stream to data set:
formulate data as experimental trials
 Context where the trial occurred
 Decision made in this trial
 Outcome based on subsequent events
Context
Student is
stuck
Before a new
word
Click on word
Project LISTEN
Decision
Prompt or
cough?
Explain it or
not?
What help to
give?
Outcome
Next event in
dialog
Test word next
day
Word read OK
next time?
18
FF 2000
IJAIE 01
SSSR 04
7/22/2004
Carnegie
Mellon
Learning curves for students’ help requests
.4
Try to predict subset
Selected data
 Grade 1-2 level
 1-6 prior encounters
.3
 53 students
 175,961 words
 29,278 help requests
Reading level
.2
Grade 1
.1
Grade 2
Grade 3
0.0
Grade 4
Train predictive model
 Count help requests 5x
 Predict other kids’ data
 71% accuracy
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Number of previous encounters
Project LISTEN
19
7/22/2004
Carnegie
Mellon
Count outcome frequency
(average success rate 66.1%)
Example: ‘People sit down and read a book.’
Whole word:
Analogy:
 24,841 Say In Context
 56,791 Say Word
Decomposition:
 6,280 Syllabify
 14,223 Onset Rime
 19,677 Sound Out
 22,933 One Grapheme
 13,165 Rhymes With
 13,671 Starts Like
Semantic:
 14,685 Recue
 2,285 Show Picture
 488 Sound Effect
Which types stood out?
 Best: Rhymes With 69.2% ± 0.4%
 Worst: Recue 55.6% ± 0.4%
Project LISTEN
20
7/22/2004