slides - Anjo Anjewierden

Download Report

Transcript slides - Anjo Anjewierden

Towards educational data mining:
Using data mining methods for automated
chat analysis to understand and support
inquiry learning processes
Anjo Anjewierden, Bas Kollöffel and Casper Hulshof
Anjo Anjewierden
hdddddtp://anjo.blogs.com
Department of Instructional Technology
Faculty of Behavourial Sciences
University of Twente
The Netherlands
Overview (1)
•
•
•
•
•
•
Motivation
Classification of educational chats
Methods for automated analysis
Experiment
Results
Conclusions
Motivation
• Chats can structure collaborative learning
– Doing vs. doing and discussing with other learners
• Current use of chats is limited to
– Logging the messages for later analysis
• Our goals related to chat analysis
– Provide adaptive feedback based on on-line
analysis of the chats
– Make the learner part of the simulation by
visualising her actions and behaviour (e.g. through
avatars)
Approach
• Define models by which messages can be
classified
– One model is based on term usage
– Another model is based on the grammar
– Later we want to combine the models to find
"semantic patterns"
• Applying the models to each message of a
particular chat it can be assigned a class
– Aggregation of class assignments over time is
what an avatar can visualise
Inquiry learning
Learning environment
• Both learners see the same simulation
on two different screens
• One learner can run the simulation
• Learners use chat to discuss:
– Simulations to run, variable settings, etc.
– Interpretation of the results of simulations
– Which answer to give to a question
– etc.
Overview (2)
•
•
•
•
•
•
Motivation
Classification of educational chats
Methods for automated analysis
Experiment
Results
Conclusions
Classifications of chats
• Which functions should we distinguish in chat
messages?
• We use a classification proposed by Gijlers
and De Jong (2005):
–
–
–
–
Regulative: planning, monitoring, agreeing, etc.
Domain: transformative
Technical: about the learning environment
Social: greetings, compliments and other off-task
Examples
• Regulative:
– Ok // Yes // Next
– I think the answer is 3
– Perhaps we should try again
• Domain:
– The momentum becomes negative
– Speed of the red ball is 2 m/s
• Technical:
– Move the mouse to the right
• Social:
– Well done partner
Data used
• Chats collected by Nadira Saab for her Ph.D.
research (University of Amsterdam, 2005)
• Domain: simulations related to collisions (e.g.
momentum for elastic and inelastic collisions)
• Language: Dutch
• 78 chat sessions
• 16879 chat messages
Data normalisation
• Messages are extremely noisy
– Misspellings (accidental and on purpose)
– Chat language (w8 = wait)
– See paper for Dutch examples
• Messages have been manually
corrected to obtain words that can be
found in the dictionary
– Grammar has not been corrected
Overview (3)
•
•
•
•
•
•
Motivation
Classification of educational chats
Methods for automated analysis
Experiment
Results
Conclusions
Types of features
• For each class one can define
– Characterising terms (domain: speed, increases)
– Grammatical patterns:
• the speed increases (<article> <noun> <verb>)
• I think (<personal pronoun> <verb>)
– Both terms and syntactic patterns are used by
humans to classify the messages
• Data mining
– Discover the terms and patterns automatically
Words as features
• Each word in a message is a feature
– Order is not taken into account
– Smileys, !, ?, integers are separate words
• Example
– The answer is 5!!!! :-)
– Features: { answer, is, the, #, !, <smiley> }
(where # is any integer)
Grammar as features
•
Each message is parsed by a part-ofspeech (POS) tagger
– Determines role words play in a message (noun,
verb, etc.)
•
POS-sequences are a feature, if:
1. They occur at least 20 times, and
2. They do not fully overlap a longer sequence
•
Example:
1. the speed: {<article>, <noun>, <article> <noun>}
2. Remove full overlaps: {<article> <noun>}
Naive Bayes classifier
• Standard Naive Bayes classifier is used
– Once for the word features
– Once for the grammar features
• See paper for technical details
Overview (4)
•
•
•
•
•
•
Motivation
Classification of educational chats
Methods for automated analysis
Experiment
Results
Conclusions
Experiment
• Four researchers each classified 400
messages
– Randomly selected with a bias towards longer
messages (nearly all short messages are
regulative)
– 1280 unique messages were classified
• Expert manually checked whether the
classifications were "correct"
• Result was used to create two classification
models (words, grammar) using Naive Bayes
Overview (5)
•
•
•
•
•
•
Motivation
Classification of educational chats
Methods for automated analysis
Experiment
Results
Conclusions
Results by demonstration
Overview (6)
•
•
•
•
•
•
Motivation
Classification of educational chats
Methods for automated analysis
Experiment
Results
Conclusions
Conclusions
• Automatic classification of messages
– Naive Bayes works surprisingly well
• Even for a small feature set per item (chat)
• And for a large number of features over all items
– Sufficiently accurate for
• The classes we used
• Visualising aggregated learner behaviour through
avatars
• Misspellings are a source of concern
Future work
• Combining manual and automatic
classification
– Started: see interaction classification tool
– Can speed up chat coding in general (also for
research)
• Find "semantic patterns" in chats
– Based on combining information from the word
and grammar models
– Relate these "semantic patterns" to learner actions
in the simulation environment
Thank you!
• And thanks to
– Nadira Saab
– Hannie Gijlers
– Petra Hendrikse
– Sylvia van Borkulo
– Jan van der Meij
– Wouter van Joolingen
– and the anonymous reviewers