Rachel Hogue`s presentation on Big Data in Education
Download
Report
Transcript Rachel Hogue`s presentation on Big Data in Education
Big Data
in Education
Rachel Hogue
Overview
Big Data and Education Communities
Why Collect Educational Data?
Learning Theories
eLearning
What Data Can We Collect?
Examples of eLearning Companies and their Use of Big Data
Data Analysis
Existing implementations using educational data
Methods that work well for educational data
MOOCdb
Privacy concerns
Communities
International Educational Data Mining Society
Founded July 2011
EDM workshop in 2005 (at Association for Advancement of
Artificial Intelligence)
EDM conference in 2008
Journal of Educational Data Mining (JEDM) since 2009
Society for Learning Analytics Research
First conference: Learning Analytics and Knowledge (LAK)
2011
Journal of Learning Analytics, founded 2012
Why Collect Educational Data?
Why Collect Educational Data?
Personalize education
Better assessment of learners
Multiple dimensions: social, cognitive, emotional, meta-
cognitive
Multiple levels: individual, group, institutional levels
To promote new scientific discoveries and to advance
learning sciences
Many theories; little hard data to support them
Opportunity to discover new learning patterns
Why Collect Educational Data?
“Not only can you look at unique learning trajectories of individuals,
but the sophistication of the models of learning goes up enormously.”
Arthur Graesser, Editor,
Journal of Educational Psychology
A Look Backwards
Collecting educational data was highly resource-intensive and
difficult to scale
Much of the data that was easily collectible was purely
summative in nature
Getting data on learning processes and learner behaviors, in
field settings, required methods like
Quantitative field observations
Video recordings
Think-Aloud studies
None of which scale easily
Learning Types
Learning Types
Visual (spatial)
Auditory
Kinesthetic / haptic
Learning Theories
Problem-Based Learning
Anchored Instruction
Cognitive Apprenticeship
Situated Learning
eLearning
eLearning
WBI –Web Based Instruction
Learning technology
Networking and computing technologies are used to improve
educational practices
eLearning
WBI –Web Based Instruction
Learning technology
Networking and computing technologies are used to improve
educational practices
MOOC
Massive Online Open Course
eLearning
What Data Can We Collect?
What Data Can We Collect?
Administrative data - who are you?
Address, name, birth date
Content data – inferred properties about material
Difficulty, subject
Longitudinal data - data from a long period of time
Grades
Standardized testing results
Time on task
Attendance
Click patterns
How long a student holds a mouse pointer over a particular answer
What Data is Available Already?
PSLC DataShop
a central repository to secure and store research data
a set of analysis and reporting tools
>250,000 hours of students using educational software
>30 million student actions, responses & annotations
Actions: entering an equation, manipulating a vector, typing a
phrase, requesting help
Responses: error feedback, strategic hints
Annotations: correctness, time, skill/concept
http://pslcdatashop.org/about/
Online Education Formats
Video
Online modules
Written documents
Audio files
Instructions for activity or task
CourseSmart
Embeds technology directly into digital textbooks
Provides an “engagement index score”, which measures how
much students are interacting with their eTextbooks (viewing
pages, highlighting, writing notes, etc.).
Researchers have found that that the engagement index score
helps instructors to accurately predict student outcomes
more than traditional measurement methods, such as class
participation.
duoLingo
Site and smartphone app to help people learn foreign
languages
Luis von Ahn
Professor at Carnegie Mellon
CAPTCHA and reCAPTCHA
“twofer”
Data from duoLingo
How long does it take someone to become proficient in a
certain aspect of a language?
How much practice is optimal?
What is the consequence of missing a few days?
There are theories about learning languages, such as the idea
that adjectives should be taught before adverbs, but
previously, there was little hard data to support these
theories
Conclusions from duoLingo Data
The best way to teach a language depends on the students’
native tongue and the language they’re trying to acquire
Example: Spanish -> English
“it” tends to confuse and create anxiety for Spanish speakers,
since the word doesn’t easily translate into their language
Women do better at sports terms
Men do better at cooking and food terms
In Italy, women as a group learn English better than men
Learning Analytics Implementations
Still very few
Knewton : https://www.youtube.com/watch?v=LldxxVRj4FU
Signals project at Purdue University:
http://www.educause.edu/ero/article/signals-applyingacademic-analytics
Ellucian Degree Works, “a comprehensive academic advising,
transfer articulation, and degree audit solution that aligns
students, advisors, and institutions to a common goal: helping
students graduate on time.”
Blackboard Analytics http://www.blackboard.com/Platforms/Analytics/Overview.asp
x
Analysis Methods
Prediction
Structure Discovery
Relationship Mining
Prediction
Develop a model which can infer a single aspect of the data
(predicted variable) from some combination of other aspects
of the data (predictor variables)
Which students are off-task?
Which students will fail the class?
Structure Discovery
Find structure and patterns in the data that emerge
“naturally”
No specific target or predictor variable
Relationship Mining
Discover relationships between variables in a data set with
many variables
Correlation or causation
MOOCdb
Collaborative, online learning research
Different Formats of Data
SQL Dump
Student state information
XML files
Course information
EdX Platform
Emails and Surveys
JSON lines
Clickstream data
Multiple Platforms and Data Control
EdX and Coursera
Controlled by MIT and Stanford, separate entities
Data model to organize raw data streams
Unifies different platforms
MOOCdb
Each class:
Student Information Tables
Observations Tables
Submissions Tables
Collaboration Tables
Feedback Tables
Benefits of MOOCdb
Public, shared data model; avoid redundant work
Foster analytic consistency
Engage more people
MOOCviz
MOOCviz
Resource use compared by country
Privacy Concerns
Hardcopy records were phased out in favor of district-based
hard drive storage some time ago, but the advent of cloud
computing has seen a trend toward the creation of thirdparty data silos (or clouds).
Teachers and parents are concerned about privacy breaches
by hackers and marketers
InBloom
Gates-funded nonprofit that houses student data in the cloud
Closed its doors after parental protest
Privacy Concerns
This past May, the Obama administration released an 85-page
report on big data and its use in the US among consumers
and businesses
"Big data and other technological innovations, including new online
course platforms that provide students real time feedback, promise to
transform education by personalizing learning. At the same time, the
federal government must ensure educational data linked to
individual students gathered in school is used for educational
purposes, and protect students against their data being shared or used
inappropriately."
History of Educational Big Data Policies
2011: FERPA
law is amended
once again,
granting
"authorized
representatives"
of state
authorities
access to student
data
2005: New
initiative
granting money
to states that
implement
Statewide
Longitudinal
Data Systems
(SLDS)
1974: The Family
Educational
Rights and
Privacy Act of
1974 (FERPA)
2000: The
Children's
Online Privacy
Protection Act of
1998 (COPPA)
2008: FERPA
law is expanded:
contracted
vendors and
school
volunteers now
have access to
the data, with or
without parental
input
2011: The
Shared Learning
Collaborative
(SLC) — which
will later
become inBloom
— is created
Questions or Comments?
Email me at [email protected] with any questions.
https://www.coursera.org/course/bigdata-edu