Ethics-aware predictive learning analytics

Download Report

Transcript Ethics-aware predictive learning analytics

Ethics-aware Learning Analytics
Mykola Pechenizkiy
http://www.win.tue.nl/~mpechen/
IRB and Big Data NSF Workshop, George Masson University
Arlington, VA, USA
Who I am
Applied Data Mining researcher  Data scientist
– Predictive analytics, evolving data, big data
– Adaptive learning, concept drift, context
– Web analytics, customer/student/user analytics
Educational Data Mining/Learning Analytics-related:
– EDM 2011, EDM 2015, LASI 2014, JEDM
– Handbook of EDM
– President-Elect IEDMS
IRB_BD@GMU
9 Nov 2014
Ethics-aware Predictive Learning Analytics
Mykola Pechenizkiy, TU Eindhoven
1
Outline
• Big Data opportunities with education
platforms
• Fears of Big Data coming to schools
• Reconsidering priorities in
developing/adopting Data-Driven
Education paradigm
–Ethics-awareness and trustworthiness
• Take-aways: where advice from IRB panels
is welcome
IRB_BD@GMU
9 Nov 2014
Ethics-aware Predictive Learning Analytics
Mykola Pechenizkiy, TU Eindhoven
2
More ICT – More Data Sources
IRB_BD@GMU
9 Nov 2014
Ethics-aware Predictive Learning Analytics
Mykola Pechenizkiy, TU Eindhoven
3
Four Major Types of Learning &
Kinds of Questions EDM\LA Can Assist with
How to
(re)organize the
classes, or
assessment, or
placement of
materials based
on usage and
performance data
How to help
learners in (re-)
finding useful
material, done
whether
individually or
collaboratively
with peers
How to identify
those who would
benefit from
provided feedback,
study advice or
other help; How to
decide which kind
of help would be
most effective?
How to help
learners in (re-)
finding useful
material, done
whether
individually or
collaboratively
with peers
IRB_BD@GMU
9 Nov 2014
Ethics-aware Predictive Learning Analytics
Mykola Pechenizkiy, TU Eindhoven
4
Kinds of Data Being Collected
• Administrative data
– Who follows which program, who takes which course,
registers for an (interim) exam, reexams
– Demographics, school grades, etc
• MOOC and LMS
–
–
–
–
Resource usage data
Assessment/assignements data (online tests, source code)
Forums, collaboration, feedback/help requests
Students’ evaluation of learning resources
• ITS, educational games, professional learning,
e-Health, simulators, ...
• Gaming, browsing, Gmail, Facebook, Twitter
IRB_BD@GMU
9 Nov 2014
Ethics-aware Predictive Learning Analytics
Mykola Pechenizkiy, TU Eindhoven
5
EDM\LA: Data  Approach  Knowledge
Interactions data
- Usage logs & contexts
Administrative data
- Enrolments
- Results
- Payments
- Graduation
- Employment
“Feedback” data
- Opinions
- Preferences
- Needs
Descriptive data
- Demographics
- Characteristics
IRB_BD@GMU
9 Nov 2014
Classification
Categorizing students
Clustering
Grouping similar students
Association Analysis,
Sequence mining
Goals
- Identify high risk
students
- Predict new student
application rates
- Predict students
retention/dropout
- Course planning &
scheduling
Find courses taken together or
Popular (parts of) study programs - Faculty teaching load
estimation
Visual Analytics
- Predict demand for
Facilitate reasoning about the resources (library,
process or results via interactive cafeteria, housing)
data/model visualization
- Predict alumni
Process mining
donation
Understanding study curricular
Ethics-aware Predictive Learning Analytics
Mykola Pechenizkiy, TU Eindhoven
6
Learning@Scale Potential
Two central questions in DDE
• “Does it work?” and “Which way is better?”
Ongoing research:
• Gaining insights via (massive) A/B testing
• Predictive modeling with actionable attributes
– Prediction vs. persuasion vs. manipulation
• Predictive modeling with sensitive attributes
– Ethics-aware personalization w/out discrimination
IRB_BD@GMU
9 Nov 2014
Ethics-aware Predictive Learning Analytics
Mykola Pechenizkiy, TU Eindhoven
7
Data Trumps Experts’ Intuition
• LAK, AIED & EDM: help in
understanding what
works and what does not,
student modeling etc
• MOOC, ITS & L@S:
A/B testing is becoming
Example by Ken Koedinger (CMU)
at Data-driven education @NIPS2013
popular
• MOOC platforms provide Intuitive design can be
replaced by data-driven
support for A/B testing
IRB_BD@GMU
9 Nov 2014
Ethics-aware Predictive Learning Analytics
Mykola Pechenizkiy, TU Eindhoven
8
Learning@Scale Potential
Two central questions in DDE
• “Does it work?” and “Which way is better?”
Some emerging research lines:
• Gaining insights via (massive) A/B testing
• Predictive modeling with actionable attributes
– Prediction vs. persuasion vs. manipulation
• Predictive modeling with sensitive attributes
– Ethics-aware personalization w/out discrimination
IRB_BD@GMU
9 Nov 2014
Ethics-aware Predictive Learning Analytics
Mykola Pechenizkiy, TU Eindhoven
9
If We Were Able to Look Deeper
How these averages
could possibly differ per
• Student learning style
• Student background
• Country they studied
• Ethnicity
• Gender
• Parents
• ….
IRB_BD@GMU
9 Nov 2014
Ethics-aware Predictive Learning Analytics
Mykola Pechenizkiy, TU Eindhoven
10
Uplift Predictors
Suppose we do have data from A/B testing
• The control dataset
– individuals on which no action was taken
• The treatment dataset
– individuals on which an action was taken
Build a model which predicts the causal influence of
the action on a given individual
• Some students prefer a story, others – a formula,
e.g. girls => story, boys => formula
• Challenging to learn such predictors, but feasible!
IRB_BD@GMU
9 Nov 2014
Ethics-aware Predictive Learning Analytics
Mykola Pechenizkiy, TU Eindhoven
11
Fear of Privacy Violation & Data Misuse
corpwatch.org/img/original/google.jpg
• “Many companies are looking to profit from
student and teacher data that can be easily
collected, stored, processed, customized,
analyzed, and then ultimately resold”.
Philip McRae (Alberta Teachers’ Association)
IRB_BD@GMU
9 Nov 2014
Ethics-aware Predictive Learning Analytics
Mykola Pechenizkiy, TU Eindhoven
13
If We Were Able to Look Deeper
How these averages
could possibly differ per
• Student learning style
• Student background
• Country they studied
• Ethnicity
Sensitive
• Gender
attributes
• Parents
• ….
cf. Discrimination at hiring, giving credit loan, etc
IRB_BD@GMU
9 Nov 2014
Ethics-aware Predictive Learning Analytics
Mykola Pechenizkiy, TU Eindhoven
14
Fear of Predictive Analytics
Are the decisions based on predictive models
always ethical?
• (Personalized) decisions may be unfair to a certain
group (race, ethnicity, gender)
Are the models/decisions trustworthy?
•
•
•
•
•
Do predictive models give guarantees?
Is the accuracy high enough?
Do models provide meaningful insights?
Are they interpretable and transparent?
“Correlation is not causation”
IRB_BD@GMU
9 Nov 2014
Ethics-aware Predictive Learning Analytics
Mykola Pechenizkiy, TU Eindhoven
15
Fears of Personalization
• “When Personalization Goes Bad”
http://www.portical.org/blog/when-personalization-goes-bad
• “Rebirth of the Teaching Machine through the
Seduction of Data Analytics: This Time It's Personal”
http://www.philmcrae.com/2/post/2013/04/rebirth-of-the-teaching-maching-through-the-seduction-of-data-analytics-this-time-its-personal1.html
• “This time it is Personal and Dangerous”
http://barbarabray.net/2013/12/30/this-time-its-personal-and-dangerous/
Postcard (World’s Fair, Paris 1899) predicting what
learning will be like in France in the year 2000
© Pawel Kuczynski
Predicting with Sensitive Attributes
Paradox: we need to use personal data to
control for unethical predictive analytics
• “Fairness through awareness” Dwork et al.
• “It’s Not Privacy, and it’s Not Fair” Dwork &
Mulligan
“Discrimination and Privacy in the
Information Society” Custers et al. (Eds)
• Data mining for discrimination discovery
• Explainable vs. unethical discrimination
• Accuracy-discrimination tradeoff
IRB_BD@GMU
9 Nov 2014
Ethics-aware Predictive Learning Analytics
Mykola Pechenizkiy, TU Eindhoven
17
Take-aways
ITS, MOOC, OLI – massive scale, cheap
and scalable experimentation online
• What should be the policies on
student data collection, sharing and use?
Potential for data-driven education, finding
out what works for students best via randomized trials
• What is and is not ethical? (cf. the Facebook study)
Effects of persuasion are not uniform
• Potential and need for personalization
• DM can learn causal models from A/B testing data
• How to prevent malignant forms of DDE
General guidelines for ethics-aware personalization and
persuasion?
IRB_BD@GMU
9 Nov 2014
Ethics-aware Predictive Learning Analytics
Mykola Pechenizkiy, TU Eindhoven
18
Thank you!
• Feedback, questions, collaboration ideas:
[email protected]
• Staying connected:
nl.linkedin.com/in/mpechen/
Fears of Big Data
Coming to Schools
personal/educational data misuse,
poor predictions, bad personalization
SIAT@SFU
11 Aug 2014
Predictive Analytics for Data-Driven Education
Mykola Pechenizkiy, TU Eindhoven
20
Fear of Privacy Violation & Data Misuse
corpwatch.org/img/original/google.jpg
• “Many companies are looking to profit from
student and teacher data that can be easily
collected, stored, processed, customized,
analyzed, and then ultimately resold”.
Philip McRae (Alberta Teachers’ Association)
SIAT@SFU
11 Aug 2014
Predictive Analytics for Data-Driven Education
Mykola Pechenizkiy, TU Eindhoven
21
Fear of Predictive Analytics
Are the decisions based on predictive models
always ethical?
• (Personalized) decisions may be unfair to a certain
group (race, ethnicity, gender)
Are the models/decisions trustworthy?
•
•
•
•
•
Do predictive models give guarantees?
Is the accuracy high enough?
Do models provide meaningful insights?
Are they interpretable and transparent?
“Correlation is not causation”
SIAT@SFU
11 Aug 2014
Predictive Analytics for Data-Driven Education
Mykola Pechenizkiy, TU Eindhoven
22
Fears of Personalization
• “When Personalization Goes Bad”
http://www.portical.org/blog/when-personalization-goes-bad
• “Rebirth of the Teaching Machine through the
Seduction of Data Analytics: This Time It's Personal”
http://www.philmcrae.com/2/post/2013/04/rebirth-of-the-teaching-maching-through-the-seduction-of-data-analytics-this-time-its-personal1.html
• “This time it is Personal and Dangerous”
http://barbarabray.net/2013/12/30/this-time-its-personal-and-dangerous/
Postcard (World’s Fair, Paris 1899) predicting what
learning will be like in France in the year 2000
© Pawel Kuczynski
Connections to Privacy & Ethics
•
•
•
•
•
What is education data scientist philosophy?
Is EDM always ethical?
Is EDM a threat to privacy?
Dangers of misuse of information
Unethical decision making or personalization
Will these discussions slow-down/kill the
development and adoption of predictive
learning analytics?
SIAT@SFU
11 Aug 2014
Predictive Analytics for Data-Driven Education
Mykola Pechenizkiy, TU Eindhoven
24
Predicting with Actionable Attributes
Prediction vs. manipulation;
uplift predictors
SIAT@SFU
11 Aug 2014
Predictive Analytics for Data-Driven Education
Mykola Pechenizkiy, TU Eindhoven
25
Data Trumps Intuition
• LAK, AIED & EDM: help in
understanding what
works and what does not,
student modeling etc
• MOOC, ITS & L@S:
A/B testing is becoming
Example by Ken Koedinger (CMU)
at Data-driven education @NIPS2013
popular
• MOOC platforms provide Intuitive design can be
replaced by data-driven
support for A/B testing
SIAT@SFU
11 Aug 2014
Predictive Analytics for Data-Driven Education
Mykola Pechenizkiy, TU Eindhoven
26
If We Were Able to Look Deeper
How these averages
could possibly differ per
• Student learning style
• Student background
• Country they studied
• Ethnicity
• Gender
• ….
SIAT@SFU
11 Aug 2014
Predictive Analytics for Data-Driven Education
Mykola Pechenizkiy, TU Eindhoven
27
Towards Personalized Medicine
• A typical medical trial:
– treatment group: gets the treatment
– control group: gets placebo (or another
treatment)
– do a statistical test to show that the treatment is
better than placebo
• With uplift predictors we can find out
– for whom the treatment works and works best or
– in case of alternative treatments – which
treatment works best for whom
SIAT@SFU
11 Aug 2014
Predictive Analytics for Data-Driven Education
Mykola Pechenizkiy, TU Eindhoven
28
Uplift Predictors
Suppose we do have data from A/B testing
• C: the control dataset
– individuals on which no action was taken
• T: the treatment dataset
– individuals on which an action was taken
Build a model which predicts the causal influence of
the action on a given individual
• Challenging, if we assume that there is no
globally better action
– Some students prefer a story, others – a formula
• But it is feasible
SIAT@SFU
11 Aug 2014
Predictive Analytics for Data-Driven Education
Mykola Pechenizkiy, TU Eindhoven
29
Uplift Predictors: Conclusions
• Learn how to choose an action when there is
no globally better action
• Clear evidence that this is feasible
• Demonstrated, that the effect of action is not
uniform for individuals
– focusing on individuals sensitive to choice of action
helps to build better uplift predictors
SIAT@SFU
11 Aug 2014
Predictive Analytics for Data-Driven Education
Mykola Pechenizkiy, TU Eindhoven
30
“Fairness through awareness” by Cynthia Dwork et al.
In order to treat similar individuals similarly we must collect more data about individuals.
Connections between privacy-preserving and fair predictive modeling.
“It’s Not Privacy, and it’s Not Fair”
Cynthia Dwork & Deirdre K. Mulligan
“Discrimination and Privacy in the Information Society” Custers et al. (Eds) Springer, 2013
Predicting with Sensitive Attributes
Discrimination-aware mining;
bias-aware mining
SIAT@SFU
11 Aug 2014
Predictive Analytics for Data-Driven Education
Mykola Pechenizkiy, TU Eindhoven
31
Sensitive Attributes
• Demographics (gender, race, income,
education of parents)
• Proxies to demographics (home address or
school location)
• Some (un)known artifacts of data collection
– Different instances of a course
– Different instructors
– Different groups (locations)
SIAT@SFU
11 Aug 2014
Predictive Analytics for Data-Driven Education
Mykola Pechenizkiy, TU Eindhoven
32
Predicting with Sensitive Attributes
Training:
Testing
data
population
(source)
X'
y = L (X, S)
Sensitive
labels
Application:
2.
X
Historical
data
S
Sensitive
y
labels
SIAT@SFU
11 Aug 2014
1. training
use L
for an unseen data
Model L
y' = L (X’,S’)
2. application
action?
enforcing
P(Y|X,S) = P(Y|X)
a’ = argmax(p(y’=1))
Predictive Analytics for Data-Driven Education
Mykola Pechenizkiy, TU Eindhoven
33
Predicting with Sensitive Attributes
• Accuracy-discrimination tradeoff:
– Data massaging for discrimination-free predictions
(ICDM);
– discrimination-aware decision trees, Bayesian
classifiers, regression (DAMI, KAIS, ICDM)
• Explainable (ethical/legal) vs. unethical (ICDM)
• Data mining for discrimination discovery
(TKDD)
• Paradox: we need to use personal data to
control for unethical predictive analytics
SIAT@SFU
11 Aug 2014
Predictive Analytics for Data-Driven Education
Mykola Pechenizkiy, TU Eindhoven
34
Predictive analytics should
• provide better tooling for DDE,
• help to eliminate Big Data fears in the
changing face of modern education, and
• not boost these fears of the general
public, educators, students and other
stakeholders
SIAT@SFU
11 Aug 2014
Predictive Analytics for Data-Driven Education
Mykola Pechenizkiy, TU Eindhoven
35
Conclusions
ITS, MOOC, OLI – massive scale, cheap
and scalable experimentation online
• Potential for data-driven education,
finding out what works for student best
– DM can help to generate promising hypothesis to test
• Effects of interventions/persuasion are not
uniform
– Potential and need for personalization
– DM can help to learn causal models from A/B testing
data: uplift predictors
Fears of (malignant forms of) DDE and DDP
• Ethics-aware and context-aware personalization
SIAT@SFU
11 Aug 2014
Predictive Analytics for Data-Driven Education
Mykola Pechenizkiy, TU Eindhoven
36
Thank you!
• Feedback, questions, collaboration ideas:
[email protected]
• Staying connected:
nl.linkedin.com/in/mpechen/