Diapositiva 1

Download Report

Transcript Diapositiva 1

Anti-discrimination and privacy
protection in released datasets
Sara Hajian
Josep Domingo-Ferrer
Data mining
• There are negative social perceptions
about data mining, among which potential
• Privacy invasion
• Potential discrimination
Discrmination
• Discrimination is unfair or unequal
treatment of people based on membership
to a category or a minority, without regard
to individual merit.
Discrimination
• Example: U.S. federal laws prohibit
discrimination on the basis of:
• Race , Color, Religion, Nationality, Sex, Marital
status, Age, Pregnancy
• In a number of settings:
•
•
•
•
•
Credit/insurance scoring
Sale, rental, and financing of housing
Personnel selection and wage
Access to public accommodations, education,
nursing homes, adoptions, and health care.
Discrimination
• Discrimination can be either direct or indirect:
• Direct discrimination occurs when decisions
are made based on sensitive attributes.
• Indirect discrimination occurs when decisions
are made based on non-sensitive attributes
which are strongly correlated with biased
sensitive ones.
Discrimination in Data mining
• Automated data collection and Data mining
techniques such as classification rule mining
have paved the way to making automated
decisions:
• loan granting/denial
• insurance premium computation
• Personnel selection and wage
Discrimination in Data mining
• If the training datasets are biased in what
regards discriminatory attributes like gender,
race, religion, discriminatory decisions may
ensue.
• Anti-discrimination techniques have been
introduced in data mining
• Discrimination discovery
• Discrimination prevention
Discrimination in Data mining
• Discrimination discovery
• Consists of supporting the discovery of
discriminatory decisions hidden, either directly
or indirectly, in a dataset of historical decision
records.
Discrimination Discovery
• Different measures of discrimination power
of the mined decision rules can be
defined, according to the provision of
different anti-discrimination regulations.
• Extended lift (elift)
• Selection lift (slift)
Discrimination in Data mining
• Discrimination prevention
• Consists of inducing patterns that do not lead
to discriminatory decisions even if trained
from a dataset containing them.
Discrimination Prevention
• How can we train an unbiased classifier
when the training data is biased?
• As for privacy, the challenge is to find an
optimal trade-off between (measurable)
protection against unfair discrimination, and
(measurable) utility of the data/models for data
mining.
Discrimination Prevention
• Methods:
• Transform the source data
• Modify the data mining methods
• Modifying discriminatory models
The framework
• The framework for discrimination
prevention can be described in terms of
two phases:
• Discrimination Measurement
• Data Transformation
Data transformation
• The purpose is transform the original data DB in
such a way to remove direct and/or indirect
discriminatory biases, with minimum impact
• on the data and
• on legitimate decision rules,
• so that no unfair decision rule can be mined from the
transformed data.
Data transformation
• As part of this effort, the metrics should be
developed that specify
• which records should be changed,
• how many records should be changed
• and how those records should be changed during
data transformation.
Utility measures
• Measuring direct discrimination removal
• Measuring indirect discrimination removal
• Measuring Data Quality
• Misses Cost (MC)
• Ghost Cost (GC)
Thanks for your attention