Agenda - Schul

Download Report

Transcript Agenda - Schul

1
Agenda
1.
What is (Web) data mining? And what does it have to do with
privacy? – a simple view –
2.
Examples of data mining and "privacy-preserving data mining":

Association-rule mining (& privacy-preserving AR mining)

Collaborative filtering (& privacy-preserving collaborative filtering)
3.
A second look at ...privacy
4.
A second look at ...Web / data mining
5.
The goal: More than modelling and hiding – Towards a
comprehensive view of Web mining and privacy. Threats,
opportunities and solution approaches.
6.
An outlook: Data mining for privacy
Technical background of the problem:
Privacy
Problems:
Example 1
• The dataset allows for Web mining (e.g.,
which search queries lead to which site
choices),
• it violates k-anonymity (e.g. "Lilburn"
 a likely k = #inhabitants of Lilburn)
2
3
Privacy
Problems:
Example 2
Where do people live who will buy
the Koran soon?
Technical background of the problem:
• A mashup of different data sources
• Amazon wishlists
• Yahoo! People (addresses)
• Google Maps
each with insufficient k-anonymity, allows
for attribute matching and thereby
inferences
Privacy
Problems:
Example 3
Predicting political affiliation from
Facebook profile and link data (1):
Most Conservative Traits
Trait Name
Trait Value
Weight Conservative
Group
george w bush is my
homeboy
45.88831329
Group
college republicans
40.51122488
Group
texas conservatives
32.23171423
Group
bears for bush
30.86484689
Group
kerry is a fairy
28.50250433
Group
aggie republicans
27.64720818
Group
keep facebook clean
23.653477
Group
i voted for bush
23.43173116
Group
protect marriage one man one
woman
21.60830487
Lindamood et al. 09 &
Heatherly et al. 09
4
Predicting political affiliation from Facebook profile
and link data (2): Most Liberal Traits per Trait Name
Trait Name
Trait Value
Weight Liberal
activities
amnesty international
4.659100601
Employer
hot topic
2.753844959
favorite tv shows
queer as folk
9.762900035
grad school
computer science
1.698146579
hometown
mumbai
3.566007713
Relationship Status
in an open relationship
1.617950632
religious views
agnostic
3.15756412
looking for
whatever i can get
1.703651985
5
Lindamood et al. 09
&
Heatherly et al. 09
6
"Privacy-preserving Web mining" example:
find patterns, unlink personal data
Volvo S40 website targets people in 20s

Are visitors in their 20s or 40s?

Which demographic groups like/dislike the website?

An example of the "Randomization Approach" to PPDM:
R. Agrawal and R. Srikant, "Privacy Preserving Data Mining",
SIGMOD 2000.
7
Randomization Approach Overview
30 | 70K | ...
50 | 40K | ...
Randomizer
Randomizer
65 | 20K | ...
25 | 60K | ...
Reconstruct
distribution
of Age
Reconstruct
distribution
of Salary
Data Mining
Algorithms
...
...
...
Model
8
Seems to work well!
Number of People
1200
1000
800
Original
Randomized
Reconstructed
600
400
200
0
20
60
Age
9
What is collaborative filtering?
"People like what
people like them
like"
– regardless of
support and
confidence
10
User-based Collaborative Filtering

Idea: People who agreed in the past are likely to agree again

To predict a user’s opinion for an item,
use the opinion of similar users

Similarity between users is decided by looking at their overlap
in opinions for other items

Next step: build a model of user types  "global model"
rather than "local patterns" as mining result
11
1. Privacy as confidentiality:
"the right to be let alone" – and to hide data
Data
Is this all
there is
to privacy?
12
2. Privacy as control:
informational self-determination
Data
Don‘t do
THIS !

e.g. data privacy: "the right of the
individual to decide what
information about himself should be
communicated to others and under
what circumstances" (Westin, 1970)

behind much of data-protection
legislation (see Eleni Kosta‘s talk)
13
Discussion item: What is this an example of?
Tracing anonymous edits in Wikipedia http://wikiscanner.virgil.gr/
14
[Method: Attribute matching]
15
Results (an example)