Transcript Slides

Mosaic: Quantifying Privacy Leakage
in Mobile Networks
Ning Xia (Northwestern University)
Han Hee Song (Narus Inc.)
Yong Liao (Narus Inc.)
Marios Iliofotou (Narus Inc.)
Antonio Nucci (Narus Inc.)
Zhi-Li Zhang
(University of Minnesota)
Aleksandar Kuzmanovic
(Northwestern University)
Scenario
Different
information
Different
services
IP1
… IPK
ISP A
IP1
… IPK
CSP B
Dynamic IP,
CSP/ISP
Different
devices
2
Problem
Other
research
work
IP1 … IPK
ISP A
IP1 … IPK
CSP B
Input packet
traces
Tessellation
Mosaic
We are
here!
How much private information can be obtained and
expanded about end users by monitoring network traffic?
3
Motivation
I will know everything about
everyone!
IP1
…
IPK
ISP A
IP1
…
IPK
CSP B
Agencies
Bad guys
Mobile Traffic:
• Relevant: more personal information
• Challenging: frequent IP changes
4
Challenges
•
How to track users when they hop over different IPs?
Sessions:
Flows(5-tuple) are
grouped into sessions
IP1
time
Traffic Markers:
IP2
time
IP3
time
Identifiers in the traffic
that can be used to
differentiate users
With Traffic Markers, it is possible to
connect the users’ true identities to their sessions.
5
Datasets
Dataset
Source
Description
3h-Dataset
CSP-A
Complete payload
9h-Dataset
CSP-A
Only HTTP headers
Ground Truth Dataset
CSP-B
Payload & RADUIS info.
•
•
•
3h-Dataset: main dataset for most experiments
9h-Dataset: for quantifying privacy leakage
Ground Truth Dataset: for evaluation of session
attribution
• RADIUS: provide session owners
6
Methodology Overview
… IPK
ISP A
IP1 … IPK
CSP B
IP1
Tessellation
Via
traffic markers
Traffic attribution
Via activity
fingerprinting
Mapping from
sessions to users
Network data
analysis
Mosaic construction
Web crawling
Combine information from both network data and
OSN profiles to infer the user mosaic.
7
Traffic Attribution via Traffic Markers
Traffic Markers:
• Identifiers in the traffic to differentiate users
• Key/value pairs from HTTP header
• User IDs, device IDs or sessions IDs
Domain
Keywords
Category
Source
osn1.com
c_user=<OSN1_ID>
OSN User ID
Cookies
osn2.com
oauth_token=<OSN2_ID>-##
OSN User ID
HTTP header
admob.com
X-Admob-ISU
Advertising
HTTP header
pandora.com
user_id
User ID
Cookies
google.com
sid
Session ID
Cookies
How can we select and evaluate
traffic markers from network data?
8
Traffic Attribution via Traffic Markers
OSN IDs as Anchors:
•
•
The most popular user identifiers among all services
Linked to user public profiles
OSN
Source
Session Coverage
OSN1 ID
HTTP URL and cookies
1.3%
OSN2 ID
HTTP header
1.0%
Top 2 OSN providers
from North America
Only 2.3% sessions
contain OSN IDs
OSN IDs can be used as anchors, but their coverage on
sessions is too small
9
Traffic Attribution via Traffic Markers
Block Generation: Group Sessions into Blocks
OSN ID
Other sessions?
≥δ
Session interval δ
• Depends on the CSP
• δ=60 seconds in our
study
time
IP1
IP
Block
IP
IP 1
time
Block
• Session group on the
same IP within a short
period of time
• Traffic markers shared
by the same block
99K session blocks generated from the 12M sessions
10
Traffic Attribution via Traffic Markers
Culling the Traffic Markers: OSN IDs are not enough
• Uniqueness: Can the traffic marker differentiate between users?
• Persistency: How long does a traffic marker remain the same?
Uniqueness
Persistency
Uniqueness = 1
No two users will share the
same google.com#sid value
Score
1
0.98
0.96
0.94
0.92
craigslist.org
craigslist.org
#cl_b#cl_b
google.com
google.com
#sid #sid
mydas.mobi
mydas.mobi
#mac-id
#mac-id
mobclix.com
mobclix.com
#u
#u
pandora.com
pandora.com
#user_id
#user_id
mobclix.com
#uid #uid
OSN1 ID
admob.com
#isu
mobclix.com
0.9
Traffic markers
Traffic
markers
Persistency ~= 1
The value of Google.com#sid
remains the same for the
same user nearly all the
observation duration
We pick 625 traffic markers with uniqueness = 1, persistency >
11
Traffic Attribution via Traffic Markers
Traffic Attribution: Connecting the Dots
Tessellation User Ti
(
)
IP 1
Same OSN ID
IP 2
Same traffic marker
IP 3
Traffic markers are the key in attributing sessions to the
same user over different IP addresses
12
Traffic Attribution via Activity Fingerprinting
•
What if a session block has no traffic markers?
Assumption (Activity Fingerprinting):
• Users can be identified from the DNS names of their
favorite services
DNS names:
Service
classes
Service
providers
• Extracted 54,000 distinct DNS names
• Classified into 21 classes
Search
bing, google, yahoo
Chat
skype, mtalk.googl.com
Dating
plentyoffish, date
E-commerce
amazon, ebay
Activity Fingerprinting:
Email
google, hotmail, yahoo
• Favorite (top-k) DNS names as the
user’s “fingerprint”
News
msnbc, ew, cnn
Picture
Flickr, picasa
…
…
13
Traffic Attribution via Activity Fingerprinting
Y(Fi)
• Fi : Top k DNS names from user as “activity fingerprint”
•
: Uniqueness of the fingerprint
1
0.98
0.96
0.94
0.92
0.9
Y-axis:
closer to 1, more distinct
the fingerprint is
k=4
k=5
k=6
k=7
k=8
0
0.2
0.4
0.6
0.8
Normalizedfingerprint
DNS namesIDs
Normalized
1
X-axis:
normalized by the total
number of DNS names
Mobile users can be identified
by the DNS names from their preferred services
14
Traffic Attribution Evaluation
Correct
(Not complete)
Not correct
Session
Ri
RADIUS user
(Ground Truth)
Ti
Tessellation user
(Correct?)
Ti
Tj
Ri
identified
sessions/users
Coverage = ----------------------total
sessions/users
Rj
correctly identified
Accuracy on sessions/users
= ------------------Covered Set total identified
sessions/users
15
Traffic Attribution Evaluation
Coverage
Evaluation Results
15.70%
User
Session
43.20%
2.40%
OSN ID extraction
Accuracy on
Covered Set
•
Via traffic markers
49.80%
78.60%
Via activity fingerprinting
100%
99.30%
96.40%
User
100%
94.50%
92.50%
Session
OSN ID extraction
69.00%
Via traffic markers
Via activity fingerprinting
16
Construction of User Mosaic
•
Mosaic of Real User
Sub-classes:
Residence,
coordinates, city,
state, and etc.
Least gain
Most gain
MOSAIC with 12 information classes(tesserae):
• Information (Education, affiliation and etc.) from OSN profiles
• Information (Locations, devices and etc.) from users’ network data
17
Quantifying Privacy Leakage
•
Leakage from OSN profiles vs. from Network Data
15000
12000
# of Users
OSN profiles provide
static user information
(education, interests)
Both public OSN profiles & activity analysis
Public OSN profiles only
Activity analysis only
9000
Analysis on network
data provides real-time
activities and locations
6000
3000
0
News_info.
Content_exch.
Entertainment
Affiliation
E-commerce
Education
Art_culture
Location
Social_actvty
Association
Demographics
Device_info.
Information from both
sides can corroborate
to each other
Information from OSN profiles and network data can complete
and corroborate each other
18
Preventing User Privacy Leakage
Protect
traffic markers
• Traffic markers (OSN IDs and etc.)
should be limited and encrypted
Restrict
3rd parties
• Third party applications/developers
should be strongly regulated
Protect
user profiles
• OSN public profiles should be
carefully obfuscated
19
Conclusions
•
Prevalence in the use of OSNs leaves users’
true identities available in the network
•
Tracking techniques used by mobile apps and
services make traffic attribution easier
•
Sessions can be labeled with network users’
true identities, even without any identity leaks
•
Various types of information can be gleaned to
paint rich digital Mosaic about users
20
Mosaic: Quantifying Privacy Leakage
in Mobile Network
Thanks!
21