Social Network Privacy

Download Report

Transcript Social Network Privacy

Lecture 20: Privacy in Online
Social Networks
Xiaowei Yang
• References:
– On the Leakage of Personally Identifiable
Information Via Online Social Networks by
Balachander Krishnamurthy and Craig E. Wills
– Characterizing Privacy in Online Social Networks
by Balachander Krishnamurthy and Craig E. Wills
Problem
• Online social networks are places for users to
share privacy information
– Personal identifiable information (PII)
• Information that can be used to distinguish or trace an
individual’s identity either alone or when combined
with other information linkable to an individual
• Examples of PII
– Photos
– Status update
• However, this information can be leaked to
unintended parties
Today
• Measurement studies of the importance of the
problem
• PII can be leaked to third-party websites that
make users browsing history linkable
• OSN default privacy settings leak PII
Types of private bits in OSNs
Who can see your private bits
USER PRIVACY CONTROLS
• Defaults are dangerous
– By default, information in a user’s Facebook
profile/content, and comments (as on a user’s
“Wall”) are viewable by any other user in the
user’s networks
• Has it changed?
– MySpace uses similar permissive defaults in terms
of access to a user’s information—all users have
access to all other user’s information.
Do users change their defaults?
• A 2005 study found that
– only 1.2% of college Facebook users at CMU
changed the searchability of their thumbnail profile
– 0.06% changed their profile visibility (second row)
• 75% of 200 users in the Facebook London
regional network have their full profile
viewable by other users in the network
Measurement Methodology
• MySpace
– Generated 5000 random numeric userids in an
observed range of valid userids
– Retrieved their corresponding user profiles
• Bebo
– Examined the profiles of users who were members
of interest groups within Bebo
• Facebook
– Join regional networks
• Large and Small
• Geographic diversity
• Linguistic/culture diversity
– Used the random network browsing feature of
Facebook to crawl users’ profiles
• 10 users are displayed
– 200 retrievals for each regional network
• 1600-1700 users
Results
• MySpace
– Obtained profile information for 3851 valid
userids
– 79% (3046) of users retained their default settings
– Profile, friends, comments and user content world viewable.
• Bebo
– 80% of the Bebo users allowed their profile,
friends, comments and user content to be viewable.
Facebook
Observations
• Users in smaller networks less concerned in
making private information available
• Higher privacy value in profile information than
list of friends
• Wall is the most valuable
– 79% of those with a viewable profile allowed their
Wall to be viewable to anyone in the network for NY
– 83% for Seattle
– 95% for the Worcester region.
USE OF THIRD-PARTY
DOMAINS
Information leakage to
domains
rd
3
party
• PII is sent to 3rd party domains via HTTP
requests
• Same PII may be sent to the same 3rd party
domains when users browse other websites
–  Online history traceable
HTTP Background
• A cookie is a piece of text stored by a web browser
• A cookie is sent as an HTTP header by a web server to
a web browser
• The web browser sends it back unchanged to the server
each time it accesses the server
• A cookie makes web browsing stateful
– http is a request/response stateless protocol
HTTP background (cont.)
• An HTTP request contains
– the method to be applied to the resource
– Request-URI (the uniform resource identifier to
the resource)
– The protocol version in use
• Example of a Request-URI
GET /pub/WWW/TheProject.html HTTP/1.1
Host: www.w3.org
HTTP background (cont.)
• Referer is a request header field
• Specifies to the server the address (URI) of the
resource from which the Request-URI was
obtained
– I.e., who asked for the server URI
• Referer allows a server to generate customized
contents
PII in OSNs
Sample of Leakage
• Friendid is associated with the doubleclick
cookie
• Other sites the user browses can be linked to
the friendid
Leakage of OSN IDs
• z.digg.com is a 3rd party advertisement site
Leakage via External Applications
Leakage of pieces of PII
Protection Against PII Leakage
• User actions
– Providing none in OSNs
– Filtering HTTP headers
• Referer, Cookie
– Disallow cookies
–…
• Aggregators
– Filtering PII
– Are they going to do it?
• OSNs
– Strip PII from HTTP requests
– A session specific value for UID
• External applications
– Similarly, strip PII from HTTP requests
Problem Not Unique to OSNs
• Any site you have an account with can do so
• Examples
– A news site leaks user email addresses to online
aggregators
– A travel site embeds a user’s first name and default
airport in its cookies, and leaks them to any site
hiding in its domain
Conclusion
• Eric Schmidt “If you have something that you
don’t want anyone to know, maybe you
shouldn’t be doing it in the first place.”
• By clicking the links and browsing online, they
know a lot more about you than you thought
Discussion
• What can be done to improve online user
privacy?
– Browser isolation
• Next lecture: privacy-preserving online
advertisements
• Law enforcement?
Lecture 21: Privacy and Online
Advertising
References
• Challenges in Measuring Online Advertising
Systems by Saikat Guha, Bin Cheng, and Paul
Francis
• Serving Ads from localhost for Performance,
Privacy, and Profit by Saikat Guha, Alexey
Reznichenko, Kevin Tang, Hamed Haddadi,
and Paul Francis
Problem
• Online advertising funds many web services
– E.g., all the free stuff we get from Google
• Ad networks gather much user information
• How do they use the user information?
Goals
• Determining how well ad networks target users
Methodology
• Creating two clients representing two different
user types
• Measuring the different ads each client sees
Challenges
• How to compare ads
• How to collect a representative snapshot of ads
• Quantifying the differences
• Avoiding measurement artifacts
Comparing Ads is challenging
• Ads don’t have unique IDs
• A & B are semantically the same, but with
different text
• A & C are different, but with same display URLs
How to define two ads are the same?
• Easy but illegal approach: comparing destination
URLs
– FP: flagged as equal but not
– FN: equal but not flagged
• Display URL has the lowest FNs  Use display
URL to define ads equality
Taking a Snapshot
• More ads can be displayed on any single page
• How to determine all Ads that may be fed to a
user?
– Reload the page multiple times
– But too many reloads may lead to ads churn: old
ads expire, new ads show up
Determining the # of reloads
• Reloads every 5 seconds
• Repeated for 200 queries
• Curve becomes linear > 10 reloads
– Ads churns
• Use 10 reloads as the threshold
Quantifying Change
• Metrics
– Jaccard index: | A  B |
| A B |
– Extended Jaccard index (cosine similarity)
Comparing Effectiveness
• Views: # of page reloads containing the ad
• Value: # of page reloads scaled by the position of
the ad
• Overlap: Jaccard index
Comparing Effectiveness
The winner is
• Weight: log(views) or log(value)
Avoiding artifacts
• Different system parameters may lead to different
ads view
– Browsers used different DNS servers
– Browsers receive different cookies
– HTTP proxy
Analysis
• Configure two or more instances to differ by
one parameter
• Comparing results for
– Search Ads
– Website Ads
– Online Social Network Ads
Search Ads
•
•
•
•
A, B: control w/o cookies
C, D: w/ cookies enabled. Seeded w/ different personae
Google 730 random product-related queries for 5 days
No obvious behavioral targeting in search ads. Why?
– Keyword based ads bidding
• Location targeting not studied
Websites Ads
•
•
•
•
Measure 15 websites that show Google ads
A, B: control in NY
C: SF; D: Germany
Location affects web ads
Website Ads
• A, B: control
• C: browse 3 out of 15 websites
• D and E: browse random websites and Google search
random websites
• Google does not use browsing behavior to pick ads
Online social network ads
• Set up three or more Facebook profiles
• A, B: control and identical
• C: differs from A by one profile parameter
Online social network ads
• Use all profile parameters to customize ads
• Age and gender are two primary factors
• Diurnal patterns due to ads churn
– Should it increase or decrease?
• Education and relationship matter less, except
for engaged and non-engaged women
Checking Impact of Sexual
Preference
• Six profiles with different sexual preferences
• Two males interested in females (male control)
• Two females interested in males (female
control)
• One male interested in male
• One female interested in female
Ads differ by sexual preferences
Other results
• Found neutral ads targeted exclusively to gay
men
• Clicking would reveal to the advertiser a
user’s sexual preference
• 66 ads shown exclusively to gay men more
than 50 times during experiments
Summary
• Search ads are largely key-word based so far
• Websites ads use location but probably not
behavior
• Social network ads use all profile attributes to
target users
Question: how can we design a
privacy-preserving online
advertising system?
Goals
• Support online advertising
– A good revenue source to fund online services
• Preserve user privacy
PrivAd
• Serving Ads from a localhost client
• Actors: user, publisher, advertiser, broker, and
dealer
How it works
• Advertisers upload ads to broker
• User client subscribes to a set of the ads according
to the user’s profile to the broker
– Message encrypted with Broker’s public key and
contains a symmetric private key
• The Broker sends filtered ads to the user client
– Ads are encrypted with the symmetric key
• Dealer anonymizes the client’s message to Broker
Ad View/Click Reporting
• When a user clicks an ad, the user client sends
a view/click report containing ad ID and
publisher ID to the broker via the dealer
• Dealer attaches a unique report ID, removes
client identity information, maps the ID to the
user identity information
Click-fraud defense
• Broker provides dealer the record IDs if it
suspects click-fraud
• The dealer finds the user
• The dealer stops relaying ads to user if convinced
• Questions not answered: how to detect by broker,
and what’s the punishment
Defining User Privacy
• Unlinkability
– No single player can link the identity of user with
any piece of user’s profile
– No single player can link together more than some
limited number of pieces of personalization
information of a given user
• The dealer learns User A clicks on some ad
• The broker learns someone clicked on ad X
• Not robust to dealer/broker collusion
Scaling PrivAd
• Ads churn is significant
• 2GB/month of compressed ad data
Discussion
• What challenges does PrivAd may face in a
practical deployment?