AA - Carnegie Mellon University's Heinz College

Download Report

Transcript AA - Carnegie Mellon University's Heinz College

Information Revelation and Privacy
in Online Social Networks
Ralph Gross and Alessandro Acquisti
[email protected] [email protected]
Heinz Seminars, October 3rd, 2005
Information revelation and privacy
in online social networks
• Online social networks (OSN): sites that facilitate
interaction between members through their selfpublished personal profiles
• How much do users of OSN reveal about
themselves online?
– A lot
• To whom?
– Friends and strangers
• Why?
Why?
•
•
•
•
•
•
•
Rationality hypothesis: signaling
Low privacy sensitivity
Herding behavior
Peer pressure
Myopic discounting
Incomplete information
…
Privacy, economics, and rationality
1. Incomplete information
2. Bounded rationality
3. Affective processes, psychological/behavioral
deviations from pure rationality model
Our study
•
•
•
•
Starts research on privacy implications of OSN
Provides first quantification of observed behavior
Studies actual usage data
Discusses trade-offs and incentives and advances
behavioral hypotheses
– Yet, still preliminary
 Implications extend beyond OSN domain
Agenda
1. Online social networks
–
The Facebook
2. CMU students and the Facebook
i.
Usage data
– Patterns of information revelation
– Inferred privacy preferences
 Risks and trade-offs
ii. User survey (pilot)
– Users’ knowledge and expectations
 Drivers and incentives
3. Next step
–
Experiments
Online Social Networks
What are online social networks?
•
•
Sites that facilitate interaction between members through
their self-published personal profiles
Common core:
– Through the site, individuals offer representations of their sel[ves] to
others to peruse, with the intention of contacting or being contacted by
others, to meet new friends or dates, find new jobs, receive or provide
recommendations, …
•
Progressive diversification and sophistication of purposes
and usage patterns
– Social Software Weblog groups hundreds of social networking sites in
nine categories (business, common interests, dating, facetoface
facilitation, friends, pets, photos, …)
 Classifieds <> OSN <> blogs
A history of online
social networks
• 1960s: Plato (University of Illinois)
• 1997: SixDegrees.com
• After 2002: commercial explosion
– Friendster, Orkut, LinkedIn, …,
– Viral growth with participation expanding at rates topping
20% a month
– 7 million Friendster users; 2 millions MySpace users; 16
million registered on Tickle to take personality test
(Leonard 2004)
– Revenues: advertising, data trading, subscriptions
– Media attention: Salon, NYT, Wired, …
Research on online
social networks
• boyd (2003): trust and intimacy on OSN
• Donath and boyd (2004): representation of self on
OSN
• Liu and Maes (2005): harvesting OSN for
recommender systems
• (some additional research uses OSN data for other
purposes)
From (social) network theory
to online networks
• Milgram (1967): the small world problem
– Watts (2003): six degrees
• Granovetter (1973, 1983): weak and strong ties
• Milgram (1977): the familiar stranger
 What about the “unknown buddy”?
Social network theory and privacy
• Strahilevitz (2005):
Discourse about privacy should be based “on what
the parties should have expected to follow the initial
disclosure of information by someone other than the
defendant”
– Consideration of expected information flows within/outside
somebody’s social network should inform that person’s
expectations for privacy
• However, application to online social network
reveals challenges
Online vs offline social networks
1. Offline: extremely diverse ties. Online: simplistic
binary relations (boyd 2004)
2. Number of strong ties not significantly increased,
but number of weak ties can increase substantially
(Donath and boyd 2004)
– From a dozen of intimate ties plus 1000 to 1700
“acquaintances,” to hundreds of direct “friends” and
hundreds of thousands of relations
Hence:
•
Online social networks are vaster and have more
weaker ties than offline social networks
 An imagined community?
•
•
Anderson (1991)
Intimacy and trust
– Sharing same personal information with a large and
potential unknown number of friends and strangers
•
Intimate with everybody? (Gerstein 1984)
 Ability to meaningfully interact with others is mildly
augmented, while ability of others to access the person is
significantly enlarged
Online social networks and
personal information
1. Pretense of identifiability changes across different types of
sites
Anonymous <> Pseudonymous <> Fully identified
2. Type of information revealed or elicited often orbits around
hobbies and interests, but can stride from there in different
directions
–
From classified to journals
3. Visibility of information is highly variable
–
–
Members only
Everybody
Online social networks and privacy
• Privacy implications of OSN depend on the level of
identifiability of the information provided, its possible
recipients, and its possible uses
– Re-identification
• Two directions: known>additional information; unknown>known
– To whom may identifiable information be made available?
• Site, third-parties (hackers, government), users (little control on
social network and its expansion)
– Risks
• From identity theft to online and physical stalking; from
embarrassment and blackmailing to spam and price discrimination
Online social networks and privacy
• And yet:
– OSN can also offer tools to address online privacy
problems
– “Social networking has the potential to create an intelligent
order in the current chaos by letting you manage how
public you make yourself and why and who can contact
you.” Tribe.net CEO Mark Pincus
 Is that true?
The Facebook
The Facebook
• www.facebook.com
• Started February 2004
– Attracted Silicon Valley funding
• Has spread to 2000 schools and 4.2 million users
• Typically attracts 80 percent of a school’s
undergraduate population
– Also gets graduate students, faculty members, staff, and
alumni
• Now targeting high schools
• Growing media attention
Facebook‘s privacy policy
• …is lax, but straightforwardly so:
“Facebook also collects information about you from other sources, such as
newspapers and instant messaging services. This information is gathered
regardless of your use of the Web Site.”
…
“We use the information about you that we have collected from other sources
to supplement your profile unless you specify in your privacy settings that
you do not want this to be done.”
…
“In connection with these offerings and business operations, our service
providers may have access to your personal information for use in
connection with these business activities.”
Facebook and
unique privacy issues
• Unique data
– Includes home location, current location (from IP address),
etc.
• Uniquely identified
– College email account
– Contact information
• Ostensibly bounded community
– “Shared real space”
 …or imagined community?
CMU students and the Facebook: usage data
Studies
•
•
Gross and Acquisti, Proceedings of WPES 2005
Acquisti and Gross, Proceedings of PET 2006
Data gathering
•
In June 2005, we created Facebook profiles with different
characteristics
– E.g., degree of connectedness, geographical location, …
•
We searched for CMU Facebook members’ profiles using
advanced search feature and extracted profile IDs
– Downloaded profiles
– Inferred additional information not immediately visible from profiles
Demographics
Demographics
Demographics
Information revelation
Information revelation
• Male users 63% more likely to leave phone
number than female users
• Single male users tend to report their phone
numbers in even higher frequencies
Data verifiability
Data verifiability
Privacy risks
• Stalking
• Re-identification
• Digital dossier
Privacy risks: Stalking
• Real-World Stalking
– College life centers around class attendance
– Facebook users put home address and class list on their
profiles; whereabouts are known for large portions of the day
• Online stalking
– Facebook profiles list AIM screennames
– AIM lets users add “buddies” without notification
– Unless AIM privacy settings have been changed, adversary can
track when user is online
Privacy risks: Re-identification
• Demographics re-identification
• 87% of US population is uniquely identified by {gender, ZIP,
date of birth} (Sweeney, 2001)
• Facebook users that put this information up on their profile
could link them up to outside, de-identified data sources
• Face re-identification
• Facebook profiles often show high quality facial images
• Images can be linked to de-identified profiles on e.g.
Match.com or Friendster.com using face recognition
• Social Security Number re-identification
• Anatomy of a social security number: xxx yy zzzz
• Based on hometown and date of birth xxx and yy can be
narrowed down substantially
Privacy risks: Digital Dossier
• Users reveal sensitive information (e.g. current partners,
political views) in profiles
• Simple script programs allow adversaries to
continuously retrieve and save all profile information
• Cheap hard drives enable essentially indefinite storage
Privacy risks
Data accessibility
Data accessibility
Data accessibility
• Profile Searchability
– We measured the percentage of users that changed search
default setting away from being searchable to everyone on the
Facebook to only being searchable to CMU users
– 1.2% of users (18 female, 45 male) made use of this privacy
setting
• Profile Visibility
– We evaluated the number of CMU users that changed profile
visibility by restricting access from unconnected users
– Only 3 profiles (0.06%) in total fall into this category
• Caveat: We would not detect users who had made themselves
both unsearchable and invisible within CMU network (safe to
assume their number is very low)
Data accessibility
Actual data accessibility:
An imagined community?
• Extensive, uncontrolled social networks
• Fragile protection:
–
–
–
–
Fake email addresses
Manipulating users
Geographical location
Advanced search features
• Using advanced search features various profile information can be
searched for, e.g. relationship status, phone number, sexual
preferences, political views and (college) residence
• By keeping track of the profile IDs returned in the different
searches a significant portion of the previously inaccessible
information can be reconstructed
– AIM
 Facebook profiles are, effectively, public data
Actual data accessibility:
An imagined community
• “What a great illustration of how things you might
not mind being public in one context can cause
all sorts of problems when they wind up globally
public.”
– CMU student
Initial hypotheses
• Default settings (Mackay 1991)/ Myopic discounting?
– Less than 2% make their profiles less searchable
– Less than 1% make their profiles less visible
• Peer pressure
• Incomplete information and biased perspectives
– An imagined community
• Or simply:
– Low privacy concerns
– Signaling
• Single males list phone number with highly significant more
frequency than females
User survey (pilot)
(Pilot) Survey
• Goals
– Understand CMU Facebook’s users degree of awareness about
the site and its information revelation patterns; understand their
privacy attitudes and expectations
• Thirty-six online questions
• Anonymous, paid
• Pilot
– 50 subjects
– Focused on Facebook users
• Survey link
CAVEAT:
The following results are based on our pilot test (50 subjects).
Hence they must only be considered suggestive trends rather
than robust evidence. We are now exploring the same
questions in the full survey – please contact us for the most
recent results: [email protected].
Density
0
0
.1
.1
.2
Density
.2
.3
.4
.3
Generic concerns
(7-point Likert scale)
2
4
State of the economy
0
2
4
Threats of terrorism
6
0
8
2
4
Threats to personal priv acy
6
8
Density
.15
.1
.1
0
.05
0
Density
.2
.2
.25
.3
0
6
8
0
2
4
Global warming
6
8
.2
Density
2
4
Same-sex marriage
6
8
0
2
4
Permeable borders
6
0
2
4
Stranger knows address
6
8
0
2
4
6
US v etoes global warming regulations
8
0
2
8
Density
0
2
4
6
Friend of f riend knew contact inf ormation
8
0
0
0
.1
.1
.1
.2
.2
Density
.3
.2
.3
.4
.4
.3
.5
0
0
0
0
.2
.1
.1
.4
Density
.2
.6
.3
.3
.8
.4
Specific concerns
(7-point Likert scale)
8
4
Partners inf o
6
Attitudes vs. behavior
• Share of users with high sensitivity (Likert >5) to
partner/sexual orientation information who provide
it on Facebook: ~70%
• Share of users with high sensitivity (Likert >5) to
home location and class schedule information who
provide it on Facebook: ~32%
• Share of users with high sensitivity (Likert >5) to
contact information who provide it on Facebook:
~42%
Awareness:
visibility and searchability
• 21% incorrectly believe only CMU users can
search their profiles
• 71% do not realize that everybody at UPitt can
search their profiles
• 40% do not realize that anybody on Facebook
can search their profiles
• 31% do not realize that everybody at CMU can
read their profiles
• On the other side, 23% incorrectly believe that
everybody on Facebook can read their profiles
Facebook‘s privacy policy,
revisited
“Facebook also collects information about you from other sources, such as
newspapers and instant messaging services. This information is gathered
regardless of your use of the Web Site.”
•
85% believe that is not the case
“We use the information about you that we have collected from other sources
to supplement your profile unless you specify in your privacy settings that
you do not want this to be done.”
•
87% believe that is not the case
“In connection with these offerings and business operations, our service
providers may have access to your personal information for use in
connection with these business activities.”
•
60% believe that is not the case
•
Control: perusal of privacy policy does not improve awareness
Privacy concerns
.2
.1
0
Density
.3
.4
• 69% believe that the information other
Facebook users reveal may create privacy
risks for those users
• But:
0
2
4
6
8
Are y ou concerned about y our personal priv acy on the Facebook?
Information revelation
•
Reasons to provide more personal information
(in order of importance):
1. No factor in particular, it's just fun
2. No factor in particular, but the amount of information I reveal is
necessary to me and other users to benefit from the
FaceBook
3. No factor in particular, rather I am following the norms and
habits common on the site
4. Quite simply, expressing myself and defining my online
persona
5. Showing more information about me to "advertise" myself
…..
–
Getting more potential dates
Other privacy concerns
•
Reasons for low privacy concerns (in order of
importance):
1.
2.
3.
4.
…
Control on information
Control on access
CMU environment
Student environment
Other privacy concerns
•
Does your Facebook profile contain
information that you might not mind being
"public" within the your Facebook or CMU
network, but that would indeed bother you if
other people could access (e.g., family,
interviewers, etc.)?
– 50% answer yes
Is it possible/likely?
0
0
.4
1
.2
0
0
.2
Density
.4
.6
1
0
2
4
6
8
0
2
4
6
8
0
Possible
Graphs by q31
2
4
6
8
Likely
Graphs by q31
0
2
4
6
8
Next steps
Next steps
• Full survey
– Users and non-users: different privacy sensitivities?
• Experiments
– Control for initial privacy settings
– Control for perception of other users’ information patterns
– Control for perception of other users’ information revelation
• Other scripts
– Study evolution of a new network
– Study dynamics of information revelation
Conclusions
• OSN offer exciting ground for privacy research
– Plenty of information revelation
– Alternative explanations
– Actual usage data
• The unknown buddy?
• An imagined community?
Conclusions
• Facebook users claim, in general, to be concerned
about their privacy but
– Publish plenty of personal information
– Do not use privacy enhancing features
• However, they are both
– …uninformed about specific information revelation patterns
– … aware of generic possibilities
• Suggestive evidence pointing towards:
– Signaling, but also
– Myopic discounting
– Incomplete information