Lecture27 - The University of Texas at Dallas

Download Report

Transcript Lecture27 - The University of Texas at Dallas

UT DALLAS
Erik Jonsson School of Engineering & Computer Science
Data and Applications Security
Security and Privacy in
Online Social Networks
Murat Kantarcioglu
Bhavani Thuraisingham
Thanks to Raymond Heatherly and Barbara Carminati for
helping in slide preparations
April 2012
FEARLESS engineering
Outline
•
•
•
•
•
•
Introduction to Social Networks
Properties of Social Networks
Social Network Analysis Basics
Data Privacy Basics
Privacy and Social Networks
Access control issues for Online Social Networks
FEARLESS engineering
Social Networks
• Social networks have important implications for our
daily lives.
–
–
–
–
Spread of Information
Spread of Disease
Economics
Marketing
• Social network analysis could be used for many activities
related to information and security informatics.
– Terrorist network analysis
FEARLESS engineering
Enron Social Graph*
* http://jheer.org/enron/
FEARLESS engineering
Romantic Relations at “Jefferson High School”
FEARLESS engineering
Emergence of Online Social Networks
• Online Social networks
become increasingly popular.
• Example: Facebook*
– Facebook has more than 200
million active users.
– More than 100 million users
log on to Facebook at least
once each day
– More than two-thirds of
Facebook users are outside of
college
– The fastest growing
demographic is those 35 years
old and older
*http://www.facebook.com/press/info.php?statistics
FEARLESS engineering
Properties of Social Networks
• “Small-world” phenomenon
– Milgram asked participants to pass a letter to one of their
close contacts in order to get it to an assigned individual
– Most of the letters are lost (~75% of the letters)
– The letters who reached their destination have passed
through only about six people.
– Origins of six degree
– Mean geodesic distance l of graphs grows logarithmically or
even slower with the network size. (dij is the shortest distance
between node i and j) .
l
FEARLESS engineering
2
d

i  j ij
n(n  1)
“Small-World” Example: Six Degrees of
Kevin Bacon
FEARLESS engineering
Properties of Social Networks
• Degree Distribution
Clustering
• Other important properties
–
–
–
–
–
Community Structure
Assortativity
Clustering Patterns
Homomiphly
….
• Many of these properties could be used for
analyzing social networks.
FEARLESS engineering
Social Network Mining
• Social network data is represented a graph
– Individuals are represented as nodes
• Nodes may have attributes to represent personal traits
– Relationships are represented as edges
• Edges may have attributes to represent relationship
types
• Edges may be directed
• Common Social Network Mining tasks
– Node classification
– Link Prediction
FEARLESS engineering
Data Privacy Basics
• How to share data without violating privacy?
• Meaning of privacy?
– Identity disclosure
– Sensitive Attribute disclosure
• Current techniques for structured data
– K-anonymity
– L-diversity
– Secure multi-party computation
• Problem: Publishing private data while, at the same
time, protecting individual privacy
• Challenges:
– How to quantify privacy protection?
– How to maximize the usefulness of published data?
– How to minimize the risk of disclosure?
– …
FEARLESS engineering
Sanitization and Anonymization
•
•
•
•
•
Automated de-identification of private data with certain privacy
guarantees
– Opposed to “formal determination by statisticians” requirement of
HIPAA
Two major research directions
1. Perturbation (e.g. random noise addition)
2. Anonymization (e.g. k-anonymization)
Removing unique identifiers is not sufficient
Quasi-identifier (QI)
– Maximal set of attributes that could help identify individuals
– Assumed to be publicly available (e.g., voter registration lists)
As a process
1. Remove all unique identifiers
2. Identify QI-attributes, model adversary’s background knowledge
3. Enforce some privacy definition (e.g. k-anonymity)
FEARLESS engineering
Re-identifying “anonymous” data (Sweeney ’01)
• 37 US states mandate
collection of information
• She purchased the voter
registration list for
Cambridge Massachusetts
– 54,805 people
• 69% unique on postal code
and birth date
• 87% US-wide with all three
•
Solution: k-anonymity
– Any combination of values
appears at least k times
•
Developed systems that
guarantee k-anonymity
– Minimize distortion of results
FEARLESS engineering
k-Anonymity
•
•
•
•
Each released record should be indistinguishable from at least (k-1)
others on its QI attributes
Alternatively: cardinality of any query result on released data should be
at least k
k-anonymity is (the first) one of many privacy definitions in this line of
work
– l-diversity, t-closeness, m-invariance, delta-presence...
Complementary Release Attack
– Different releases can be linked together to compromise kanonymity.
– Solution:
• Consider all of the released tables before release the new one,
and try to avoid linking.
• Other data holders may release some data that can be used in
this kind of attack. Generally, this kind of attack is hard to be
prohibited completely.
FEARLESS engineering
L-diversity principles
• L-diversity principle: A q-block is l-diverse if
contains at least l ‘well represented” values
for the sensitive attribute S. A table is ldiverse if every q-block is l-diverse
l-diversity may be difficult and unnecessary to achieve.
A single sensitive attribute
Two values: HIV positive (1%) and HIV negative (99%)
Very different degrees of sensitivity
l-diversity is unnecessary to achieve
2-diversity is unnecessary for an equivalence class that contains
only negative records
l-diversity is difficult to achieve
Suppose there are 10000 records in total
To have distinct 2-diversity, there can be at most 10000*1%=100
equivalence classes
FEARLESS engineering
Privacy Preserving Distributed Data Mining
• Goal of data mining is summary results
– Association rules
– Classifiers
– Clusters
• The results alone need not violate privacy
– Contain no individually identifiable values
– Reflect overall results, not individual organizations
The problem is computing the results without access to
the data!
Data needed for data mining maybe distributed among parties
Credit card fraud data
Inability to share data due to privacy reasons
HIPPAA
Even partial results may need to be kept private
FEARLESS engineering
Secure Multi-Party Computation (SMC)
• The goal is computing a function
f ( x1 , x2 ,, xn )
without revealing xi
• Semi-Honest Model
– Parties follow the protocol
• Malicious Model
– Parties may or may not follow the protocol
• We cannot do better then the existence of the
third trusted party situation
• Generic SMC is too inefficient for PPDDM
– Enhancements being explored
FEARLESS engineering
Graph Model
Lindamood et al. 09 &
Heatherly et al. 09
• Graph represented by a set of homogenous
vertices and a set of homogenous edges
• Each node also has a set of Details, one of
which is considered private.
FEARLESS engineering
Naïve Bayes Classification
Lindamood et al. 09 &
Heatherly et al. 09
• Classification based only on specified
attributes in the node
FEARLESS engineering
Naïve Bayes with Links
Lindamood et al. 09 &
Heatherly et al. 09
• Rather than calculate the probability from
person nx to ny we calculate the probability of
a link from nx to a person with ny‘s traits
FEARLESS engineering
Link Weights
Lindamood et al. 09 &
Heatherly et al. 09
• Links also have associated weights
• Represents how ‘close’ a friendship is
suspected to be using the following formula:
FEARLESS engineering
Collective Inference
Lindamood et al. 09 &
Heatherly et al. 09
• Collection of techniques that use node
attributes and the link structure to refine
classifications.
• Uses local classifiers to establish a set of
priors for each node
• Uses traditional relational classifiers as the
iterative step in classification
FEARLESS engineering
Relational Classifiers
Lindamood et al. 09 &
Heatherly et al. 09
•
•
•
•
Class Distribution Relational Neighbor
Weighted-Vote Relational Neighbor
Network-only Bayes Classifier
Network-only Link-based Classification
FEARLESS engineering
Experimental Data
Lindamood et al. 09 &
Heatherly et al. 09
• 167,000 profiles from the Facebook online
social network
• Restricted to public profiles in the Dallas/Fort
Worth network
• Over 3 million links
FEARLESS engineering
General Data Properties
Lindamood et al. 09 &
Heatherly et al. 09
Diameter of the largest component
16
Number of nodes
167,390
Number of friendship links
3,342,009
Total number of listed traits
4,493,436
Total number of unique traits
110,407
Number of components
18
Probability Liberal
.45
Probability Conservative
.55
FEARLESS engineering
Inference Methods
Lindamood et al. 09 &
Heatherly et al. 09
• Details only: Uses Naïve Bayes classifier to
predict attribute
• Links Only: Uses only the link structure to
predict attribute
• Average: Classifies based on an average of
the probabilities computed by Details and
Links
FEARLESS engineering
Predicting Private Details
Lindamood et al. 09 &
Heatherly et al. 09
• Attempt to predict the value of the political
affiliation attribute
• Three Inference Methods used as the local
classifier
• Relaxation labeling used as the Collective
Inference method
FEARLESS engineering
Removing Details
Lindamood et al. 09 &
Heatherly et al. 09
• Ensures that no ‘false’ information is added to
the network, all details in the released graph
were entered by the user
• Details that have the highest global
probability of indicating political affiliation
removed from the network
FEARLESS engineering
Removing Links
Lindamood et al. 09 &
Heatherly et al. 09
• Ensures that the link structure of the released
graph is a subset of the original graph
• Removes links from each node that are the
most like the current node
FEARLESS engineering
Most Liberal Traits
Lindamood et al. 09 &
Heatherly et al. 09
Trait Name
Trait Value
Weight Liberal
Group
legalize same sex
marriage
46.16066789
Group
every time i find out a
cute boy is conservative
a little part of me dies
39.68599463
Group
equal rights for gays
33.83786875
Group
the democratic party
32.12011605
Group
not a bush fan
31.95260895
Group
people who cannot
understand people who
voted for bush
30.80812425
Group
government religion
disaster
29.98977927
Group
buck fush
27.05782866
FEARLESS engineering
Most Conservative Traits
Lindamood et al. 09 &
Heatherly et al. 09
Trait Name
Trait Value
Weight Conservative
Group
george w bush is my
homeboy
45.88831329
Group
college republicans
40.51122488
Group
texas conservatives
32.23171423
Group
bears for bush
30.86484689
Group
kerry is a fairy
28.50250433
Group
aggie republicans
27.64720818
Group
keep facebook clean
23.653477
Group
i voted for bush
23.43173116
Group
protect marriage one
man one woman
21.60830487
FEARLESS engineering
Most Liberal Traits per Trait Name
Lindamood et al. 09 &
Heatherly et al. 09
Trait Name
Trait Value
Weight Liberal
activities
amnesty international
4.659100601
Employer
hot topic
2.753844959
favorite tv shows
queer as folk
9.762900035
grad school
computer science
1.698146579
hometown
mumbai
3.566007713
Relationship Status
in an open relationship
1.617950632
religious views
agnostic
3.15756412
looking for
whatever i can get
1.703651985
FEARLESS engineering
Experiments
Lindamood et al. 09 &
Heatherly et al. 09
• Conducted on 35,000 nodes which recorded
political affiliation
• Tests removing 0 details and 0 links, 10
details and 0 links, 0 details and 10 links, and
10 details and 10 links
• Varied Training Set size from 10% of
available nodes to 90%
FEARLESS engineering
Local Classifier Results
FEARLESS engineering
Lindamood et al. 09 &
Heatherly et al. 09
Collective Inference Results
FEARLESS engineering
Lindamood et al. 09 &
Heatherly et al. 09
Online Social Networks Access Control
Issues
• Current access control systems for online
social networks are either too restrictive or
too loose
– “selected friends”
• Bebo, Facebook, and Multiply.
– “neighbors” (i.e., the set of users having musical preferences
and tastes similar to mine)
• Last.fm
– “friends of friends”
• (Facebook, Friendster, Orkut);
– “contacts of my contacts” (2nd degree contacts), “3rd”
and“4th degree contacts”
• Xing
FEARLESS engineering
Challenges
I want only my
family and close
friends to see this
picture.
FEARLESS engineering
Requirements
• Many different online social networks with different
terminology
– Facebook vs Linkedin
• We need to have flexible models that can represent
– User’s profiles
– Relationships among users
• (e.g. Bob is Alice’s close friend)
– Resources
• (e.g., online photo albums)
– Relationships among users and resources
• (e.g., Bob is the owner of the photo album and Alice is tagged in
this photo),
– Actions (e.g., post a message on someone’s wall).
FEARLESS engineering
Overview of the Solution
• We use semantic web technologies (e.g.,
OWL) to represent social network knowledge
base.
• We use semantic web rule language (SWRL)
to represent various security, admin and filter
policies.
FEARLESS engineering
Modeling User Profiles and Resources
• Existing ontologies such as FoAF could be
extended to capture user profiles.
• Relationship among resources could be
captured by using OWL concepts
– PhotoAlbum rdfs:subClassOf Resource
– PhotoAlbum consistsOf Photos
FEARLESS engineering
Modeling Relationships Among Users
• We model relationships among users by defining N-ary
relationship
– :Christine
a :Person ;
:has_friend _:Friendship_Relation_1.
:_Friendship_relation_1
a :Friendship_Relation ;
:Friendship_trust :HIGH;
:Friendship_value :Mike .
• Owl reasoners cannot be used to infer some relationships
such as Christine is a third degree friend of John.
– Such computations needs to be done separately and represented
by using new class.
FEARLESS engineering
Specifying Policies Using OSN Knowledge
Base
• Most of the OSN information
could be captured using OWL to
represent rich set of concepts
• This makes it possible to specify
very flexible access control
policies
– “Photos could be accessed by
friends only” automatically
implies closeFriend can access
the photos too.
– Policies could be defined
based on user-resource
relationships easily.
FEARLESS engineering
Security Policies for OSNs
• Access control policies
• Filtering policies
– Could be specified by user
– Could be specified by authorized user
• Admin policies
– Security admin specifies who is authorized specify
filtering and access control policies
– Exp: if U1 isParentOf U2 and U2 is a child than
U1 can specify filtering policies for U2.
FEARLESS engineering
Security Policy Specification (using
semantic web technologies)
• Semantic Web Rule Language (SWRL) is used for
specifying access control, filtering and authorization
policies.
• SWRL is based on OWL:
– all rules are expressed in terms of OWL concepts
(classes, properties, individuals, literals…).
• Using SWRL, subject, object and actions are
specified
• Rules can have different authorization that states the
subject’s rights on target object.
FEARLESS engineering
Knowledge based for Authorizations and
Prohibitions
• Authorizations/Prohibitions needs to be specified
using OWL
– Different object property for each actions
supported by OSN.
– Authorizations/prohibitions could automatically
propagate based on action hierarchies
• Assume “post” is a subproperty of “write”
• If a user is given “post” permission than user
will have “write” permission as well
• Admin Prohibitions need to be specified slightly
different. (Supervisor, Target, Object, Privilige)
FEARLESS engineering
Security Rule Examples
• SWRL rule specification does depend on the
authorization and OSN knowledge bases.
– It is not possible to specify generic rules
• Examples:
FEARLESS engineering
Security Rule Enforcement
• A reference monitor evaluates the requests.
• Admin request for access control could be
evaluated by rule rewriting
– Example: Assume Bob submits the following
admin request
– Rewrite as the following rule
FEARLESS engineering
Security Rule Enforcement
• Admin requests for Prohibitions could be rewritten as
well.
– Example: Bob issues the following prohibition request
– Rewritten version
• Access control requests needs to consider both filter and
access control policies
FEARLESS engineering
Framework Architecture
Social Network
Application
Access request
Access
Decision
Reference
Monitor
Knowledge Base
Queries
Modified Access
request
Reasoning Result
Semantic
Web
Reasoning
Engine
FEARLESS engineering
Policy
Retrieval
Policy Store
SN Knowledge
Base
Conclusions
• Various attacks exist to
– Identify nodes in anonymized data
– Infer private details
• Recent attempts to increase social network access control to
limit some of the attacks
• Balancing privacy, security and usability on online social
networks will be an important challenge
• Directions
– Scalability
• We are currently implementing such system to test its scalability.
– Usability
• Create techniques to automatically learn rules
• Create simple user interfaces so that users can easily specify these
rules.
FEARLESS engineering