Bridging the Gap Between Physical Location and Online Social

Download Report

Transcript Bridging the Gap Between Physical Location and Online Social

Bridging the Gap Between
Physical Location and Online
Social Networks
Justin Cranshaw, Eran Toch, Jason Hong, Aniket Kittur,
Norman Sadeh
School of Computer Science, Carnegie Mellon University
UbiComp 2010
Outlines
•
•
•
•
•
•
•
•
Introduction
Related Works
Method
Model Description
Results
Discussion
Conclusion
Comment
2011/9/20
Weiting Hsu
2
Introduction
• Find relationships between the users’ mobility
patterns and structural properties of their
underlying social network in a location sharing
social network.
• Some related works exist, but can’t provide
enough evidence to reliably establish a
relationship between people in the same place at
the same time.
– Co-location among strangers is frequent
– Location tracking is inherently partial and inexact
2011/9/20
Weiting Hsu
3
Introduction
• To meet above challenges they introduce a novel
set of features, and evaluate them on two tasks:
– Predicting whether two co-located users are friends
on Facebook
– Predicting the number of friends a user has in the
social network.
• They also examine the relative importance of the
predictors used, and show that looking deeper into
characteristics of the locations visit can
significantly improve performance on these tasks
2011/9/20
Weiting Hsu
4
Related Works
• Challenge
– How to infer properties of the social behavior of users
from their location trails
• Related Works
– Eigenvalue decomposition
• A. Eigenbehaviors: identifying structure in routine.*
Behavioral Ecology and Sociobiology 63, 7 (May 2009)
• Inferring friendship network structure by using mobile phone
data. Proceedings of the National Academy of Sciences 106,
36 (September 2009)
– Hierarchical modeling
• Mining user similarity based on location history*. In ACM
SIGSPATIAL GIS 2008
2011/9/20
Weiting Hsu
5
Related Works
• Friendship and Mobility: User Movement In
Location-Based Social Networks
• GeoLife: Building social networks using human
location history
– Share travel experience by GPS trajectories
– Travel recommendation
– Perform personalized friend & location
recommendation*
– Trajectories Data Download
http://research.microsoft.com/enus/downloads/b16d359d-d164-469e-9fd4daa38f2b2e13/
2011/9/20
Weiting Hsu
6
Method-Locaccino
• To collect data from users
• Web-application developed by the Mobile
Commerce Lab at Carnegie Mellon University
• Allows a user to share her current location with
other Locaccino users through her Facebook
social network
• Location ascertained: Combination of GPS +
WiFI + IP geolocation
• Laptop/Mobile
• Updates the user location every 10 minutes
2011/9/20
Weiting Hsu
7
2011/9/20
Weiting Hsu
8
2011/9/20
Weiting Hsu
9
Method-Recruitment, Demographics
and Data Collection
• 489 users
• 7 days to several months (mean of 74 days,
median of 38 day)
• 2 million / 3 million ignore IP geolocation and
observations outside of the Pittsburgh
metropolitan*
• Collected by laptop locator software (93.7%)
– Limitation, due to too stationary
– Sporadic
2011/9/20
Weiting Hsu
10
Method-Recruitment, Demographics
and Data Collection
2011/9/20
Weiting Hsu
11
Method-Co-location
• Divide the latitude and longitude space into
discrete 0:0002/0:0002 (latitude/longitude)
grids (about 30m/30 m)
• Time coordinate into whole 10 minute
intervals
2011/9/20
Weiting Hsu
12
Method-Network Data
• S : Facebook social network of Locaccino
users. There is an edge between vertices u1, u2
∈ 𝑆 if and only if u and u are friends on
Facebook
• C : Undirected graph of The Co-location
Network, that an edge exists between u1 and u2
if they were co-located
• S ∩ 𝐶 : The Co-located Friends Network
1
2011/9/20
2
Weiting Hsu
13
Method-Network Data
2011/9/20
Weiting Hsu
14
MODEL DESCRIPTIONS
• Measuring the diversity of a location
– Find out the type of location where observation
occurred*
– Measured by : (1) frequency (raw count of user
observations that occurred there), (2) user count
(the total number of unique users that visit the
location), (3) entropy (takes into account both user
count and the relative proportions of their
observations)
2011/9/20
Weiting Hsu
15
Measuring the diversity of a location
•
•
•
•
•
•
L : a location, U : the set of all users
UL = { 𝑢 ∈ 𝑈 : u was observed at location L}
𝑂𝑢 : set of location observations of u, O : for ∪𝑢𝜖𝑈 𝑂𝑢
o: 𝑜𝜖𝑂 ,{user ID, latitude + longitude coordinates, a timestamp}
Ou,L = {𝑜𝜖𝑂𝑢 , 𝑜𝜖𝐿 }, OL = {𝑜𝜖𝑂 , 𝑜𝜖𝐿 }
The probability that a random drawn from OL belongs u is
2011/9/20
Weiting Hsu
16
Measuring the diversity of a location
2011/9/20
Weiting Hsu
17
Measuring the diversity of a location
2011/9/20
Weiting Hsu
18
MODEL DESCRIPTIONS
• Co-location features Categories
– For each co-location edge {u1,u2} of C, we extract
67 features, divided into 4 categories
• The 4 categories
– Intensity and Duration
– Location Diversity
– Specificity
– Structural Properties
2011/9/20
Weiting Hsu
19
MODEL DESCRIPTIONS
• Intensity and Duration
– Measure qualities related to the size and spatial and temporal range
of the set of co-locations
• Location Diversity
– For a given location l, Compute Freq(l), UserCount(l) and
Entropy(l) for every co-location between two users.
• Specificity
– Measure specificity a location is to the pair of co-locationed users
– We define the TF-IDFu1,u2 (l) to be the number of times two
users were observed co-located at l divided by Freq(l).
• Structural Properties
– Measure the strength of the structural relationship between pair of
users in C
– NumMutualNeighbors, NeighborhoodOverlap, LocationOverlap*
2011/9/20
Weiting Hsu
20
MODEL DESCRIPTIONS
• Measuring the regularity of a user’s routine
– 𝑅 ⊂ {𝐿, 𝐷, 𝐻} : the components of the restriction
(L: location, D: day of the week, H: hour of the day)
– o(R) be the restriction of o to the components of S
– Ou(R)= {𝑜(𝑅) , 𝑜𝜖𝑂𝑢}
– Probability distribution
2011/9/20
Weiting Hsu
21
2011/9/20
Weiting Hsu
22
MODEL DESCRIPTIONS
• User mobility features
– For each vertex u of C, we extract 64 features from the
data, divided into 3 categories
• The 3 Categories
– Intensity and Duration
– Location Diversity: measure the diversity of the
location observations of a single user
– Mobility Regularity: {L}, {L, H}, {L, D}, {L, H, D}.
Computing the schedule size and schedule entropy on
these four restrictions
2011/9/20
Weiting Hsu
23
RESULTS
• Inferring social network ties from colocation
– FacebookFriends : bool, indicates each edge of C
is in S or not
– 307 co-location edges where the users were
Facebook friends and 3330 colocation edges where
the users were not Facebook friends
– Trained 6 classifiers on the data, 50-fold cross
validation procedure.
2011/9/20
Weiting Hsu
24
identifies 74/307 friendships
and 3295/3330 non-friendships.
accuracy of the classifier is high (92%)
But…..
class distribution is heavily biased towards non-friendship
2011/9/20
Weiting Hsu
25
RESULTS
• To examine the relative predictive power of the 4 feature
classes
– Trained two AdaBoost classifiers (exponential loss /
decision stumps) using only the Intensity and Duration
features and the three remaining classes respectively
• 50-fold cross validation for all classifier estimates
• Co-location alone is not a very strong predictor of online
friendship, but the prediction can be significantly
improved by looking at additional contextual social
properties of the locations the users visit.
2011/9/20
Weiting Hsu
26
2011/9/20
Weiting Hsu
27
RESULTS
• Inferring the number of friends from user
mobility data
– consider the relationship between the number of
Facebook friends a user has in Locaccino and her
mobility patterns
– calculated the Pearson’s correlation between the
node degrees in S with each user mobility feature
listed in Table 2
2011/9/20
Weiting Hsu
28
MaxEntropyWeekend
(cor=0.39 with 95% CI=(0.31, 0.47))
SchEntropyL (cor=0.16 with 95% CI=(0.06,0.25)),
SchEntropyLH (cor=0.28 with 95% CI=(0.20,0.37)),
SchEntropyLD (cor=0.28 with 95% CI=(0.19,0.37)),
SchEntropyLHD(cor=0.32 with 95% CI=(0.23,0.40)),
The highest correlated sample statistic on the set of locations is always the maximum:
MaxEntropy (cor=0.29 with 95% CI=(0.20,0.37)), MaxUserCount (cor=0.30 with 95%
CI=(0.21,0.38)), and MaxFreq (cor=0.29 with 95% CI=(0.20,0.37))*
Weiting Hsu
2011/9/20
29
RESULTS
• Users who visit highly diverse locations tend to have
more social network ties than those who do not
• Users who have irregular schedules tend to have more
ties in the online social network S.
• To better understand the interrelations between user
mobility features and the number of friends, they
conducted a multiple regression analysis
– Types of places a user visits and the Regularity of a
user’s routine are stronger predictors for the number of
Locaccino friends they have than how long or how
intensely they use the system
2011/9/20
Weiting Hsu
30
(adj R2=0.21, p-val < 0.001)
2011/9/20
Weiting Hsu
31
DISCUSSION
• The co-location network has roughly 3 times the
number of edges as the social network, yet the social
network is better connected
• The co-location network has many small
disconnected components, but it has a single large
and highly connected subcomponent.
• Co-location graph contains important information
that can be used to reconstruct a portion of the social
network.
2011/9/20
Weiting Hsu
32
Conclusion
• Location-based features (such as the entropy of
a location) have significant correlations with
real social behavior features
• Explored connection between an online social
network and the location traces of its users
• Evaluated a set of features of the location
observations for their potential in analyzing the
social behavior of the users
2011/9/20
Weiting Hsu
33
Conclusion
• Future Works
– Can different types of social relationships be
inferred from location data?
– Can tie strength be estimated from locations?
– Does offline interaction spur online
communication?
– seeking a more diverse pool of participants as
other pop-ulations could exhibit different online
and offline behavior
2011/9/20
Weiting Hsu
34
Comment
• Pros
– Provide a set of novel features for reference
– A more delicate work on seeking the relationship
between location and social network
– The classifier is useful in friend recommendation
systems to find users with strong co-location
patterns who are yet friends in the social network
• Cons
– Didn’t clearly mention how they combine the
features and the action of predict
2011/9/20
Weiting Hsu
35