Transcript Slide 1
MEASUREMENT AND ANALYSIS OF
ONLINE SOCIAL NETWORKS
Professor : Dr Sheykh Esmaili
1
Presenters:
Pourya Aliabadi
Boshra Ardallani
Paria Rakhshani
INTRODUCTION
The Internet has spawned different types of
information sharing systems, including the Web.
MySpace (over 190 million users)
Orkut (over 62 million)
LinkedIn (over 11 million)
LiveJournal (over 5.5 million)
Unlike the Web, which is largely organized
around content, online social networks are
organized around users.
2
INTRODUCTION(CONT.)
Users join a network, publish their profile and
create links to any other users with whom they
associate.
The resulting social network provides a basis for
maintaining social relationships, for finding
users with similar interests, and for locating
content.
Understanding of the graph structure of online
social networks is necessary to evaluate current
systems, to design future online social network
based systems.
3
INTRODUCTION(CONT.)
Recent work has proposed the use of social
networks to mitigate email spam, to improve
Internet search, and to defend against Sybil
attacks.
We obtained our data by crawling publicly
accessible information on these sites
4
INTRODUCTION(CONT.)
This differs from content graphs like the graph
formed by Web hyperlinks, where the popular
pages (authorities) and the pages with many
references (hubs) are distinct.
We find that online social networks contain a
large, strongly connected core of high-degree
nodes, surrounded by many small clusters of lowdegree nodes.
Flow of information in these networks.
5
ONLINE SOCIAL NETWORKING SITES
Online social networking sites. are usually
run by individual corporations.
Users. must register with a site, possibly under a
pseudonym. Some sites allow browsing of public
data without explicit.
Links. The social network is composed of user
accounts and links between users. Some sites
(e.g. Flickr, LiveJournal) allow users to link to
any other user, without consent from the link
target.
6
ONLINE SOCIAL NETWORKING SITES(CONT.)
Groups. Most sites enable users to create and
join special interest groups.
Users can post messages to groups and upload shared
content to the group. Certain groups are moderated.
admission to such a group and postings to a group are
controlled by a user designated as the group’s moderator.
Other groups are unrestricted, allowing any member to join
and post messages or content.
7
IS THE SOCIAL NETWORK USED IN
LOCATING CONTENT?
Only Orkut is a “pure” social networking site, in
the sense that the primary purpose of the site is
finding and connecting to new users.
Flickr, YouTube, and LiveJournal are used for
sharing photographs, videos, and blogs,
respectively.
8
WHY STUDY SOCIAL NETWORKS?
Are already at the heart of some very popular
Web sites.
Play an important role in future personal and
commercial online interaction.
Help us understand the impact of online social
networks on the future Internet.
We speculate how our data might be of interest to
researchers in other disciplines.
9
SHARED INTEREST AND TRUST
Adjacent users in a social network tend to trust
each other.
A number of research systems have been
proposed to exploit this trust.
Adjacent users in a social network also tend to
have common interests.
Users browse neighboring regions of their social
network because they are likely to find content
that is of interest to them.
10
IMPACT ON FUTURE INTERNET
Impact on future Internet
Impact on other disciplines
Sociologists can examine our data to test existing
theories
Studying the structure of online social networks
may help improve the understanding of online
campaigning and viral marketing.
Political campaigns have realized the importance
of blogs in elections.
11
HOW TO GET DATASETS?
Sites reluctant to give out data
Cannot enumerate user list
Performed crawls of user graph
Crawled using cluster of 58 machines
Used APIs where available
Otherwise, used HTML screen scraping
12
CHALLENGES IN CRAWLING LARGE GRAPHS
Need to crawl quickly
Underlying social networks changing rapidly
Need to crawl completely
Social networks aren’t necessarily connected, some
users have no links, or small clusters
Need to estimate the crawl coverage
13
HOW TO VERIFY SAMPLES
Obtain a random user sample
Conduct a crawl using these random users as
seeds
See if these random nodes connect to the original
WCC (weakly connected component)
14
DATASET FROM FLICKR
Used API to conduct the crawl
Obtained random users by guessing usernames
(########@N00) to evaluate coverage
Covered 27% of user population, but remaining
users have very few links
15
DATASET FROM LIVEJOURNAL
Used API to conduct the crawl
Obtained random users using special URL
http://www.livejournal.com/random.bml
Crawl covered 95% of user population
16
DATASET FROM ORKUT
Used
HTML screen-scraping to conduct the
crawl
17
DATASET FROM YOUTUBE
Used API to conduct the crawl
Could not obtain random users
Usernames user-specified strings
Unable to estimate fraction of users covered
18
HIGH-LEVEL DATA CHARACTERISTICS
Flickr
LiveJournal
Orkut
YouTube
Number of Users
1.8M
5.3M
3.1M
1.2M
Avg. friends per User
12.2
17.0
106
4.3
Number of User Groups
0.1M
7.5M
8.7M
30K
Avg. Group per User
4.6
21.2
106
0.25
Metrics
vary by orders of magnitude
However, networks share many key properties
19
19
ANALYSIS OF NETWORK STRUCTURE
Characterize the structural properties of the four
network and compare them
Link symetry
Power-law node degrees
Correlation of indegree and outdegree
Path lengths and diameter
Link degree correlations
Densely connected core
Tightly clustered fringe
20
20
HOW ARE THE LINKS DISTRIBUTED?
Distribution of indegree and outdegree is similar
● Underlying cause is link symmetry
21
LINK SYMMETRY
symmetry
YouTube
Flickr
LiveJournal
Orkut
79.1%
62.0%
73.5%
100.0%
Possibly contributed by informing users of new incoming links
Unlike other complex networks, such as the Web
● Sites like cnn.com receive much links more than they give
makes it harder to identify reputable sources
22
22
POWER-LAW NODE DEGREES
All social networks show properties consistent
with power-law networks.
● The majority of the nodes have small degree, and a
few nodes have significantly higher degree
23
CORRELATION OF INDEGREE AND OUTDEGREE
outdegree vs. indegree in web
outdegree vs. indegree in social networks
PW
CNN
OSN
24
PATH LENGTHS AND DIAMETER
network
Avg.path len
diameter
web
16.12
905
flickr
5.67
27
livejournal
5.88
20
orkat
4.25
9
youtube
5.10
21
all four networks have short path length from
4.25 – 5.88
25
25
LINK DEGREE CORRELATIONS
Examine which users tend to connect to each other
Focus on:
Joint degree distribution
How often nodes of different degrees connect to each other
Scale free behavior
A value calculated directly from the joint degree distribution
of graph
Assortativity
A measure of the likelihood for nodes to connect to other
nodes with similar degrees
26
JOINT DEGREE DISTRIBUTION AND SCALE-FREE BEHAVIOUR
27
DENSELY CONNECTED CORE
comprising of between 1% and 10% of the highest degree nodes
removing 10% of core nodes results in breaking up graph into
millions of very small SCCs
graphs below show results as nodes are removed starting with
highest-degree nodes (left) and path length as graph is
constructed beginning with highest-degree nodes(right)
Sub logarithmic growth
28
28
CLUSTERING COEFFICIENT
Clustering coefficient C is a metric of cliquishness
Number of links between friends
C
Number of links that could exist
Online social networks are tightly clustered
10,000 times more clustered than random graphs
5-50 times more clustered than random power-law graphs
29
TIGHTLY CLUSTERED FRINGE
Low-degree users show high degree of clustering
Social network graphs show stronger clustering
30
GROUPS
Network
Groups
Usage
Avg. Size
Avg. C
Flickr
103M
21%
82
0.47
LiveJournal
7M
61%
15
0.81
Orkut
9M
13%
37
0.52
YouTube
30M
8%
10
0.34
31
31
WHAT DOES THE STRUCTURE LOOK LIKE
o the networks contain a densely
connected core of high-degree
nodes;
o and that this core links small
groups of strongly clustered, lowdegree nodes at the fringes of the
network.
octopus
32
CONCLUSIONS
Structure of OSNs is significantly different from the Web
Higher degree symmetry in OSNs
Much higher levels of local clustering in OSNs
Privacy controls make graph crawling very difficult
Pure social networks different from content sharing networks
33
Thanks
34