Social-network-worksheet

Download Report

Transcript Social-network-worksheet

Social Network Mining for
Digital Library Application
Dr. Thanachart Ritbumroong
King Mongkut’s University of Technology Thonburi
Assist. Prof. Dr. Satidchoke Phosaard
Suranaree University of Technology
Agenda




Social Media Mining Concepts
Data Extraction and Preparation
Social Network Analysis
Social Media Mining for Recommender System
SOCIAL MEDIA CONCEPTS
Social Media Mining
 The process of representing, analyzing, and extracting
actionable patterns from social media data
 The study on how individuals (also known as social atoms)
interact and how Social Molecule communities (i.e., social
molecules) form
 Social Media is … “the group of internet-based applications
that build on the ideological and technological foundations of
Web 2.0, and that allow the creation and exchange of usergenerated content” … (Kaplan and Haenlein, 2010)
Applications





Facebook: People you may know
Amazon: Other customers suggested these items
Netflix: movie suggestions for you
Targeted marketing
Online advertising
Major Components of a Network
 Vertices
– Nodes, agents, entities, or items
– Representing people, or social structures (workgroups, teams,
organizations, institutions, states, or countries)
 Edges
– Links, ties, connections, or relationships
– Connecting two vertices together
– Representing proximity, collaborations, partnerships, transactions, etc.
Directed Graph
ʋ1
ʋ2
ʋ3
Unidirected Graph
ʋ1
ʋ2
ʋ3
Network Data
 Differ from attribute data
 Two ways of presenting network data
– as a matrix
Mike
Nicole
Tim
Mike
Nicole
0
1
1
Tim
0
0
0
Mike
1
0
0
– as an edge list
Vertex 1
Vertex 2
Nicole
Tim
Nicole
Mike
Mike
Nicole
Nicole
Tim
Types of Networks
 Full, Partial, and Egocentric Networks
– Full: contain all entities
– Partial: subset, topic centric
– Egocentric: include only individuals who are connected to a specified
ego (person)
 Unimodal, Multimodal, and Affiliation Networks
– Unimodal: one type of vertex
– Multimodal: many types of vertex (persons, posts, pictures, etc.)
– Affiliation: bimodal network
 Multiplex Networks
– multiple types of connection (following, reply to, mention, etc.)
Network Analysis Metrics
 Aggregate Networks Metrics: describing entire networks
– Density
• the level of interconnectedness of the vertices
• a count of the number of relationships observed to be present in a network
divided by the total number of possible relationships that could be present
– Centralization
• the amount to which the network is centered on one or a few important
nodes
Density
 the total number of possible relationships
 directed graph
emax = n*(n-1)
 density = e/ emax
 undirected graph
emax = n*(n-1)/2
Directed Graph
ʋ1
ʋ2
ʋ3
 density = 3/ 6 = 0.5
Unidirected Graph
ʋ1
ʋ2
ʋ3
 density = 2/ 3 = 0.67
Centralization
 Freeman’s general formula for centralization:
C



maximum value in the network
g
CD

i1
(n )  CD (i)
*
D
[(N 1)(N  2)]
Degree Centralization
CD = 0.167
CD = 1.0
CD = 0.167
Network Analysis Metrics
 Vertex-Specific Networks Metrics: describing a specific
vertex
– Degree Centrality
• a simple count of the total number of connections linked to a vertex
• for directed networks; in-degree (point inward) and out-degree (point
outward)
– Betweenness Centralities
• the amount to which the network is centered on one or a few important
nodes
Normalized Degree Centrality
 divide by the max. possible, i.e. (N-1)
Betweenness Centrality
 how many pairs of individuals would have to go through you
in order to reach one another in the minimum number of hops?
A
B
C
D
E
 A lies between no two other vertices
 B lies between A and 3 other vertices: C, D, and E
 C lies between 4 pairs of vertices (A,D),(A,E),(B,D),(B,E)
NODEXL: INTRODUCTION
What is NodeXL?
 An open source software for
social network analysis
 Extension to Microsoft Excel
 Easy to use
 Provide basic network
analysis and visualization
features
 www.codeplex.com/NodeXL
NodelXL Template
 Edges
– Vertex 1 = source
– Vertex 2 = destination
– Attributes
NodeXL Edge List
Vertices will be automatically generated
Showing the graph
There are several automatic layouts that can be selected from the control in the
graph pane or in the NodeXL ribbon.
Fruchterman-Reingo
Harel-Koren Fast Multiscale
Adding descriptive data
 Color
– CSS color names
– RGB format (240, 12, 135)
 Size
– Between 1 and 100
 Shape
–
–
–
–
–
–
–
–
–
–
–
1 = Circle
2 = Disk
3 = Sphere
4 = Square
5 = Solid Square
6 = Diamond
7 = Solid Diamond
8 = Triangle
9 = Solid Triangle
10 = Label
11 = Image
 Label
Color
Autofilling
 Allowsyou to provide
instructions on how
NodeXL should fill in the
worksheet columns such
as those relating to size
and shape.
Autofilling
Graph with details
Calculating Metrics
 Network analysis metrics
can be automatically
calculated in NodeXL.
 Once completed, NodeXL
displays each vertex
specific metric in a set of
Graph Metrics columns in
the Vertices worksheet.
Graph Metrics
 Graph type. Undirected or directed
 Vertices. The number of total
vertices
 Unique edges. The number of
unique edges found in the edges
worksheet.
 Edges with duplicates. The
number of repeated vertex pairs on
the edges worksheet.
 Total edges. The number of total
edges
 Self-loops. The number of edges
that connect a vertex with itself.
Graph Metrics (Cont')
 Connected components. The
number of connected components
(i.e., clusters of vertices that are
connected to each other but
separate from other vertices in the
graph).
 Single vertex connected
components. The number of
isolated vertices that are not
connected to any other vertices in
the graph.
 Maximum vertices in a connected
component. The number of
vertices in the connected
component with the most vertices.
Graph Metrics (Cont')
 Maximum edges in a connected
component. The number of edges
in the connected component with
the most edges.
 Maximum geodesic distance
(diameter). The geodesic distance
is the length of the shortest path
between two people.
 Average geodesic distance. The
average of all geodesic distances.
This value gives a sense of how
“close” community members are
from one another.
 Graph density. The number
between 0 and 1 indicating how
interconnected the vertices are in
the network.
Vertex Specific Metrics
 Degree
– The degree of a vertex (sometimes
called degree Centrality) is a count
of the number of unique edges that
are connected to it.
 Betweenness Centrality
– how many pairs of individuals
would have to go through you in
order to reach one another in the
minimum number of hops?
 Closeness Centrality
– How close each person is to the
other people in the network
– the inverse of the sum of the
shortest distances between the
vertex and all other vertices
reachable from it
Vertex Specific Metrics
 Eigenvector Centrality
– takes into consideration not only how
many connections a vertex has (i.e., its
degree), but also the degree of the
vertices that it is connected to
– a measure of the importance of a node
in a network
– It assigns relative scores to all nodes
in the network based on the principle
that connections to high-scoring nodes
contribute more to the score of the
node in question than equal
connections to low-scoring node
 Pagerank
– the importance of each vertex within
the graph using a link analysis
algorithm developed by Larry Page
 Clustering Coefficient
– a vertex in a graph quantifies how
close the vertex and its neighbors are
to being a clique (complete graph)."
Import data from Social Media
Social Network Importer
 allow users to directly download and import different Facebook
networks
 http://socialnetimporter.codeplex.com/
 Installation Guide
– Close NodeXL
– Download the zip file from http://socialnetimporter.codeplex.com/
Unzip the file: you will find two items:
FacebookAPI.DLL
SocialNetImporter.DLL
– Copy these files to the NodeXL Plug-ins Directory specified in the "Import
Options..." (Using third-party graph data importers in NodeXL Excel
Template 2014)
– Restart NodeXL: you should see the Facebook Import option in the
NodeXL>Data>Import menu.
Exercise: Facebook
 Download data from your
facebook account
Edge List
Visualizing Social Network
Calculate Metrics
Vertex Specific Metrics
NODEXL: CLUSTERING
Clustering
 NodexL can automatically identify clusters based on the
network structure.
 An algorithm will look for groups of densely clustered vertices
that are only loosely connected to vertices in another cluster.
 The number of clusters is not predetermined; instead the
algorithm dynamically determines the number it thinks is best.
Clustering Results
Visualizing Clusters
NODEXL: MULTIMODAL
NETWORK
Import data from FB Fanpage
 Using Social
Network Importer
to download FB
fanpage data
 LibraryCMU
FB Fanpage Data
Visualizing Likes & Comments
PERSONALIZATION AND
RECOMMENDER SYSTEMS
Personalization
 Information and services can be modified to meet the unique
and specific needs of an individual or a community
 by changing presentation, content, and/or services based on a
person’s task, background, history, device, information needs,
location, etc. (user’s context)
Recommender Systems
 A type of personalization that learns about a person’s needs
and then proactively identify and recommend information that
matches those needs
 Useful when they identify information a person was previously
unaware of
 Can be user-driven which involves a user directly invoking
and supporting the personalization process by providing
explicit input.
Collaborative
 Content-Based systems focus on properties of items.
Similarity of items is determined by measuring the similarity
in their properties.
 Collaborative-Filtering systems focus on the relationship
between users and items. Similarity of items is determined by
the similarity of the ratings of those items by the users who
have rated both items.
Simple way to do it
 Transforming Multimodal Affiliation Networks into Unimodal
Networks
– Bimodal affiliation networks can be transformed into two single-mode
networks
– person-to-affiliation network  person-to-person network
– person-to-page network  person-to-person network
– person-to-item network  person-to-person network
Person-to-affiliation network
Example of
person-to-affiliation
network
username
Adam Kuban
Adam Kuban
Adam Kuban
Adam Kuban
ag3208
AliceBlue
AliceBlue
alliect
Alm25
Amandarama
Amandarama
annatr
annatr
annatr
anniedra
annien
arm1970
atom12
AuntJone
avryan
BananaMonkey
BangieB
Barbieri13
bebes
bessfour
discussion
B_InVideos
B_SupposedTop10
B_WomanFindsCell
F_Portland
F_CuttingMelon
F_DoubleParked
F_Portions
F_Vietnamese
F_BestFarmers
F_CuttingMelon
F_IveNeverTasted
F_SundriedTomatoes
F_BestFarmers
F_Portions
F_IveNeverTasted
F_ChezLaurence
F_BestFarmers
B_WomanFindsCell
F_FearBroiling
F_SundriedTomatoes
F_BestFarmers
F_CheffTell
F_FearBroiling
F_SundriedTomatoes
F_FearBroiling
Create person-to-affiliation matrix
 Use pivot table to create the matrix
affiliation
user
count of relationships
Create an affiliation-to-affiliation matrix
 Create the matrix by summing up products of the relationships
between two affiliations
affiliation
affiliation
Sum of product
Similarity measures
 Cosine-based similarity
Also known as vector-based similarity, this formulation
views two items and their ratings as vectors, and defines the
similarity between them as the angle between these vectors:
Example

x
 = (4.75,4.5, 5,4.25,4)
y = (4,3, 5,2,1)
x  4.752  4.52  52  4.252  42  10.09
y  42  32  52  22  12  7.416
 
x  y  (4.75  4)  (4.5  3)  (5  5)  (4.25  2)  (4 1)  70
cos( x, y ) 
70
 0.935
10.09  7.416
Bibilography
 Hansen, D., Shneiderman, B., & Smith, M. A. (2010). Analyzing social
media networks with NodeXL: Insights from a connected world. Morgan
Kaufmann.
 Russell, M. A. (2013). Mining the Social Web: Data Mining Facebook,
Twitter, LinkedIn, Google+, GitHub, and More. " O'Reilly Media, Inc.".
 Xu, G., Zhang, Y., & Li, L. (2010). Web mining and social networking:
techniques and applications (Vol. 6). Springer.
 Zafarani, R., Abbasi, M. A., & Liu, H. (2014). Social Media Mining: An
Introduction. Cambridge University Press.