PPTX - SJTU CS

Download Report

Transcript PPTX - SJTU CS

Introductory Social Network Analysis
with Pajek
November 4, 2016
Teaching Assistant: Anum Masood
SEIEE
1
Overview of Network Analysis Tools
Pajek
network analysis and visualization,
menu driven, suitable for large networks
Netlogo
agent based modeling
recently added network modeling capabilities
GUESS
network analysis and visualization,
extensible, script-driven (jython)
platforms: Windows (on linux
via Wine)
download
platforms: any (Java)
download
platforms: any (Java)
download
Other software tools that we will not be using but that you may find useful:
visualization and analysis:
UCInet - user friendly social network visualization and analysis software (suitable smaller networks)
iGraph - if you are familiar with R, you can use iGraph as a module to analyze or create large networks, or you can directly use the C functions
Jung - comprehensive Java library of network analysis, creation and visualization routines
Graph package for Matlab (untested?) - if Matlab is the environment you are most comfortable in, here are some basic routines
SIENA - for p* models and longitudinal analysis
SNA package for R - all sorts of analysis + heavy duty stats to boot
NetworkX - python based free package for analysis of large graphs
InfoVis Cyberinfrastructure - large agglomeration of network analysis tools/routines, partly menu driven
visualization only:
GraphViz - open source network visualization software (can handle large/specialized networks)
TouchGraph - need to quickly create an interactive visualization for the web?
yEd - free, graph visualization and editing software
specialized:
CLAIR library - NLP and IR library (Perl Based) includes network analysis routines
Tools Useful for Social Networks
• Pajek: extensive menu-driven functionality, including many network
metrics, and manipulations
– But not extensible
• Guess: extensible, scriptable tool of exploratory data analysis, but more
limited selection of built-in methods compared to Pajek
• NetLogo: general agent based simulation platform with excellent
network modeling support
– many of the demos in this course were built with NetLogo
• NetDraw: network visualization tool associated with UCInet. UCInet is
not free, but NetDraw is.
Other Tools: gephi / Cytoscape
• http://gephi.org
• primarily for visualization, has some nice
touches
• Cytoscape is mainly used for visualization of
biological networks/ pathways / interactions
analysis.
Other Visualization Tools:
Walrus
•
developed at CAIDA available under the GNU GPL.
•
“…best suited to visualizing moderately sized graphs that are
nearly trees. A graph with a few hundred thousand nodes and only
a slightly greater number of links is likely to be comfortable to
work with.”
Java-based
Implemented Features
•
•
–
–
–
–
–
–
–
rendering at a guaranteed frame rate regardless of graph size
coloring nodes and links with a fixed color, or by RGB values stored in
attributes
labeling nodes
picking nodes to examine attribute values
generating subgraph: displaying a subset of nodes or links based on a
user-supplied boolean attribute
interactive pruning of the graph to temporarily reduce clutter and
occlusion
zooming in and out
Source: CAIDA, http://www.caida.org/tools/visualization/walrus/
Visualization Tool: GraphViz
•
•
•
•
Takes descriptions of graphs in simple text languages
Outputs images in useful formats
Options for shapes and colors
Standalone or use as a library
• dot: hierarchical or layered drawings of directed graphs, by
avoiding edge crossings and reducing edge length
• neato (Kamada-Kawai) and fdp (Fruchterman-Reinhold with
heuristics to handle larger graphs)
• twopi – radial layout
• circo – circular layout
http://www.graphviz.org/
Dot (GraphViz)
Visualization Tools: YEd - JavaTM Graph Editor
http://www.yworks.com/en/products_yed_about.htm
(good primarily for layouts, scales better, maybe free)
yEd and 26,000 Nodes (Takes a Few Seconds)
Visualization Tools: Prefuse
• (free) user interface toolkit for interactive information visualization
–
–
–
–
–
built in Java using Java2D graphics library
data structures and algorithms
pipeline architecture featuring reusable, composable modules
animation and rendering support
architectural techniques for scalability
• requires knowledge of Java programming
• website: http://prefuse.sourceforge.net/
– CHI paper http://guir.berkeley.edu/pubs/chi2005/prefuse.pdf
Simple Prefuse Visualizations
Source: Prefuse, http://prefuse.sourceforge.net/
Prefuse Application: Flow Maps
A flow map of migration from California
from 1995-2000, generated automatically
by Prefuse system using edge routing but
no layout adjustment.
 http://graphics.stanford.edu/papers/flow_map_layout/
Prefuse Application: Vizster
 http://jheer.org/vizster/
Visualization Tool: Manyeyes
• http://manyeyes.alphaw
orks.ibm.com/manyeyes
/
• Only for Visualization
• Not just for networks,
but many other data
type
• Web based, very easy to
use
14
Outline
• In Pajek
–
–
–
–
–
–
visualization and layouts
degree
connected components
snowball sampling
one mode projections of bipartite graphs
thresholding weighted graphs
Using Pajek for Exploratory Social
Network Analysis
• Pajek – (pronounced in Slovenian as Pah-yek) means ‘spider’
• website: http://vlado.fmf.uni-lj.si/pub/networks/pajek/
• wiki: http://pajek.imfm.si/doku.php
–
–
–
–
download application (free)
tutorials
lectures
datasets
• Windows only (works on Linux via Wine, Mac via Darwine)
• Reference book: ‘Exploratory Social Network Analysis with Pajek’ by
Wouter de Nooy, Andrej Mrvar and Vladimir Batagelj
Pajek: Interface
we’ll use today
Drop down list of networks opened or created with pajek. Active is displayed
Drop down list of network partitions by discrete variables, e.g. degree, mode, label
Drop down list of continuous node attributes, e.g. centrality, clustering coefficients
can be used for clustering
Source: Pajek, Free for noncommercial use - http://pajek.imfm.si/doku.php?id=download
Pajek: Opening a Network File
click on folder icon
to open a file
Save changes to your network, network partitions, etc., if you’d like to keep them
Source: Pajek, Free for noncommercial use - http://pajek.imfm.si/doku.php?id=download
Pajek: Working with Network Files
• The active network, partition, etc is shown on top of the drop
down list
Draw the network
Source: Pajek, Free for noncommercial use - http://pajek.imfm.si/doku.php?id=download
Pajek data format
Louise
Ada
Cora
directed edges
from Ada(1) to Louise(3) w/
eight “2” and color Black
undirected edges
between Ada(1) to Cora(2) w/
weight “1” and color Black
number of vertices
*Vertices 26
1 "Ada"
2 "Cora"
3 "Louise"
..
*Arcs
1 3 2 c Black
1 2 1 c Black
2 1 1 c Black
..
*Edges
2 3 1 c Black
..
vertex x,y,z coordinates (optional)
0.1646 0.2144 0.5000
0.0481 0.3869 0.5000
0.3472 0.1913 0.5000
Pajek: Let’s Get Started
• Opening a network
– File  Network  Read
• Visualization
– Draw  Draw
• Essential measurements
Pajek: Opening a File
• A planar graph and layouts in Pajek
• Download the file ‘NetScience.net' from the website
[http://vlado.fmf.unilj.si/pub/networks/data/collab/netscience.htm]
• Open it in Pajek by either clicking on the yellow folder icon
under the word "Network" or by selecting
FileNetworkRead from the main menu panel
•
A report window should pop up confirming that the graph
has been read and the filename and location will be displayed
in the 'active' position of the network dropdown list
Pajek: Visualization & Manual
Positioning
•
Visualize the network using Pajek's DrawDraw command
from the main menu panel.
•
This will bring up the 'draw' window with its own menu bar at
the top
•
Reposition the vertices by clicking on them and holding down
the mouse button while dragging them to a new location.
Continue doing this until you have shown that the graph is
planar (no edges cross have to cross )
• (If you think this is really fun to do in your spare time, go to
http://www.planarity.net)
Pajek: Visualization & Layout
Algorithms
• Now let Pajek do the work for you by selecting from the draw
toolbar several layout algorithms under 'LayoutEnergy'.
•
Why did you select the layout algorithm you did?
•
Did the layout leave any lines crossed? If you were to do this
assignment over, what order would you do it in?
A Directed Network
•
•
girls’ school dormitory dining-table partners (Moreno, The sociometry reader, 1960)
first and second choices shown
Louise
Ada
Lena
Adele
Marion
Jane
Frances
Cora
Eva
Maxine
Mary
Anna
Ruth
Edna
Robin
Betty
Martha
Jean
Laura
Alice
Hazel
Helen
Ellen
Ella
Irene
Hilda
Node Centrality: Degree
• Node network properties
– from immediate connections
indegree=3
• indegree
how many directed edges (arcs) are incident on a node
outdegree=2
• outdegree
how many directed edges (arcs) originate at a node
• degree (in or out)
number of edges incident on a node
– labels
degree=5
Centrality: Degree
• More on degree and other centrality measures in
the next lecture…
• Degree: calculate it
– Net  Partitions  Degree
• Visualize degree centrality
– DrawDraw-Vector
– If nodes are not the right size, use resize option
• Options  Sizeof Vertices
• Adjust the default size
Connected Components
• Strongly connected components
– Any two nodes in the component can
be reached from each other by
following directed edges




BCDE
A
GH
F
B
F
C
A
E
D
G
H
• Weakly connected components: every node can either reach
or be reached from every other node by following directed edges
 ABCDE
 GHF
• In undirected networks one talks simply about
“connected components”
The bowtie model of the Web
Broder et al. (1999)
•
SCC (strongly connected component):
– can reach all nodes from any other by
following directed edges
•
IN
– can reach SCC from any node in ‘IN’
component by following directed
edges
•
OUT
– can reach any node in ‘OUT’
component from SCC
•
Tendrils and tubes
– connect to IN and/or OUT
components but not SCC
•
Disconnected
– isolated components
Bipartite networks
Going from a Bipartite to a One-mode
group 1
Graph
 Two-mode network
• One mode projection
group 2
– two nodes from the first
group are connected if
they link to the same node
in the second group
– some loss of information
– naturally high occurrence
of cliques
Pajek: Wrap Up
• Used frequently by sociologists
– UCInet is comparable and arguably more user friendly (but not free)
• Extensive functionality
– But not extendable
• What we covered
–
–
–
–
–
–
visualization
node properties: degree
connected components
k-neighbors
converting two-mode networks to one-mode
thresholding the network
Quick Overview: Pajek
Pajek:
Program for Large Network Analysis.
Download page:
http://pajek.imfm.si/doku.php?id=download
Manual:
http://vlado.fmf.uni-lj.si/pub/networks/pajek/doc/pajekMan.pdf
Quick Overview: Pajek
• Draw “Network” with Pajek:
– List of neighbours (Arcslist/Edgeslist) (unweighted graph)
– Pairs of lines (Arcs/Eges) (weighted graph)
– Matrix
Quick Overview: Pajek
• List of neighbours (Arcslist/Edgeslist)
*Vertices 5
1 “a”
2 “b”
3 “c”
4 “d”
5 “e”
*Arcslist
124
23
314
45
*Edgeslist
15
Words, starting with *,
must be written in first
column of the line.
Definition of vertices
followed after that – to each
vertex we give a label.
using *Arcslist, a list of
directed lines from selected
vertices are declared.
*Edgeslist, declares a list
of undirected lines.
No empty lines are
allowed.
Quick Overview: Pajek
1, read the .net file
1, draw the network
Quick Overview: Pajek
• Pairs of lines (Arcs/Edges)
*Vertices 5
1 "a"
2 "b"
3 "c"
4 "d"
5 "e"
*Arcs
121
141
232
311
342
451
*Edges
151
Every arc/edge is defined
separately in new line – initial and
terminal vertex are given.
Directed lines are defined using
*Arcs, undirected lines are defined
using *Edges, the third number in
rows defining the weight.
Quick Overview: Pajek
• Matrix
*Vertices 5
1 "a"
2 "b"
3 "c"
4 "d"
5 "e"
*Matrix
01011
00200
10020
00001
10000
In this format directed lines are
given in the matrix form
(*Matrix).
We can transform bidirected
arcs to edges.
Quick Overview: Pajek
• Export to bmp, eps…
Case Study: Pajek
• Computing indegree and outdegree using Pajek:
double click Partitions
REFERENCES
• Graph and Digraph Glossary example:
– Derived from Bill Cherowitzo's Graph and Digraph Glossary.
• http://www-math.cudenver.edu/~wcherowi/courses/m4408/glossary.html