Measuring and Extracting Proximity in Networks

Download Report

Transcript Measuring and Extracting Proximity in Networks

Measuring and Extracting
Proximity in Networks
By - Yehuda Koren, Stephen C.North and Chris Volinsky
- Rahul Sehgal
Introduction











Network
Information hidden in a network
What is Proximity?
Why do we need proximity in a network?
Methods for measuring and extracting proximities in a network.
CFEC ( Cycle free effective conductance) and how to compute
cycle-free escape probability
Extracting proximity through proximity graphs obtained by CFEC
Working with large networks
Experiments
Conclusion
Questions
Network



Collection of edges or
links
Collection of nodes
These edges and links
help in deciding the
proximity in a given
network.
node1
node 5
Node 6
node 2
node 4
node 3
Hidden information in a network
C
D
A
E
F
B
Hidden information in a network
C

If two people speak on
the phone to many
common friends, the
probability is high that
they will talk to each
other in the future, or
perhaps that they
already communicate
through some other
medium as email.
D
A
E
F
B
Hidden information in a network
C

If two nodes are connected
to a common node then it
might be possible that they
have a strong relationship
or they don’t have any
relationship at all.
for example: in a telephone
network where many people
call to service center but it
might be possible that two
people who are calling they
don’t know each other at all.
D
A
E
F
B
What is proximity?



Proximity is a method of measuring distance
or closeness between different objects.
It is a method of finding hidden relationship
between the objects.
It is a method of finding similarities and helps
in clustering objects or nodes.
Need of proximity





It measures potential information exchange between
two non-linked objects through intermediaries
It can measure the extent to which two nodes
belong to each other.
It helps in knowing the likelihood that a link will exist
in future.
In social network settings proximity helps to predict
or track the propagation of an idea, product or
disease .
It helps in discovering unexpected communities in a
network.
Various Methods

Graph-theoretic distance:

It is length of shortest path connecting two nodes
measured either by number of hops or sum of weight of
edges
Limitations

Proximity decays as nodes become farther apart.

Information may be lost due to friction or noise at a
particular node.

This method doesn’t assume that proximity can exist via
multiple paths.
Various Methods contd…

Network Flow (maximal
network flow)


Limited capacity is
assigned to each edge,
depending upon the weight
of that edge and then
compute the maximal
number of units that can be
simultaneously delivered
from node s to node t.
It prefers high weight edges
and captures the premise
that an increasing number
of alternative paths
increases the proximity.
For example we consider
the adjacent figure.
a1
A
B
b1
A
B
a1
Various Methods contd…

Network Flow (maximal network flow)
Limitations:


It disregards the length of the path.
It also follow that the maximal s-t flow (that is graph flow
from node s to node t) in a graph equals the minimal s-t cut
– that is the minimal edge capacity we need to remove to
disconnect s from t, therefore we cannot implement it in a
robust system.
Various Methods contd…

Effective Conductance (EC):





Modeling of networks as an electric circuit by treating the edges
as resistors whose conductance is the given edge weight.
In this method we keep the starting node (s) as 1 and the end
node (t) as 0. and then we solve the linear equations for getting
voltage and current on each edge.
It accounts for both path length and number of alternative paths .
It avoids dependence on single shortest path and bottlenecks
It has a monotonicity property which states that in an electrical
resistor network, increasing the conductance of any resistor or
increasing the number of resistors increase conductance between
any two nodes.
Various Methods contd…

Limitations of Effective Conductance
s1
Monotonicity has its limitations consider following example
a1
t1
same EC
s
a1
t
Various Methods contd…

Sink augmented effective conductance



Each node in a network is connected to a universal sink
which is at voltage zero.
This universal sink competes with the node t.
Universal sink “tax” each node that absorbs a portion of out
going current. Consequently it forces all the node to have
degree greater than 1. So, our restriction for degree one
node doesn’t exist any more.
Various Methods contd…

Limitations of sink augmented effective
conductance


Its required to know how much current will flow through
each node . Understanding how such parameter will
influence the proximity is a difficult task.
It destroys the concept of monotonicity . It means whenever
a new node is added to the network it has a direct link to
the sink but not to t. It strengthens sink and it compete with
node t and thus the proximity between s and t is lost.
So, we look for a solution that can overcome the limitations of
above methods…..
Random Walk

Definitions:

s
Random Walk is a transition from one state to another
without depending on the previous state. The transition
could be to the same state also.
a1
a2
a3
a4
t
Random Walk
 Random walk in network proximity is the infinite number
of attempts that is made to reach from starting node s to
end node t. It might be possible that when we traverse
this path we might return back to the starting node s.
s
t
Random Walk

Explanation



In network it might be possible that during random walk we
might go back to s without going to t.
As we can see in the diagram it might be possible that we
return back to s via path s–a1-a2-a3-s without going to t.
We are having a cyclic path which is leading us back to
starting point s.
continued =>
Random Walk
Diagram
a3
s
a1
a2
a4
a5
t
What is our goal?

We have to improve the effective
conductance measure and avoid any cyclic
path…..
CFEC( Cycle Free Effective Conductance)


It considers random walk interpretation
DEFINITION: The cycle-free escape probability from s
to t is the probability that a random walk originating at s
will reach t without visiting any node more than once.

In random walk, a probability of transition from
node i to node j is
probability of transition from node i to node j
= weight of edge from i to j
= degree of node i.
=
CFEC( Cycle Free Effective Conductance)

The probability that a random walk starting
at v1 for path P= v1-v2-…-vr, will follow
this path is given by:
Features of CFEC:

We have following equalities:

Multiplying by the degree:
R = set of simple paths from s to t, simple path are those that never visit the
same node twice

CFEC discourages long paths as the probability of following the path
decays exponentially with its length. It is given by

It supports proximity measure for multiple paths.
Degree-1 nodes dilute the significance of path from s to t. So we cut the
main graph into sub-graph. And we preserve the original degree of the
nodes.

Explanation of CFEC features
CFEC discourages long paths:
s
c1
c2
ck
t
Proximity decreases
Computing Cycle Free Escape Probability

Restrict the sum in following equation to K most probable simple
path.

Edge weights are transformed into edge lengths, establishing 1-1
correspondence between path probability and path length.
 Path probability is given as
Extracting Proximity Through Proximity
Graphs :

Extract a subgraph
that maximizes the ratio:
= subgraph
= constant,
 Find subset of R, which required for above problem.
 Set of simple paths R is sorted in ascending order of
weights of paths.
Extracting Proximity Through Proximity
Graphs :

Optimizing the given formula using branch
and bound path merging algorithm:
Branch and Bound path merging
algorithm
Working with large networks

Growing candidate graphs via {s, t}
neighborhoods


Producing the candidate graph is to find a sub graph
containing the highest weight paths originating at either
s or t.
Now, the problem becomes , find a sub graph containing
shortest paths originating at either s or t. Our objective is
to expand the neighborhoods of s and t. This is done by
Dijkstra’s algorithm for computing shortest path on
graphs with non-negative edge lengths.
Working with large networks contd….

Determining neighborhood size


we determine paths which are not longer than L – log(€).
Path lengths greater than is are not useful. Here L is the
length of shortest s-t path.
In practice, (L – log(€))/2 – neighborhoods might be too
large.
Working with large networks contd….

Pruning the neighborhood:

We prune the neighbor in such a way that dist(s,i)+dist(t,i) >
β , where s is our starting node and t is terminating node.

Then, we exclude from the neighborhood any i for which
dist(s,i)+dist(i,t) > L – log(€).
Experiments:

Online movie database IMDB.

Co-authorship graph.

Telecommunication graph.
Experiments:
Telecommunication Graph




Extract the candidate graph by growing
neighborhoods from the two nodes of
interest.
2000 random pairs of telephone numbers to
calculate the CFEC value.
For 1808(90%) of these pairs paths between
them were found. Others do not have any
known path between them, this could be due
to, these number were not in frequent use.
We have values for
as 1,5,10 and 50.
Telecommunication Graph (contd…)
Conclusion:




CFEC proximity allows us to readily compute
proximity graphs, which are small portions of the
network that are aimed at capturing a related
proximity value.
An analyst studying proximity in a graph has to
focus on the most relevant part of the graph.
It is extension of connection graph which is capable
of presenting compact relationship between objects
of a network.
We can deduce relationship between more than two
endpoints, the flexibility to handle edge direction,
and the fact that they are obtained by solving an
intuitively tunable optimization problem.
QUESTIONS??????
References:
References: