Link Prediction

Download Report

Transcript Link Prediction

A Graph-Based
Approach to Link
Prediction in Social
Networks Using a
Pareto-Optimal Genetic
Algorithm
Jeff Naruchitparames
University of Nevada, Reno - CSE
CS 790: Complex Networks, Fall 2010
• biological
social
2
3
4
very dynamic
‣ Social networks =heterogeneous
‣ Dynamic, judgmental environment
‣ Affect friendships over time
5
6
‣ 1-2 hop distance only
‣ Friend-of-friend
7
‣ Multiple hops; >1
‣ Structural; purely graphbased
‣ No explicit correlation
between potential
friends...
8
‣ Silva, et. al.,
‣
A Graph-based Recommendation System Using Genetic Algorithms,
2010
9
10
11
Friends-of-Friends
2 hops
Filter
Order
12
Filtering
“It’s more probable that you know a friend of your
friend than any other random person”
Mitchell M., Complex Systems: Network Thinking, 2006.
13
14
15
Indexes
16
What’s missing?
‣ Heterogeneity
‣ Human behavior and preferences
‣ Multiple hops
17
My approach
• Pretty much a filtering problem...
18
My approach
‣ Components (for filtering)
‣ Betweenness centrality
‣ Community detection
‣ Clique Percolation Method (CPM)
‣ Friends of friends
‣ 10-dimensional Pareto-optimal
genetic algorithm
19
Betweenness Centrality
20
Community Detection
21
‣ Remove duplicates
‣ Remove our test cases
‣ (More on this later...)
22
The Genetic Algorithm
Part
23
Pareto Fronts
24
The Features
1.
# of shared friends
2.
location
3.
age range
4.
general interest
5.
music
6.
attended same events
7.
groups
8.
movies
9.
education
10. religion/politics
25
Pareto Optimality
‣
Localized to implementation of selection
‣
‣
Feature subset selection
We want to find the best combination of these subsets that can
give us the best solutions for how we determine friendships
26
Pareto Optimality and
Feature Subset Selection
F1 F2 F3 F4 F5 F6 F7 F8 F9 F10
C1 0
1 0 1 0 0 0 1 1
0
C2 1
1 0 0 0 1 0 1 0
1
.
.
.
Cn 0
0 0 0 1 0 0 1 0
27
0
A Point System
F1 F2 F3 F4 F5 F6 F7 F8 F9 F10
U1 -
3
- 11 -
-
- 20 44 -
U2 -
1
- 13 -
-
- 31 9
-
- 49 61 -
-
.
.
.
Un - 10 - 14 -
28
Pareto Optimality
‣
‣
Compare with the test cases we removed earlier...
For all chromosomes in population, do:
‣
If ALL test cases ≥ optimal Pareto front
‣
Calculate fitness
‣
‣
Good to go
Else
‣
‣
Calculate fitness
Continue onto next chromosome
29
Fitness Function
• ∑ ∑ p ln( f
n
10
i
i=1
j
p
) i-1
j=1
30
Continuing on with the
Evolutionary Process
‣
‣
‣
‣
‣
Apply fitness proportional selection
Randomly select 2 parents to mate
Apply 1-point crossover (82% chance)
Bit mutation (0.05% chance)
Do this until ALL test cases better than Pareto front OR fitness does
not improve for 5 consecutive generations
31
1-Point Crossover
32
Conclusion
‣
Complex network theory + Genetic algorithm + social theory
‣
Betweenness centrality
‣
Community detection
‣
‣
Binary 10-dimensional Pareto-optimal genetic algorithm
‣
‣
Clique Percolation Method
Dominant, fitness proportional selection
Several levels of filtering and selection (aka filtering ☺)
33
Future Work
‣
Better fitness function (need to ask Sociologists)
‣
Weighted chromosome for Pareto optimization (as opposed to binary)
‣
Prove all this stuff actually works (sociology standpoint??)
‣
Parallelize or GPU-ize the code (it’s in Python)
34
35