slides - Simon Fraser University
Download
Report
Transcript slides - Simon Fraser University
Mining Social Ties Beyond Homophily
Hongwei Liang*
Ke Wang*
Feida Zhu #
* Simon
# Singapore
16-05-18
Fraser University, Canada
Management University, Singapore
ICDE2016 Helsinki, Finland
1
Outline
Introduction & Motivation
Problem Formulation
Solution
Evaluation
Conclusion & Future Work
16-05-18
ICDE2016 Helsinki, Finland
2
Integrating graphs and
demographic data
5
ID
SEX
RACE
LOCATOIN
1
F
Asian
US
2
F
Latino
US
…
…
…
…
(a) Graph topology
(b) User profile
integration
Graph data and demographic data are everywhere
But the two aspects of data maybe incomplete within single
social network
More comprehensive analysis can be done by integrating them
16-05-18
ICDE2016 Helsinki, Finland
3
Facebook Dating App Example
All except black women preferred white men,
while all men except Asians preferred Asian women.
16-05-18
ICDE2016 Helsinki, Finland
4
Social Ties (Group Relationships)
Form:
1
2
3
4
5
6
7
F
Asian
US
M
8
9
R1 : (Sex: M)
10
11
12
13
Latino
White
Canada
Finland
14
(Sex: F, Race: Asian)
conf = 7 /14; supp = 7/15
R2 : (Sex: M, Race: Asian)
(Sex: F, Race: Asian)
conf = 0; supp = 0
16-05-18
ICDE2016 Helsinki, Finland
5
Homophily in Social Network
Homophily principle: love of the same
Homophily effect is well-known and is often “dominant”
R3 : (Sex: F, Location: US)
(Sex: M, Location: US)
conf = 4/6; supp = 4/15
Homophily captures “primary” bond
Literature largely focuses on applications based on homophily
e.g.: community detection, link prediction, friend/product recommendation
16-05-18
ICDE2016 Helsinki, Finland
6
Beyond Homophily
Unearth the treasures beyond homophily?
Assume “Location” is homophilic in a dating network
R4: (Sex: F, Location: US)
standard confidence?
(Sex: M, Location: Canada)
VS
conf = 2/6, not interesting
US
1
2
Canada
3
4
nhp = 2/ (6 – 4) = 100%, interesting !
support of the homophily effect is 4/15
(Sex: F, Location: US)
(Sex: M, Location: US)
Finland
5
new metric that remove homophily?
6
7
F
Reads as: if a female from US does NOT want
her partner to be from US, there is a high chance
that she prefers a partner from Canada.
M
8
16-05-18
9
10
11
12
13
14
ICDE2016 Helsinki, Finland
7
Potential Applications
Target advertising
Homophily pattern:
(JOB : Lawyer, PRODUCT : Stocks) → (PRODUCT : Stocks)
Non-Homphily pattern: (JOB : Lawyer, PRODUCT : Stocks) → (PRODUCT : Bond)
Helpful in link predicting, beyond homphily
Friend/dating Recommendation
User behaviors/habits analysis
Profile completion
Criminal investigation
16-05-18
ICDE2016 Helsinki, Finland
8
Non-homophily Preference
Non-homophily preference: a probability of links going to a node
described by 𝑟, given 𝑙 ∧ 𝑤 and exclude the homophily effect
𝑤
𝑛ℎ𝑝 𝑙
𝑤
𝑠𝑢𝑝𝑝(𝑙 𝑟)
𝑟 =
𝑠𝑢𝑝𝑝 𝑙 ∧ 𝑤 − 𝑠𝑢𝑝𝑝(ℎ𝑜𝑚𝑜𝑝ℎ𝑖𝑙𝑦 𝑒𝑓𝑓𝑒𝑐𝑡)
Example: (Sex: F, Location: US)
(Sex: M, Location: Canada)
(Sex: F, Location: US)
(Sex: M, Location: US)
Captures “secondary bonds” beyond “primary bonds”
nhp does not have the regular anti-monotonicity
Adding an attribute on the RHS may increase supp(homophily effect)
16-05-18
ICDE2016 Helsinki, Finland
9
Outline
Introduction & Motivation
Problem Formulation
Solution
Evaluation
Conclusion & Future Work
16-05-18
ICDE2016 Helsinki, Finland
10
Problem - Mining Top-k GRs
Given
an multi-dimensional information network
the setting of homophily for attributes
a supp threshold, a nhp threshold and an integer k
Goal
discover the top-k GRs, ranked by nph followed by supp, and
each of them satisfies the supp and nhp thresholds
16-05-18
ICDE2016 Helsinki, Finland
11
Challenges
Storage
Space =
, if single table
Computation
Exponential order of attributes value combination
nhp does not have anti-monotonicity
If only supp pruning: small threshold, and post-processing is needed
How to deal with?
Storage: favourable data modeling
Computation: ingenious enumeration with efficient pruning strategies
16-05-18
ICDE2016 Helsinki, Finland
12
Outline
Introduction & Motivation
Problem Formulation
Solution
Evaluation
Conclusion & Future Work
16-05-18
ICDE2016 Helsinki, Finland
13
for mining t op-k GRs by leveraging the enumeration and
pruning st rategies presented in Section 4, but representing
the graph data using a more efficient data struct ure.
We shall st ore the node and edge information separately.
Fig. 3 shows t he data st ruct ure for our running example.
LArray
cont ains the records
for individuals
could
occur
Compact 3-table data
presentation:
combines
profilet hat
data
and
in the LHS of GRs and RArray cont ains t he records for
individuals that could occur in the RHS of GRs. Out is t he
graph topology together
out -degree of a record and I nd is t he st art ing position of
the outgoing edges in EArray. EArray cont ains one record
No redundant records,
data
for each
edgeare
andlinked
Ptr is tby
he pointers
point er to the record for t he
destination node in RArray. This data st ruct ure has the size
Space complexity V (# Attr V + 2) + E (# Attr E + 1) + V # Attr V ,
which is more compact t han t he single t able at t he beginning
of Section 4 by eliminating t he term E
2 # Attr V . We
assume t hat this struct ure is held in memory. We use these
tables t o part ition t he dat a for counting the support s for
GRs. For example, the first row in LArray represents t he
record 1 for LHS, which connects to the dest inat ion records
2, 4 and 5 for RHS, found by the point ers Ptr kept in t he
ent ries [I nd, I nd + Out 1] of EArray.
Our algorit hm enumerates each attribut e subset L W R following t he subset-first dept h-first order as described in t he
previous sect ion. To comput e supp and di vConf of t he GRs
at the node for L W R, it part it ions the data using t he at-
Data Model
16-05-18
ICDE2016 Helsinki, Finland
14
SFDF Enumeration
Subset-First Depth-First (SFDF) Enumeration
Subset-First: some kind of reverse order, all parts of supp, including
that for homophily effect, are available when computing nhp
Depth-First: only materialize the current branch
16-05-18
ICDE2016 Helsinki, Finland
15
Dynamic Ordering
How to make nhp anti-monotone?
Dynamically order the homophily attributes, on the basis of whether
the same homophily attributes were enumerated in the LHS
dynamic ordering
assume both A and B are
homophily attributes
for the GRs with same
is anti-monotone,
with the help of dynamic ordering
16-05-18
ICDE2016 Helsinki, Finland
16
Multiple Pruning Strategies
supp based pruning
nhp based pruning
enabled with the help of dynamic ordering
Top-k pruning tights up the nph threshold
The mining task finishes in one phase
16-05-18
ICDE2016 Helsinki, Finland
17
Data partition and Pruning
Partition attributes while computing supp and nhp?
Recursive partition with linear CountingSort
At each node, GR representing the homophily effect is generated
first, e.g.
is generated earlier than
8B
b1
10b B
21
b21b
…...
11b
1b2
1A
b1b21a12
b1b21a3
16-05-18
ICDE2016 Helsinki, Finland
b2
b3
2
1
b21b
3
2
b21b
1
3
18
Outline
Introduction & Motivation
Problem Formulation
Solution
Evaluation
Conclusion & Future Work
16-05-18
ICDE2016 Helsinki, Finland
19
Experimental Evaluation
Implementation: C++
Platform: CentOS 6.4 with Intel 8-core processors 2.53GHz and 12G of RAM
Real Datasets
Pokec Social Network data1
o 1,436,515 users and 21,078,140 edges, 6 node attributes
DBLP co-authorship data2
o 28,702 authors and 66,832 directed edges, 2 node and 1 edge attributes
Evaluation Measures
Interestingness
Efficiency (runtime)
1http://snap.stanford.edu/data/soc-pokec.html
2[Zhao
16-05-18
et al. SIGMOD 11]
ICDE2016 Helsinki, Finland
20
Interestingness: Case Study
Case study
A top GR
from Pockec data derives:
This pair suggests a big difference in the preference of opposite sex
partners by males and females when looking for sexual partners
A top GR from DBLP data
Authors in the DB area often collaborate with those in the DM area
when collaborating with those not in their own area
16-05-18
ICDE2016 Helsinki, Finland
21
Efficiency Study: Pokec Data
600
GRMiner(k)
GRMiner
BL2
BL1
A+B+C+D
A+B+C
A+B
A
550
500
300
A: supp based pruning
250
B: compact 3-table data storage
200
C: nhp based pruning
450
Time (sec)
400
350
300
150
D: top-k pruning
250
100
0
200
20
40
60
80
100
minNhp (%)
150
3000
100
0
10
1
2
10
3
10
Time (sec)
Time (sec)
10
2500
250
200
150
100
0
Default Parameters Setting
4
10
20
10000
40
60
minNhp (%)
minSupp = 50 (absolute value)
2000
1500
minNhp = 50%
1000
500
k = 100
100
80
100 1
k
0
4
6
8
10
12
Dimension
16-05-18
ICDE2016 Helsinki, Finland
22
Outline
Introduction & Motivation
Problem Formulation
Solution
Evaluation
Conclusion & Future Work
16-05-18
ICDE2016 Helsinki, Finland
23
Conclusion, Extensions and Future Work
Conclusion
Mining social ties beyond homophily, many potential applications
Compact data presentation
Novel enumeration with multiple pruning strategies
Interestingness and efficiency study on real data
Extensions and future work
Alternative metrics other than nhp, such as lift, laplace, gain, etc
Deal with unstructured data
Predictive model
16-05-18
ICDE2016 Helsinki, Finland
24
Q&A?
Thanks !
16-05-18
ICDE2016 Helsinki, Finland
25