slides - Simon Fraser University

Download Report

Transcript slides - Simon Fraser University

Mining Social Ties Beyond Homophily
Hongwei Liang*
Ke Wang*
Feida Zhu #
* Simon
# Singapore
16-05-18
Fraser University, Canada
Management University, Singapore
ICDE2016 Helsinki, Finland
1
Outline
 Introduction & Motivation
 Problem Formulation
 Solution
 Evaluation
 Conclusion & Future Work
16-05-18
ICDE2016 Helsinki, Finland
2
Integrating graphs and
demographic data
5
ID
SEX
RACE
LOCATOIN
1
F
Asian
US
2
F
Latino
US
…
…
…
…
(a) Graph topology
(b) User profile
integration
 Graph data and demographic data are everywhere
 But the two aspects of data maybe incomplete within single
social network
 More comprehensive analysis can be done by integrating them
16-05-18
ICDE2016 Helsinki, Finland
3
Facebook Dating App Example
All except black women preferred white men,
while all men except Asians preferred Asian women.
16-05-18
ICDE2016 Helsinki, Finland
4
Social Ties (Group Relationships)
 Form:
1
2
3
4
5
6
7
F
Asian
US
M
8
9
 R1 : (Sex: M)
10
11
12
13
Latino
White
Canada
Finland
14
(Sex: F, Race: Asian)
conf = 7 /14; supp = 7/15
 R2 : (Sex: M, Race: Asian)
(Sex: F, Race: Asian)
conf = 0; supp = 0
16-05-18
ICDE2016 Helsinki, Finland
5
Homophily in Social Network
 Homophily principle: love of the same
 Homophily effect is well-known and is often “dominant”
 R3 : (Sex: F, Location: US)
(Sex: M, Location: US)
conf = 4/6; supp = 4/15
 Homophily captures “primary” bond
 Literature largely focuses on applications based on homophily
 e.g.: community detection, link prediction, friend/product recommendation
16-05-18
ICDE2016 Helsinki, Finland
6
Beyond Homophily
 Unearth the treasures beyond homophily?
 Assume “Location” is homophilic in a dating network
R4: (Sex: F, Location: US)
standard confidence?
(Sex: M, Location: Canada)
VS
conf = 2/6, not interesting
US
1
2
Canada
3
4
nhp = 2/ (6 – 4) = 100%, interesting !
support of the homophily effect is 4/15
(Sex: F, Location: US)
(Sex: M, Location: US)
Finland
5
new metric that remove homophily?
6
7
F
Reads as: if a female from US does NOT want
her partner to be from US, there is a high chance
that she prefers a partner from Canada.
M
8
16-05-18
9
10
11
12
13
14
ICDE2016 Helsinki, Finland
7
Potential Applications
 Target advertising

Homophily pattern:
(JOB : Lawyer, PRODUCT : Stocks) → (PRODUCT : Stocks)
Non-Homphily pattern: (JOB : Lawyer, PRODUCT : Stocks) → (PRODUCT : Bond)
 Helpful in link predicting, beyond homphily
 Friend/dating Recommendation
 User behaviors/habits analysis
 Profile completion
 Criminal investigation
16-05-18
ICDE2016 Helsinki, Finland
8
Non-homophily Preference
 Non-homophily preference: a probability of links going to a node
described by 𝑟, given 𝑙 ∧ 𝑤 and exclude the homophily effect
𝑤
𝑛ℎ𝑝 𝑙
𝑤
𝑠𝑢𝑝𝑝(𝑙 𝑟)
𝑟 =
𝑠𝑢𝑝𝑝 𝑙 ∧ 𝑤 − 𝑠𝑢𝑝𝑝(ℎ𝑜𝑚𝑜𝑝ℎ𝑖𝑙𝑦 𝑒𝑓𝑓𝑒𝑐𝑡)
Example: (Sex: F, Location: US)
(Sex: M, Location: Canada)
(Sex: F, Location: US)
(Sex: M, Location: US)
 Captures “secondary bonds” beyond “primary bonds”
 nhp does not have the regular anti-monotonicity
 Adding an attribute on the RHS may increase supp(homophily effect)
16-05-18
ICDE2016 Helsinki, Finland
9
Outline
 Introduction & Motivation
 Problem Formulation
 Solution
 Evaluation
 Conclusion & Future Work
16-05-18
ICDE2016 Helsinki, Finland
10
Problem - Mining Top-k GRs
 Given
 an multi-dimensional information network
 the setting of homophily for attributes
 a supp threshold, a nhp threshold and an integer k
 Goal
 discover the top-k GRs, ranked by nph followed by supp, and
each of them satisfies the supp and nhp thresholds
16-05-18
ICDE2016 Helsinki, Finland
11
Challenges
 Storage
 Space =
, if single table
 Computation
 Exponential order of attributes value combination
 nhp does not have anti-monotonicity
 If only supp pruning: small threshold, and post-processing is needed
 How to deal with?
 Storage: favourable data modeling
 Computation: ingenious enumeration with efficient pruning strategies
16-05-18
ICDE2016 Helsinki, Finland
12
Outline
 Introduction & Motivation
 Problem Formulation
 Solution
 Evaluation
 Conclusion & Future Work
16-05-18
ICDE2016 Helsinki, Finland
13
for mining t op-k GRs by leveraging the enumeration and
pruning st rategies presented in Section 4, but representing
the graph data using a more efficient data struct ure.
We shall st ore the node and edge information separately.
Fig. 3 shows t he data st ruct ure for our running example.
LArray
cont ains the records
for individuals
could
occur
Compact 3-table data
presentation:
combines
profilet hat
data
and
in the LHS of GRs and RArray cont ains t he records for
individuals that could occur in the RHS of GRs. Out is t he
graph topology together
out -degree of a record and I nd is t he st art ing position of
the outgoing edges in EArray. EArray cont ains one record
No redundant records,
data
for each
edgeare
andlinked
Ptr is tby
he pointers
point er to the record for t he
destination node in RArray. This data st ruct ure has the size
Space complexity V (# Attr V + 2) + E (# Attr E + 1) + V # Attr V ,
which is more compact t han t he single t able at t he beginning
of Section 4 by eliminating t he term E
2 # Attr V . We
assume t hat this struct ure is held in memory. We use these
tables t o part ition t he dat a for counting the support s for
GRs. For example, the first row in LArray represents t he
record 1 for LHS, which connects to the dest inat ion records
2, 4 and 5 for RHS, found by the point ers Ptr kept in t he
ent ries [I nd, I nd + Out 1] of EArray.
Our algorit hm enumerates each attribut e subset L W R following t he subset-first dept h-first order as described in t he
previous sect ion. To comput e supp and di vConf of t he GRs
at the node for L W R, it part it ions the data using t he at-
Data Model



16-05-18
ICDE2016 Helsinki, Finland
14
SFDF Enumeration
 Subset-First Depth-First (SFDF) Enumeration
 Subset-First: some kind of reverse order, all parts of supp, including
that for homophily effect, are available when computing nhp
 Depth-First: only materialize the current branch
16-05-18
ICDE2016 Helsinki, Finland
15
Dynamic Ordering
 How to make nhp anti-monotone?
 Dynamically order the homophily attributes, on the basis of whether
the same homophily attributes were enumerated in the LHS
dynamic ordering
assume both A and B are
homophily attributes

for the GRs with same
is anti-monotone,
with the help of dynamic ordering
16-05-18
ICDE2016 Helsinki, Finland
16
Multiple Pruning Strategies
 supp based pruning
 nhp based pruning
 enabled with the help of dynamic ordering
 Top-k pruning tights up the nph threshold
 The mining task finishes in one phase
16-05-18
ICDE2016 Helsinki, Finland
17
Data partition and Pruning
 Partition attributes while computing supp and nhp?
 Recursive partition with linear CountingSort
 At each node, GR representing the homophily effect is generated
first, e.g.
is generated earlier than
8B
b1
10b B
21
b21b
…...
11b
1b2
1A
b1b21a12
b1b21a3
16-05-18
ICDE2016 Helsinki, Finland
b2
b3
2
1
b21b
3
2
b21b
1
3
18
Outline
 Introduction & Motivation
 Problem Formulation
 Solution
 Evaluation
 Conclusion & Future Work
16-05-18
ICDE2016 Helsinki, Finland
19
Experimental Evaluation
 Implementation: C++
 Platform: CentOS 6.4 with Intel 8-core processors 2.53GHz and 12G of RAM
 Real Datasets
 Pokec Social Network data1
o 1,436,515 users and 21,078,140 edges, 6 node attributes
 DBLP co-authorship data2
o 28,702 authors and 66,832 directed edges, 2 node and 1 edge attributes
 Evaluation Measures
 Interestingness
 Efficiency (runtime)
1http://snap.stanford.edu/data/soc-pokec.html
2[Zhao
16-05-18
et al. SIGMOD 11]
ICDE2016 Helsinki, Finland
20
Interestingness: Case Study
 Case study
 A top GR
from Pockec data derives:
This pair suggests a big difference in the preference of opposite sex
partners by males and females when looking for sexual partners
 A top GR from DBLP data
Authors in the DB area often collaborate with those in the DM area
when collaborating with those not in their own area
16-05-18
ICDE2016 Helsinki, Finland
21
Efficiency Study: Pokec Data
600
GRMiner(k)
GRMiner
BL2
BL1
A+B+C+D
A+B+C
A+B
A
550
500
300
A: supp based pruning
250
B: compact 3-table data storage
200
C: nhp based pruning
450
Time (sec)
400
350
300
150
D: top-k pruning
250
100
0
200
20
40
60
80
100
minNhp (%)
150
3000
100
0
10
1
2
10
3
10
Time (sec)
Time (sec)
10
2500
250
200
150
100
0
Default Parameters Setting
4
10
20
10000
40
60
minNhp (%)
minSupp = 50 (absolute value)
2000
1500
minNhp = 50%
1000
500
k = 100
100
80
100 1
k
0
4
6
8
10
12
Dimension
16-05-18
ICDE2016 Helsinki, Finland
22
Outline
 Introduction & Motivation
 Problem Formulation
 Solution
 Evaluation
 Conclusion & Future Work
16-05-18
ICDE2016 Helsinki, Finland
23
Conclusion, Extensions and Future Work
 Conclusion
 Mining social ties beyond homophily, many potential applications
 Compact data presentation
 Novel enumeration with multiple pruning strategies
 Interestingness and efficiency study on real data
 Extensions and future work
 Alternative metrics other than nhp, such as lift, laplace, gain, etc
 Deal with unstructured data
 Predictive model
16-05-18
ICDE2016 Helsinki, Finland
24
Q&A?
Thanks !
16-05-18
ICDE2016 Helsinki, Finland
25