ACL/IJCNLP 2009 Invited Talk

Download Report

Transcript ACL/IJCNLP 2009 Invited Talk

Transfer Learning
for Link Prediction
Qiang Yang, 楊強,香港科大
Hong Kong University of Science and Technology
Hong Kong, China
http://www.cse.ust.hk/~qyang
1
We often find it easier …
2
We often find it easier…
3
We often find it easier…
4
We often find it easier…
5
Transfer Learning? 迁移学习…

People often transfer knowledge to
novel situations



Chess  Checkers
C++  Java
Physics  Computer Science
Transfer Learning:
The ability of a system to recognize and apply knowledge
and skills learned in previous tasks to novel tasks (or new
domains)
6
But, direct application will not
work
Machine Learning:
 Training and future (test) data


follow the same distribution, and
are in same feature space
7
When distributions are different
Classification Accuracy (+, -)
 Training: 90%
 Testing: 60%
8
When features and distributions
are different
Classification Accuracy (+, -)
 Training: 90%
 Testing: 10%
9
Cross-Domain Sentiment Classification
Training
Test
Classifier
DVD
82.55%
DVD
Training
Test
71.85%
Classifier
Electronics
DVD
When distributions are different


Wireless sensor networks
Different time periods,
devices or space
Night time
Day time
Device 1
Device 2
Device,
Space, or
Time
11
When Features are different

Heterogeneous: different feature spaces
Training: Text
Apples
Bananas
Future: Images
The apple is the pomaceous fruit of
the apple tree, species Malus
domestica in the rose family
Rosaceae ...
Banana is the common name for a
type of fruit and also the
herbaceous plants of the genus
Musa which produce this commonly
eaten fruit ...
12
Transfer Learning: Source
Domains
Input
Learning
Output
Source
Domains
Training Data
Test Data
Source Domain
Target Domain
Labeled/Unlabeled
Labeled/Unlabeled
Unlabeled
13
An overview of
various settings of
transfer learning
源數據
Self-taught
Learning
Case 1
No labeled data in a source domain
Inductive Transfer
Learning
目標數據
Labeled data are available in a source domain
Labeled data are available
in a target domain
Case 2
Source and
target tasks are
learnt
simultaneously
Multi-task
Learning
Transfer
Learning
Labeled data are
available only in a
source domain
No labeled data in
both source and
target domain
Transductive
Transfer Learning
Assumption:
different
domains but
single task
Domain
Adaptation
Assumption: single domain
and single task
Unsupervised
Transfer Learning
Sample Selection Bias
/Covariance Shift
Talk Outline



Transfer Learning: A quick introduction
Link prediction and collaborative filtering
problems
Transfer Learning for Sparsity Problem




Codebook Method
CST Method
Collective Link Prediction Method
Conclusions and Future Works
15
Network Data

Social Networks


Information Networks



The Web (1.0 with hyperlinks and 2.0 with
tags)
Citation network, Wikipedia, etc.
Biological Networks


Facebook, Twitter, Live messenger, etc.
Gene network, etc.
Other Networks
16
A Real World Study [Leskovec-Horvitz
WWW ‘08]

Who talks to whom on MSN messenger


Network: 240M nodes, 1.3 billion edges
Conclusions:


Average path length is 6.6
90% of nodes is reachable <8 steps
17
Network Structures: Links

Global network structures



Local network structures



Small world
Long tail distributions
Relational learning (relation==link)
Link prediction (e.g. recommendation)
Group (Cluster) structures
18
Global Network Structures

Small world




Six degrees of separation [Milgram 60s]
Clustering and communities
Evolutionary patterns
Long tail distributions


Personalized recommendation: Amazon
Also brings a challenge in recommendation
and relational learning: sparseness
19
Local Network Structures

Link Prediction


A form of Statistical Relational Learning
Object classification: predict category of an object
based on its attributes and links


Link classification: predict type of a link



Are they co-authors?
Link existence: predict whether a link exists or not


Is this person a student?
Does he like this book?
Link cardinality estimation: predict the number of
links of a node
An Introduction to statistical relational learning by Taskar and Getoor
20
Link Prediction


Task: predict missing links in a network
Main challenge: Network Data Sparsity
21
Product Recommendation
(Amazon.com)
22
22
Examples: Collaborative Filtering
1
3
1
5
3
2
4
?
3
1
?
2
3
?
4
4
Ideal
Setting
Training Data:
Dense Rating Matrix
?
?
?
Test Data:
CF model
Density >=2.0%,
RMSE:0.8721
Rating Matrix
Examples: Collaborative Filtering
1
?
5
?
3
?
4
4
?
Realistic
Setting
?
Training Data:
Sparse Rating Matrix
?
Test Data:
CF model
Rating Matrix
Density <=0.6%,
RMSE:0.9748
10%
Transfer Learning for
Collaborative Filtering?
IMDB Database
Amazon.com
26
26
Codebook Transfer



Bin Li, Qiang Yang, Xiangyang Xue.
Can Movies and Books Collaborate? Cross-Domain
Collaborative Filtering for Sparsity Reduction.
In Proceedings of the Twenty-First International Joint
Conference on Artificial Intelligence (IJCAI '09),
Pasadena, CA, USA, July 11-17, 2009.
Limitations of Codebook
Transfer

Same rating range


Homogenous dimensions


Source and target data must have the
same range of ratings [1, 5]
User and item dimensions must be similar
In reality


Range of ratings can be 0/1 or [1,5]
User and item dimensions may be very
different
37
Coordinate System Transfer



Weike Pan, Evan Xiang, Nathan Liu and Qiang Yang.
Transfer Learning in Collaborative Filtering for
Sparsity Reduction.
In Proceedings of the 24th AAAI Conference on
Artificial Intelligence (AAAI-10). Atlanta, Georgia,
USA. July 11-15, 2010.
Types of User Feedback

Explicit feedback: express preferences explicitly



1-5: Rating
?: Missing
Implicit feedback: express preferences implicitly


1: Observed browsing/purchasing
0: Unobserved browsing/purchasing

Target domain



R: explicit ratings
One auxiliary domain (two or more data sources)
: user u has implicitly browsed/purchased item j
Problem Formulation

One target domain
, n users and m items.


One auxiliary domain (two data sources)



user side:
users with R.
item side:
items with R.
, shares n common
, shares m common
Goal: predict the missing values in R.
Our Solution: Coordinate System Transfer


Step 1: Coordinate System Construction (
Step 2: Coordinate System Adaptation
)
Coordinate System Transfer
Step 1: Coordinate System Construction

Sparse SVD on auxiliary data
where the matrices
give the top d principle coordinates.
Definition (Coordinate System):

A coordinate system is a matrix with columns of orthonormal bases
(principle coordinates), where the columns are located in descending order
according to their corresponding eigenvalues.
Coordinate System Transfer
Step 1: Coordinate System Adaptation

Adapt the discovered coordinate systems from the auxiliary
domain to the target domain,

The effect from the auxiliary domain


Initialization: take
Regularization:
as seed model in target domain,
Coordinate System Transfer
Algorithm
Experimental Results
Data Sets and Evaluation Metrics

Data sets (extracted from MovieLens and Netflix)

Mean Absolute Error (MAE) and Root Mean Square Error
(RMSE),
Where
and
and
are the true and predicted ratings, respectively,
is the number of test ratings.
Experimental Results
Baselines and Parameter Settings

Baselines





Average Filling
Latent Matrix Factorization (Bell and Koren, ICDM07)
Collective Matrix Factorization (Singh and Gordon, KDD08)
OptSpace (Keshavan, Montanari and Oh, NIPS10)
Parameter settings

Tradeoff parameters





LFM: { 0.001, 0.01, 0.1, 1, 10, 100 }
CMF: { 0.1, 0.5, 1, 5, 10 }
CST:
Latent dimensions: { 5, 10, 15 }
Average results over 10 random trials are reported
Experimental Results
Results(1/2)

Observations:


CST performs 99% significantly better (t-test) than all baselines in all
sparsity levels,
Transfer learning methods (CST, CMF) beats two non-transfer learning
methods (AF, LFM).
Experimental Results
Results(2/2)

Observations:


CST consistently improves as d increases, demonstrating the usefulness
(avoid overfitting) using the auxiliary knowledge,
The improvement (CST vs. OptSpace) is more significant with fewer
observed ratings, demonstrating the effectiveness of CST in alleviating
data sparsity.
Related Work
Related Work (Transfer Learning)

Cross-domain collaborative filtering

Domain adaptation methods


One-sided:
Two-sided:
Limitation of CST and CBT


Users behave differently in different
domains
Rating bias

Users tend to rate items that they like
Our Solution: Collective Link
Prediction (CLP)


Jointly learn multiple domains together
Learning the similarity of different
domains

consistency between domains indicates
similarity.



Bin Cao, Nathan Liu and Qiang Yang.
Transfer Learning for Collective Link
Prediction in Multiple Heterogeneous
Domains.
In Proceedings of 27th International
Conference on Machine Learning (ICML
2010), Haifa, Israel. June 2010.
Recommendation with
Heterogeneous Domains

Items span multiple, heterogeneous
domains for many recommendation
systems.

ebay, GoogleReader, douban, etc.
Example: Collective Link Prediction
1
2
3
5
1
2
3
1
Sparse Training
4 Rating
Movie
Matrix
4
1
0
0
1
0
0
1
0
0
1
1
0
1
1
0
1
1
0
0Dense
1 Auxiliary
1
0
1
Movie Tagging
0 0Matrix
1
2
3
4
5
4
3
Dense Auxiliary
5 Rating
Book
Matrix
2
Transfer
Learning
?
RMSE:0.8855
?
?
? Test Data:
?
?
Rating
Matrix
A Nonparametric Bayesian
Approach


Based on Gaussian process models
Key part is the kernel modeling user
relation as well as task relation
Making Prediction

Similar to memory-based approach
Mean of prediction
Similarity between items
Similarity between tasks
Experimental Results
Conclusions and Future Work


Transfer Learning (舉一反三 )
Transfer Learning has many applications



Sensor Based Location Estimation
Sensor Based Activity Recognition
Future:



When to transfer, and when not to transfer
Finding source domains on World Wide Web
Heterogeneous transfer learning
59