Research Progress Report
Download
Report
Transcript Research Progress Report
國立雲林科技大學
National Yunlin University of Science and Technology
Bridging Domains Using World
Wide Knowledge for Transfer
Learning
Evan Wei Xiang, Bin Cao, Derek Hao Hu, and Qiang Yang
TKDE, 2010
presented by Wen-Chung Liao, 2010/05/12
1
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Outlines
2
Motivation
Objectives
Methodology
Experiments
Conclusions
Comments
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Motivation
Supervised learning, require sufficient labeled
instances
It is not easy or feasible to obtain new labeled data in
a domain of interest
To solve this problem, transfer learning techniques
─
─
─
3
capture the shared knowledge from some related domains (source
domains ) where labeled data are available
use the knowledge to improve the performance of data mining
tasks in a target domain.
domain adaptation techniques,
However, transfer learning may not work well when
the difference (information gap) between the source
and target domains is large.
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Objectives
To solve this problem, introduce a bridge between the
two different domains
─
by leveraging additional knowledge sources
─
─
treat the two domains from a single underlying distribution
“domain adaptation problem” classification problem under
the supervised setting or a semisupervised (transductive) setting.
Introduces a novel domain adaptation algorithm
called BIG (Bridging Information Gap).
─
we apply semisupervised learning (SSL) to domain adaption
problems based on the use of the auxiliary data (bridge).
4
Wikipedia or the Open Directory Project (ODP)
the labeled data from the source domain
the unlabeled data from the target domain
an auxiliary data source such as the Wikipedia.
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Support vector machines (SVMs)
5
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
6
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
7
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
8
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Methodology
Information Gap with No Background Knowledge Available
SVM
Information Gap with Background Knowledge
TSVM
Selecting the set of unlabeled data {xi} from K
to minimize the margin
NP-Hard
9
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
10
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Experiments
11
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
12
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
13
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
14
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Conclusions
THREE MAJOR CONTRIBUTIONS
1) We view the problem from a new perspective, i.e., we consider the
problem of transfer learning as one of filling in the information gap
based on a large document corpus.
2) we show that we can successfully bridge the source and target domains
using well developed semisupervised learning algorithms.
3) We propose a minmargin algorithm that can effectively identify and
reduce the information gap between two domains.
FUTURE WORK
─
─
─
15
First, we plan to validate the effectiveness of our approach through other
semisupervised learning algorithms and other relational knowledge
bases
We plan to extend our approach to be able to consider heterogeneous
transfer learning
Finally, we will try to develop online TSVM methods for incremental
cross-domain transductive learning.
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Comments
Advantage
─
16
new perspective
Shortage
Applications
─ Web and document data mining applications
─ information retrieval
─ spam detection
─ online advertisement
─ Web search
Intelligent Database Systems Lab