Active learning Bootstrapping
Download
Report
Transcript Active learning Bootstrapping
國立雲林科技大學
National Yunlin University of Science and Technology
Mining relational data from text –
From strictly supervised to weakly
supervised learning
Presenter : Shao-Wei Cheng
Authors : Zhu Zhang
IS 2008
Intelligent Database Systems Lab
Outline
Motivation
Objective
Methodology
Experiments
Conclusion
Personal Comments
N.Y.U.S.T.
I. M.
2
Intelligent Database Systems Lab
Motivation
N.Y.U.S.T.
I. M.
The world today is full of various information sources, often
with different ways of representing the same information. And
lots of relations are hidden in natural language text.
While supervised learning is usually preferred when applicable,
it is not always easy to acquire large amount of labeled
training data.
Relation:(author, book)
(1) ‘‘… Shakespeare’s famous work Hamlet …’’
(2) ‘‘… A Brief History of Time was written by Stephen Hawking …’’
3
Intelligent Database Systems Lab
Objectives
N.Y.U.S.T.
I. M.
The goal of the study is to automatically classify relations
between entities using machine learning techniques (SVM),
especially weakly supervised learning algorithms.
Active learning
Bootstrapping
Introduce random subspace-based algorithms, in the context
ROLE
of both active learning and bootstrapping for relation
classification.
PART
AT
NEAR
Shares of Disney, parent company of ABC, are up five eighths.
SOC
4
Intelligent Database Systems Lab
Methodology
N.Y.U.S.T.
I. M.
Co-training algorithm
Active learning
RandSelect:Random sampling strategy.
ActiveLearnBaseline:Presents the examples with the highest uncertainty to
the user for annotation.
ActiveLearnBagging:Committee-based strategy. Presents the most
disagreement example for further labeling.
ActiveLearnSubspace:The features are randomly sampled with probability p.
5
Intelligent Database Systems Lab
Methodology
N.Y.U.S.T.
I. M.
Bootstrapping
BootSelf-Y:The highest-probability label is assigned.
BootBagging-Y:Committee-based strategy, and the highest-probability label is
assigned.
BootSubspace-Y:Random sampling in feature space.
The modified bootstrapping are named BootSelf-I and
BootBagging-I and BootSubspace-I, the “I” stand for
“incremental”.
6
Intelligent Database Systems Lab
Experiments
Dataset
N.Y.U.S.T.
I. M.
From the ACE corpus
Data treatment
Parse the sentences into syntactic trees.
Convert into chunklink format and generate feature vectors.
John hit theof
ball.
Performance
the SVM model.
7
Intelligent Database Systems Lab
Experiments
N.Y.U.S.T.
I. M.
Active learning
8
Intelligent Database Systems Lab
Experiments
N.Y.U.S.T.
I. M.
Something about Co-training algorithm
Bootstrapping
Intelligent Database Systems Lab
Conclusion
N.Y.U.S.T.
I. M.
A variety of weakly supervised learning (active learning
and bootstrapping) algorithms can take advantage of large
amount of unlabeled data when labeling is costly.
Innovative use of RS-based algorithms in the context of
weakly supervised learning demonstrated empirical
advantage.
10
Intelligent Database Systems Lab
Personal Comments
Advantage
The goal of the study is clearly.
Drawback
N.Y.U.S.T.
I. M.
Some of proper noun without any explanation.
Application
Relation extraction.
Weakly supervised learning.
11
Intelligent Database Systems Lab