Active learning Bootstrapping

Download Report

Transcript Active learning Bootstrapping

國立雲林科技大學
National Yunlin University of Science and Technology
Mining relational data from text –
From strictly supervised to weakly
supervised learning
Presenter : Shao-Wei Cheng
Authors : Zhu Zhang
IS 2008
Intelligent Database Systems Lab
Outline

Motivation

Objective

Methodology

Experiments

Conclusion

Personal Comments
N.Y.U.S.T.
I. M.
2
Intelligent Database Systems Lab
Motivation
N.Y.U.S.T.
I. M.

The world today is full of various information sources, often
with different ways of representing the same information. And
lots of relations are hidden in natural language text.

While supervised learning is usually preferred when applicable,
it is not always easy to acquire large amount of labeled
training data.
Relation:(author, book)
(1) ‘‘… Shakespeare’s famous work Hamlet …’’
(2) ‘‘… A Brief History of Time was written by Stephen Hawking …’’
3
Intelligent Database Systems Lab
Objectives


N.Y.U.S.T.
I. M.
The goal of the study is to automatically classify relations
between entities using machine learning techniques (SVM),
especially weakly supervised learning algorithms.

Active learning

Bootstrapping
Introduce random subspace-based algorithms, in the context
ROLE
of both active learning and bootstrapping for relation
classification.
PART
AT
NEAR
Shares of Disney, parent company of ABC, are up five eighths.
SOC
4
Intelligent Database Systems Lab
Methodology
N.Y.U.S.T.
I. M.
Co-training algorithm

Active learning

RandSelect:Random sampling strategy.

ActiveLearnBaseline:Presents the examples with the highest uncertainty to
the user for annotation.

ActiveLearnBagging:Committee-based strategy. Presents the most
disagreement example for further labeling.

ActiveLearnSubspace:The features are randomly sampled with probability p.
5
Intelligent Database Systems Lab
Methodology


N.Y.U.S.T.
I. M.
Bootstrapping

BootSelf-Y:The highest-probability label is assigned.

BootBagging-Y:Committee-based strategy, and the highest-probability label is
assigned.

BootSubspace-Y:Random sampling in feature space.
The modified bootstrapping are named BootSelf-I and
BootBagging-I and BootSubspace-I, the “I” stand for
“incremental”.
6
Intelligent Database Systems Lab
Experiments

Dataset



N.Y.U.S.T.
I. M.
From the ACE corpus
Data treatment

Parse the sentences into syntactic trees.

Convert into chunklink format and generate feature vectors.
John hit theof
ball.
Performance
the SVM model.
7
Intelligent Database Systems Lab
Experiments

N.Y.U.S.T.
I. M.
Active learning
8
Intelligent Database Systems Lab
Experiments
N.Y.U.S.T.
I. M.
Something about Co-training algorithm

Bootstrapping
Intelligent Database Systems Lab
Conclusion
N.Y.U.S.T.
I. M.

A variety of weakly supervised learning (active learning
and bootstrapping) algorithms can take advantage of large
amount of unlabeled data when labeling is costly.

Innovative use of RS-based algorithms in the context of
weakly supervised learning demonstrated empirical
advantage.
10
Intelligent Database Systems Lab
Personal Comments

Advantage


The goal of the study is clearly.
Drawback


N.Y.U.S.T.
I. M.
Some of proper noun without any explanation.
Application

Relation extraction.

Weakly supervised learning.
11
Intelligent Database Systems Lab