Machine Learning

Download Report

Transcript Machine Learning

Modern Topics in Multivariate Methods for
Data Analysis
Modern Topics in Multivariate Methods for
Data Analysis
•
Semi-Supervised Learning
•
Transfer Learning
•
Active Learning
•
Summary
Semi-Supervised Learning
This is an extension to supervised learning.
We have two sets of data:
Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007
Motivation: labeled data is sometimes hard to obtain.
An example from Mars Data Analysis
Martian landscape
Geomorphic map
shows landforms
chosen and
defined by a
domain expert.
Digital Elevation Map
Manually drawn geomorphic map of this landscape
Geomorphic Map
Segmentation
Segmentation: Results.
2631 segments
homogeneous in
slope, curvature
and flood.
Displayed on an elevation background.
Classification: Labeling.
A representative subset of objects
are labeled as one of the
following six classes:
 Plain
 Crater Floor
 Convex Crater Walls
 Concave Crater Walls
 Convex Ridges
 Concave Ridges
Labeled segments.
How do we approach semi-supervised learning?
Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007
A Case with No Unlabeled Data
Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007
A Case with Unlabeled Data
Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007
A Case with Unlabeled Data
Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007
A Case with Unlabeled Data
Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007
Graph-Based Models
Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007
Assumptions in Semi-Supervised Learning
How can we learn from unlabeled data at all?
The answer lies in the set of assumptions about the
unlabeled data distribution.
If assumptions are right, an advantage can be obtained
using unlabeled data
But a decrease in performance is possible if
assumptions are incorrect.
Modern Topics in Multivariate Methods for
Data Analysis
•
Semi-Supervised Learning
•
Transfer Learning
•
Active Learning
•
Summary
Transfer Learning
• The goal is to transfer knowledge gathered from previous experience.
• Also called Inductive Transfer or Learning to Learn.
• Example: Invariant transformations across tasks.
Motivation Transfer Learning
Motivation for transfer learning
Once a predictive model is built, there are reasons to
believe the model will cease to be valid at some point in
time.
The difference is that now source and target domains can
be completely different.
Traditional Approach to Classification
DB1
Learning
System
DB2
Learning
System
DBn
Learning
System
Transfer Learning
DB1
DB2
Source
domain
DB new
Target
domain
Learning
System
Learning
System
Knowledge
Learning
System
Transfer Learning
Scenarios:
1. Labeling in a new domain is costly.
DB1 (labeled)
Classification of Patients G1
DB2 (unlabeled)
Classification of Patients G2
Transfer Learning
Scenarios:
2. Data is outdated. Model created with one survey but
a new survey is now available.
Survey 1
Learning
System
Survey 2
?
Functional Transfer: Multitask Learning
Left
Output nodes
Internal nodes
Input nodes
Straight
Right
Train in Parallel with Combined Architecture
Figure obtained from Brazdil, et. Al. Metalearning: Applications to Data Mining, Chapter 7, Springer, 2009.
Knowledge of Parameters
Assume prior distribution of parameters
Source
domain
Target
domain
Learn parameters and adjust prior distribution
Learn parameters using the source prior
distribution.
Assume Parameter Similarity
P(y|x) = P(x|y) P(y) / P(x)
Parameter Similarity
Task A  Parameter A
Task B 
Parameter B ~ A
Assume hyper-distribution with low variance.
Knowledge of Parameters
Find coefficients ws using SVMs
Find coefficients wT using SVMs
initializing the search with ws
Feature Transfer
Feature Transfer:
Source
domain
Target
domain
Shared representation across tasks
Minimize Loss-Function( y, f(x))
The minimization is done over multiple tasks (multiple regions on Mars).
Feature Transfer
Identify common
Features to all tasks
Instance Transfer Learning
Instance Transfer:
Source
domain
Filter samples
New program called
TrAdaboost
Target
domain
Learning
System
Larger target dataset
Modern Topics in Multivariate Methods for
Data Analysis
•
Semi-Supervised Learning
•
Transfer Learning
•
Active Learning
•
Summary
Active Learning
Active learning is part of the field of supervised learning.
We have labeled and unlabeled data. The novel idea is that
we can choose which examples to label during learning.
It is also called “Query Learning”.
Labeled Data
Unlabeled Data  Select examples
Active Learning
Types of Active Learning:
1.
Query Synthesis.
The learner can request an example from anywhere in the
instance space. It is only appropriate with small finite
domains.
Some examples may have no meaning.
Active Learning
Types of Active Learning:
2. Stream-Based Selective Sampling
Instances are drawn from the input space according to a
distribution, and the learner can decide to discard it or not.
For example, one can only choose examples from regions of
uncertainty.
Active Learning
Types of Active Learning:
3. Pool-Based Sampling
Assume a small set of labeled examples and a large set of
unlabeled examples. Here we evaluate and rank the whole
set of unlabeled examples; we then choose one or more
examples.
Sampling Based on Uncertainty
70% accuracy
Figure taken from “Active Learning” by Burr Settles, Morgan & Claypool, 2012.
90% accuracy
Sampling Based on Uncertainty
Uncertainty:
1.0
0.5
1.0
Modern Topics in Multivariate Methods for
Data Analysis
•
Semi-Supervised Learning
•
Transfer Learning
•
Active Learning
•
Summary
Summary
Few labeled examples, labeling is expensive,
many unlabeled examples  Semi-Supervised
Similar classification tasks but there is indication that
the distributions have changed  Transfer Learning
Few training examples, labeling is expensive  Active Learning