Multi-Instance Learning
Download
Report
Transcript Multi-Instance Learning
Theoretical Analysis of
Multi-Instance Leaning
张敏灵 周志华
[email protected]
南京大学软件新技术国家重点实验室
2002.10.11
Outline
Introduction
Theoretical analysis
PAC
learning model
PAC learnablility of APR
Real-valued multi-instance learning
Future work
Introduction
Origin
Multi-instance learning originated from the
problem of “drug activity prediction”, and was
first formalized by T. G. Dietterich et al. in their
seminal paper “Solving the multiple-instance
problem with axis-parallel rectangles”(1997)
Later in 2001, J. D. Zuker and Y. Chevaleyre
extended the concept of “multi-instance learning”
to “multi-part learning”, and pointed out that
many previously studied problems are “multi-part”
problems rather than “multi-instance” ones.
Introduction-cont’d
Drug activity prediction problem
Fig.1. The shape of a molecule changes as it rotates it’s bonds
Comparisons
Fig.2. Classical and multi-instance learning frameworks
Introduction-cont’d
Experiment data
Data
set
#dim
#bags
#pos
bags
#neg
bags
#instances
#instances/bag
max
min
ave
musk1
166
92
47
45
476
40
2
5.17
musk2
166
102
39
63
6598
1044
1
64.69
APR(Axis-Parallel Rectangles) algorithms
GFS elim-count APR(standard)
GFS elim-kde APR(outside-in)
Iterated discrim APR(inside-out)
musk1: 92.4%
musk2: 89.2%
Fig.3. APR algorithms
Introduction-cont’d
Various algorithms
APR (T. G. Dietterich et al.1997)
MULTINST (P. Auer 1997)
Diverse Density (O. Maron 1998)
Bayesian-kNN, Citation-kNN (J. Wang et al.
2000)
Relic (G. Ruffo 2000)
EM-DD (Q. Zhang & S. A. Goldman 2001)
……
Introduction-cont’d
Comparison on benchmark data sets
Algorithms
Musk1
(%correct)
Musk2
(%correct)
iterated-discrim APR
92.4
89.2
Citation-kNN
92.4
86.3
Diverse Density
88.9
82.5
RELIC
83.7
87.3
MULTINST
76.7
84.0
BP
75.0
67.7
C4.5
68.5
58.8
Fig.4. A comparison of several multi-instance learning algorithm
Introduction-cont’d
Application area
Drug activity prediction (T. G. Dietterich et al. 1997)
Stock prediction (O. Maron 1998)
Learn a simple description of a person from a series
of images (O. Maron 1998)
Natural scene classification (O. Maron & A. L. Ratan
1998)
Event prediction (G. M. Weiss & H. Hirsh 1998)
Data mining and computer security (G. Ruffo 2000)
……
Multi-instance learning has been regarded as the fourth
machine learning framework parallel to supervised
learning, unsupervised learning, and reinforcement
learning.
Theoretical analysis
PAC learning model
Definition
and it’s properties
VC dimension
PAC learnability of APR
Real-valued multi-instance learning
Theoretical Analysis-PAC model
Computational learning theory
L. G. Valiant (1984)
A theory of learnable
Deductive learning
Used for constructing a mathematical model of a
cognitive process.
W
Actual
example
P
Coded
example
M
Fig.5. Diagram of a framework for learning
0/1
PAC model-cont’d
Definition of PAC learning
We say that a learning algorithm L is a pac(probably
approximately correct) learning algorithm for the
hypothesis space H if, given
A confidence parameter δ (0< δ<1);
An accuracy parameter ε (0< ε<1);
then there is a positive integer mL= mL (δ,ε) such that
For any target concept t ∈H
For any probability distribution µ on X
whenever m mL , µm{s ∈ S(m,t) | er µ(L(s) , t)< ε}>1- δ
PAC model-cont’d
Properties of a pac learning algorithm
It is probable that a useful training sample is presented.
One can only expect that the output hypothesis is
approximately correct.
mL depends upon δ and ε, but not on t and µ.
If there is a pac learning algorithm for a hypothesis space
H, then we say that H is pac-learnable.
Efficient pac learning algorithm
If the running time of a pac learning algorithm L is
polynomial in 1/ δ and 1/ ε, then L is said to be efficient.
It is usually necessary to require a pac learning algorithm
to be efficient.
PAC model-cont’d
VC dimension
VC (Vapnik-Chervonenkis) dimension of a
hypothesis space H is a notion originally defined
by Vapnik and Chervonenkis(1971), and was
introduced into computational learning theory by
Blumer et al.(1986)
VC dimension of a hypothesis space H, denoted
by VCdim(H), describes the ‘expressive power’ of
H in a sense.Generally, the greater of VCdim(H),
the greater ‘expressive power’ of H, so H is more
difficult to learn.
PAC model-cont’d
Consistency
If for any target concept t∈H and any training sample
s=((x1,b1),(x2,b2), . . ., (xm,bm)) for t, the corresponding
hypothesis L(s)∈H agrees with s, i.e. L(s)(xi)=t(xi)=bi,
then we say that L is a consistent algorithm.
VC dimension and pac learnability
H has finite VC dimension
L is a consistent learning
algorithm for H
H is pac-learnable
Theoretical Analysis-PAC learning
of APR
Early work
While T. G. Dietterich et al. have proposed three APR
algorithms for multi-instance learning, P. M. Long & L.
Tan (1997) had some theoretical analysis of the pac
learnability of APR and showed that if,
Each instance in a bag is draw from a product distribution.
All instance in a bag are drawn independently.
then APR is pac learnable under the multi-instance
nd
)
learning framework with sample complexity O( d n log
nd
).
and time complexity O( d n log
2 6
10
5 12
2
20
PAC learning of APR-cont’d
A hardness result
Via the analysis of VC dimension, P. Auer et al.(1998)
gave a much more efficient pac learning algorithm
than with sample complexity O( d n log d ) and time
complexity O( d n ).
2 2
3 2
2
2
More important, they proved that if the instances
in a bag are not independent, then learning APR
under multi-instance learning framework is as
hard as learning DNF formulas, which is a NPComplete problem.
PAC learning of APR-cont’d
A further reduction
A. Blum & A. Kalai (1998) further studied the problem
of pac learning APR from multi-instance examples,
and proved that
If H is pac learnable from 1-sided (or 2-sided) random
classification noise, then H is pac learnable from multiinstance examples.
Via a reduction to the “Statistical Query” model ( M.
Kearns 1993), APR is pac learnable from
multi-instance
d 2n
examples with 3sample
complexity O ( 2 ) and with time
2
complexity O ( d n2 ) .
PAC learning of APR-cont’d
Summary
P. M. Long et al.
P. Auer et al.
A. Blum et al.
Sample
Time
complexity
complexity
d 2n6 nd
d 5n12 2 nd
O( 10 log ) O ( 20 log
)
d 2n 2 d
O( 2 log )
d 2n
O( 2 )
Constrains
Theoretical
tools
product distribution,
p-concept,
independent
instances
d 3n 2
O( 2 )
independent
instances
d 3n 2
O( 2 )
independent
instances
VC dimension
VC dimension
statistical query
model,
VC dimension
Fig.6. A comparison of three theoretical algorithm
Theoretical Analysis- Real-valued
multi-instance learning
Real-valued multi-instance learning
It is worthwhile to note that in several applications of the
multiple instance problem, the actual predictions desired are
real valued. For example, the binding affinity between a
molecule and receptor is quantitative, so a real-valued label
of binding strength is preferable.
S. Ray & D. Page (2001) showed that the problem of multiinstance regression is NP-Complete, furthermore, D. R.
Dooly et al. (2001) showed that learning from real-valued
multi-instance examples is as hard as learning DNF.
Nearly at the same time, R. A. Amar et al.(2001) extended
the KNN, Citation-kNN and Diverse Density algorithms for
real-valued multi-instance learning, they also provided a
flexible procedure for generating chemically realistic artificial
data sets and studied the performance of these modified
algorithms on them.
Future work
Further theoretical analysis of multi-instance
learning.
Design multi-instance modifications for neural
networks, decision trees, and other popular
machine learning algorithms.
Explore more issues which can be translated
into multi-instance learning problems.
Design appropriate bag generating methods.
……
Thanks