Multi-Instance Learning

Download Report

Transcript Multi-Instance Learning

Theoretical Analysis of
Multi-Instance Leaning
张敏灵 周志华
[email protected]
南京大学软件新技术国家重点实验室
2002.10.11
Outline
 Introduction
 Theoretical analysis
 PAC
learning model
 PAC learnablility of APR
 Real-valued multi-instance learning
 Future work
Introduction
 Origin


Multi-instance learning originated from the
problem of “drug activity prediction”, and was
first formalized by T. G. Dietterich et al. in their
seminal paper “Solving the multiple-instance
problem with axis-parallel rectangles”(1997)
Later in 2001, J. D. Zuker and Y. Chevaleyre
extended the concept of “multi-instance learning”
to “multi-part learning”, and pointed out that
many previously studied problems are “multi-part”
problems rather than “multi-instance” ones.
Introduction-cont’d
 Drug activity prediction problem
Fig.1. The shape of a molecule changes as it rotates it’s bonds
 Comparisons
Fig.2. Classical and multi-instance learning frameworks
Introduction-cont’d
 Experiment data
Data
set
#dim
#bags
#pos
bags
#neg
bags
#instances
#instances/bag
max
min
ave
musk1
166
92
47
45
476
40
2
5.17
musk2
166
102
39
63
6598
1044
1
64.69
 APR(Axis-Parallel Rectangles) algorithms
GFS elim-count APR(standard)
GFS elim-kde APR(outside-in)
Iterated discrim APR(inside-out)
musk1: 92.4%
musk2: 89.2%
Fig.3. APR algorithms
Introduction-cont’d
 Various algorithms
 APR (T. G. Dietterich et al.1997)
 MULTINST (P. Auer 1997)
 Diverse Density (O. Maron 1998)
 Bayesian-kNN, Citation-kNN (J. Wang et al.
2000)
 Relic (G. Ruffo 2000)
 EM-DD (Q. Zhang & S. A. Goldman 2001)
 ……
Introduction-cont’d
 Comparison on benchmark data sets
Algorithms
Musk1
(%correct)
Musk2
(%correct)
iterated-discrim APR
92.4
89.2
Citation-kNN
92.4
86.3
Diverse Density
88.9
82.5
RELIC
83.7
87.3
MULTINST
76.7
84.0
BP
75.0
67.7
C4.5
68.5
58.8
Fig.4. A comparison of several multi-instance learning algorithm
Introduction-cont’d
 Application area
 Drug activity prediction (T. G. Dietterich et al. 1997)
 Stock prediction (O. Maron 1998)
 Learn a simple description of a person from a series
of images (O. Maron 1998)
 Natural scene classification (O. Maron & A. L. Ratan
1998)
 Event prediction (G. M. Weiss & H. Hirsh 1998)
 Data mining and computer security (G. Ruffo 2000)
 ……
Multi-instance learning has been regarded as the fourth
machine learning framework parallel to supervised
learning, unsupervised learning, and reinforcement
learning.
Theoretical analysis
 PAC learning model
 Definition
and it’s properties
 VC dimension
 PAC learnability of APR
 Real-valued multi-instance learning
Theoretical Analysis-PAC model
 Computational learning theory
 L. G. Valiant (1984)



A theory of learnable
Deductive learning
Used for constructing a mathematical model of a
cognitive process.
W
Actual
example
P
Coded
example
M
Fig.5. Diagram of a framework for learning
0/1
PAC model-cont’d
 Definition of PAC learning
We say that a learning algorithm L is a pac(probably
approximately correct) learning algorithm for the
hypothesis space H if, given


A confidence parameter δ (0< δ<1);
An accuracy parameter ε (0< ε<1);
then there is a positive integer mL= mL (δ,ε) such that

For any target concept t ∈H

For any probability distribution µ on X
whenever m  mL , µm{s ∈ S(m,t) | er µ(L(s) , t)< ε}>1- δ
PAC model-cont’d
 Properties of a pac learning algorithm
 It is probable that a useful training sample is presented.
 One can only expect that the output hypothesis is
approximately correct.
 mL depends upon δ and ε, but not on t and µ.
If there is a pac learning algorithm for a hypothesis space
H, then we say that H is pac-learnable.
 Efficient pac learning algorithm
If the running time of a pac learning algorithm L is
polynomial in 1/ δ and 1/ ε, then L is said to be efficient.
It is usually necessary to require a pac learning algorithm
to be efficient.
PAC model-cont’d
 VC dimension
 VC (Vapnik-Chervonenkis) dimension of a
hypothesis space H is a notion originally defined
by Vapnik and Chervonenkis(1971), and was
introduced into computational learning theory by
Blumer et al.(1986)
 VC dimension of a hypothesis space H, denoted
by VCdim(H), describes the ‘expressive power’ of
H in a sense.Generally, the greater of VCdim(H),
the greater ‘expressive power’ of H, so H is more
difficult to learn.
PAC model-cont’d
 Consistency
If for any target concept t∈H and any training sample
s=((x1,b1),(x2,b2), . . ., (xm,bm)) for t, the corresponding
hypothesis L(s)∈H agrees with s, i.e. L(s)(xi)=t(xi)=bi,
then we say that L is a consistent algorithm.
 VC dimension and pac learnability
H has finite VC dimension
L is a consistent learning
algorithm for H
H is pac-learnable
Theoretical Analysis-PAC learning
of APR
 Early work
While T. G. Dietterich et al. have proposed three APR
algorithms for multi-instance learning, P. M. Long & L.
Tan (1997) had some theoretical analysis of the pac
learnability of APR and showed that if,


Each instance in a bag is draw from a product distribution.
All instance in a bag are drawn independently.
then APR is pac learnable under the multi-instance
nd
)
learning framework with sample complexity O( d n log 
nd
).
and time complexity O( d n log 
2 6
10
5 12
2
20
PAC learning of APR-cont’d
 A hardness result
Via the analysis of VC dimension, P. Auer et al.(1998)
gave a much more efficient pac learning algorithm
than with sample complexity O( d n log d ) and time
complexity O( d n ).
2 2
3 2
2
2
More important, they proved that if the instances
in a bag are not independent, then learning APR
under multi-instance learning framework is as
hard as learning DNF formulas, which is a NPComplete problem.
PAC learning of APR-cont’d
 A further reduction
A. Blum & A. Kalai (1998) further studied the problem
of pac learning APR from multi-instance examples,
and proved that


If H is pac learnable from 1-sided (or 2-sided) random
classification noise, then H is pac learnable from multiinstance examples.
Via a reduction to the “Statistical Query” model ( M.
Kearns 1993), APR is pac learnable from
multi-instance
d 2n
examples with 3sample
complexity O ( 2 ) and with time
2

complexity O ( d n2 ) .

PAC learning of APR-cont’d
 Summary
P. M. Long et al.
P. Auer et al.
A. Blum et al.
Sample
Time
complexity
complexity
d 2n6 nd
d 5n12 2 nd
O( 10 log ) O ( 20 log
)




d 2n 2 d
O( 2 log )


d 2n
O( 2 )

Constrains
Theoretical
tools
product distribution,
p-concept,
independent
instances
d 3n 2
O( 2 )

independent
instances
d 3n 2
O( 2 )

independent
instances
VC dimension
VC dimension
statistical query
model,
VC dimension
Fig.6. A comparison of three theoretical algorithm
Theoretical Analysis- Real-valued
multi-instance learning
 Real-valued multi-instance learning



It is worthwhile to note that in several applications of the
multiple instance problem, the actual predictions desired are
real valued. For example, the binding affinity between a
molecule and receptor is quantitative, so a real-valued label
of binding strength is preferable.
S. Ray & D. Page (2001) showed that the problem of multiinstance regression is NP-Complete, furthermore, D. R.
Dooly et al. (2001) showed that learning from real-valued
multi-instance examples is as hard as learning DNF.
Nearly at the same time, R. A. Amar et al.(2001) extended
the KNN, Citation-kNN and Diverse Density algorithms for
real-valued multi-instance learning, they also provided a
flexible procedure for generating chemically realistic artificial
data sets and studied the performance of these modified
algorithms on them.
Future work
 Further theoretical analysis of multi-instance




learning.
Design multi-instance modifications for neural
networks, decision trees, and other popular
machine learning algorithms.
Explore more issues which can be translated
into multi-instance learning problems.
Design appropriate bag generating methods.
……
Thanks