mmis-v2 - Fordham University Computer and Information

Download Report

Transcript mmis-v2 - Fordham University Computer and Information

A Combinatorial Fusion
Method for Feature Mining
Ye Tian, Gary Weiss, D. Frank Hsu, Qiang Ma
Fordham University
Presented by Gary Weiss
Introduction
• Feature construction/engineering often a critical
step in the data mining process
– Can be very time-consuming and may require a lot of
manual effort
• Our approach is to use a combinatorial method to
automatically construct new features
– We refer to this as “feature fusion”
– Geared toward helping to predict rare classes
– For now it is restricted to numerical features, but can be
extended to other features
2
How does this relate to MMIS?
• One MMIS category is local pattern analysis
– How to efficiently identify quality knowledge
from a single data source
– Listed data preparation and selection as
subtopics and also mentioned fusion
• We acknowledge that this work probably is
not what most people think of as MMIS
3
How can we view this work as MMIS?
• Think of each feature as piece of information
– Our fusion approach integrates these pieces
• Fusion itself is a proper topic for MMIS since it can
also be used with multiple info sources
– The fusion method we employ does not really care if the
information (i.e., features) are from a single source
• As complexity of constructed features increases,
each can be viewed as a classifier
– Each fused feature is an information source
– This view is bolstered by other work on data fusion that
using ensembles to combine each fused feature
4
Description of the Method
1. A data set is a collection of records where each
feature has a score
–
We assume numerical features
2. We then replace scores by ranks
–
Ordering of ranks determined by whether larger or
small scores better predict class
3. Compute performance of each feature
4. Compute performance of feature combinations
5. Decide which combinations to evaluate/use
5
Step 1: A data set
A
B
C
F1
1
3
5
F2
4
3
5
F3
3
5
2
F4
2
5
6
F5
8
4
7
Class
1
0
1
D
E
F
7
11
15
6
13
16
15
16
4
3
7
13
2
14
11
0
0
0
G
H
9
17
7
15
14
9
1
8
18
3
1
0
6
Step 2: Scores replaced by Ranks
F1
F2
F3
F4
F5
A
1
2
2
2
5
B
2
1
4
4
3
C
3
3
1
5
4
D
4
4
7
3
1
E
6
6
8
6
7
F
7
8
3
8
6
G
5
8
5
6
1
8
7
5
7
2
H
7
Step 3: Compute Feature Performance
F2 Rank
Class
B
1
0
A
2
1
C
3
1
D
4
0
G
5
1
E
6
0
H
7
0
F
8
0
• Performance measures how
well feature predicts minority
class
• We sort rows by feature rank
and measure performance on
top n%, where n% belong to
minority class
• In this case we evaluate on top
3 rows. Since 2 of 3 are minority
(class=1), performance = .66
8
Step 3 continued
Feature
F1
Performance
0.67
F2
0.67
F3
0.67
F4
0.67
F5
0.00
9
Step 4: Compute Performance of
Feature Combinations
• Let F6 be fused F1F2F3F4F5
F6
Rank
F6
Rank
• Rank combination function is
average of ranks
A
B
2.4
2.8
E
F
6.6
6.4
• Compute rank of F6 for each
record
C
3.2
G
5.0
• Compute performance of F6
as in step 3
D
3.8
H
5.8
10
Step 5: What Combinations to Use?
• Given n features there are 2n – 1 possible
combinations
– C(n,1) + C(n,2) … C(n.n)
– This “fully exhaustive” fusion strategy is practical for
many values of n
• We try other strategies in case not feasible
– k-exhaustive strategy selects k best features and tries
all combinations
– k-fusion strategy uses all n features but fuses at most
k features at once
11
Combinatorial Fusion Table
k-fusion for values of k shown below
Number
Features
1
1
1
2
2
3
3
3
6
7
4
4
10
14
15
5
5
15
25
30
31
6
6
21
41
56
62
7
7
28
63
98
119 126 127
8
8
36
92
162 218 246 254
255
9
9
45
129
255 381 465 501
510
10
10 55
175
385 637 847 967 1012 1022
2
3
4
5
6
7
8
9
10
63
511
1023
12
Combinatorial Fusion Algorithm
• Combinatorial strategy generates features
– Performance metric determines which are best
• Used to determine which k features for k-fusion
• Also used to determine order of features to add
• We add a feature if it leads to a statistically
significant improvement (p ≤ .10)
– As measured on validation data
– This limits the number of features
– But requires a lot of computation
13
Example Run of Algorithm
AUC
Feature
p-value
valid
test
(+ means added)
0.670 0.682
--
{F1,F2,F3,F4,F5}
0.766 0.757
0.001
+F3F4
0.731
0.771 0.774
F1F2
0.063
+F1F3
14
Description of Experiments
• We use Weka’s DT, 1-NN, and Naïve
Bayes methods
• Analyze performance on 10 data sets
– With and without fused features
– Focus on AUC as the main metric
• More appropriate than accuracy especially with
skewed data
• Use 3 combinatorial fusion strategies
– 2-fusion, 3-fusion, and 6-exhaustive
15
Results
Summary Results over all 10 Data Sets
Bayes
DT
1-NN
Strategy
AUC
W-L-D
AUC
W-L-D
AUC
W-L-D
2-fusion
0.009
5-1-4
0.105
7-0-3
0.016
5-4-1
3-fusion
0.009
6-4-0
0.103
8-1-1
0.023
5-4-1
6-exhaustive 0.003
3-6-1
0.115
9-0-1
0.027
6-1-3
Results over 4 Most Skewed Data Sets (< 10% Minority)
Bayes
DT
1-NN
Strategy
AUC
W-L-D
AUC
W-L-D
AUC
W-L-D
2-fusion
0.004
1-1-2
0.195
4-0-0
0.065
4-0-0
3-fusion
0.012
2-2-0
0.185
3-1-0
0.059
3-0-1
6-exhaustive 0.006
1-3-0
0.194
4-0-0
0.062
4-0-0
16
Discussion of Results
• No one of the 3 fusion schemes is clearly best
• The methods seem to help, but the biggest
improvement is clearly with the DT method
– May be explained by traditional DT methods having
limited expressive power
• They can only consider 1 feature at a time
• Can never perfectly learn simple concepts like F1+F2 > 10,
but can with feature-fusion
– Bigger improvement for highly skewed data sets
• Identifying rare cases is difficult and may require looking at
many features in parallel
17
Future Work
• More comprehensive experiments
– More data sets, more skewed data sets, more
combinatorial fusion strategies
• Use of heuristics to more intelligently choose
fused features
– Performance measure now used only to order
– Use of diversity measures
– Avoid building classifier to determine which fused
features to add
• Handle non-numerical features
18
Conclusion
• Showed how a method from information
fusion can be applied to feature
construction
• Results encouraging but more study
needed
• Extending the method should lead to further
improvements
19
Questions?
20
Detailed Results: Accuracy
Dataset
Bayes
w/o
w
Decision Trees
w
1-NN
Diff
w/o
Diff
w/o
w
Diff
bio
98.8 98.8
0.0
99.4 99.4 0.0
99.2
99.2
0.0
letter-a
98.4 98.4
0.0
98.6 98.6 0.0
98.9
98.9
0.0
income
92.0 92.0
0.0
94.5 94.5 0.0
92.4
92.4
0.0
stock
80.4 80.4
0.0
90.3 90.3 0.0
86.3
86.3
0.0
hepatitis 84.0 84.0
0.0
86.2 80.7 -5.6 89.3
89.3
0.0
68.9 75.3
6.5
75.1 75.2 0.1
62.6
75.0 12.4
german 73.2 73.2
0.0
69.5 73.0 3.5
68.1
71.6
crx
70.1 70.1
0.0
60.3 75.1 14.9 60.4
bands
67.0 67.0
0.0
61.4 61.4 0.0
65.3
65.3
0.0
boa1
55.0 57.0
2.0
51.0 56.9 6.0
52.6
57.5
5.0
physics
3.5
73.6 13.2
21
Dataset Strat.
2-F
3-F
bio
6-EX
2-F
letter-a 3-F
6-EX
2-F
income 3-F
6-EX
2-F
3-F
stock
6-EX
2-F
hepatitis 3-F
6-EX
2-F
physics 3-F
6-EX
2-F
german 3-F
6-EX
2-F
3-F
crx
6-EX
2-F
3-F
bands
6-EX
2-F
3-F
boa1
6-EX
w/o
.943
.963
.901
.725
.864
.498
.740
.762
.750
.571
Bayes
w
Diff
.923 -.020
.954
.010
.926 -.017
.963
.000
.960 -.003
.962 -.001
.901
.000
.897 -.004
.900 -.001
.762
.037
.767
.043
.769
.044
.869
.005
.868
.004
.864
.000
.498
.000
.506
.008
.506
.008
.751
.011
.723 -.017
.736 -.004
.762
.000
.779
.017
.755 -.007
.779
.029
.747 -.003
.744 -.006
.596
.024
.602
.031
.589
.018
Decision Trees
w/o w
Diff
.752 .256
.496 .742 .247
.759 .264
.943 .021
.922 .919 -.003
.937 .014
.736 .241
.494 .734 .239
.739 .245
.755 .260
.496 .751 .255
.747 .252
.755 .000
.755 .759 .004
.760 .005
.499 .000
.499 .499 .000
.499 .000
.609 .118
.492 .606 .115
.654 .162
.646 .000
.646 .670 .024
.722 .076
.611 .108
.504 .603 .099
.580 .076
.538 .041
.497 .548 .050
.553 .056
w/o
.499
.937
.593
.524
.819
.504
.609
.639
.655
.515
1-NN
w
Diff
.663 .164
.651 .152
.663 .164
.961 .024
.937 .000
.961 .024
.612 .020
.621 .028
.612 .020
.575 .051
.578 .054
.564 .040
.803 -.016
.826 .007
.821 .002
.504 .000
.495 -.008
.504 .000
.607 -.001
.593 -.015
.609 .000
.653 .014
.673 .034
.667 .028
.559 -.096
.644 -.012
.655 .000
.509 -.005
.509 -.006
.509 -.005
Detailed
Results:
AUC
22