Weak relevance

Download Report

Transcript Weak relevance

Feature Selection
Jamshid Shanbehzadeh, Samaneh Yazdani
Department of Computer Engineering, Faculty Of Engineering,
Khorazmi University (Tarbiat Moallem University of Teheran)
1
Outline
Dimension
Reduction
Feature Selection
Application Of
Feature Selection
and Software
2
Outline
Part 1: Dimension Reduction
 Dimension
 Feature Space
 Definition & Goals
 Curse of dimensionality
 Research and Application
 Grouping of dimension reduction
methods
Part 3: Application Of Feature
Selection and Software
Part 2: Feature selection
 Parts of feature set
 Feature Selection Approach
3
Part 1:
Dimension Reduction
4
Dimension Reduction
Dimension
Dimension (Feature or Variable):
A measurement of a certain aspect of an object
Two feature of person:
• weight
• hight
5
Dimension Reduction
Feature Space
Feature Space:
An abstract space where each pattern sample is represented as point
6
Dimension Reduction
Introduction
Large and high-dimensional data
Web documents, etc…
A large amount of resources are needed in
 Information Retrieval
 Classification tasks
 Data Preservation etc…
Dimension Reduction
Dimension Reduction
Definition & Goals
Dimensionality reduction:
The study of methods for reducing the number of dimensions describing the object
General objectives of dimensionality reduction:
Reduce the computational cost
Improve the quality of data for efficient data-intensive processing tasks
8
Dimension Reduction
Definition & Goals
Class 1: overweight
Weight (kg)
Class 2: underweight
60
50
140
150
Height (cm)
Dimension Reduction
preserves information on classification of overweight and underweight as much as
possible
makes classification easier
reduces data size ( 2 features  1 feature )
Dimension Reduction
Curse of dimensionality
As the number of dimension increases, a fix data sample becomes exponentially spars
Example:
Observe that the data become more and more sparse in higher dimensions
Effective solution to the problem of “curse of dimensionality” is:
Dimensionality reduction
10
Dimension Reduction
Research and Application
Why dimension reduction is a subject of much research recently?
Massive data of large dimensionality in:
Knowledge discovery
Text mining
Web mining
and . . .
11
Dimension Reduction
Grouping of dimension reduction methods
Dimensionality reduction approaches include
Feature Selection
Feature Extraction
12
Dimension Reduction
Grouping of dimension reduction methods : Feature Selection
Dimensionality reduction approaches include
Feature Selection: the problem of choosing a small subset of features that ideally are
necessary and sufficient to describe the target concept.
Example
 Feature Set= {X,Y}
 Two Class
Goal: Classification
 Feature X Or Feature Y ?
 Answer: Feature X
13
Dimension Reduction
Grouping of dimension reduction methods : Feature Selection
Feature Selection (FS)
Selects feature
ex.
Preserves weight
14
Dimension Reduction
Grouping of dimension reduction methods
Dimensionality reduction approaches include
Feature Extraction: Create new feature based on transformations or combinations of
the original feature set.
New Feature
 Original Feature
{X1,X2}
15
Dimension Reduction
Grouping of dimension reduction methods
Feature Extraction (FE)
Generates feature
ex.
 Preserves weight / height
16
Dimension Reduction
Grouping of dimension reduction methods
Dimensionality reduction approaches include
Feature Extraction: Create new feature based on transformations or combinations of
the original feature set.
 N: Number of original features
 M: Number of extracted features
 M<N
17
Dimension Reduction
Question: Feature Selection Or Feature Extraction
Feature Selection Or Feature Extraction
It is depend on the problem. Example
 Pattern recognition: problem of dimensionality reduction is to extract a
small set of features that recovers most of the variability of the data.
 Text mining: problem is defined as selecting a small subset of words or
terms (not new features that are combination of words or terms).
 Image Compression: problem is finding the best extracted features to
describe the image
18
Part 2:
Feature selection
19
Feature selection
 Thousands to millions of low level features: select the most relevant one to build
better, faster, and easier to understand learning machines.
n
m
N
X
20
Feature selection
Parts of feature set
 Irrelevant OR Relevant
Three disjoint categories of features:
 Irrelevant
 Weakly Relevant
 Strongly Relevant
21
Feature selection
Parts of feature set
 Irrelevant OR Relevant
Goal: Classification
Two Class : {Lion and Deer}
We use some features to classify a new instance
To which class does
this animal belong
22
Feature selection
Parts of feature set
 Irrelevant OR Relevant
Goal: Classification
Two Class : {Lion and Deer}
We use some feature to classify a new instance
So, number of legs is
irrelevant feature
 Feature 1: Number of legs
Q: Number of legs?
A: 4
23
Feature selection
Parts of feature set
 Irrelevant OR Relevant
Goal: Classification
Two Class : {Lion and Deer}
We use some features to classify a new instance
So, Color is an
irrelevant feature
 Feature 1: Number of legs
 Feature 2: Color
Q: What is its color?
A:
24
Feature selection
Parts of feature set
 Irrelevant OR Relevant
Goal: Classification
Two Class : {Lion and Deer}
We use some features to classify a new instance
 Feature 1: Number of legs
 Feature 2: Color
 Feature 3: Type of food
So, Feature 3 is a
relevant feature
Q: What does it eat?
A: Grass
25
Feature selection
Parts of feature set
 Irrelevant OR Relevant
Goal: Classification
Three Class : {Lion, Deer and Leopard}
We use some features to classify a new instance
To which class does
this animal belong
26
Feature selection
Parts of feature set
 Irrelevant OR Relevant
Goal: Classification
Three Class : {Lion, Deer and Leopard}
We use some features to classify a new instance
So, number of legs is
an irrelevant feature
 Feature 1: Number of legs
Q: Number of legs?
A: 4
27
Feature selection
Parts of feature set
 Irrelevant OR Relevant
Goal: Classification
Three Class : {Lion, Deer and Leopard}
We use some features to classify a new instance
So, Color is a relevant
feature
 Feature 1: Number of legs
 Feature 2: Color
Q: What is its color?
A:
28
Feature selection
Parts of feature set
 Irrelevant OR Relevant
Goal: Classification
Three Class : {Lion and Deer and Leopard}
We use some features to classify a new instance
 Feature 1: Number of legs
 Feature 2: Color
 Feature3: Type of food
So, Feature 3 is a
relevant feature
Q: What does it eat?
A: meat
29
Feature selection
Parts of feature set
 Irrelevant OR Relevant
Goal: Classification
Three Class : {Lion and Deer and Leopard}
We use some feature to classify a new instance
 Feature 1: Number of legs
 Feature 2: Color
 Feature3: Type of food
 Add new feature: Felidae
It is weakly relevant feature
 Optimal set: {Color, Type of food} Or {Color, Felidae}
30
Feature selection
Parts of feature set
 Irrelevant OR Relevant
 Traditionally, feature selection research has focused on searching for relevant features.
Irrelevant
Relevant
Feature set
31
Feature selection
Parts of feature set
 Irrelevant OR Relevant: An Example for the Problem
F1 F2 F3 F4 F5
0 0 1 0 1
0 1 0 0 1
1 0 1 0 1
C
0
1
1
1
0
0
1
0
1
0
1
0
0
1
1
1
0
0
1
0
1
1
1
0
1
1
0
1
1
0
0
1
1
Data set
Five Boolean features
C = F1∨F2
F3 = ┐F2 , F5 = ┐F4
Optimal subset:
{F1, F2} or {F1, F3}
32
Feature selection
Parts of feature set
 Irrelevant OR Relevant
Formal Definition 1 (Irrelevance) :
 Irrelevance indicates that the feature is not necessary at all.
 In previous Example:
 F4, F5 irrelevance
F4 and F5
Relevant
33
Feature selection
Parts of feature set
 Irrelevant OR Relevant
Definition1(Irrelevance) A feature Fi is irrelevant if
Irrelevance indicates that the feature is not necessary at all
 F be a full set of features
 Fi a feature
 Si = F −{Fi}.
34
Feature selection
Parts of feature set
 Irrelevant OR Relevant
Categories of relevant features:
 Strongly Relevant
 Weakly Relevant
Irrelevant
Relevant
Weakly
Strongly
35
Feature selection
Parts of feature set
 Irrelevant OR Relevant: An Example for the Problem
F1 F2 F3 F4 F5
0 0 1 0 1
0 1 0 0 1
1 0 1 0 1
C
0
1
1
1
0
0
1
0
1
0
1
0
0
1
1
1
0
0
1
0
1
1
1
0
1
1
0
1
1
0
0
1
1
Data set
Five Boolean features
C = F1∨F2
F3 = ┐F2 , F5 = ┐F4
36
Feature selection
Parts of feature set
 Irrelevant OR Relevant
Formal Definition2 (Strong relevance) :
 Strong relevance of a feature indicates that the feature is always necessary for an
optimal subset
 It cannot be removed without affecting the original conditional class distribution.
 In previous Example:
 Feature F1 is strongly relevant
F4 and F5
Weakly
F1
37
Feature selection
Parts of feature set
 Irrelevant OR Relevant
Definition 2 (Strong relevance) A feature Fi is strongly relevant if
Strong relevance of a feature cannot be removed without affecting the original
conditional class distribution
38
Feature selection
Parts of feature set
 Irrelevant OR Relevant
Formal Definition 3 (Weak relevance) :
 Weak relevance suggests that the feature is not always necessary but may become
necessary for an optimal subset at certain conditions.
 In previous Example:
 F2, F3 weakly relevant
F4 and F5
F2 and F3
F1
39
Feature selection
Parts of feature set
 Irrelevant OR Relevant
Definition 3 (Weak relevance) A feature Fi is weakly relevant if
Weak relevance suggests that the feature is not always necessary but may become
necessary for an optimal subset at certain conditions.
40
Feature selection
Parts of feature set
 Optimal Feature Subset
 Example:
In order to determine the target concept (C=g(F1, F2)):
 F1 is indispensable
 One of F2 and F3 can be disposed
 Both F4 and F5 can be discarded.
optimal subset: Either {F1, F2} or {F1, F3}
 The goal of feature selection is to find either of them.
41
Feature selection
Parts of feature set
 Optimal Feature Subset
optimal subset: Either {F1, F2} or {F1, F3}
Conclusion
An optimal subset should include all strongly relevant features, none of irrelevant
features, and a subset of weakly relevant features.
which of weakly relevant features should
be selected and which of them removed
42
Feature selection
Parts of feature set
 Redundancy
Solution
 Defining Feature Redundancy
43
Feature selection
Parts of feature set
 Redundancy
Redundancy
It is widely accepted that two features are redundant to each other if their values
are completely correlated
 In previous Example:
 F2 , F3 (
F2  F3 )
44
Feature selection
Parts of feature set
 Redundancy
Markov blanket
It used when one feature is correlated with a set of features.
Given a feature Fi, let M i  F
( Fi  M i ) ,Mi is said to be a Markov blanket for Fi if
The Markov blanket condition requires that Mi subsume not only the information that Fi
has about C, but also about all of the other features.
45
Feature selection
Parts of feature set
 Redundancy
 Redundancy definition further divides weakly relevant features into redundant and
non-redundant ones.
Irrelevant
II
Weakly III
Strongly
II : Weakly relevant and redundant features
III: Weakly relevant but non-redundant features
Optimal Subset: Strongly relevant features +Weakly relevant but non-redundant features
46
Feature selection
Approaches
Subset Evaluation
Feature selection
Approach
New Framework
Individual evaluation
47
Feature selection
Approaches : Subset Evaluation (Feature Subset Selection )
Framework of feature selection via subset evaluation
48
Feature selection
Approaches : Subset Evaluation (Feature Subset Selection )
Subset Generation
1
Original
Feature Set
2
Subset
Generation
Evaluation
Generates subset
of features for
evaluation
Goodness of
the subset
Can start with:
•no features
•all features
•random subset
of features
No
Stopping
Criterion
3
Yes
Validation
4
49
Feature selection
Approaches : Subset Evaluation (Feature Subset Selection )
Subset search method -Exhaustive Search Example
Examine all combinations of feature subset.
Example:
{f1,f2,f3} => { {f1},{f2},{f3},{f1,f2},{f1,f3},{f2,f3},{f1,f2,f3} }
Order of the search space O(2d), d - # feature.
Optimal subset is achievable.
Too expensive if feature space is large.
50
Feature selection
Approaches : Subset Evaluation (Feature Subset Selection )
Subset Evaluation
Original
Feature Set
1
Measures the
goodness of
the subset
2
Subset
Generation
Evaluation
Goodness of
the subset
No
Stopping
Criterion
3
Compares with
the previous
best subset
if found better,
then replaces
the previous
best subset
Yes
Validation
4
51
Feature selection
Approaches : Subset Evaluation (Feature Subset Selection )
Subset Evaluation
Each feature and feature subset needs to be evaluated based on importance by a
criterion.
The existing feature selection algorithms, based on criterion functions used in
searching for informative features can be generally categorized as:
 Filter model
Wrapper model
 Embedded methods
Note: Different criteria may select different features.
52
Feature selection
Approaches : Subset Evaluation (Feature Subset Selection )
Filter
The filter approach utilizes the data alone to decide which features should be kept,
without running the learning algorithm.
The filter approach basically pre-selects the features, and then applies the selected
feature subset to the clustering algorithm.
Evaluation function <> Classifier
Ignored effect of selected subset on the performance of classifier.
53
Feature selection
Approaches : Subset Evaluation (Feature Subset Selection )
Filter (1)- Independent Criterion
 Some popular independent criteria are
 Distance measures (Euclidean distance measure).
 Information measures (Entropy, Information gain, etc.)
 Dependency measures (Correlation coefficient)
 Consistency measures
54
Feature selection
Approaches : Subset Evaluation (Feature Subset Selection )
Wrappers
In wrapper methods, the performance of a learning algorithm is used to evaluate the
goodness of selected feature subsets.
 Evaluation function = classifier
 Take classifier into account.
55
Feature selection
Approaches : Subset Evaluation (Feature Subset Selection )
Wrappers (2)
 Wrappers utilize a learning machine as a “black box” to score subsets of features
according to their predictive power.
56
Feature selection
Approaches : Subset Evaluation (Feature Subset Selection )
Filters vs. Wrappers
Filters
 Advantages
 Fast execution: Filters generally involve a non-iterative computation on the
dataset, which can execute much faster than a classifier training session
 Generality: Since filters evaluate the intrinsic properties of the data, rather than
their interactions with a particular classifier, their results exhibit more generality:
the solution will be “good” for a larger family of classifiers
Disadvantages
 The main disadvantage of the filter approach is that it totally ignores the effects
of the selected feature subset on the performance of the induction algorithm
57
Feature selection
Approaches : Subset Evaluation (Feature Subset Selection )
Filters vs. Wrappers
Wrappers
 Advantages
 Accuracy: wrappers generally achieve better recognition rates than filters
since they are tuned to the specific interactions between the classifier and the
dataset
Disadvantages
 Slow execution: since the wrapper must train a classifier for each feature
subset (or several classifiers if cross-validation is used), the method can become
infeasible for computationally intensive methods
 Lack of generality: the solution lacks generality since it is tied to the bias of
the classifier used in the evaluation function.
58
Feature selection
Approaches : Subset Evaluation (Feature Subset Selection )
Embedded methods
All
features
Train
Train
SVM
SVM
Eliminate
Eliminate
useless
useless
feature(s)
feature(s)
Performance
degradation?
Yes,
stop!
No,
continue…
Recursive Feature Elimination (RFE) SVM. Guyon-Weston, 2000. US patent 7,117,188
Feature selection
Approaches : Subset Evaluation (Feature Subset Selection )
Stopping Criterion
Original
Feature Set
Based on Generation
rcdure:
1
2
•Pre-defined number of features
•Pre-defined number of iterations
Subset
Generation
Evaluation
Goodness of
the subset
Based on Evaluation
Function:
•whether addition or deletion of a
feature does not produce a better
subset
•whether optimal subset based on
some evaluation function is
achieved
No
Stopping
Criterion
3
Yes
Validation
4
60
Feature selection
Approaches : Subset Evaluation (Feature Subset Selection )
Result Validation
1
Original
Feature Set
2
Subset
Generation
Evaluation
Basically not part of the feature
selection process itself
- compare results with already
established results or results from
competing feature selection
methods
Goodness of
the subset
No
Stopping
Criterion
3
Yes
Validation
4
61
Feature selection
Approaches : Subset Evaluation (Feature Subset Selection )
Subset Evaluation: Advantage
 A feature subset selected by this approach approximates the optimal subset:
Irrelevant
II
Weakly III
Strongly
II : Weakly relevant and redundant features
III: Weakly relevant but non-redundant features
Optimal Subset: Strongly relevant features +Weakly relevant but non-redundant features
62
Feature selection
Approaches : Subset Evaluation (Feature Subset Selection )
Subset Evaluation: Disadvantages
High computational cost of the subset search makes subset evaluation approach
inefficient for high dimensional data.
63
Feature selection
Approaches
Subset Evaluation
Feature selection
Approach
New Framework
Individual evaluation
64
Feature selection
Approaches : Individual Evaluation (Feature Weighting/Ranking)
Individual method (Feature Ranking / Feature weighting)
Individual methods evaluate each feature individually according to a criterion.
They then select features, which either satisfy a condition or are top-ranked.
 Exhaustive, greedy and random searches are subset search methods because they
evaluate each candidate subset.
65
Feature selection
Approaches : Individual Evaluation (Feature Weighting/Ranking)
Individual Evaluation: Advantage
 linear time complexity in terms of dimensionality N.
 Individual method is efficient for high-dimensional data.
66
Feature selection
Approaches : Individual Evaluation (Feature Weighting/Ranking)
Individual Evaluation: Disadvantages
It is incapable of removing redundant features.
For high-dimensional data which may contain a large number of redundant features,
this approach may produce results far from optimal.
Irrelevant
II Weakly III
Strongly
Select= Weakly + Strongly
67
Feature selection
Approaches
Subset Evaluation
Feature selection
Approach
New Framework
Individual evaluation
68
Feature selection
New Framework
New Framework
New framework of feature selection composed of two steps:
First Step (Relevance analysis): determines the subset of relevant features by
removing irrelevant ones.
Second Step (redundancy analysis): determines and eliminates redundant features
from relevant ones and thus produces the final subset.
69
Part 3:
Applications of Feature Selection
And
Software
70
Feature selection
Applications of Feature Selection
71
Feature selection
Applications of Feature Selection
 Text categorization: Importance


 Information explosive
 80% information stored in text documents:
journals, web pages, emails...
 Difficult to extract special information
 Current technologies...
Internet
Feature selection
Applications of Feature Selection
 Text categorization
Assigning documents to a fixed set of categories
sports
News
article
cultures
categorizer
health
politics
economics
vacations
73
Feature selection
Applications of Feature Selection
 Text categorization
Text-Categorization
Documents are represented by a vector of dimension the size of the vocabulary
containing word frequency counts
Vocabulary ~ 15.000 words (i.e. each document is represented by a 15.000dimensional vector)
Typical tasks:
- Automatic sorting of documents into web-directories
- Detection of spam-email
74
Feature selection
Applications of Feature Selection
 Text categorization
Major characteristic, or difficulty of text categorization:
High dimensionality of the feature space
 Goal: Reduce the original feature space without sacrificing categorization accuracy
75
Feature selection
Applications of Feature Selection
 Image retrieval
Importance: Rapid increase of the size and amount of image collections from both
civilian and military equipments
Problem: Cannot access to or make use of the information unless it is organized.
Content-based image retrieval: Instead of being manually annotated by
text-based keywords, images would be indexed by their own visual contents (features),
such as color, texture, shape, etc.
One of the biggest problems to make content-based image retrieval truly
scalable to large size image collections is still the “curse of dimensionality
76
Feature selection
Applications of Feature Selection
 Image retrieval
Paper: ReliefF Based Feature Selection In Content-Based Image Retrieval
A. sarrafzadeh, Habibollah Agh Atabay, Mir Mosen Pedram, Jamshid
Shanbehzadeh
Image dataset: Coil-20 contains :
1440 grayscale pictures from 20 classes of objects.
77
Feature selection
Applications of Feature Selection
 Image retrieval
In this paper They use :
Legendre moments to extract features
ReliefF algorithm to select the most relevant and non-redundant features
Support vector machine to classify images.
The effects of features on classification accuracy
78
Feature selection
Weka Software: What we can do with ?
Weka is a piece of software, written in Java, that provides an array of machine
learning tools, many of which can be used for data mining
Pre-processing data
Features selection
Features extraction
Regression
Classify data
Clustering data
Associate rules
More functions
Create random data set
Connect data sets in other formats
Visualize data
…….
References
[1] M. Dash and H.Liu, “Dimensionality Reduction, in Encyclopedia of Computer
Science and Engineering,” John Wiley & Sons, Inc 2,958-966, 2009.
[2]H. Liu and L. Yu, "Toward Integrating Feature Selection Algorithms for Classification and
Clustering", presented at IEEE Trans. Knowl. Data Eng, vol. 17, no.4, pp.491-502, 2005.
[3]I.Guyon and A.Elisseeff, "An introduction to variable and feature selection", Journal of
Machine Learning Research 3, 1157–1182, 2003.
[4] L. Yu and H. Liu, “Efficient Feature Selection via Analysis of Relevance and Redundancy",
presented at Journal of Machine Learning Research, vol. 5, pp.1205-1224, 2004.
[5] H. Liu, and H. Motoda, "Computational methods of feature selection", Chapman and
Hall/CRC Press, 2007.
[6] I.Guyon, Lecture 2: Introduction to Feature Selection.
[7] M.Dash and H.liu, Feature selection for classification.
80
References
[8] Makoto Miwa, A Survey on Incremental Feature Extraction
[9] Lei Yu, Feature Selection and Its Application in Genomic Data Analysis
81