Learning from Imbalanced Data Sets with Boosting and Data

Download Report

Transcript Learning from Imbalanced Data Sets with Boosting and Data

國立雲林科技大學
National Yunlin University of Science and Technology
Adaptive self-organized maps
based on bidirectional approximate reasoning and
its applications to information filtering
Yu Liu
Knowledge-Based Systems, Vol. 19, 2006, pp. 719–729.
Presenter : Wei-Shen Tai
Advisor : Professor Chung-Chian Hsu
2006/11/29
N.Y.U.S.T.
I. M.
Outline







Introduction
BAR inference network and fuzzy similarity
distance
ASOMBAR
ASOMBAR algorithm for information filtering
Information filtering example and results
Conclusion
Comments
N.Y.U.S.T.
I. M.
Motivation

Similarity measuring in SOM

The Euclidean distance supposes that all the
factors are in the same level of importance

The distortion of the large weighted factors
should have more influence on the similarity
measuring than the small weighted ones do.
N.Y.U.S.T.
I. M.
Objective

A novel fuzzy similarity distance

Replace the Euclidean distance to reflect the
real relation more precisely.

Improve the effectiveness of the competitive
process and the cooperative process.
BAR inference network and
fuzzy similarity distance

Similarity measure of BAR

~
~
If any variable X is similar to Xj, then the output Y of the
inference network is to be similar to Yj.


Solves the problem of different pairs input and output data
often include inconsistencies among them in real situations.
Ordered weighted averaging (OWA)

Combines fuzzy membership values in constraint satisfaction
problems, scheduling and group decision making.



Biggest factor
Smallest factor
where λ decides how much the big factor and other smaller factors
in one input vector influence the proposed fuzzy distance.
N.Y.U.S.T.
I. M.
N.Y.U.S.T.
I. M.
ASOMBAR (Adaptive SOMBAR)

Competition:


Cooperation:


The particular neuron with the minimum fuzzy similarity distance
is the declared winner of the competition.
The winning vector determines the spatial location of a topological
neighborhood of excited neurons.
Synaptic adaptation:

Enables the excited neurons to increase their individual values
through suitable adjustments applied to their synaptic weights.
ASOMBAR algorithm for
information filtering


N.Y.U.S.T.
I. M.
Information filtering
 Separating relevant documents and
removing irrelevant documents to meet
the user’s current interest.
ASOMBAR
 Filters available information through the
similarity degrees between the documents
and user profile.
N.Y.U.S.T.
I. M.
Information filtering example

User profile


Described by those high-frequency important words
related to the user’s interest.
Parameter λ


Decides how much the big factor and other factors in
one input vector impact.
Influences the convergence of the distortion errors.
N.Y.U.S.T.
I. M.
Results

ASOMBAR in λ ≧ 0.5


More effective than both GNG and basic SOM.
Topology neighborhoods


ASOMBAR and SOM seem very alike.
ASOMBAR based on fuzzy similarity distance changes
more smoothly than that of SOM does.
N.Y.U.S.T.
I. M.
Conclusions

A new fuzzy similarity distance



Replaces the Euclidean distance inspired by
BAR and OWA theory.
Has paid more attention on the large weighted
element and less emphasis on the small
weighted elements.
Information filtering

Applied to constructs user’s profile and
processes document filtering.
N.Y.U.S.T.
I. M.
Comments

Advantage



Drawback



The idea of non-Euclidean similarity measurement in SOM.
Ordered weighted averaging (OWA) in the similarity
measurement.
Effectiveness can not be improved significantly.
BAR concept is not applied in this study appropriately.
Application

Information filter and applications related to similarity
measurement.