The Self-Organizing Map

Download Report

Transcript The Self-Organizing Map

Self Organization of
a massive document collection
IEEE Transactions on Neural Networks VOL. 11 No.3 3,May
2000
Teuvo Kohonen
Helsinki University of Technology, Espoo, Finland
Presented by Chu Huei-Ming 2004/12/01
Reference
• A Scalable Self-organizing Map Algorithm
for Textual Classification: A Neural Network
Approach to Thesaurus Generation
– Dmitri G. Roussinov
– Hsinchun Chen
– Department of MIS, Karl Eller Graduate School of
Management, University of Arizona
• http://dlist.sir.arizona.edu/archive/00000460/01/
A_Scalable-98.htm
2
Outline
•
•
•
•
•
Introduction
The Self-Organizing Map
Statistical Models of Documents
Rapid Construction of Large Document Maps
Conclusion
3
Introduction
• From simple search to browsing of selforganized data collections
– Document collections could be automatically
organized in some meaningful way.
– The context of the other related documents residing
nearby would then help in understanding the true
meaning
4
Introduction
• SOM can first be computed using any
representative subset of old input data
• New input items can be mapped straight into the
most similar models without
re-computation of the whole mapping
5
The Self-Organizing Map
• Unsupervised-learning neural-network method
• Similarity graph of input data
• Arranged as a regular, usually two-dimensional
grid
6
SOM algorithm
7
SOM algorithm
M1 M2 M3 M 4 M5
……
MI-3 MI-2 MI-1 MI
Wi,j
Map Model Vector
Input Data Vector
N1 N2 N3 N4 ……
NJ
8
SOM algorithm
1. Initialize input nodes, output nodes, and
connection weights
– input vector : top (most frequently occurring) N terms
– output nodes : create a two-dimensional map (grid) of
M
– Initialize weights wij from N input nodes to M output
nodes to small random values
9
SOM algorithm
2.Present each document in order
– Describe each document as an input vector of N
coordinates
3.Compute distance to all nodes
N 1
di   x j (t )  w ji (t ) 
2
(1)
j 0
10
SOM algorithm
• 4.Select winning node i* and update weights
to node
i* and its neighbors
– Update weights to nodes i* and its neighbors to
reduce the distances between them and the input
vector xj(t):
wij (t  1)  wij (t )   (t )x j (t  1)  wij (t ) 
(2)
–  (t ) is an error-adjusting coefficient ( 0   (t )  1 ) that
decreases over time
11
SOM algorithm
•
•
•
•
•
Recursive regression process
t is the index of the regression step
x ( t ) is the presentation of a sample of input x
hc ( x ),i t  is the neighborhood function
mc ( x ) t  is the model (“winner”) that matches best with
mi t  1  mi t   hc ( x ),i t  x t   mi t  
cx   arg min  x-mi
i

 r r 2 
 i c(x) 
hc ( x ),i t    t  exp  
2 2 t  



(3)
(4)
(5)
12
SOM algorithm
•
0   (t )  1
is the learning-rate factor which
decreases monotonically with the regression
steps
•  ( t ) corresponds to the width of the
neighborhood function, which is also decreasing
monotonically with the regression steps
13
Statistical Models of Documents
•
•
•
•
The primitive vector space model
Latent semantic indexing (LSI)
Randomly projection histograms
Histograms on the word category map
14
Rapid Construction of
Large Document Maps
• Fast distance computation
• Estimation of larger maps based on carefully
constructed smaller ones
• Rapid fine-tuning of the large maps
15
Estimation of larger maps
• Estimate good initial values for the model
vectors of a very large map
• Based on the asymptotic values of the model
vectors of a much smaller map
– The superscript d refers to the “dense” lattice
– The superscript s refers to the “sparse” lattice
'( s )
'( s )
'( s )
– If three “sparse” vectors mi , m j , mk do not lie on the
same straight line
mh'( d )   h mi'( s )   h m'j( s )  (1   h   h )mk'( s )
16
Estimation of larger maps
Fig3. Illustration of the relation of the model vectors in a sparse (s, solid lines)
and dense (d, dashed lines) grid.
Here
shall be interpolated in terms of the three closest “sparse” models
17
Rapid fine-tuning of the large maps
• Addressing old winners
– in the meddle of the training process
– the same training input is used again
– the new winner is found at or in the vicinity of the old one
Fig.4 Finding the new winner in the vicinity of the old one, where by the old winner
18
is directly located by a pointer. The pointer is then updated
Performance evaluation of the new methods
• Average quantization error
– The average distance of each input from the closest model
vector
• Classification accuracy
– The separability of different classes of patents on the resulting
map
19
Conclusion
• The graph is suitable for interactive data mining
or exploration tasks.
• A new method of forming statistical models of
documents
• Several new fast computing methods (shortcuts)
20