PosNegNMF-JMPDiscoveryMeeting

Download Report

Transcript PosNegNMF-JMPDiscoveryMeeting

A novel clustering JMP platform using non-negative matrix factorization Paul Fogel Consultant,
1,
1
2,
3
Paul Fogel Yann Gaston-Mathé S. Stanley Young
2
3
Consultant, Paris, France, YGM Consult, Paris, France , CGStat, Raleigh, NC, USA
Abstract
• If the data is non-negative, Non-negative Matrix
Factorization or NMF [1, 2] can be used to
cluster the observations, the variables, or both.
By its nature, NMF clustering is focused on the
large values.
• Our idea is to normalize the data, e.g. by
subtracting the row/column means, and split the
matrix into positive and negative parts. NMF
clustering applied to the concatenated data,
“PosNegNMF” [3], gives equal weight to large
and small values.
Objective
• Powerful clustering approaches
• Intuitive visualizations
• Available in JMP through our platform: inferential &
robust Matrix Factorization, irMF
Click to Demo
YGM Consult &
CGStat
Materials & Methods
• The number of emergency hospital admissions for
cardiovascular disease (CVD), myocardial infarction
(MI), congestive heart failure (CHF), respiratory
disease, and diabetes were collected in 26 US
communities, for the years 2000–2003.
• We considered the normalized matrix of
contingency ratios – similar to the normalization
used by Correspondence Analysis.
• NMF clustering was applied to the positive and
negative parts of the matrix – after taking the
absolute value of the negative elements.
Results
• The NMF screeplots indicate that four components
provide small residual variance and good stability.
• On the heatmap of the reordered rows and columns
of the matrix, the third cluster(rows 12 to 16) is
mostly characterized by a lower number of hospital
admissions due to respiratory disease – illustrating
the role of low values for PosNegNMF clustering.
• The interpretation of the same clustering on the
biplot of the correspondence analysis requires
extensive expertise.
(a)
1
(b)
Figure 1. (a) Residual sum of squares; (b) Row clustering stability
( Click Figures
to Enlarge )
Conclusions
• The heatmap of the reordered rows and
columns of a matrix, when properly
normalized, can add insight to the SVD
clustering produced by Correspondence
Analysis, in particular with respect to the
interpretation of the biplot axes.
Click to view references
Figure 1. (a) Residual sum of squares; (b) Row clustering stability
(a)
1
(b)
Figure 1. (a) Residual sum of squares; (b) Row clustering stability
Click Here to Return
Figure 2. NMF clustering and re-ordering of hospital admissions by city and cause.
Red: High count; Blue: Low count.
Click Here to Return
Figure 3. Correspondence analysis biplot of hospital admissions by city and cause (PosNegNMF
clusters are represented by the city label colors)
Click Here to Return
References
1. Lee, D. D.; Seung, H. S. Learning the Parts of Objects by Non-Negative Matrix Factorization. Nature 1999,
Volume 401, pp. 788–791
2. Fogel, P.; Hawkins, D. M.; Beecher, C.; Luta, G.; Young, S.S. A Tale of Two Matrix Factorizations. The American
Statistician 2013 , Volume 67, no. 4 pp. 207–218
3. P. Fogel, Y. Gaston-Mathé, D. Hawkins, F. Fogel, G. Luta, S.S. Young (2016). Applications of a Novel Clustering
Approach Using Non-Negative Matrix Factorization to Environmental Research in Public Health. International
Journal of Environmental Research and Public Health 2016, 13, x; doi:10.3390/
Click Here to Return
Demo irMF
Click Here to Return