Sparse Generative Topographic Mapping for Both Data Visualization and Clustering.
Journal:
Journal of chemical information and modeling
Published Date:
Dec 24, 2018
Abstract
To achieve simultaneous data visualization and clustering, the method of sparse generative topographic mapping (SGTM) is developed by modifying the conventional GTM algorithm. While the weight of each grid point is constant in the original GTM, it becomes a variable in the proposed SGTM, enabling data points to be clustered on two-dimensional maps. The appropriate number of clusters is determined by optimization based on the Bayesian information criterion. Analysis of numerical simulation data sets along with quantitative structure-property relationship and quantitative structure-activity relationship data sets confirmed that the proposed SGTM provides the same degree of visualization performance as the original GTM and clusters data points appropriately. Python and MATLAB codes for the proposed algorithm are available at https://github.com/hkaneko1985/gtm-generativetopographicmapping .