Machine learning algorithm for feature space clustering of mixed data with missing information based on molecule similarity.
Journal:
Journal of biomedical informatics
Published Date:
Nov 15, 2021
Abstract
Clustering Algorithms have just fascinated significant devotion in machine learning applications owing to their great competence. Nevertheless, the existing algorithms quite have approximately disputes that need to be further deciphered. For example, most existing algorithms transform one type of feature into another type, which disregards the explicit possessions of information. In addition, most of them deliberate whole features, which may lead to difficulty in calculation and effect in sub-optimal presentation. To address the above difficulties, this paper proposes a novel technique for clustering categorical and numerical features based on feature space clustering of mixed data with missing information (FSCMMI). The procedure involves three stages. Initially, FSCMMI divides the given dataset depending on missing information in instances and features types. The second stage uses the decision-tree procedure to identify the association between instances. Finally, the third stage is used for computing the closeness measure for numerical features and categorical features. Meanwhile, we propose a new training algorithm to cluster mixed datasets. Extensive experimental results on benchmark datasets show that the proposed FSCMMI outperforms several state-of-art clustering methods in terms of accuracy and efficiency.