An efficient framework based on local multi-representatives and noise-robust synthetic example generation for self-labeled semi-supervised classification.
Journal:
Neural networks : the official journal of the International Neural Network Society
PMID:
39889375
Abstract
While self-labeled methods can exploit unlabeled and labeled instances to train classifiers, they are also restricted by the labeled instance number and distribution. SEG-SSC, k-means-SSC, LC-SSC, and LCSEG-SSC are sophisticated solutions for overcoming the above restrictions. However, when some classes overlap, they suffer from the following challenging technical defects: (a) they fail to effectively improve the labeled instance distribution by identifying a single local representative in a cluster; (b) they have a low accuracy or a high degree of manual interference for predicting identified unlabeled local representatives; and (c) they fail to effectively improve the labeled instance number due to noise generation. To address the above issues, a framework based on local multi-representatives and noise-robust synthetic example generation (LMR-NRSEG-SSC) is proposed for self-labeled semi-supervised classification. First, a newly proposed local multi-representatives search algorithm with multi-granularity ideas is used to partition labeled and unlabeled instances into independent clusters and identify unlabeled local multi-representatives in each cluster. Second, a newly proposed divide-and-conquer self-labeling is used to predict unlabeled local multi-representatives, with the goal of improving the labeled instance distribution. Third, a newly proposed noise-robust oversampling technique based on local multi-representatives is used to create safe labeled synthetic instances with little noise, with the goal of improving the labeled instance number. Finally, almost any algorithm of the self-labeled methods can be performed on improved labeled and unlabeled instances to train classifiers with effects. Experiments demonstrated that LMR-NRSEG-SSC outperformed 7 sophisticated self-labeled frameworks in improving 2 advanced self-labeled methods on extensive benchmark datasets.