ieGENES: A machine learning method for selecting differentially expressed genes in cancer studies.
Journal:
Journal of biomedical informatics
PMID:
40024422
Abstract
Gene selection is crucial for cancer classification using microarray data. In the interests of improving cancer classification accuracy, in this paper, we developed a new wrapper method called ieGENES for gene selection. First we proposed a parsimonious kernel machine regularization (PKMR) model by using ridge regularization in kernel machine driven classification to tackle multi-collinearity for the sake of stable estimates in high-dimensional settings. Then the ieGENES algorithm was developed to optimally identify relevant genes while iteratively eliminating redundant ones based on leave-one-out cross-validation accuracy. In particular, we developed a new methodology to optimally update model parameters upon gene removal. The ieGENES algorithm was evaluated on six cancer microarray datasets and compared to existing methods. Classification accuracy and number of differentially expressed genes (DEGs) identified were assessed. In terms of gene selection accuracy, the ieGENESoutperformed multiple wrapper methods on 5 out of 6 datasets (Colon, Leukemia, Hepato, Glioma, and Breast Cancers), with statistically significant improvements (p<0.001). For the Colon dataset, ieGENES achieved 96.21% accuracy with 167 DEGs. The proposed ieGENES technique demonstrated superior performance in identifying DEGs for cancer diagnosis comparing with existing techniques. It offers a promising tool for identifying biologically relevant genes in microarray data analysis and biomarker discovery for cancer research.