An integrated approach for key gene selection and cancer phenotype classification: Improving diagnosis and prediction.

Journal: Computers in biology and medicine

Published Date: Jul 5, 2025

Abstract

The identification of key features and reliable phenotype classification remains pivotal in cancer research, with direct implications for early diagnosis, prognosis, treatment optimization, and cost reduction in healthcare. This study introduces a hybrid model that integrates statistical and machine learning (ML) algorithms to enhance feature selection and improve classification accuracy for cancer phenotypes. Five well-known statistical tests (LIMMA, SAM, ANOVA, KW-test, and t-test) are employed to identify significant features based on statistical decision markers. The dominant features identified across both binary and multi-class datasets are then used for cancer phenotype classification using various ML methods, including LDA, LR, NB, GPC, KNN, ANN, SVM (with radial, polynomial, linear kernels), and RF. The model's robustness is validated using eight distinct microarray gene expression datasets, combined with various resampling protocols. The results show consistent improvements over previous benchmarks in the literature, with the RF classifier performing better in binary classification tasks and SVM-r demonstrating superior performance in multi-class settings. Additionally, the analysis of the bladder cancer dataset led to the identification of 13 key genes (MYH11, CCN1, FHL1, MYL9, EFEMP1, FILIP1L, RGS2, MATN2, CALD1, TNC, PALLD, ADAMTS9-AS2, and CELF2) that demonstrated strong discriminatory power. These genes were further validated through enrichment in relevant GO terms and KEGG pathways, emphasizing their diagnostic and prognostic significance. Moreover, Gene-TF and Gene-miRNA network analyses highlighted critical regulators, including TFs like CYR61, SMAD4, SOX2, TP63, and AR, along with miRNAs such as hsa-let-7b-5p, hsa-miR-34a-5p, hsa-let-7a-5p, hsa-let-7c-5p, and hsa-miR-16-5p, underscoring the functional impact of the selected features. In conclusion, the proposed approach effectively generates a streamlined set of optimal features, providing valuable biological insights and laying the groundwork for more accurate and effective tools in cancer diagnosis and prediction.

Authors

Md Matiur Rahaman

Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, 310058, China.
Bandhan Sarker

Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, 200240, PR China.
Muhammad Habibulla Alamin

School of Computer Science and Engineering, Central South University, Changsha 410083, Hunan, PR China.
Farzana Ferdousi

Department of Statistics, Faculty of Science, Gopalganj Science and Technology University, Gopalganj, 8100, Bangladesh.

Keywords

No keywords available for this article.

External Resources

View on PubMed Access via DOI PubMed (40618695)

An integrated approach for key gene selection and cancer phenotype classification: Improving diagnosis and prediction.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

An integrated approach for key gene selection and cancer phenotype classification: Improving diagnosis and prediction.

Abstract

Authors

Keywords

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals