ieGENES: A machine learning method for selecting differentially expressed genes in cancer studies.

Journal: Journal of biomedical informatics
PMID:

Abstract

Gene selection is crucial for cancer classification using microarray data. In the interests of improving cancer classification accuracy, in this paper, we developed a new wrapper method called ieGENES for gene selection. First we proposed a parsimonious kernel machine regularization (PKMR) model by using ridge regularization in kernel machine driven classification to tackle multi-collinearity for the sake of stable estimates in high-dimensional settings. Then the ieGENES algorithm was developed to optimally identify relevant genes while iteratively eliminating redundant ones based on leave-one-out cross-validation accuracy. In particular, we developed a new methodology to optimally update model parameters upon gene removal. The ieGENES algorithm was evaluated on six cancer microarray datasets and compared to existing methods. Classification accuracy and number of differentially expressed genes (DEGs) identified were assessed. In terms of gene selection accuracy, the ieGENESoutperformed multiple wrapper methods on 5 out of 6 datasets (Colon, Leukemia, Hepato, Glioma, and Breast Cancers), with statistically significant improvements (p<0.001). For the Colon dataset, ieGENES achieved 96.21% accuracy with 167 DEGs. The proposed ieGENES technique demonstrated superior performance in identifying DEGs for cancer diagnosis comparing with existing techniques. It offers a promising tool for identifying biologically relevant genes in microarray data analysis and biomarker discovery for cancer research.

Authors

  • Xiao-Lei Xia
    Centre for Cancer Research and Cell Biology, School of Medicine, Dentistry and Biomedical Sciences, Queen's University Belfast, BT9 7BL, UK. Electronic address: xxia01@qub.ac.uk.
  • Shang-Ming Zhou
    Institute of Life Science, College of Medicine, Swansea University, Swansea, United Kingdom.
  • Yunguang Liu
    Affiliated Hospital of Youjiang Medical University for Nationalities, Department of Pediatrics, Baise, PR China. Electronic address: ygliu02@gmail.com.
  • Na Lin
    Affiliated Hospital of Youjiang Medical University for Nationalities, Department of Pediatrics, Baise, PR China. Electronic address: jxee18@163.com.
  • Ian M Overton
    Health Data Research Wales and Northern Ireland, Queen's University Belfast, 97 Lisburn Road, Belfast BT9 7AE, UK; The Patrick G Johnston Centre for Cancer Research, Queen's University Belfast, 97 Lisburn Road, Belfast BT9 7AE, UK. Electronic address: i.overton@qub.ac.uk.