Development of a two-stage gene selection method that incorporates a novel hybrid approach using the cuckoo optimization algorithm and harmony search for cancer classification.

Journal: Journal of biomedical informatics
PMID:

Abstract

For each cancer type, only a few genes are informative. Due to the so-called 'curse of dimensionality' problem, the gene selection task remains a challenge. To overcome this problem, we propose a two-stage gene selection method called MRMR-COA-HS. In the first stage, the minimum redundancy and maximum relevance (MRMR) feature selection is used to select a subset of relevant genes. The selected genes are then fed into a wrapper setup that combines a new algorithm, COA-HS, using the support vector machine as a classifier. The method was applied to four microarray datasets, and the performance was assessed by the leave one out cross-validation method. Comparative performance assessment of the proposed method with other evolutionary algorithms suggested that the proposed algorithm significantly outperforms other methods in selecting a fewer number of genes while maintaining the highest classification accuracy. The functions of the selected genes were further investigated, and it was confirmed that the selected genes are biologically relevant to each cancer type.

Authors

  • V Elyasigomari
    School of Engineering and Materials Science, Queen Mary University of London, London E1 4NS, United Kingdom.
  • D A Lee
    School of Engineering and Materials Science, Queen Mary University of London, London E1 4NS, United Kingdom.
  • H R C Screen
    School of Engineering and Materials Science, Queen Mary University of London, London E1 4NS, United Kingdom.
  • M H Shaheed
    School of Engineering and Materials Science, Queen Mary University of London, London E1 4NS, United Kingdom. Electronic address: m.h.shaheed@qmul.ac.uk.