Filter feature selectors in the development of binary QSAR models.

Journal: SAR and QSAR in environmental research
Published Date:

Abstract

The application of machine learning methods to the construction of quantitative structure-activity relationship models is a complex computational problem in which dimensionality reduction of the representation of the molecular structure plays a fundamental role in predicting a target activity. The feature selection pre-processing approach has been indicated to be effective in dimensionality reduction for building simpler and more understandable models. In this paper, a performance comparative study of 13 state-of-the-art feature selection filter methods is conducted. Structure-activity relationship models are constructed using three widely used classifiers and a diverse collection of datasets. The comparative study utilizes robust statistical tests to compare the algorithms. According to the experimental results, there are substantial differences in performance among the evaluated feature selection methods. The methods that exhibit the best performance are correlation-based feature selection, fast clustering-based feature selection and the set cover method.

Authors

  • G Cerruela García
    a Department of Computing and Numerical Analysis , University of Córdoba, Campus de Rabanales, Albert Einstein Building , E-14071 Córdoba , Spain.
  • J Pérez-Parras Toledano
    a Department of Computing and Numerical Analysis , University of Córdoba, Campus de Rabanales, Albert Einstein Building , E-14071 Córdoba , Spain.
  • A de Haro García
    a Department of Computing and Numerical Analysis , University of Córdoba, Campus de Rabanales, Albert Einstein Building , E-14071 Córdoba , Spain.
  • N García-Pedrajas
    a Department of Computing and Numerical Analysis , University of Córdoba, Campus de Rabanales, Albert Einstein Building , E-14071 Córdoba , Spain.