Cell fishing: A similarity based approach and machine learning strategy for multiple cell lines-compound sensitivity prediction.

Journal: PloS one
Published Date:

Abstract

The prediction of cell-lines sensitivity to a given set of compounds is a very important factor in the optimization of in-vitro assays. To date, the most common prediction strategies are based upon machine learning or other quantitative structure-activity relationships (QSAR) based approaches. In the present research, we propose and discuss a straightforward strategy not based on any learning modelling but exclusively relying upon the chemical similarity of a query compound to reference compounds with annotated activity against cell lines. We also compare the performance of the proposed method to machine learning predictions on the same problem. A curated database of compounds-cell lines associations derived from ChemBL version 22 was created for algorithm construction and cross-validation. Validation was done using 10-fold cross-validation and testing the models on new data obtained from ChemBL version 25. In terms of accuracy, both methods perform similarly with values around 0.65 across 750 cell lines in 10-fold cross-validation experiments. By combining both methods it is possible to achieve 66% of correct classification rate in more than 26000 newly reported interactions comprising 11000 new compounds. A Web Service implementing the described approaches (both similarity and machine learning based models) is freely available at: http://bioquimio.udla.edu.ec/cellfishing.

Authors

  • E Tejera
    Ingeniería en Biotecnología, Facultad de Ingeniería y Ciencias Aplicadas, Universidad de Las Américas, Quito, Ecuador.
  • I Carrera
    Departamento de Informática y Ciencias de la Computación, Escuela Politécnica Nacional, Quito, Ecuador.
  • Karina Jimenes-Vargas
    Ingeniería en Biotecnología, Facultad de Ingeniería y Ciencias Aplicadas, Universidad de Las Américas, Quito, Ecuador.
  • V Armijos-Jaramillo
    Ingeniería en Biotecnología, Facultad de Ingeniería y Ciencias Aplicadas, Universidad de Las Américas, Quito, Ecuador.
  • A Sánchez-Rodríguez
    Grupo de Bio-Quimioinformática, Universidad de Las Américas, Quito, Ecuador.
  • M Cruz-Monteagudo
    Center for Computational Science (CCS), University of Miami (UM), Miami, FL, United States of America.
  • Y Perez-Castillo
    Grupo de Bio-Quimioinformática, Universidad de Las Américas, Quito, Ecuador.