Unraveling druggable cancer-driving proteins and targeted drugs using artificial intelligence and multi-omics analyses.

Journal: Scientific reports
PMID:

Abstract

The druggable proteome refers to proteins that can bind to small molecules with appropriate chemical affinity, inducing a favorable clinical response. Predicting druggable proteins through screening and in silico modeling is imperative for drug design. To contribute to this field, we developed an accurate predictive classifier for druggable cancer-driving proteins using amino acid composition descriptors of protein sequences and 13 machine learning linear and non-linear classifiers. The optimal classifier was achieved with the support vector machine method, utilizing 200 tri-amino acid composition descriptors. The high performance of the model is evident from an area under the receiver operating characteristics (AUROC) of 0.975 ± 0.003 and an accuracy of 0.929 ± 0.006 (threefold cross-validation). The machine learning prediction model was enhanced with multi-omics approaches, including the target-disease evidence score, the shortest pathways to cancer hallmarks, structure-based ligandability assessment, unfavorable prognostic protein analysis, and the oncogenic variome. Additionally, we performed a drug repurposing analysis to identify drugs with the highest affinity capable of targeting the best predicted proteins. As a result, we identified 79 key druggable cancer-driving proteins with the highest ligandability, and 23 of them demonstrated unfavorable prognostic significance across 16 TCGA PanCancer types: CDKN2A, BCL10, ACVR1, CASP8, JAG1, TSC1, NBN, PREX2, PPP2R1A, DNM2, VAV1, ASXL1, TPR, HRAS, BUB1B, ATG7, MARK3, SETD2, CCNE1, MUTYH, CDKN2C, RB1, and SMARCA4. Moreover, we prioritized 11 clinically relevant drugs targeting these proteins. This strategy effectively predicts and prioritizes biomarkers, therapeutic targets, and drugs for in-depth studies in clinical trials. Scripts are available at https://github.com/muntisa/machine-learning-for-druggable-proteins .

Authors

  • Andrés López-Cortés
    Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Mariscal Sucre Avenue, Quito, 170129, Ecuador. aalc84@gmail.com.
  • Alejandro Cabrera-Andrade
    RNASA-IMEDIR, Computer Science Faculty, University of Coruna, Coruna, 15071, Spain.
  • Gabriela Echeverría-Garcés
    Centro de Referencia Nacional de Genómica, Secuenciación y Bioinformática, Instituto Nacional de Investigación en Salud Pública "Leopoldo Izquieta Pérez", Quito, Ecuador.
  • Paulina Echeverría-Espinoza
    Cancer Research Group (CRG), Faculty of Medicine, Universidad de Las Américas, Quito, Ecuador.
  • Micaela Pineda-Albán
    Cancer Research Group (CRG), Faculty of Medicine, Universidad de Las Américas, Quito, Ecuador.
  • Nicole Elsitdie
    Cancer Research Group (CRG), Faculty of Medicine, Universidad de Las Américas, Quito, Ecuador.
  • José Bueno-Miño
    Cancer Research Group (CRG), Faculty of Medicine, Universidad de Las Américas, Quito, Ecuador.
  • Carlos M Cruz-Segundo
    RNASA-IMEDIR, Computer Science Faculty, University of A Coruna, A Coruña, Spain.
  • Julian Dorado
    Information and Communications Technologies Department, Faculty of Computer Science, University of A Coruna, Campus de Elviña s/n, 15071 A Coruña, Spain. Electronic address: julian@udc.es.
  • Alejandro Pazos
    Department of Computer Science and Information Technologies, Faculty of Computer Science, CITIC-Research Center of Information and Communication Technologies, Universidade da Coruña, A Coruña, Spain.
  • Humberto Gonzáles-Díaz
    Department of Organic Chemistry II, University of the Basque Country UPV/EHU, Leioa 48940, Biscay, Spain.
  • Yunierkis Pérez-Castillo
  • Eduardo Tejera
  • Cristian R Munteanu
    Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruna, Campus de Elviña s/n, 15071, A Coruña, Spain, phone/fax: +34-981167000/+34-981167160. crm.publish@gmail.com.