Optimal Descriptor Subset Search via Chemical Information and Target Activity-Guided Algorithm for Antimicrobial Peptide Prediction.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

Antimicrobial peptides (AMPs) have emerged as a promising alternative to conventional drugs due to their potential applications in combating multidrug-resistant pathogens. Various computational approaches have been developed for AMP prediction, ranging from shallow learning methods to advanced deep learning techniques. Additionally, the performance of shallow learning models based on self-learning features derived from protein language models has recently been studied. However, the performance of AMP models based on shallow learning strongly depends on the quality of descriptors derived via manual feature engineering, which may miss crucial information by assuming that the initial descriptor set fully captures relevant information. The AExOp-DCS algorithm was introduced as an automatic feature domain optimization method that identifies the "optimal" descriptor set driven by the chemical structure and biological activity of the compounds under study. QSAR models built on AExOp-DCS optimized descriptors outperform those using nonoptimized sets. In this study, we explore the use of AExOp-DCS to identify optimal descriptor subsets for AMP modeling. Experimental results show that the descriptors returned by AExOp-DCS contain information comparable to those used in top-performing models while exhibiting higher discriminative capacity. The generated models based on the descriptors returned by AExOp-DCS achieved performance metric values comparable to state-of-the-art approaches while utilizing fewer descriptors, suggesting a more efficient modeling process. By reducing dimensionality without sacrificing accuracy, this approach contributes to the development of more efficient computational pipelines for AMP discovery. Finally, a Java software called AExOp-DCS-SEQ is freely available, enabling researchers to leverage its capabilities for peptide descriptor search and AMP classification tasks.

Authors

  • Luis A García-González
    Grupo de Investigación de Bioinformática , Universidad de las Ciencias Informáticas , La Habana , Cuba.
  • Yovani Marrero-Ponce
    Universidad San Francisco de Quito, Grupo de Medicina Molecular y Traslacional, Colegio de Ciencias de la Salud , Escuela de Medicina, Edificio de Especialidades Médicas , Quito , Pichincha , Ecuador.
  • César R García-Jacas
    Departamento de Ciencias de la Computación , Centro de Investigación Científica y de Educación Superior de Ensenada , Ensenada , Baja California , México.
  • Sergio A Aguila Puentes
    Centro de Nanociencias y Nanotecnología, Universidad Nacional Autónoma de México, Km. 107 Carretera Tijuana-Ensenada, Ensenada, Baja California C. P. 22860, México.