Natural Language Processing Methods for the Study of Protein-Ligand Interactions.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

Natural Language Processing (NLP) has revolutionized the way computers are used to study and interact with human languages and is increasingly influential in the study of protein and ligand binding, which is critical for drug discovery and development. This review examines how NLP techniques have been adapted to decode the "language" of proteins and small molecule ligands to predict protein-ligand interactions (PLIs). We discuss how methods such as long short-term memory (LSTM) networks, transformers, and attention mechanisms can leverage different protein and ligand data types to identify potential interaction patterns. Significant challenges are highlighted including the scarcity of high-quality negative data, difficulties in interpreting model decisions, and sampling biases in existing data sets. We argue that focusing on improving data quality, enhancing model robustness, and fostering both collaboration and competition could catalyze future advances in machine-learning-based predictions of PLIs.

Authors

  • James Michels
    Department of Computer and Information Science, University of Mississippi, University, Mississippi 38677, United States.
  • Ramya Bandarupalli
    Department of BioMolecular Sciences, School of Pharmacy, University of Mississippi, University, Mississippi 38677, United States.
  • Amin Ahangar Akbari
    Department of BioMolecular Sciences, School of Pharmacy, University of Mississippi, University, Mississippi 38677, United States.
  • Thai Le
    Biomedical and Health Informatics, University of Washington, Seattle, WA, USA.
  • Hong Xiao
    Department of Computer and Information Science and Institute for Data Science, University of Mississippi, University, Mississippi 38677, United States.
  • Jing Li
    Department of Neurosurgery, Tianjin Medical University General Hospital, Tianjin, China.
  • Erik F Y Hom
    Department of Biology and Center for Biodiversity and Conservation Research, University of Mississippi, University, Mississippi 38677, United States.