Emerging opportunities of using large language models for translation between drug molecules and indications.

Journal: Scientific reports
Published Date:

Abstract

A drug molecule is a substance that changes an organism's mental or physical state. Every approved drug has an indication, which refers to the therapeutic use of that drug for treating a particular medical condition. While the Large Language Model (LLM), a generative Artificial Intelligence (AI) technique, has recently demonstrated effectiveness in translating between molecules and their textual descriptions, there remains a gap in research regarding their application in facilitating the translation between drug molecules and indications (which describes the disease, condition or symptoms for which the drug is used), or vice versa. Addressing this challenge could greatly benefit the drug discovery process. The capability of generating a drug from a given indication would allow for the discovery of drugs targeting specific diseases or targets and ultimately provide patients with better treatments. In this paper, we first propose a new task, the translation between drug molecules and corresponding indications, and then test existing LLMs on this new task. Specifically, we consider nine variations of the T5 LLM and evaluate them on two public datasets obtained from ChEMBL and DrugBank. Our experiments show the early results of using LLMs for this task and provide a perspective on the state-of-the-art. We also emphasize the current limitations and discuss future work that has the potential to improve the performance on this task. The creation of molecules from indications, or vice versa, will allow for more efficient targeting of diseases and significantly reduce the cost of drug discovery, with the potential to revolutionize the field of drug discovery in the era of generative AI.

Authors

  • David Oniani
    Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, Minnesota, USA.
  • Jordan Hilsman
    Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, USA.
  • Chengxi Zang
    The Department of Population Health Sciences (Zang, Wang), Weill Cornell Medicine, New York, New York.
  • Junmei Wang
    Department of Pharmaceutical Sciences, Computational Chemical Genomics Screen Center, School of Pharmacy, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA, 15213, USA; Department of Pharmaceutical Sciences, School of Pharmacy, NIDA National Center of Excellence for Computational Drug Abuse Research, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA, 15213, USA. Electronic address: junmei.wang@pitt.edu.
  • Lianjin Cai
    Department of Pharmaceutical Sciences, Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA.
  • Jan Zawala
    Jerzy Haber Institute of Catalysis and Surface Chemistry, Polish Academy of Sciences, Kraków, Poland.
  • Yanshan Wang
    Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.