GPCRSPACE: A New GPCR Real Expanded Library Based on Large Language Models Architecture and Positive Sample Machine Learning Strategies.

Journal: Journal of medicinal chemistry
Published Date:

Abstract

The quest for novel therapeutics targeting G protein-coupled receptors (GPCRs), essential in numerous physiological processes, is crucial in drug discovery. Despite the abundance of GPCR-targeting drugs, many receptors lack selective modulators, indicating a significant untapped therapeutic potential. To bridge this gap, we introduce GPCRSPACE, a novel GPCR-focused purchasable real chemical library developed using the G protein-coupled receptors large language models (GPCR LLM) architecture. Different from traditional machine learning models, GPCR LLM uses a positive sample machine learning strategy for training and does not need to construct any negative samples. This not only reduces false negatives but also reduces the time to label negative samples. GPCR LLM accelerates the identification and screening of potential GPCR-interactive compounds by learning the chemical space of GPCR-targeting molecules. GPCRSPACE, built on GPCR LLM, outperforms existing chemical data sets in synthesizability, structural diversity, and GPCR-likeness, making it a valuable tool for GPCR drug discovery.

Authors

  • Shiming Chen
  • Feisheng Zhong
    Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China.