Large Language Model Agent for Modular Task Execution in Drug Discovery
Journal:
arXiv
Published Date:
Jun 26, 2025
Abstract
We present a modular framework powered by large language models (LLMs) that
automates and streamlines key tasks across the early-stage computational drug
discovery pipeline. By combining LLM reasoning with domain-specific tools, the
framework performs biomedical data retrieval, domain-specific question
answering, molecular generation, property prediction, property-aware molecular
refinement, and 3D protein-ligand structure generation. In a case study
targeting BCL-2 in lymphocytic leukemia, the agent autonomously retrieved
relevant biomolecular information-including FASTA sequences, SMILES
representations, and literature-and answered mechanistic questions with
improved contextual accuracy over standard LLMs. It then generated chemically
diverse seed molecules and predicted 67 ADMET-related properties, which guided
iterative molecular refinement. Across two refinement rounds, the number of
molecules with QED > 0.6 increased from 34 to 55, and those passing at least
four out of five empirical drug-likeness rules rose from 29 to 52, within a
pool of 194 molecules. The framework also employed Boltz-2 to generate 3D
protein-ligand complexes and provide rapid binding affinity estimates for
candidate compounds. These results demonstrate that the approach effectively
supports molecular screening, prioritization, and structure evaluation. Its
modular design enables flexible integration of evolving tools and models,
providing a scalable foundation for AI-assisted therapeutic discovery.