The influence of prompt engineering on large language models for protein-protein interaction identification in biomedical literature.

Journal: Scientific reports
PMID:

Abstract

Identifying protein-protein interactions (PPIs) is a foundational task in biomedical natural language processing. While specialized models have been developed, the potential of general-domain large language models (LLMs) in PPI extraction, particularly for researchers without computational expertise, remains unexplored. This study evaluates the effectiveness of proprietary LLMs (GPT-3.5, GPT-4, and Google Gemini) in PPI prediction through systematic prompt engineering. We designed six prompting scenarios of increasing complexity, from basic interaction queries to sophisticated entity-tagged formats, and assessed model performance across multiple benchmark datasets (LLL, IEPA, HPRD50, AIMed, BioInfer, and PEDD). Carefully designed prompts effectively guided LLMs in PPI prediction. Gemini 1.5 Pro achieved the highest performance across most datasets, with notable F-scores in LLL (90.3%), IEPA (68.2%), HPRD50 (67.5%), and PEDD (70.2%). GPT-4 showed competitive performance, particularly in the LLL dataset (87.3%). We identified and addressed a positive prediction bias, demonstrating improved performance after evaluation refinement. While not surpassing specialized models, general-purpose LLMs with appropriate prompting strategies can effectively perform PPI prediction tasks, offering valuable tools for biomedical researchers without extensive computational expertise.

Authors

  • Yung-Chun Chang
    Graduate Institute of Data Science, College of Management, Taipei Medical University, Taipei, Taiwan.
  • Ming-Siang Huang
    Graduate Institute of Data Science, Taipei Medical University, Taipei, Taiwan.
  • Yi-Hsuan Huang
    Graduate Institute of Data Science, Taipei Medical University, Taipei, Taiwan.
  • Yi-Hsuan Lin
    Graduate Institute of Data Science, Taipei Medical University, Taipei, Taiwan.