The influence of prompt engineering on large language models for protein-protein interaction identification in biomedical literature.

Journal: Scientific reports

PMID: 40319086

Abstract

Identifying protein-protein interactions (PPIs) is a foundational task in biomedical natural language processing. While specialized models have been developed, the potential of general-domain large language models (LLMs) in PPI extraction, particularly for researchers without computational expertise, remains unexplored. This study evaluates the effectiveness of proprietary LLMs (GPT-3.5, GPT-4, and Google Gemini) in PPI prediction through systematic prompt engineering. We designed six prompting scenarios of increasing complexity, from basic interaction queries to sophisticated entity-tagged formats, and assessed model performance across multiple benchmark datasets (LLL, IEPA, HPRD50, AIMed, BioInfer, and PEDD). Carefully designed prompts effectively guided LLMs in PPI prediction. Gemini 1.5 Pro achieved the highest performance across most datasets, with notable F-scores in LLL (90.3%), IEPA (68.2%), HPRD50 (67.5%), and PEDD (70.2%). GPT-4 showed competitive performance, particularly in the LLL dataset (87.3%). We identified and addressed a positive prediction bias, demonstrating improved performance after evaluation refinement. While not surpassing specialized models, general-purpose LLMs with appropriate prompting strategies can effectively perform PPI prediction tasks, offering valuable tools for biomedical researchers without extensive computational expertise.

Authors

Yung-Chun Chang

Graduate Institute of Data Science, College of Management, Taipei Medical University, Taipei, Taiwan.
Ming-Siang Huang

Graduate Institute of Data Science, Taipei Medical University, Taipei, Taiwan.
Yi-Hsuan Huang

Graduate Institute of Data Science, Taipei Medical University, Taipei, Taiwan.
Yi-Hsuan Lin

Graduate Institute of Data Science, Taipei Medical University, Taipei, Taiwan.

Keywords

Computational Biology Data Mining Humans Large Language Models Natural Language Processing Protein Interaction Mapping

External Resources

View on PubMed Access via DOI PubMed (40319086)

The influence of prompt engineering on large language models for protein-protein interaction identification in biomedical literature.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals