Zero-Shot Document-Level Biomedical Relation Extraction via Scenario-based Prompt Design in Two-Stage with LLM
Journal:
arXiv
Published Date:
May 2, 2025
Abstract
With the advent of artificial intelligence (AI), many researchers are
attempting to extract structured information from document-level biomedical
literature by fine-tuning large language models (LLMs). However, they face
significant challenges such as the need for expensive hardware, like
high-performance GPUs and the high labor costs associated with annotating
training datasets, especially in biomedical realm. Recent research on LLMs,
such as GPT-4 and Llama3, has shown promising performance in zero-shot
settings, inspiring us to explore a novel approach to achieve the same results
from unannotated full documents using general LLMs with lower hardware and
labor costs. Our approach combines two major stages: named entity recognition
(NER) and relation extraction (RE). NER identifies chemical, disease and gene
entities from the document with synonym and hypernym extraction using an LLM
with a crafted prompt. RE extracts relations between entities based on
predefined relation schemas and prompts. To enhance the effectiveness of
prompt, we propose a five-part template structure and a scenario-based prompt
design principles, along with evaluation method to systematically assess the
prompts. Finally, we evaluated our approach against fine-tuning and pre-trained
models on two biomedical datasets: ChemDisGene and CDR. The experimental
results indicate that our proposed method can achieve comparable accuracy
levels to fine-tuning and pre-trained models but with reduced human and
hardware expenses.