BiomedRAG: A retrieval augmented large language model for biomedicine.

Journal: Journal of biomedical informatics
PMID:

Abstract

Retrieval-augmented generation (RAG) involves a solution by retrieving knowledge from an established database to enhance the performance of large language models (LLM). , these models retrieve information at the sentence or paragraph level, potentially introducing noise and affecting the generation quality. To address these issues, we propose a novel BiomedRAG framework that directly feeds automatically retrieved chunk-based documents into the LLM. Our evaluation of BiomedRAG across four biomedical natural language processing tasks using eight datasets demonstrates that our proposed framework not only improves the performance by 9.95% on average, but also achieves state-of-the-art results, surpassing various baselines by 4.97%. BiomedRAG paves the way for more accurate and adaptable LLM applications in the biomedical domain.

Authors

  • Mingchen Li
    Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, MN, USA.
  • Halil Kilicoglu
    School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL 61820, United States.
  • Hua Xu
    Department of Urology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.
  • Rui Zhang
    Department of Cardiology, Zhongda Hospital, Medical School of Southeast University, Nanjing, China.